12.5 System-level Design Space Exploration

Printer-friendly version PDF version

Date: Thursday 27 March 2014
Time: 16:00 - 17:30
Location / Room: Konferenz 3

Chair:
Frederic Petrot, TIMA, FR

Co-Chair:
Luciano Lavagno, Politecnico di Torino, IT

The sessions discusses novel aspects and objectives in the exploration of embedded architectures. Papers cover topics including integration of diagnosis, approximate circuit design, custom instruction optimization, and scheduling issues.

TimeLabelPresentation Title
Authors
16:0012.5.1NON-INTRUSIVE INTEGRATION OF ADVANCED DIAGNOSIS FEATURES IN AUTOMOTIVE E/E-ARCHITECTURES
Speakers:
Ulrich Abelein1, Alejandro Cook2, Piet Engelke3, Michael Glaß4, Felix Reimann4, Laura Rodríguez Gómez2, Thomas Russ4, Jürgen Teich4, Dominik Ull2 and Hans-Joachim Wunderlich2
1AUDI AG, Ingolstadt, DE; 2University of Stuttgart, DE; 3Infineon Technologies AG, DE; 4University of Erlangen-Nuremberg, DE
Abstract
With ever more complex automotive systems, the current approach of using functional tests to locate faulty components results in very long analysis procedures and poor diagnostic accuracy. Built-In Self-Test (BIST) offers a promising alternative to collect structural diagnostic information during E/E-architecture test. However, as the automotive industry is quite cost-driven, structural diagnosis shall not deteriorate traditional design objectives. With this goal in mind, the work at hand proposes a design space exploration to integrate structural diagnostic capabilities into an E/E-architecture design. The proposed integration is performed non-intrusively, i.e., the addition and execution of tests (a) does not affect any functional applications and (b) does not require any costly changes in the communication schedules.
16:3012.5.2ABACUS: A TECHNIQUE FOR AUTOMATED BEHAVIORAL SYNTHESIS OF APPROXIMATE COMPUTING CIRCUITS
Speakers:
Kumud Nepal, Yueting Li, R. Iris Bahar and Sherief Reda, Brown University, Providence, Rhode Island, US
Abstract
Many classes of applications, especially in the domains of signal and image processing, computer graphics, computer vision, and machine learning, are inherently tolerant to inaccuracies in their underlying computations. This tolerance can be exploited to design approximate circuits that perform within acceptable accuracies but have much lower power consumption and smaller area footprints (and often better run times) than their exact counterparts. In this paper, we propose a new class of automated synthesis methods for generating approximate circuits directly from behavioral-level descriptions. In contrast to previous methods that operate at the Boolean level or use custom modifications, our automated behavioral synthesis method enables a wider range of possible approximations and can operate on arbitrary designs. Our method first creates an abstract synthesis tree (AST) from the input behavioral description, and then applies variant operators to the AST using an iterative stochastic greedy approach to identify the optimal inexact designs in an efficient way. Our method is able to identify the optimal designs that represent the Pareto frontier trade-off between accuracy and power consumption. Our methodology is developed into a tool we call ABACUS, which we integrate with a standard ASIC experimental flow based on industrial tools. We validate our methods on three realistic Verilog-based benchmarks from three different domains --- signal processing, computer vision and machine learning. Our tool automatically discovers optimal designs, providing area and power savings of up to 50% while maintaining good accuracy.
17:0012.5.3AUTOMATIC GENERATION OF CUSTOM SIMD INSTRUCTIONS FOR SUPERWORD LEVEL PARALLELISM
Speakers:
Taemin Kim and Yatin Hoskote, Intel/Intel Labs, US
Abstract
Application specific instruction-set processors (ASIPs) have drawn significant attention from System-on-a-Chip (SoC) community due to its capability of fine grain flexibility and customizability. In order to maximize the benefit of ASIP, automatic instruction set extension (ISE) is required. In the past decade, there have been plethora researches on automatic ISE for custom scalar instruction. However, due to increasing usage of SIMD instructions to exploit data level parallelism (DLP) that exists both across loop iterations and within a basic block called Superword Level Parallelism (SLP), automatic generation of custom SIMD instructions is inevitable direction of automatic ISE. In this paper, we propose an algorithm that automatically generates custom SIMD instructions from a set of custom scalar instructions to exploit SLP. We have demonstrated 52.4% and 30.8% performance improvement on average over base instruction set and additional custom scalar instructions, respectively.
17:1512.5.4SYSTEM-LEVEL SCHEDULING OF REAL-TIME STREAMING APPLICATIONS USING A SEMI-PARTITIONED APPROACH
Speakers:
Emanuele Cannella, Mohamed Bamakhrama and Todor Stefanov, Leiden University, NL
Abstract
Modern multiprocessor streaming systems have hard real-time constraints that must be always met to ensure correct functionality. At the same time, these streaming systems must be designed to use the minimum required amount of resources (such as processors and memory). In order to meet such constraints, using scheduling algorithms from the classical real-time scheduling theory represents an attractive solution approach. These algorithms enable: (1) providing timing guarantees to the applications running on the system, and (2) deriving analytically the minimum number of processors required to schedule the applications. So far, designers in the embedded systems community have focused on global and partitioned scheduling algorithms. However, recently, a new hybrid class of scheduling algorithms has been proposed. In this work, we investigate the applicability of a sub-class of these hybrid algorithms, called semi-partitioned algorithms, to applications modeled as Cyclo-Static Dataflow (CSDF) graphs. The contribution of this paper is two fold. First, we devise an approach that enables semi-partitioned scheduling algorithms, even soft real-time ones, to be applied to CSDF graphs while providing hard real-time guarantees at the input/output interfaces with the external environment. Second, we focus on an existing soft real-time semi-partitioned approach, for which we propose an allocation heuristic, called FFD-SP. The proposed heuristic reduces the minimum number of processors required to schedule the applications compared to a pure partitioned scheduling algorithm, while trying to minimize the buffer size and latency increases incurred by the soft real-time approach.
17:30End of session