12.5 System-level Design Space Exploration

Date: Thursday 27 March 2014
Time: 16:00 - 17:30
Location / Room: Konferenz 3

Chair:
Frederic Petrot, TIMA, FR

Co-Chair:
Luciano Lavagno, Politecnico di Torino, IT

The sessions discusses novel aspects and objectives in the exploration of embedded architectures. Papers cover topics including integration of diagnosis, approximate circuit design, custom instruction optimization, and scheduling issues.

Time	Label	Presentation Title Authors
16:00	12.5.1	NON-INTRUSIVE INTEGRATION OF ADVANCED DIAGNOSIS FEATURES IN AUTOMOTIVE E/E-ARCHITECTURES Speakers: Ulrich Abelein¹, Alejandro Cook², Piet Engelke³, Michael Glaß⁴, Felix Reimann⁴, Laura Rodríguez Gómez², Thomas Russ⁴, Jürgen Teich⁴, Dominik Ull² and Hans-Joachim Wunderlich² ¹AUDI AG, Ingolstadt, DE; ²University of Stuttgart, DE; ³Infineon Technologies AG, DE; ⁴University of Erlangen-Nuremberg, DE Abstract With ever more complex automotive systems, the current approach of using functional tests to locate faulty components results in very long analysis procedures and poor diagnostic accuracy. Built-In Self-Test (BIST) offers a promising alternative to collect structural diagnostic information during E/E-architecture test. However, as the automotive industry is quite cost-driven, structural diagnosis shall not deteriorate traditional design objectives. With this goal in mind, the work at hand proposes a design space exploration to integrate structural diagnostic capabilities into an E/E-architecture design. The proposed integration is performed non-intrusively, i.e., the addition and execution of tests (a) does not affect any functional applications and (b) does not require any costly changes in the communication schedules.
16:30	12.5.2	ABACUS: A TECHNIQUE FOR AUTOMATED BEHAVIORAL SYNTHESIS OF APPROXIMATE COMPUTING CIRCUITS Speakers: Kumud Nepal, Yueting Li, R. Iris Bahar and Sherief Reda, Brown University, Providence, Rhode Island, US Abstract Many classes of applications, especially in the domains of signal and image processing, computer graphics, computer vision, and machine learning, are inherently tolerant to inaccuracies in their underlying computations. This tolerance can be exploited to design approximate circuits that perform within acceptable accuracies but have much lower power consumption and smaller area footprints (and often better run times) than their exact counterparts. In this paper, we propose a new class of automated synthesis methods for generating approximate circuits directly from behavioral-level descriptions. In contrast to previous methods that operate at the Boolean level or use custom modifications, our automated behavioral synthesis method enables a wider range of possible approximations and can operate on arbitrary designs. Our method first creates an abstract synthesis tree (AST) from the input behavioral description, and then applies variant operators to the AST using an iterative stochastic greedy approach to identify the optimal inexact designs in an efficient way. Our method is able to identify the optimal designs that represent the Pareto frontier trade-off between accuracy and power consumption. Our methodology is developed into a tool we call ABACUS, which we integrate with a standard ASIC experimental flow based on industrial tools. We validate our methods on three realistic Verilog-based benchmarks from three different domains --- signal processing, computer vision and machine learning. Our tool automatically discovers optimal designs, providing area and power savings of up to 50% while maintaining good accuracy.
17:00	12.5.3	AUTOMATIC GENERATION OF CUSTOM SIMD INSTRUCTIONS FOR SUPERWORD LEVEL PARALLELISM Speakers: Taemin Kim and Yatin Hoskote, Intel/Intel Labs, US Abstract Application specific instruction-set processors (ASIPs) have drawn significant attention from System-on-a-Chip (SoC) community due to its capability of fine grain flexibility and customizability. In order to maximize the benefit of ASIP, automatic instruction set extension (ISE) is required. In the past decade, there have been plethora researches on automatic ISE for custom scalar instruction. However, due to increasing usage of SIMD instructions to exploit data level parallelism (DLP) that exists both across loop iterations and within a basic block called Superword Level Parallelism (SLP), automatic generation of custom SIMD instructions is inevitable direction of automatic ISE. In this paper, we propose an algorithm that automatically generates custom SIMD instructions from a set of custom scalar instructions to exploit SLP. We have demonstrated 52.4% and 30.8% performance improvement on average over base instruction set and additional custom scalar instructions, respectively.
17:15	12.5.4	SYSTEM-LEVEL SCHEDULING OF REAL-TIME STREAMING APPLICATIONS USING A SEMI-PARTITIONED APPROACH Speakers: Emanuele Cannella, Mohamed Bamakhrama and Todor Stefanov, Leiden University, NL Abstract Modern multiprocessor streaming systems have hard real-time constraints that must be always met to ensure correct functionality. At the same time, these streaming systems must be designed to use the minimum required amount of resources (such as processors and memory). In order to meet such constraints, using scheduling algorithms from the classical real-time scheduling theory represents an attractive solution approach. These algorithms enable: (1) providing timing guarantees to the applications running on the system, and (2) deriving analytically the minimum number of processors required to schedule the applications. So far, designers in the embedded systems community have focused on global and partitioned scheduling algorithms. However, recently, a new hybrid class of scheduling algorithms has been proposed. In this work, we investigate the applicability of a sub-class of these hybrid algorithms, called semi-partitioned algorithms, to applications modeled as Cyclo-Static Dataflow (CSDF) graphs. The contribution of this paper is two fold. First, we devise an approach that enables semi-partitioned scheduling algorithms, even soft real-time ones, to be applied to CSDF graphs while providing hard real-time guarantees at the input/output interfaces with the external environment. Second, we focus on an existing soft real-time semi-partitioned approach, for which we propose an allocation heuristic, called FFD-SP. The proposed heuristic reduces the minimum number of processors required to schedule the applications compared to a pure partitioned scheduling algorithm, while trying to minimize the buffer size and latency increases incurred by the soft real-time approach.
17:30		End of session

< Return to last page

Submissions

12.5 System-level Design Space Exploration