10.7 Advances in Synthesis

Printer-friendly version PDF version

Date: Thursday 27 March 2014
Time: 11:00 - 12:30
Location / Room: Konferenz 5

Chair:
John Hayes, University of Michigan, US

Co-Chair:
Kim Taemin, Intel Labs, US

Papers in this session address synthesis algorithms and tools at different levels, targeting power, area and delay minimization.

TimeLabelPresentation Title
Authors
11:0010.7.1PROVABLY MINIMAL ENERGY USING COORDINATED DVS AND POWER GATING
Speakers:
Nathaniel Conos, Saro Meguerdichian, Foad Dabiri and Miodrag Potkonjak, UCLA, US
Abstract
Both energy and execution speed can be greatly impacted by clock and power gating, nonlinear voltage scaling, and leak- age power. We address the problem of coordinated power gating and dynamic voltage scaling (DVS) to minimize the overall energy consumption of an application under user- specified timing constraints. We prove that a solution pro- vided by our convex programming formulation that uses at most two versions of hardware, where each version uses its own constant voltages, is optimal. Comprehensive evalua- tion of the new approach demonstrates energy improvements over traditional DVS and DVS and power gating techniques by factors of 1.44X-2.97X and 1.44X-2.82X, respectively.
11:3010.7.2A TREE ARBITER CELL FOR HIGH SPEED RESOURCE SHARING IN ASYNCHRONOUS ENVIRONMENTS
Speakers:
Syed Rameez Naqvi and Andreas Steininger, Vienna University of Technology, AT
Abstract
We present a novel tree arbiter cell that allows a pipelined processing of asynchronous requests. In this way it can achieve significantly lower delay in the critical case of frequent requests coming from different clients. We elaborate the necessary extension to facilitate a cascaded use of this cell in a tree-like fashion, and we show by theoretical analysis that in this configuration our cell provides better fairness than the standard approach. We implement our approach and quantitatively compare its performance properties with related work in a gate-level simulation. In our sample asynchronous Networks-on-Chip application our new cell proves to increase the throughput of three different designs available in literature by approximately 61.28\%, 69.24\%, and 186.85\% respectively.
12:0010.7.3AN EFFICIENT MANIPULATION PACKAGE FOR BICONDITIONAL BINARY DECISION DIAGRAMS
Speakers:
Luca Amaru, Pierre-Emmanuel Gaillardon and Giovanni De Micheli, EPFL, CH
Abstract
Biconditional Binary Decision Diagrams (BBDDs) are a novel class of binary decision diagrams where the branching condition, and its associated logic expansion, is biconditional on two variables. Reduced and ordered BBDDs are remarkably compact and unique for a given Boolean function. In order to exploit BBDDs in Electronic Design Automation (EDA) applica- tions, efficient manipulation algorithms must be developed and integrated in a software package. In this paper, we present the theory for efficient BBDD manipulation and its practical software implementation. The key features of the proposed approach are (i) strong canonical form pre-conditioning of stored BBDD nodes, (ii) recursive formulation of Boolean operations in terms of biconditional expansions, (iii) performance-oriented memory management and (iv) dedicated BBDD re-ordering techniques. Experimental results show that the developed BBDD package achieves an average node count reduction of 19.48% and a speed-up factor of 1.63x with respect to a state-of-art decision diagram manipulation package. Employed in the synthesis of datapath circuits, the BBDD manipulation package is capable to advantageously restructure arithmetic operations producing 11.02% smaller and 32.29% faster circuits as compared to a commercial synthesis flow.
12:1510.7.4SYNTHESIS ALGORITHM OF PARALLEL INDEX GENERATION UNITS
Speaker:
Yusuke Matsunaga, Kyushu University, JP
Abstract
The index generation function is a multi-valued logic function which checks if the given input vector is a registered or not, and returns its index value if the vector is registered. If the latency of the operation is critical, dedicated hardware is used for implementing the index generation functions. This paper proposes a method implementing the index generation functions using parallel index generation units. A novel and efficient algorithm called `conflict free partitioning' is proposed to synthesis paralell index generation units. Experimental results show the proposed method outperforms other existing methods.
12:30IP5-7, 104AUTOMATING DATA REUSE IN HIGH-LEVEL SYNTHESIS
Speakers:
Wim Meeus1 and Dirk Stroobandt2
1Imec and Ghent University, BE; 2Ghent University, BE
Abstract
Current High-Level Synthesis (HLS) tools perform excellently for the synthesis of computation kernels, but they often don't optimize memory bandwidth. As memory access is a bottleneck in many algorithms, the performance of the generated circuit will benefit substantially from memory access optimization. In this paper we present an automated method and a toolchain to detect reuse of array data in loop nests and to build hardware that exploits this data reuse. This saves memory bandwidth and improves circuit performance. We make use of the polyhedral representation of the source program, which makes our method computationally easy. Our software complements the existing HLS flows. Starting from a loop nest written in C, our tool generates a reuse buffer and a loop controller, and preprocesses the loop body for synthesis with an existing HLS tool. Our automated tool produces designs from unoptimized source code that are as efficient as those generated by a commercial HLS tool from manually-optimized source code.
12:31IP5-8, 12A UNIVERSAL SYMMETRY DETECTION ALGORITHM
Speaker:
Peter Maurer, Dept. of Computer Sci., Baylor University, US
Abstract
Research on symmetry detection focuses on identifying and detecting new types of symmetry. We present an algorithm that is capable of detecting any type of permutation-based symmetry, including many types for which there are no existing algorithms. General symmetry detection is library-based, but symmetries that can be parameterized, (i.e. total, partial, rotational, and dihedral symmetry), can be detected without using libraries. In many cases it is faster than existing techniques. Furthermore, it is simpler than most existing techniques, and can easily be incorporated into existing software.
12:32IP5-9, 525OPTIMIZATION OF DESIGN COMPLEXITY IN TIME-MULTIPLEXED CONSTANT MULTIPLICATIONS
Speakers:
Levent Aksoy1, Paulo Flores2 and Jose Monteiro3
1INESC-ID, PT; 2INESC-ID/IST ULisbon, PT; 3INESC-ID / IST, ULisbon, PT
Abstract
The multiplication of constants by a data input is an essential operation in digital signal processing (DSP) systems. For applications requiring a large number of constant multiplications under stringent hardware constraints, it is generally realized under a folded architecture, where a single constant selected from a set of multiple constants is multiplied by the data input at each time, called time-multiplexed constant multiplication (TMCM). This paper addresses the problem of optimizing the complexity of a TMCM design and introduces an algorithm that finds the least complex TMCM design by sharing the logic operators, i.e., adders, subtractors, adders/subtractors, and multiplexors (MUXes). It includes efficient search methods, yielding better results than existing TMCM algorithms.
12:33IP5-10, 807HARDWARE PRIMITIVES FOR THE SYNTHESIS OF MULTITHREADED ELASTIC SYSTEMS
Speakers:
Giorgos Dimitrakopoulos1, Seitanidis Ioannis2, Anastasios Psarras1, Konstantinos Tsiouris1, Pavlos Matthaiakis3 and Jordi Cortadella4
1Democritus University of Thrace, GR; 2Democritus University of Thrac, GR; 3Mentor Graphics, FR; 4Universitat Politecnica de Catalunya, ES
Abstract
Abstract—Elastic systems operate in a dataflow-like mode using a distributed scalable control and tolerating variable latency computations. At the same time, multithreading increases the utilization of processing units and hides the latency of each operation by time-multiplexing operations of different threads in the datapath. This paper proposes a model to unify multithreading and elasticity. A new multithreaded elastic control protocol is introduced supported by low-cost elastic buffers that minimize the storage requirements without sacrificing performance. To enable the synthesis of multithreaded elastic architectures, new hardware primitives are proposed and utilized in two circuit examples to prove the applicability of the proposed approach.
12:30End of session
Lunch Break in Exhibition Area
Sandwich lunch