7.4 System-Level Synthesis

Time	Label	Presentation Title Authors
14:30	7.4.1	SYSTEM LEVEL SYNTHESIS FOR VIRTUAL MEMORY ENABLED HARDWARE THREADS. Speaker: Nicolas Estibals, IRISA, FR Authors: Nicolas Estibals¹, Gaël Deest², Ali Hassan El Moussawi² and Steven Derrien³ ¹University of Rennes 1/IRISA, FR; ²University of Rennes 1, FR; ³IRISA, FR Abstract Newly introduced ARM-based FPGA platforms enable transparent hardware/software multithreading by providing cache-coherent memory accesses to hardware accelerators. However, the lack of support for virtual memory on the accelerator side impedes the acceleration of legacy applications. To address this problem, we propose a fully automated High Level Synthesis based source-to-source flow to efficiciently support virtual memory in hardware accelerators. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	7.4.2	COMPOSABLE, PARAMETERIZABLE TEMPLATES FOR HIGH-LEVEL SYNTHESIS Speaker: Dajung Lee, University of California, San Diego, US Authors: Janarbek Matai, Dajung Lee, Alric Althoff and Ryan Kastner, University of California, San Diego, US Abstract High-level synthesis tools aim to make FPGA programming easier by raising the level of programming abstraction. Yet in order to get an efficient hardware design from HLS tools, the designer must know how to write HLS code that results in an efficient low level hardware architecture. Unfortunately, this requires substantial hardware knowledge, which limits wide adoption of HLS tools outside of hardware designers. In this work, we develop an approach based upon parameterizable templates that can be composed using common data access patterns. This creates a methodology for efficient hardware implementations. Our results demonstrate that a small number of optimized templates can be hierarchically composed to develop highly optimized hardware implementations for large applications. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	7.4.3	LEVERAGING POWER SPECTRAL DENSITY FOR SCALABLE SYSTEM-LEVEL ACCURACY EVALUATION Speaker: Benjamin Barrois, University of Rennes, INRIA, FR Authors: Benjamin Barrois¹, Karthick Parashar² and Olivier Sentieys³ ¹University of Rennes, INRIA, FR; ²IMEC, BE; ³INRIA, FR Abstract The choice of fixed-point word-lengths critically impacts the system performance by affecting the quality of computation, its energy, speed and area. Making a good choice of fixed-point word-length generally requires solving an NP-hard problem by exploring a vast search space. Therefore, the entire fixed-point refinement process becomes critically dependent on evaluating the effects of accuracy degradation. In this paper, a novel technique for the system-level evaluation of fixed-point systems which is more scalable and that renders better accuracy is proposed. This techniques makes use of the information hidden information in the power-spectral density of quantization noise. This technique is found to be very effective in systems consisting of more than one frequency sensitive components. Compared to the state of the art hierarchical methods that are agnostic of the hidden information in the quantization noise spectrum, we show that the proposed technique is 5X to 500X more accurate on some representative signal processing kernels. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00	IP3-11, 132	LOW NORMALIZED ENERGY DERIVATION ASYNCHRONOUS CIRCUIT SYNTHESIS FLOW THROUGH FORK-JOIN SLACK MATCHING FOR CRYPTOGRAPHIC APPLICATIONS Speaker: Nan Liu, Nanyang Technological University, SG Authors: Nan Liu, Kwen-Siong Chong, Weng-Geng Ho, Bah-Hwee Gwee and Joseph S. Chang, Nanyang Technological University, SG Abstract In this paper, an automatic synthesis flow of asynchronous (async) Quasi-Delay-Insensitive (QDI) circuits for cryptographic applications is presented. The synthesis flow accepts Verilog netlists as primary inputs, in part leverages on commercial electronic design automation tools for synthesis and verifications, and relies heavily on the proposed translation processes for async netlist conversion and optimization. Particularly, a three-step synchronous-to-asynchronous-direct-translation (SADT) process is proposed. The first step is to translate a Verilog netlist into a direct circuit graph, allowing us to model QDI pipelines for performance analysis based on the same netlist function. Second, graph coarsening in combination with dynamic programing is adopted to analyze the fork-join slack matching of the QDI pipelines, aiming to balance the pipeline depths in any fork-join pipelines to optimize the system performance, and to reduce energy variations of the overall pipelines to against power-analysis-attack. The last step is to insert async local controllers/gates to ensure the async circuits consistent with QDI protocol, hence enhancing its timing robustness to accommodate Process-Voltage-Temperature variations. We show that, on the basis of simulations on the ISCAS benchmark circuits, the QDI circuits based on our proposed automatic synthesis flow are on average 20% faster and feature 30% less normalized energy derivations than un-optimized circuits. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00		End of session Coffee Break in Exhibition Area

Time

Label

Presentation Title
Authors

14:30

7.4.1

SYSTEM LEVEL SYNTHESIS FOR VIRTUAL MEMORY ENABLED HARDWARE THREADS.
Speaker:
Nicolas Estibals, IRISA, FR
Authors:
Nicolas Estibals¹, Gaël Deest², Ali Hassan El Moussawi² and Steven Derrien³
¹University of Rennes 1/IRISA, FR; ²University of Rennes 1, FR; ³IRISA, FR
Abstract
Newly introduced ARM-based FPGA platforms enable transparent hardware/software multithreading by providing cache-coherent memory accesses to hardware accelerators. However, the lack of support for virtual memory on the accelerator side impedes the acceleration of legacy applications. To address this problem, we propose a fully automated High Level Synthesis based source-to-source flow to efficiciently support virtual memory in hardware accelerators.
Download Paper (PDF; Only available from the DATE venue WiFi)

15:00

7.4.2

COMPOSABLE, PARAMETERIZABLE TEMPLATES FOR HIGH-LEVEL SYNTHESIS
Speaker:
Dajung Lee, University of California, San Diego, US
Authors:
Janarbek Matai, Dajung Lee, Alric Althoff and Ryan Kastner, University of California, San Diego, US
Abstract
High-level synthesis tools aim to make FPGA programming easier by raising the level of programming abstraction. Yet in order to get an efficient hardware design from HLS tools, the designer must know how to write HLS code that results in an efficient low level hardware architecture. Unfortunately, this requires substantial hardware knowledge, which limits wide adoption of HLS tools outside of hardware designers. In this work, we develop an approach based upon parameterizable templates that can be composed using common data access patterns. This creates a methodology for efficient hardware implementations. Our results demonstrate that a small number of optimized templates can be hierarchically composed to develop highly optimized hardware implementations for large applications.
Download Paper (PDF; Only available from the DATE venue WiFi)

15:30

7.4.3

LEVERAGING POWER SPECTRAL DENSITY FOR SCALABLE SYSTEM-LEVEL ACCURACY EVALUATION
Speaker:
Benjamin Barrois, University of Rennes, INRIA, FR
Authors:
Benjamin Barrois¹, Karthick Parashar² and Olivier Sentieys³
¹University of Rennes, INRIA, FR; ²IMEC, BE; ³INRIA, FR
Abstract
The choice of fixed-point word-lengths critically impacts the system performance by affecting the quality of computation, its energy, speed and area. Making a good choice of fixed-point word-length generally requires solving an NP-hard problem by exploring a vast search space. Therefore, the entire fixed-point refinement process becomes critically dependent on evaluating the effects of accuracy degradation. In this paper, a novel technique for the system-level evaluation of fixed-point systems which is more scalable and that renders better accuracy is proposed. This techniques makes use of the information hidden information in the power-spectral density of quantization noise. This technique is found to be very effective in systems consisting of more than one frequency sensitive components. Compared to the state of the art hierarchical methods that are agnostic of the hidden information in the quantization noise spectrum, we show that the proposed technique is 5X to 500X more accurate on some representative signal processing kernels.
Download Paper (PDF; Only available from the DATE venue WiFi)

16:00

IP3-11, 132

LOW NORMALIZED ENERGY DERIVATION ASYNCHRONOUS CIRCUIT SYNTHESIS FLOW THROUGH FORK-JOIN SLACK MATCHING FOR CRYPTOGRAPHIC APPLICATIONS
Speaker:
Nan Liu, Nanyang Technological University, SG
Authors:
Nan Liu, Kwen-Siong Chong, Weng-Geng Ho, Bah-Hwee Gwee and Joseph S. Chang, Nanyang Technological University, SG
Abstract
In this paper, an automatic synthesis flow of asynchronous (async) Quasi-Delay-Insensitive (QDI) circuits for cryptographic applications is presented. The synthesis flow accepts Verilog netlists as primary inputs, in part leverages on commercial electronic design automation tools for synthesis and verifications, and relies heavily on the proposed translation processes for async netlist conversion and optimization. Particularly, a three-step synchronous-to-asynchronous-direct-translation (SADT) process is proposed. The first step is to translate a Verilog netlist into a direct circuit graph, allowing us to model QDI pipelines for performance analysis based on the same netlist function. Second, graph coarsening in combination with dynamic programing is adopted to analyze the fork-join slack matching of the QDI pipelines, aiming to balance the pipeline depths in any fork-join pipelines to optimize the system performance, and to reduce energy variations of the overall pipelines to against power-analysis-attack. The last step is to insert async local controllers/gates to ensure the async circuits consistent with QDI protocol, hence enhancing its timing robustness to accommodate Process-Voltage-Temperature variations. We show that, on the basis of simulations on the ISCAS benchmark circuits, the QDI circuits based on our proposed automatic synthesis flow are on average 20% faster and feature 30% less normalized energy derivations than un-optimized circuits.
Download Paper (PDF; Only available from the DATE venue WiFi)

16:00

End of session
Coffee Break in Exhibition Area

Visit us at DATE 2016