10.5 Reconfigurable Architectures and Applications

Date: Thursday 12 March 2015
Time: 11:00 - 12:30
Location / Room: Meije

Chair:
Christian Plessl, University of Paderborn, DE

Co-Chair:
Enno Lübbers, Intel Labs Europe, DE

Reconfigurable computing has vast potential for enhancing the performance of applications especially when using architectural optimizations. This session has two papers that focus on architectural enhancements while the third demonstrates a hardware accelerated bioinformatics application

Time	Label	Presentation Title Authors
11:00	10.5.1	HYBRID ADAPTIVE CLOCK MANAGEMENT FOR FPGA PROCESSOR ACCELERATION Speakers: Alexandru Gheolbanoiu¹, Lucian Petrica¹ and Sorin Cotofana² ¹University POLITEHNICA of Bucharest, RO; ²Delft University of Technology, NL Abstract As FPGAs speed, power efficiency, and logic capacity are increasing, so does the number of applications which make use of FPGA processors. However, due to placement and routing constraints, FPGA processors instruction delay balancing is a real challenge, especially when the implementation approaches the FPGA resource capacity. Consequently, even though some instructions can operate at high frequencies, the slow instructions determine the processor clock period, resulting in the underutilisation of the processor potential. However, the fast instructions latent performance may be harnessed through Adaptive Clock Management (ACM), i.e., by dynamically adapting the clock frequency such that each instruction gets sufficient time for correct completion. Up to date, ACM augmented FPGA processors have been proposed based on Clock Multiplexing (CM), but they suffer from long clock switching delays, which could nullify most of the ACM potential performance gain. This paper proposes an effective FPGA tailored clock manipulation approach able to leverage the ACM potential. We first evaluate Clock Stretching (CS), i.e., the temporary clock period augmentation, as a CM alternative in FPGA processor designs and introduce an FPGA specific CS circuit implementation. Subsequently, we evaluate the advantages and drawbacks of the two techniques and propose a Hybrid ACM, which monitors the processor instruction stream and determines the optimal adaptive clocking strategy in order to provide the maximum speedup for the executing program. Given that CS has very low latency at the expense of limited accuracy and dynamic range we rely on it when the program requires frequent clock period changes. Otherwise we utilise CM, which is rather slow but enables the FPGA processor operation at the edge of its hardware capabilities. We evaluate our proposal on a vector processor mapped on a Xilinx Zynq FPGA. Our experiments indicate that on Sum of Squared Differences algorithm, Neural network, and FIR filter execution traces the hybrid ACM provides up to $14$\% performance increase over the CM based ACM. Download Paper (PDF; Only available from the DATE venue WiFi)
11:30	10.5.2	A SCALABLE AND HIGH-DENSITY FPGA ARCHITECTURE WITH MULTI-LEVEL PHASE CHANGE MEMORY Speakers: Chunan Wei, Ashutosh Dhar and Deming Chen, University of Illinois, Urbana-Champaign, US Abstract As CMOS technology is stretched to its limits it has become imperative to look to alternative solutions for the next generation of FPGAs. In particular, due to the configurable nature of FPGAs, on-chip memory remains to be a major concern for designers. In this work we explore the use of Phase-Change Memory (PCM). We exploit the ability of PCM to exist in multiple intermediate states to store 2 bits per cell and develop a new Look Up Table (LUT) architecture. The new LUT can either store two functions with shared inputs or a single function with an additional input. We also explore the use of PCM in local routing mechanisms and thus propose a new Configurable Logic Block (CLB) composed of CMOS and PCM. The new design promises significant improvements in logic density and performance with area improvements of over 40% for all LUT sizes and delay improvements of 7% to 13% on an average for LUTs of size 10 to 6 . Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	10.5.3	FPGA ACCELERATED DNA ERROR CORRECTION Speaker: Ashutosh Dhar, University of Illinois at Urbana-Champaign, US Authors: Anand Ramachandran, Yun Heo, Wen-mei Hwu, Jian Ma and Deming Chen, University of Illinois at Urbana-Champaign, US Abstract Correcting errors in DNA sequencing data is an important process that can improve the quality of downstream analysis using the data. Even though many error-correction methods have been proposed for Illumina reads, their throughput is not high enough to process data from large genomes. The current paper describes the first FPGA-based error-correction tool, called FPGA Accelerated DNA Error Correction (FADE), which targets to improve the throughput of DNA error correction for Illumina reads. The base algorithm of FADE is BLESS that is highly accurate but slow. A Bloom filter that is the main data structure of BLESS and BLESS' error correction subroutines for different types of errors have been implemented on a FPGA. We compared our design with the software version of BLESS using DNA sequencing data generated from four genomes and we could achieve up to 43 times speedup for the best case, and 36 times speedup on the average. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	IP5-4, 1007	(Best Paper Award Candidate) DESIGN FLOW AND RUN-TIME MANAGEMENT FOR COMPRESSED FPGA CONFIGURATIONS Speakers: Christophe Huriaux¹, Antoine Courtay¹ and Olivier Sentieys² ¹University of Rennes 1 - IRISA, FR; ²INRIA, FR Abstract The aim of partially and dynamically reconfigurable hardware is to provide an increased flexibility through the load of multiple applications on the same reconfigurable fabric at the same time. However, a configuration bit-stream loaded at runtime should be created offline for each task of the application. Moreover, modern applications use a lot of specialized hardware blocks to perform complex operations, which tends to cancel the "single bit-stream for a single application" paradigm, as the logic content for different locations of the reconfigurable fabric may be different. In this paper we propose a design flow for generating compressed configuration bit-streams abstracted from their final position on the logic fabric. Those configurations will then be decoded and finalized in real-time and at run-time by a dedicated reconfiguration controller to be placed at a given physical location. Our experiments show that densely routed applications gain the most with a compression factor of more than 2× using the finest cluster size, but coarser coding can be implemented to achieve a compression factor up to 10×. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30		End of session Lunch Break, Keynote lecture from 1320 - 1350 (Room Oisans) in Les Écrins Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00