10.5 Reconfigurable Architectures and Applications

Printer-friendly version PDF version

Date: Thursday 12 March 2015
Time: 11:00 - 12:30
Location / Room: Meije

Chair:
Christian Plessl, University of Paderborn, DE

Co-Chair:
Enno Lübbers, Intel Labs Europe, DE

Reconfigurable computing has vast potential for enhancing the performance of applications especially when using architectural optimizations. This session has two papers that focus on architectural enhancements while the third demonstrates a hardware accelerated bioinformatics application

TimeLabelPresentation Title
Authors
11:0010.5.1HYBRID ADAPTIVE CLOCK MANAGEMENT FOR FPGA PROCESSOR ACCELERATION
Speakers:
Alexandru Gheolbanoiu1, Lucian Petrica1 and Sorin Cotofana2
1University POLITEHNICA of Bucharest, RO; 2Delft University of Technology, NL
Abstract
As FPGAs speed, power efficiency, and logic capacity are increasing, so does the number of applications which make use of FPGA processors. However, due to placement and routing constraints, FPGA processors instruction delay balancing is a real challenge, especially when the implementation approaches the FPGA resource capacity. Consequently, even though some instructions can operate at high frequencies, the slow instructions determine the processor clock period, resulting in the underutilisation of the processor potential. However, the fast instructions latent performance may be harnessed through Adaptive Clock Management (ACM), i.e., by dynamically adapting the clock frequency such that each instruction gets sufficient time for correct completion. Up to date, ACM augmented FPGA processors have been proposed based on Clock Multiplexing (CM), but they suffer from long clock switching delays, which could nullify most of the ACM potential performance gain. This paper proposes an effective FPGA tailored clock manipulation approach able to leverage the ACM potential. We first evaluate Clock Stretching (CS), i.e., the temporary clock period augmentation, as a CM alternative in FPGA processor designs and introduce an FPGA specific CS circuit implementation. Subsequently, we evaluate the advantages and drawbacks of the two techniques and propose a Hybrid ACM, which monitors the processor instruction stream and determines the optimal adaptive clocking strategy in order to provide the maximum speedup for the executing program. Given that CS has very low latency at the expense of limited accuracy and dynamic range we rely on it when the program requires frequent clock period changes. Otherwise we utilise CM, which is rather slow but enables the FPGA processor operation at the edge of its hardware capabilities. We evaluate our proposal on a vector processor mapped on a Xilinx Zynq FPGA. Our experiments indicate that on Sum of Squared Differences algorithm, Neural network, and FIR filter execution traces the hybrid ACM provides up to $14$\% performance increase over the CM based ACM.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.5.2A SCALABLE AND HIGH-DENSITY FPGA ARCHITECTURE WITH MULTI-LEVEL PHASE CHANGE MEMORY
Speakers:
Chunan Wei, Ashutosh Dhar and Deming Chen, University of Illinois, Urbana-Champaign, US
Abstract
As CMOS technology is stretched to its limits it has become imperative to look to alternative solutions for the next generation of FPGAs. In particular, due to the configurable nature of FPGAs, on-chip memory remains to be a major concern for designers. In this work we explore the use of Phase-Change Memory (PCM). We exploit the ability of PCM to exist in multiple intermediate states to store 2 bits per cell and develop a new Look Up Table (LUT) architecture. The new LUT can either store two functions with shared inputs or a single function with an additional input. We also explore the use of PCM in local routing mechanisms and thus propose a new Configurable Logic Block (CLB) composed of CMOS and PCM. The new design promises significant improvements in logic density and performance with area improvements of over 40% for all LUT sizes and delay improvements of 7% to 13% on an average for LUTs of size 10 to 6 .

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.5.3FPGA ACCELERATED DNA ERROR CORRECTION
Speaker:
Ashutosh Dhar, University of Illinois at Urbana-Champaign, US
Authors:
Anand Ramachandran, Yun Heo, Wen-mei Hwu, Jian Ma and Deming Chen, University of Illinois at Urbana-Champaign, US
Abstract
Correcting errors in DNA sequencing data is an important process that can improve the quality of downstream analysis using the data. Even though many error-correction methods have been proposed for Illumina reads, their throughput is not high enough to process data from large genomes. The current paper describes the first FPGA-based error-correction tool, called FPGA Accelerated DNA Error Correction (FADE), which targets to improve the throughput of DNA error correction for Illumina reads. The base algorithm of FADE is BLESS that is highly accurate but slow. A Bloom filter that is the main data structure of BLESS and BLESS' error correction subroutines for different types of errors have been implemented on a FPGA. We compared our design with the software version of BLESS using DNA sequencing data generated from four genomes and we could achieve up to 43 times speedup for the best case, and 36 times speedup on the average.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP5-4, 1007(Best Paper Award Candidate)
DESIGN FLOW AND RUN-TIME MANAGEMENT FOR COMPRESSED FPGA CONFIGURATIONS
Speakers:
Christophe Huriaux1, Antoine Courtay1 and Olivier Sentieys2
1University of Rennes 1 - IRISA, FR; 2INRIA, FR
Abstract
The aim of partially and dynamically reconfigurable hardware is to provide an increased flexibility through the load of multiple applications on the same reconfigurable fabric at the same time. However, a configuration bit-stream loaded at runtime should be created offline for each task of the application. Moreover, modern applications use a lot of specialized hardware blocks to perform complex operations, which tends to cancel the "single bit-stream for a single application" paradigm, as the logic content for different locations of the reconfigurable fabric may be different. In this paper we propose a design flow for generating compressed configuration bit-streams abstracted from their final position on the logic fabric. Those configurations will then be decoded and finalized in real-time and at run-time by a dedicated reconfiguration controller to be placed at a given physical location. Our experiments show that densely routed applications gain the most with a compression factor of more than 2× using the finest cluster size, but coarser coding can be implemented to achieve a compression factor up to 10×.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break, Keynote lecture from 1320 - 1350 (Room Oisans) in Les Écrins

Coffee Break in Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Break

On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only).

Tuesday, March 10, 2015

Coffee Break 10:30 - 11:30

Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics

Coffee Break 16:00 - 17:00

Wednesday, March 11, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans)

Coffee Break 16:00 - 17:00

Thursday, March 12, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50

Coffee Break 15:30 - 16:00