12.6 Reconfigurable Computing Platforms and Architectures

Printer-friendly version PDF version

Date: Thursday 17 March 2016
Time: 16:00 - 17:30
Location / Room: Konferenz 4

Chair:
Dirk Stroobandt, Ghent University, BE

Co-Chair:
Jürgen Becker, Karlsruhe Institute of Technology (KIT), DE

In this session, we have three papers focused on design of platform and architectures for reconfigurable computing. The first paper described a dedicated hardware accelerator addressing the prohibitive computing demand of Homomorphic Encryption. The second paper develop larger, more efficient, overlays using multiple DSP blocks and then maximising their utilisation. The third paper proposes a novel scheme to dynamically optimize a reconfigurable VLIW processor by predicting and matching the number of active data-paths for each application phase.

TimeLabelPresentation Title
Authors
16:0012.6.1SECURING THE CLOUD WITH RECONFIGURABLE COMPUTING: AN FPGA ACCELERATOR FOR HOMOMORPHIC ENCRYPTION
Speaker:
Alessandro Cilardo, University of Naples Federico II, IT
Authors:
Alessandro Cilardo and Domenico Argenziano, University of Naples Federico II, IT
Abstract
A hot topic in current cloud security research, homomorphic encryption is a recently introduced technique allowing computation to take place on encrypted data. This work presents the architecture and implementation of a dedicated FPGA-based accelerator addressing the prohibitive computing demand of homomorphic encryption. In particular, the accelerator targets the most time consuming operation used by the encryption primitive, large integer multiplication. Based on an Altera's Stratix V FPGA platform, the prototype implementation achieves significant improvements in terms of execution time -under a comparable hardware cost- against alternative solutions previously presented in the technical literature.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.6.2THROUGHPUT ORIENTED FPGA OVERLAYS USING DSP BLOCKS
Speaker:
Douglas L. Maskell, Nanyang Technological University, SG
Authors:
Abhishek K. Jain1, Douglas L. Maskell1 and Suhaib A. Fahmy2
1Nanyang Technological University, SG; 2University of Warwick, GB
Abstract
Design productivity is a major concern preventing the mainstream adoption of FPGAs. Overlay architectures have emerged as one possible solution to this challenge, offering fast compilation and software-like programmability. However, overlays typically suffer from area and performance overheads due to limited consideration for the underlying FPGA architecture. These overlays have often been of limited size, supporting only relatively small compute kernels. This paper examines the possibility of developing larger, more efficient, overlays using multiple DSP blocks and then maximising utilisation by mapping multiple instances of kernels simultaneously onto the overlay to exploit kernel level parallelism. We show a significant improvement in achievable overlay size and overlay utilisation, with a reduction of almost 70% in the overlay tile requirement compared to existing overlay architectures, an operating frequency in excess of 300 MHz, and kernel throughputs of almost 60 GOPS.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.6.3RUN-TIME PHASE PREDICTION FOR A RECONFIGURABLE VLIW PROCESSOR
Speaker:
Stephan Wong, TUDelft, NL
Authors:
Qi Guo1, Anderson Sartor2, Anthony Brandon3, Xuehai Zhou1 and Stephan Wong3
1University of Science and Technology of China, CN; 2Universidade Federal do Rio Grande do Sul (UFRGS), BR; 3TUDelft, NL
Abstract
It is well-known that different applications exhibit varying amounts of ILP. Execution of these applications on the same fixed-width VLIW processor will result (1) in wasted energy due to underutilized resources if the issue-width of the processor is larger than the inherent ILP; or alternatively, (2) in lower performance if the issue-width is smaller than the inherent ILP. Moreover, even within a single application distinct phases can be observed with varying ILP and therefore changing resource requirements. With this in mind, we designed the rVEX processor, which is a VLIW processor that can change its issue-width at run-time. In this paper, we propose a novel scheme to dynamically (i.e., at run-time) optimize the resource utilization by predicting and matching the number of active data-paths for each application phase. The purpose is to achieve low energy consumption for applications with low ILP, and high performance for applications with high ILP, on a single VLIW processor design. We prototyped the rVEX processor on an FPGA and obtained the dynamic traces of applications running on top of a Linux port. Our results show that it is possible in some cases to achieve the performance of an 8-issue core with 10% lower energy consumption, while in others we achieve the energy consumption of a 2-issue core with close to 20% lower execution time.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session