UB01 Session 1

Label	Presentation Title Authors
UB01.1	NOXIM-XT: A BIT-ACCURATE POWER ESTIMATION SIMULATOR FOR NOCS Presenter: Pierre Bomel, Université de Bretagne Sud, FR Authors: André Rossi¹, Johann Laurent² and Erwan Moreac² ¹LERIA, Université d'Angers, Angers, France, FR; ²Lab-STICC, Université de Bretagne Sud, Lorient, FR Abstract We have developped an enhanced version of Noxim (Noxim-XT) to estimate the energy consumption of a NoC in a SOC. Noxim-XT is used in a two-step methodology. First, applications are mapped on a SoC and their traffics are extracted by simulation with MPSOcBench. Second, Noxim-XT tests various hardware configurations of the NoC, and for each configuration, the application's traffic is re-injected and replayed, an accurate performance and power breakdown is provided, and the user can choose different data coding strategies. With the help of Noxim XT, each configuration is bit-accurately estimated in terms of energy consumption. After simulation, a spatial mapping of the energy consumption is provided and highlights the hot-spots. Moreover, the new coding strategies allows significant energy saving. Noxim XT simulations and a FPGA-based prototype of a new coding strategy will be demonstrated at the U-booth to illustrate these works. More information ...
UB01.2	TFA: TRANSPARENT CODE OFFLOADING ON FPGA Presenter: Roberto Rigamonti, HEIG-VD/HES-SO, CH Authors: Anthony Convers, Baptiste Delporte, Xavier Ruppen and Alberto Dassatti, HEIG-VD/HES-SO, CH Abstract Genomics, molecular dynamics, and machine learning are just the most recent examples of fields where FPGAs could provide the means to achieve interesting breakthroughs. However, HDL programming requires considerable multi-disciplinary skills, experience, large budgets, time, and a bit of wizardry. Given that most implementations are short-lived, the investment simply does not pay off. In this demo we propose a multi-vendor LLVM-based automated framework that can transparently - without the user or developer being aware of it - offload computing-intensive code fragments to FPGAs. The system relies on a performance monitor to detect computing-intensive code sections and, if they are suitable for offloading, extracts the Data Flow Graph and uses it to program an overlay pre-programmed on the FPGA, which then interacts with the Just-In-Time compiler executing the program. The overall process requires hundreds of microseconds, and can be easily reverted should the outcome be unsatisfactory. More information ...
UB01.3	DEMONSTRATION OF HW/SW CO-PROCESSING WITH FPGA FOR FAST VISUAL NAVIGATION OF ROVERS Presenter: Konstantinos Maragos, National Technical University of Athens, GR Authors: George Lentaris and Dimitrios Soudris, National Technical University of Athens, GR Abstract Autonomy, speed and accuracy constitute vital factors for the successful rover-exploration missions. However, the extremely low performance of the on-board space-grade CPUs in conjunction with the increased complexity of the sophisticated computer vision algorithms become a serious bottleneck for fast rover navigation. In this work, we present a HW/SW co-design solution based on FPGA to accelerate visual odometry algorithms tailored to the needs of future Mars exploration missions being scheduled by European Space Agency. For demonstration purposes, we use a Xilinx Kintex-7 FPGA to process images and perform feature detection, description, and matching. The FPGA communicates via ethernet port with the host CPU, which performs filtering and egomotion estimation with absolute orientation. We present the navigation path of a hypothetical moving rover which processes successively stereo images acquired by a hypothetical Martian surface while live-recording the CPU-FPGA co-processing. More information ...
UB01.4	MULTI-CORE VERIFICATION: COMBINING MICROTESK AND SPIN FOR VERIFICATION OF MULTI-CORE MICROPROCESSORS Presenter: Mikhail Chupilko, ISPRAS, RU Authors: Alexander Kamkin, Mikhail Lebedev and Andrei Tatarnikov, ISPRAS, RU Abstract The complexity of modern cache coherence protocols (CCP) in multi-core microprocessors prevents from complete verification of shared memory subsystems by means of random test-program generators (TPG). The following steps are suggested to target the problem. The first step is to separately specify CCP features and generate CCP-specific events to be used in TPG when generating a test program (TP). The protocol is specified in Promela, with Spin making a test template (TT). Spin also produces UVM (or C++TESK) testbench to make the execution of the resulting TPs to be controlable and deterministic. The second step is to let TPG produce the memory access instructions causing desired CCP-specific behavior. As a TPG we use MicroTESK. Its Ruby-based TTs abstractly describe future TPs. MicroTESK processes that TT making TP with CCP-specific events. The resulting TP is executed together with the testbench to exactly reproduce the situation Spin had found to be important for such a protocol. More information ...
UB01.5	A VOLTAGE-SCALABLE FULLY DIGITAL ON-CHIP MEMORY FOR ULTRA-LOW-POWER IOT PROCESSORS Presenter: Jun Shiomi, Kyoto University, JP Authors: Tohru Ishihara and Hidetoshi Onodera, Kyoto University, JP Abstract A voltage-scalable RISC processor integrating standard-cell based memory (SCM) is demonstrated. Unlike conventional processors, the processor has Standard-Cell based Memories (SCMs) as an alternative to conventional SRAM macros, enabling it to operate at a 0.4 V single-supply voltage. The processor is implemented with the fully automated cell-based design, which leads to low design costs. By scaling the supply voltage and applying the back-gate biasing techniques, the power dissipation of the SCMs is less than 20 uW, enabling the SCMs to operate with ambient energy source only. In this demonstration, the SCMs of the processor operates with a lemon battery as the ambient energy source. More information ...
UB01.6	MARGOT: APPLICATION ADAPTATION THROUGH RUNTIME AUTOTUNING Presenter: Gianluca Palermo, Politecnico di Milano, IT Authors: Davide Gadioli, Emanuele Vitali and Cristina Silvano, Politecnico di Milano, IT Abstract Several classes of applications expose parameters that influence their extra-functional properties, such as the quality of the result or the performance. This leads the application designer to tune these parameters to find the configuration that produces the desired outcome. Given that the application requirements and the resources assigned to each application might vary at runtime, finding a one-fit-all configuration is not a trivial task. For this reason, we implemented the mARGOt framework that enhances an application with an adaptation layer in order to continuously tune the parameters according to the evolving situation. More in detail, mARGOt is composed of a monitoring infrastructure, an application-level adaptation engine and an extra-functional configuration framework based on the separation of concerns paradigm between functional and extra-functional aspects. At the booth, we plan to demonstrate the effectiveness of the proposed infrastructure on three real-life applications. More information ...
UB01.7	XBARGEN: A TOOL FOR DESIGN SPACE EXPLORATION OF MEMRISTOR BASED CROSSBAR ARCHITECTURES. Presenter: Marcello Traiola, LIRMM, FR Authors: Mario Barbareschi¹ and Alberto Bosio² ¹University of Naples Federico II, IT; ²University of Montpellier - LIRMM laboratories, FR Abstract The unceasing shrinking process of CMOS technology is leading to its physical limits, impacting several aspects, such as performances, power consumption and many others.Alternative solutions are under investigation in order to overcome CMOS limitations.Among them, the memristor is one of promising technologies.Several works have been proposed so far, describing how to synthesize boolean logic functions on memristors-based crossbar architecture.However, depending on the synthesis parameters, different architectures can be obtained.In this demo, we show a Design Space Exploration (DSE) that we use to select the best crossbar configuration on the basis of workload dependent and independent parameters, such as area, time and power consumption.The main advantage is that it does not require any simulation and thus it avoid any runtime overheads.The demo aims to show the tool prototype on a selected set of benchmarks which will be synthesized on a memristor-based crossbar circuit. More information ...
UB01.8	MTA: MANCHESTER THERMAL ANALYZER Presenter: Scott Ladenheim, University of Manchester, GB Authors: Yi-Chung Chen, Vasilis Pavlidis and Milan Mihajlović, University of Manchester, GB Abstract The Manchester Thermal Analyzer (MTA) is a fast thermal analysis tool to compute temperature profiles of integrated circuits (ICs) in 3-D. The thermal simulations use the finite element method to discretize the heat equation in space coupled to an implicit time-integration method and are implemented with the open-source C++ library deal.II. The MTA supports higher-order elements, several time-integration methods, and fully adaptive spatiotemporal refinement. State-of-the-art preconditioned iterative methods solve the linear systems arising from the discretized equations as efficiently as possible. Using shared memory parallelization, the MTA solves systems on the order of tens of millions enabling modeling ICs at the cell-level. We present a thermal simulation of an Intel Xeon processor within a FCLGA package with heatsink to show the diverse structures of modern ICs the MTA simulates. The MTA also models other 3-D structures such as bonded tiers, TSVs, heatsinks, and heat spreaders. More information ...
UB01.9	SEFILE: A SECURE FILESYSTEM IN USERSPACE VIA SECUBE™ Presenter: Giuseppe Airofarulla, CINI, IT Authors: Paolo Prinetto¹ and Antonio Varriale² ¹CINI & Politecnico di Torino, IT; ²Blu5 Labs Ltd., IT Abstract The SEcube™ Open Source platform is a combination of three main cores in a single-chip design. Low-power ARM Cortex-M4 processor, a flexible and fast Field-Programmable-Gate-Array (FPGA), and an EAL5+ certified Security Controller (SmartCard) are embedded in an extremely compact package. This makes it a unique Open Source security environment where each function can be optimized, executed, and verified on its proper hardware device. In this demo, we present a Windows wrapper for a Filesystem in Userspace (FUSE) with an HDD firewall resorting to the hardware built-in capabilities, and the software libraries, of the SEcube™. More information ...
UB01.10	PER: METHOD AND TOOL FOR ANALYZING THE INTERPLAY BETWEEN PERFORMANCE, ENERGY AND SCALING IN MULTI- AND MANY-CORE PLATFORMS Presenter: Fei Xia, Newcastle University, GB Authors: Ashur Rafiev, Alexander Romanovsky and Alex Yakovlev, Newcastle University, GB Abstract Parallelization has been used to maintain a reasonable balance between energy consumption and performance in computing systems. However, the effectiveness of parallelization scaling is different for different hardware platforms. This is because the reliable operation region (ROR), a region defined in the voltage-throughput space for any hardware platform, is platform-dependent and its shape determines how effective parallelization scaling is in improving throughput and/or reducing power consumption. Although many of the interlinked issues are known, a unifying analysis method has just now been proposed to study the interplay between performance, energy, reliability and parallelization scaling. The method of bi-normalization of the ROR is designed to help achieve a meaningful cross-platform analysis of this interplay. The PER tool brings all these issues together and helps designers reason about hardware parallelization, DVFS and software parallelizability. More information ...
12:30	End of session
13:00	Lunch Break in Garden Foyer Keynote Lecture session 3.0 in "Garden Foyer" 1350 - 1420 Lunch Break in the Garden Foyer On all conference days (Tuesday to Thursday), a buffet lunch will be offered in the Garden Foyer, in front of the session rooms. Kindly note that this is restricted to conference delegates possessing a lunch voucher only. When entering the lunch break area, delegates will be asked to present the corresponding lunch voucher of the day. Once the lunch area is being left, re-entrance is not allowed for the respective lunch.

Label

Presentation Title
Authors

UB01.1

NOXIM-XT: A BIT-ACCURATE POWER ESTIMATION SIMULATOR FOR NOCS
Presenter:
Pierre Bomel, Université de Bretagne Sud, FR
Authors:
André Rossi¹, Johann Laurent² and Erwan Moreac²
¹LERIA, Université d'Angers, Angers, France, FR; ²Lab-STICC, Université de Bretagne Sud, Lorient, FR
Abstract
We have developped an enhanced version of Noxim (Noxim-XT) to estimate the energy consumption of a NoC in a SOC. Noxim-XT is used in a two-step methodology. First, applications are mapped on a SoC and their traffics are extracted by simulation with MPSOcBench. Second, Noxim-XT tests various hardware configurations of the NoC, and for each configuration, the application's traffic is re-injected and replayed, an accurate performance and power breakdown is provided, and the user can choose different data coding strategies. With the help of Noxim XT, each configuration is bit-accurately estimated in terms of energy consumption. After simulation, a spatial mapping of the energy consumption is provided and highlights the hot-spots. Moreover, the new coding strategies allows significant energy saving. Noxim XT simulations and a FPGA-based prototype of a new coding strategy will be demonstrated at the U-booth to illustrate these works.
More information ...

UB01.2

TFA: TRANSPARENT CODE OFFLOADING ON FPGA
Presenter:
Roberto Rigamonti, HEIG-VD/HES-SO, CH
Authors:
Anthony Convers, Baptiste Delporte, Xavier Ruppen and Alberto Dassatti, HEIG-VD/HES-SO, CH
Abstract
Genomics, molecular dynamics, and machine learning are just the most recent examples of fields where FPGAs could provide the means to achieve interesting breakthroughs. However, HDL programming requires considerable multi-disciplinary skills, experience, large budgets, time, and a bit of wizardry. Given that most implementations are short-lived, the investment simply does not pay off. In this demo we propose a multi-vendor LLVM-based automated framework that can transparently - without the user or developer being aware of it - offload computing-intensive code fragments to FPGAs. The system relies on a performance monitor to detect computing-intensive code sections and, if they are suitable for offloading, extracts the Data Flow Graph and uses it to program an overlay pre-programmed on the FPGA, which then interacts with the Just-In-Time compiler executing the program. The overall process requires hundreds of microseconds, and can be easily reverted should the outcome be unsatisfactory.
More information ...

UB01.3

DEMONSTRATION OF HW/SW CO-PROCESSING WITH FPGA FOR FAST VISUAL NAVIGATION OF ROVERS
Presenter:
Konstantinos Maragos, National Technical University of Athens, GR
Authors:
George Lentaris and Dimitrios Soudris, National Technical University of Athens, GR
Abstract
Autonomy, speed and accuracy constitute vital factors for the successful rover-exploration missions. However, the extremely low performance of the on-board space-grade CPUs in conjunction with the increased complexity of the sophisticated computer vision algorithms become a serious bottleneck for fast rover navigation. In this work, we present a HW/SW co-design solution based on FPGA to accelerate visual odometry algorithms tailored to the needs of future Mars exploration missions being scheduled by European Space Agency. For demonstration purposes, we use a Xilinx Kintex-7 FPGA to process images and perform feature detection, description, and matching. The FPGA communicates via ethernet port with the host CPU, which performs filtering and egomotion estimation with absolute orientation. We present the navigation path of a hypothetical moving rover which processes successively stereo images acquired by a hypothetical Martian surface while live-recording the CPU-FPGA co-processing.
More information ...

UB01.4

MULTI-CORE VERIFICATION: COMBINING MICROTESK AND SPIN FOR VERIFICATION OF MULTI-CORE MICROPROCESSORS
Presenter:
Mikhail Chupilko, ISPRAS, RU
Authors:
Alexander Kamkin, Mikhail Lebedev and Andrei Tatarnikov, ISPRAS, RU
Abstract
The complexity of modern cache coherence protocols (CCP) in multi-core microprocessors prevents from complete verification of shared memory subsystems by means of random test-program generators (TPG). The following steps are suggested to target the problem. The first step is to separately specify CCP features and generate CCP-specific events to be used in TPG when generating a test program (TP). The protocol is specified in Promela, with Spin making a test template (TT). Spin also produces UVM (or C++TESK) testbench to make the execution of the resulting TPs to be controlable and deterministic. The second step is to let TPG produce the memory access instructions causing desired CCP-specific behavior. As a TPG we use MicroTESK. Its Ruby-based TTs abstractly describe future TPs. MicroTESK processes that TT making TP with CCP-specific events. The resulting TP is executed together with the testbench to exactly reproduce the situation Spin had found to be important for such a protocol.
More information ...

UB01.5

A VOLTAGE-SCALABLE FULLY DIGITAL ON-CHIP MEMORY FOR ULTRA-LOW-POWER IOT PROCESSORS
Presenter:
Jun Shiomi, Kyoto University, JP
Authors:
Tohru Ishihara and Hidetoshi Onodera, Kyoto University, JP
Abstract
A voltage-scalable RISC processor integrating standard-cell based memory (SCM) is demonstrated. Unlike conventional processors, the processor has Standard-Cell based Memories (SCMs) as an alternative to conventional SRAM macros, enabling it to operate at a 0.4 V single-supply voltage. The processor is implemented with the fully automated cell-based design, which leads to low design costs. By scaling the supply voltage and applying the back-gate biasing techniques, the power dissipation of the SCMs is less than 20 uW, enabling the SCMs to operate with ambient energy source only. In this demonstration, the SCMs of the processor operates with a lemon battery as the ambient energy source.
More information ...

UB01.6

MARGOT: APPLICATION ADAPTATION THROUGH RUNTIME AUTOTUNING
Presenter:
Gianluca Palermo, Politecnico di Milano, IT
Authors:
Davide Gadioli, Emanuele Vitali and Cristina Silvano, Politecnico di Milano, IT
Abstract
Several classes of applications expose parameters that influence their extra-functional properties, such as the quality of the result or the performance. This leads the application designer to tune these parameters to find the configuration that produces the desired outcome. Given that the application requirements and the resources assigned to each application might vary at runtime, finding a one-fit-all configuration is not a trivial task. For this reason, we implemented the mARGOt framework that enhances an application with an adaptation layer in order to continuously tune the parameters according to the evolving situation. More in detail, mARGOt is composed of a monitoring infrastructure, an application-level adaptation engine and an extra-functional configuration framework based on the separation of concerns paradigm between functional and extra-functional aspects. At the booth, we plan to demonstrate the effectiveness of the proposed infrastructure on three real-life applications.
More information ...

UB01.7

XBARGEN: A TOOL FOR DESIGN SPACE EXPLORATION OF MEMRISTOR BASED CROSSBAR ARCHITECTURES.
Presenter:
Marcello Traiola, LIRMM, FR
Authors:
Mario Barbareschi¹ and Alberto Bosio²
¹University of Naples Federico II, IT; ²University of Montpellier - LIRMM laboratories, FR
Abstract
The unceasing shrinking process of CMOS technology is leading to its physical limits, impacting several aspects, such as performances, power consumption and many others.Alternative solutions are under investigation in order to overcome CMOS limitations.Among them, the memristor is one of promising technologies.Several works have been proposed so far, describing how to synthesize boolean logic functions on memristors-based crossbar architecture.However, depending on the synthesis parameters, different architectures can be obtained.In this demo, we show a Design Space Exploration (DSE) that we use to select the best crossbar configuration on the basis of workload dependent and independent parameters, such as area, time and power consumption.The main advantage is that it does not require any simulation and thus it avoid any runtime overheads.The demo aims to show the tool prototype on a selected set of benchmarks which will be synthesized on a memristor-based crossbar circuit.
More information ...

UB01.8

MTA: MANCHESTER THERMAL ANALYZER
Presenter:
Scott Ladenheim, University of Manchester, GB
Authors:
Yi-Chung Chen, Vasilis Pavlidis and Milan Mihajlović, University of Manchester, GB
Abstract
The Manchester Thermal Analyzer (MTA) is a fast thermal analysis tool to compute temperature profiles of integrated circuits (ICs) in 3-D. The thermal simulations use the finite element method to discretize the heat equation in space coupled to an implicit time-integration method and are implemented with the open-source C++ library deal.II. The MTA supports higher-order elements, several time-integration methods, and fully adaptive spatiotemporal refinement. State-of-the-art preconditioned iterative methods solve the linear systems arising from the discretized equations as efficiently as possible. Using shared memory parallelization, the MTA solves systems on the order of tens of millions enabling modeling ICs at the cell-level. We present a thermal simulation of an Intel Xeon processor within a FCLGA package with heatsink to show the diverse structures of modern ICs the MTA simulates. The MTA also models other 3-D structures such as bonded tiers, TSVs, heatsinks, and heat spreaders.
More information ...

UB01.9

SEFILE: A SECURE FILESYSTEM IN USERSPACE VIA SECUBE™
Presenter:
Giuseppe Airofarulla, CINI, IT
Authors:
Paolo Prinetto¹ and Antonio Varriale²
¹CINI & Politecnico di Torino, IT; ²Blu5 Labs Ltd., IT
Abstract
The SEcube™ Open Source platform is a combination of three main cores in a single-chip design. Low-power ARM Cortex-M4 processor, a flexible and fast Field-Programmable-Gate-Array (FPGA), and an EAL5+ certified Security Controller (SmartCard) are embedded in an extremely compact package. This makes it a unique Open Source security environment where each function can be optimized, executed, and verified on its proper hardware device. In this demo, we present a Windows wrapper for a Filesystem in Userspace (FUSE) with an HDD firewall resorting to the hardware built-in capabilities, and the software libraries, of the SEcube™.
More information ...

UB01.10

PER: METHOD AND TOOL FOR ANALYZING THE INTERPLAY BETWEEN PERFORMANCE, ENERGY AND SCALING IN MULTI- AND MANY-CORE PLATFORMS
Presenter:
Fei Xia, Newcastle University, GB
Authors:
Ashur Rafiev, Alexander Romanovsky and Alex Yakovlev, Newcastle University, GB
Abstract
Parallelization has been used to maintain a reasonable balance between energy consumption and performance in computing systems. However, the effectiveness of parallelization scaling is different for different hardware platforms. This is because the reliable operation region (ROR), a region defined in the voltage-throughput space for any hardware platform, is platform-dependent and its shape determines how effective parallelization scaling is in improving throughput and/or reducing power consumption. Although many of the interlinked issues are known, a unifying analysis method has just now been proposed to study the interplay between performance, energy, reliability and parallelization scaling. The method of bi-normalization of the ROR is designed to help achieve a meaningful cross-platform analysis of this interplay. The PER tool brings all these issues together and helps designers reason about hardware parallelization, DVFS and software parallelizability.
More information ...

12:30

End of session

13:00

Lunch Break in Garden Foyer

Keynote Lecture session 3.0 in "Garden Foyer" 1350 - 1420

Lunch Break in the Garden Foyer
On all conference days (Tuesday to Thursday), a buffet lunch will be offered in the Garden Foyer, in front of the session rooms. Kindly note that this is restricted to conference delegates possessing a lunch voucher only. When entering the lunch break area, delegates will be asked to present the corresponding lunch voucher of the day. Once the lunch area is being left, re-entrance is not allowed for the respective lunch.

available at

Visit us at DATE 2017

Booth: 20+21

Booth: 30

Booth: 17

Booth: 26

Booth: 1

Booth: 23

Submissions

DATE Smartphone App

Visit us at DATE 2017