UB07 Session 7

Label	Presentation Title Authors
UB07.1	LOOPINVADER: A COMPILER FOR TIGHTLY COUPLED PROCESSOR ARRAYS Presenter: Alexandru Tanase, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE Authors: Alexandru Tanase, Michael Witterauf, Ericles Sousa, Vahid Lari, Frank Hannig and Jürgen Teich, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE Abstract In today's coarse-grained reconfigurable architectures (CGRAs), application performance depends mostly on exploiting loop level and instruction level parallelism. However, it is tedious and error-prone to program such architectures in machine language manually. Here, only a compiler can make such architectures feasible. For solving this problem, we present a compiler for programming massively parallel processor arrays in particularly for so-called tightly processor arrays (TCPAs).By using a domain-specific language as design entry, our compiler symbolically parallelizes the code by using symbolic loop tiling techniques in the polyhedron model. Then, by replacing the parameters, e.g., with the desired number of processors elements (PEs), the compiler generates assembly code and interconnect configuration for different PEs which are combined to one binary. Finally, we demonstrate our tool flow for several selected examples. Download Paper (PDF)
UB07.2	INVADESIM: A SIMULATOR FOR HETEROGENEOUS MULTI-PROCESSOR SYSTEMS-ON-CHIP Presenter: Sascha Roloff, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE Authors: Sascha Roloff, Frank Hannig and Jürgen Teich, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE Abstract Innovative simulation mechanisms at system-level are a key for embedded hardware designers and parallel software developers to predict performance. This is important especially in a very early development phase where design space exploration (DSE) helps to guide design decisions in proper directions. In case of modern MPSoCs, DSE can be very costly and time consuming depending on the underlying simulation techniques. We present InvadeSIM, a parallel execution-driven simulator for fast functional and timing simulation of heterogeneous NoC-based MPSoCs. For this purpose, InvadeSIM combines a fast direct-execution simulation approach with different parallelization strategies. We will showcase our work by simulating a stream processing application from computer vision domain on a tiled MPSoC architecture in real-time. In particular, we present an object tracking chain that continuously captures frames from a robot camera, followed by object detection, and a control loop back to the camera. Download Paper (PDF)
UB07.3	ETEAK: ASYNCHRONOUS DATAFLOWS SYNTHESIS ONTO FPGAS USING THE ETEAK FRAMEWORK Presenter: Mahdi Jelodari Mamaghani, The University of Manchester, GB Authors: Mahdi Jelodari Mamaghani, Jim Garside and Steve Furber, The University of Manchester, GB Abstract We exploit eTeak (De-Elastisation [DATE'15] enabled) to synthesise asynchronous dataflow descriptions in Balsa into synchronous structure loadable onto FPGA. We will be also able to demonstrate the software realisation of the same architecture running on a laptop and let the audience compare the hardware vs. software concurrency. A brief experiment conducted in our recent study where a prime number generator (aka sieve of Eratosthenes) is implemented both in software using the CSP compiler and hardware using eTeak: On average the hardware implementation runs 90-120x faster than its software counterpart while the processor clock speed is almost the same as the hardware clock speed (1.2GHz). This allows us to plan ahead and exploit eTeak toward energy-efficient synthesis. According to EPSRC's research portpolio this work falls under the most growing research subject of "Energy Efficiency" which aims to achieve an energy reduction of 26-43% by exploiting ICT. Download Paper (PDF)
UB07.4	RC3E: DESIGN AND TEST AUTOMATIZATION IN THE CLOUD Presenter: Patrick Lehmann, Technische Universität Dresden, DE Authors: Patrick Lehmann, Oliver Knodel, Martin Zabel and Rainer G. Spallek, Technische Universität Dresden, DE Abstract Cloud computing is getting more and more interesting for companies, caused by its flexibility to provide apparently endless resources and nouveau services, while reducing he total cost of ownership for the user. Fields of applications reach from web technologies over storage solutions to complex business processes. The domain of chip and system design is well known for offloading resource intensive and long running synthesis or simulation task onto centralized servers. As hardware designs grow in an exponential way and verification requirements were strengthened, cloud services are investigated to compensate these needs. Anyway, in the end real hardware tests cannot be avoided. Our RC3E eco system brings close to the hardware prototype development and automated hardware testing into the cloud, continuing the principle of "test often and test early". The architecture offers virtualized and shared FPGA resources for prototyping, with automated remote debugging capabilities. Download Paper (PDF)
UB07.5	T-RIDE: A MOBILE-HEALTH NEURODIAGNOSTIC SYSTEM BASED ON SPATIO-TEMPORAL P300 MONITORING: DESIGN, DEVELOPMENT AND TEST IN VIVO Presenter: Valerio Francesco Annese, Politecnico di Bari, IT Authors: Valerio Francesco Annese, Giovanni Mezzina and Daniela De Venuto, Politecnico di Bari, IT Abstract A mobile health solution for neuro-cognitive impairment monitoring based on P300 spatio-temporal characterization achieved by tuned Residue Iteration Decomposition (t-RIDE) has been presented. The m-health service proposed allows remote monitoring of neuro-cognitive impairment through a 'plug and play' application, while doctor customization and data collection are allowed by cloud bridging. The developed t-RIDE method overcomes the limitations of the previous approaches (ICA; PCA; grand average; etc.). Its testing has been performed on 8 subjects performing three different cognitive tasks of increasing difficulty. P300 amplitude ranges (3.6uV - 11uV), latencies (280ms-390ms) and frontal-cortex spatial evidence (Pz, Fz, Cz) fully match medical references. T-RIDE convergence is reached in 148 iteration ensuring a 80% accuracy in P300 amplitude using only 13 trials (worst case) on single channel. Download Paper (PDF)
UB07.6	HYPERDIMENSIONAL COMPUTING FOR TEXT CLASSIFICATION: AN EFFICIENT SOFTWARE IMPLEMENTATION Presenter: Fateme Rasti Najafabadi, Sharif University of Technology, IR Authors: Fateme Rasti Najafabadi¹, Abbas Rahimi², Pentti Kanerva² and Jan Rabaey² ¹Sharif University of Technology, IR; ²University of California, Berkeley, US Abstract The mathematical properties of high-dimensional spaces show remarkable agreement with behaviors controlled by the brain. Hyperdimensional computing explores the emulation of cognition by computing with hypervectors as an alternative to computing with numbers. Hypervectors are high-dimensional (e.g., 10,000 dimensions) and holographic, and they appear randomly. These properties provide an opportunity for efficient computing, while aligning well with undesirable hardware variations in nanoscale fabrics. We focused on an application of hyperdimensional computing for text classification. Accordingly, we developed an algorithm to classify news stories from a stream of letters. Using pentagrams, the algorithm achieved a classification accuracy of above 95% for eight news topics that surpasses the other reported techniques in the literature including Bayes, K-NN, and SVM. We demonstrated a fully software framework that enables execution of such algorithms on the contemporary hardware fabrics. Download Paper (PDF)
UB07.7	Q27: PUTTING QUEENS IN CARRY CHAINS Presenter: Thomas Preußer, Technische Universität Dresden, DE Author: Thomas Preußer, Technische Universität Dresden, DE Abstract The N-Queens Puzzle is a fascinating combinatorial problem. Up to now, the number of distinct valid placements of N non-attacking queens on a generalized NxN-chessboard cannot be computed by a formula. Solution counts obtained from extensive explorations of the solution space are currently known for all N up to 26. The parallelization of this exploration is embarrassingly simple and is achieved by pre-placing the queens of a certain board region. This very flexible partioning approach makes the N-Queens Puzzle a great show-off case for tremendously parallel computation approaches. This demo illustrates an approach to compute the next, yet unknown solution count for the 27-Queens Puzzle that is based on a coronal pre-placement that does not only partition the overall computation but also cuts the size of the search space significantly by exploiting inherent symmetries. It presents higly effective hardware solvers that back an ongoing tremendously parallel computation. Download Paper (PDF)
UB07.8	CONTREP: A SINGLE-SOURCE FRAMEWORK FOR UML-BASED MODELLING AND DESIGN OF MIXED-CRITICALITY SYSTEMS Presenter: Fernando Herrera, University of Cantabria, ES Authors: Fernando Herrera and Eugenio Villar, University of Cantabria, ES Abstract Mixed-criticality systems integrate applications, platform resources and requirements with different criticality. A criticality reflects the impact of either a failure of a component or a violation of a requirement, which can range from irrelevant to catastrophic effects. This booth presents the CONTREP framework, which supports UML/MARTE based modeling, analysis and design of mixed-criticality embedded systems. The booth shows a model of a quadcopter control system which integrates safety critical (e.g. flight control), mission-critical (e.g., a video processing payload), and non-critical (e.g., monitoring) functions. The booth shows how mixed-criticality is captured, together with the description of the functional architecture, and of the multi-core embedded platform where the system is implemented; how CONTREP automates different design activities, i.e. model validation, performance assessment and design space exploration, exploiting mixed-criticality information in every case. Download Paper (PDF)
UB07.9	DAC GENERATOR: A DAC STAGE ANALOG CIRCUIT GENERATOR FOR UDSM AND FD-SOI TECHNOLOGIES Presenter: Benjamin Prautsch, Fraunhofer Institute for Integrated Circuits IIS, Design Automation Division EAS, DE Authors: Benjamin Prautsch, Sunil Rao, Uwe Eichler, Ajith Puppala and Torsten Reich, Fraunhofer Institute for Integrated Circuits IIS, Design Automation Division EAS, DE Abstract The design of analog integrated circuits requires extensive manual work which is error-prone and inefficient. With advanced ultra-deep sub-micron (UDSM) technologies, the manual design effort increases further dramatically. This work presents the application of a rethought generator approach for the efficient reusable design of a 12 bit current steering DAC. The current mirror stage of the DAC, which is arranged in the complex Q² random walk scheme for high intrinsic matching [1], is realized by a circuit generator which automatically creates schematic, symbol, and layout of the required cells within few minutes. Originally focused on a 28 nm bulk technology, the generator code was also executed in a 28 nm FD-SOI technology with minor migration effort due to the generic nature of our tool. In addition, the fast circuit generation enables an efficient layout optimization showcasing the benefit of analog circuit generators for "bottom-up" design [2] in advanced technology nodes. Download Paper (PDF)
UB07.10	LLBMC / QPR-VERIFY: HIGH-PRECISION BOUNDED MODEL CHECKING FOR AUTOMOTIVE SOFTWARE Presenter: Carsten Sinz, Karlsruhe Institute of Technology (KIT), DE Authors: Carsten Sinz, David Farago, Florian Merz and Reimo Schaupp, Karlsruhe Institute of Technology (KIT), DE Abstract LLBMC (the low-level bounded model checker) is a static software analysis tool for finding bugs in C (and, to some extent, in C++) programs. It is mainly intended for checking low-level system code and is based on the technique of Bounded Model Checking. LLBMC is fully automatic and requires minimal preparation efforts and user interaction. It supports all C constructs, including not so common features such as bitfields. LLBMC models memory accesses (heap, stack, global variables) with high precision and is thus able to find hard-to-detect memory access errors like heap or stack buffer overflows. LLBMC can also uncover errors due to uninitalized variables or other sources of non-deterministic behavior. Due to its precise analysis, LLBMC produces almost no false alarms (false positives). LLBMC is developed at Karlsruhe Institute of Technology, and will soon be commercially available via a university spin-off, QPR Technologies. Download Paper (PDF)
16:00	End of session Coffee Break in Exhibition Area

Label

Presentation Title
Authors

UB07.1

LOOPINVADER: A COMPILER FOR TIGHTLY COUPLED PROCESSOR ARRAYS
Presenter:
Alexandru Tanase, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE
Authors:
Alexandru Tanase, Michael Witterauf, Ericles Sousa, Vahid Lari, Frank Hannig and Jürgen Teich, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE
Abstract
In today's coarse-grained reconfigurable architectures (CGRAs), application performance depends mostly on exploiting loop level and instruction level parallelism. However, it is tedious and error-prone to program such architectures in machine language manually. Here, only a compiler can make such architectures feasible. For solving this problem, we present a compiler for programming massively parallel processor arrays in particularly for so-called tightly processor arrays (TCPAs).By using a domain-specific language as design entry, our compiler symbolically parallelizes the code by using symbolic loop tiling techniques in the polyhedron model. Then, by replacing the parameters, e.g., with the desired number of processors elements (PEs), the compiler generates assembly code and interconnect configuration for different PEs which are combined to one binary. Finally, we demonstrate our tool flow for several selected examples.
Download Paper (PDF)

UB07.2

INVADESIM: A SIMULATOR FOR HETEROGENEOUS MULTI-PROCESSOR SYSTEMS-ON-CHIP
Presenter:
Sascha Roloff, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE
Authors:
Sascha Roloff, Frank Hannig and Jürgen Teich, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE
Abstract
Innovative simulation mechanisms at system-level are a key for embedded hardware designers and parallel software developers to predict performance. This is important especially in a very early development phase where design space exploration (DSE) helps to guide design decisions in proper directions. In case of modern MPSoCs, DSE can be very costly and time consuming depending on the underlying simulation techniques. We present InvadeSIM, a parallel execution-driven simulator for fast functional and timing simulation of heterogeneous NoC-based MPSoCs. For this purpose, InvadeSIM combines a fast direct-execution simulation approach with different parallelization strategies. We will showcase our work by simulating a stream processing application from computer vision domain on a tiled MPSoC architecture in real-time. In particular, we present an object tracking chain that continuously captures frames from a robot camera, followed by object detection, and a control loop back to the camera.
Download Paper (PDF)

UB07.3

ETEAK: ASYNCHRONOUS DATAFLOWS SYNTHESIS ONTO FPGAS USING THE ETEAK FRAMEWORK
Presenter:
Mahdi Jelodari Mamaghani, The University of Manchester, GB
Authors:
Mahdi Jelodari Mamaghani, Jim Garside and Steve Furber, The University of Manchester, GB
Abstract
We exploit eTeak (De-Elastisation [DATE'15] enabled) to synthesise asynchronous dataflow descriptions in Balsa into synchronous structure loadable onto FPGA. We will be also able to demonstrate the software realisation of the same architecture running on a laptop and let the audience compare the hardware vs. software concurrency. A brief experiment conducted in our recent study where a prime number generator (aka sieve of Eratosthenes) is implemented both in software using the CSP compiler and hardware using eTeak: On average the hardware implementation runs 90-120x faster than its software counterpart while the processor clock speed is almost the same as the hardware clock speed (1.2GHz). This allows us to plan ahead and exploit eTeak toward energy-efficient synthesis. According to EPSRC's research portpolio this work falls under the most growing research subject of "Energy Efficiency" which aims to achieve an energy reduction of 26-43% by exploiting ICT.
Download Paper (PDF)

UB07.4

RC3E: DESIGN AND TEST AUTOMATIZATION IN THE CLOUD
Presenter:
Patrick Lehmann, Technische Universität Dresden, DE
Authors:
Patrick Lehmann, Oliver Knodel, Martin Zabel and Rainer G. Spallek, Technische Universität Dresden, DE
Abstract
Cloud computing is getting more and more interesting for companies, caused by its flexibility to provide apparently endless resources and nouveau services, while reducing he total cost of ownership for the user. Fields of applications reach from web technologies over storage solutions to complex business processes. The domain of chip and system design is well known for offloading resource intensive and long running synthesis or simulation task onto centralized servers. As hardware designs grow in an exponential way and verification requirements were strengthened, cloud services are investigated to compensate these needs. Anyway, in the end real hardware tests cannot be avoided. Our RC3E eco system brings close to the hardware prototype development and automated hardware testing into the cloud, continuing the principle of "test often and test early". The architecture offers virtualized and shared FPGA resources for prototyping, with automated remote debugging capabilities.
Download Paper (PDF)

UB07.5

T-RIDE: A MOBILE-HEALTH NEURODIAGNOSTIC SYSTEM BASED ON SPATIO-TEMPORAL P300 MONITORING: DESIGN, DEVELOPMENT AND TEST IN VIVO
Presenter:
Valerio Francesco Annese, Politecnico di Bari, IT
Authors:
Valerio Francesco Annese, Giovanni Mezzina and Daniela De Venuto, Politecnico di Bari, IT
Abstract
A mobile health solution for neuro-cognitive impairment monitoring based on P300 spatio-temporal characterization achieved by tuned Residue Iteration Decomposition (t-RIDE) has been presented. The m-health service proposed allows remote monitoring of neuro-cognitive impairment through a 'plug and play' application, while doctor customization and data collection are allowed by cloud bridging. The developed t-RIDE method overcomes the limitations of the previous approaches (ICA; PCA; grand average; etc.). Its testing has been performed on 8 subjects performing three different cognitive tasks of increasing difficulty. P300 amplitude ranges (3.6uV - 11uV), latencies (280ms-390ms) and frontal-cortex spatial evidence (Pz, Fz, Cz) fully match medical references. T-RIDE convergence is reached in 148 iteration ensuring a 80% accuracy in P300 amplitude using only 13 trials (worst case) on single channel.
Download Paper (PDF)

UB07.6

HYPERDIMENSIONAL COMPUTING FOR TEXT CLASSIFICATION: AN EFFICIENT SOFTWARE IMPLEMENTATION
Presenter:
Fateme Rasti Najafabadi, Sharif University of Technology, IR
Authors:
Fateme Rasti Najafabadi¹, Abbas Rahimi², Pentti Kanerva² and Jan Rabaey²
¹Sharif University of Technology, IR; ²University of California, Berkeley, US
Abstract
The mathematical properties of high-dimensional spaces show remarkable agreement with behaviors controlled by the brain. Hyperdimensional computing explores the emulation of cognition by computing with hypervectors as an alternative to computing with numbers. Hypervectors are high-dimensional (e.g., 10,000 dimensions) and holographic, and they appear randomly. These properties provide an opportunity for efficient computing, while aligning well with undesirable hardware variations in nanoscale fabrics. We focused on an application of hyperdimensional computing for text classification. Accordingly, we developed an algorithm to classify news stories from a stream of letters. Using pentagrams, the algorithm achieved a classification accuracy of above 95% for eight news topics that surpasses the other reported techniques in the literature including Bayes, K-NN, and SVM. We demonstrated a fully software framework that enables execution of such algorithms on the contemporary hardware fabrics.
Download Paper (PDF)

UB07.7

Q27: PUTTING QUEENS IN CARRY CHAINS
Presenter:
Thomas Preußer, Technische Universität Dresden, DE
Author:
Thomas Preußer, Technische Universität Dresden, DE
Abstract
The N-Queens Puzzle is a fascinating combinatorial problem. Up to now, the number of distinct valid placements of N non-attacking queens on a generalized NxN-chessboard cannot be computed by a formula. Solution counts obtained from extensive explorations of the solution space are currently known for all N up to 26. The parallelization of this exploration is embarrassingly simple and is achieved by pre-placing the queens of a certain board region. This very flexible partioning approach makes the N-Queens Puzzle a great show-off case for tremendously parallel computation approaches. This demo illustrates an approach to compute the next, yet unknown solution count for the 27-Queens Puzzle that is based on a coronal pre-placement that does not only partition the overall computation but also cuts the size of the search space significantly by exploiting inherent symmetries. It presents higly effective hardware solvers that back an ongoing tremendously parallel computation.
Download Paper (PDF)

UB07.8

CONTREP: A SINGLE-SOURCE FRAMEWORK FOR UML-BASED MODELLING AND DESIGN OF MIXED-CRITICALITY SYSTEMS
Presenter:
Fernando Herrera, University of Cantabria, ES
Authors:
Fernando Herrera and Eugenio Villar, University of Cantabria, ES
Abstract
Mixed-criticality systems integrate applications, platform resources and requirements with different criticality. A criticality reflects the impact of either a failure of a component or a violation of a requirement, which can range from irrelevant to catastrophic effects. This booth presents the CONTREP framework, which supports UML/MARTE based modeling, analysis and design of mixed-criticality embedded systems. The booth shows a model of a quadcopter control system which integrates safety critical (e.g. flight control), mission-critical (e.g., a video processing payload), and non-critical (e.g., monitoring) functions. The booth shows how mixed-criticality is captured, together with the description of the functional architecture, and of the multi-core embedded platform where the system is implemented; how CONTREP automates different design activities, i.e. model validation, performance assessment and design space exploration, exploiting mixed-criticality information in every case.
Download Paper (PDF)

UB07.9

DAC GENERATOR: A DAC STAGE ANALOG CIRCUIT GENERATOR FOR UDSM AND FD-SOI TECHNOLOGIES
Presenter:
Benjamin Prautsch, Fraunhofer Institute for Integrated Circuits IIS, Design Automation Division EAS, DE
Authors:
Benjamin Prautsch, Sunil Rao, Uwe Eichler, Ajith Puppala and Torsten Reich, Fraunhofer Institute for Integrated Circuits IIS, Design Automation Division EAS, DE
Abstract
The design of analog integrated circuits requires extensive manual work which is error-prone and inefficient. With advanced ultra-deep sub-micron (UDSM) technologies, the manual design effort increases further dramatically. This work presents the application of a rethought generator approach for the efficient reusable design of a 12 bit current steering DAC. The current mirror stage of the DAC, which is arranged in the complex Q² random walk scheme for high intrinsic matching [1], is realized by a circuit generator which automatically creates schematic, symbol, and layout of the required cells within few minutes. Originally focused on a 28 nm bulk technology, the generator code was also executed in a 28 nm FD-SOI technology with minor migration effort due to the generic nature of our tool. In addition, the fast circuit generation enables an efficient layout optimization showcasing the benefit of analog circuit generators for "bottom-up" design [2] in advanced technology nodes.
Download Paper (PDF)

UB07.10

LLBMC / QPR-VERIFY: HIGH-PRECISION BOUNDED MODEL CHECKING FOR AUTOMOTIVE SOFTWARE
Presenter:
Carsten Sinz, Karlsruhe Institute of Technology (KIT), DE
Authors:
Carsten Sinz, David Farago, Florian Merz and Reimo Schaupp, Karlsruhe Institute of Technology (KIT), DE
Abstract
LLBMC (the low-level bounded model checker) is a static software analysis tool for finding bugs in C (and, to some extent, in C++) programs. It is mainly intended for checking low-level system code and is based on the technique of Bounded Model Checking. LLBMC is fully automatic and requires minimal preparation efforts and user interaction. It supports all C constructs, including not so common features such as bitfields. LLBMC models memory accesses (heap, stack, global variables) with high precision and is thus able to find hard-to-detect memory access errors like heap or stack buffer overflows. LLBMC can also uncover errors due to uninitalized variables or other sources of non-deterministic behavior. Due to its precise analysis, LLBMC produces almost no false alarms (false positives). LLBMC is developed at Karlsruhe Institute of Technology, and will soon be commercially available via a university spin-off, QPR Technologies.
Download Paper (PDF)

16:00

End of session
Coffee Break in Exhibition Area

Visit us at DATE 2016