Booklet Proof Reading

Printer-friendly version PDF version

Goto Session:

1.1 Opening Session: Plenary, Awards Ceremony & Keynote Addresses

Date: Tuesday 15 March 2016
Time: 08:30 - 10:30
Location / Room: Großer Saal

TimeLabelPresentation Title
Authors
08:451.1.2PRESENTATION OF DISTINGUISHED AWARDS
09:151.1.3KEYNOTE: FROM THE HAPPY FEW TO THE HAPPY MANY - TOWARDS AN INTUITIVE INTERNET OF THINGS
Speaker:
Luc Van den hove, IMEC, BE
Abstract
The last year every high-tech company was talking about the Internet of Things. The coming decade, we will indeed see a rise in smart connected systems. Machines, buildings, vehicles, personal appliances will all be equipped with more intelligence that will be interconnected. Smart systems will be unobtrusive, ultra-small, cheap, intelligent, and ultra-low power. They will include sensors, actuators, and processing and communication abilities, often in a one-chip wireless solution. Imec aims at bringing the Internet of Things to the next level. Imec develops the building blocks to create an easy-to-use Internet of Things that surrounds us, that interacts with us as individuals, that learns our habits, our preferences, our health… An Internet of Things that will connect diverse unconnected systems. That will turn the massive amount of measured data in information to make the right decisions, to take the right actions exactly as we need or want. Of course taking into account our privacy preferences. This Intuitive Internet of Things will help manage the sustainability, complexity and safety of our world. It will increase our comfort and wellbeing. Not only of the happy few. Imec will bring the Intuitive Internet of Things to the happy many.
08:301.1.1WELCOME ADDRESSES
Speakers:
Luca Fanucci1 and Jürgen Teich2
1DATE 2016 General Chair, University of Pisa, IT; 2DATE 2016 Programme Chair, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE
1.1.4KEYNOTE: DESIGN WILL MAKE EVERYTHING DIFFERENT
Speaker:
Antun Domic, Synopsys, US
Abstract
How many different silicon manufacturing process technologies will there be at 10, 7, or 5 nanometers? Probably only three. How many design starts will there be at 10, 7, or 5 nanometers? According to IBS [1], in 2025 there will be less than 250 design starts at 10 nanometers and below, about 3% of the total number of design starts that year, and only about five of those design starts, i.e. 2% of the 3% (0.05% of the total) will take place in Europe. But this is not the end! This is not even the beginning of the end. There is a great deal of opportunity beyond the relentless progression of Moore's law. Design innovation can be the enabler, and the differentiator, regardless of the process technology node. Automotive is a great example: according to Bosch [2], electronics represents 80% of the innovation in cars, and 40% of its cost; the car is a computer - actually, over one hundred computers - on four wheels already, and it will get smarter and smarter, with new layers of services and players just around the corner. Design, and design automation can help increase and accelerate innovation, and at the same time, improve efficiency. The "Internet of Things" is another, potentially greater example: smartness going way beyond the phone. Everything will get smarter: cars, homes, cities, agriculture, farming, factories, etc. Most of the IoT enablement and differentiation will stem from design, and design automation, which include IP, and an increasing amount of software. After performance and power consumption, systems reliability and security have already become critical design considerations at the dawn of a new era, in which design will be critical to make everything better." [1] Design Starts by Geographic Region 2010-2025, International Business Strategies, Inc. (IBS), 2015 [2] "Can EDA Solve the Problems of Electronic Design for the Car of the Future?", Peter van Staa, Robert Bosch, ICCAD 2014 Keynote Address
10:30End of session
Coffee Break in Exhibition Area

UB01 Session 1

Date: Tuesday 15 March 2016
Time: 10:30 - 12:30
Location / Room: Booth 15, Exhibition Area

LabelPresentation Title
Authors
UB01.1HIGH-END 122GHZ MINIATURE RADAR SENSOR FOR AUTONOMOUS AIRCRAFTS
Presenter:
Federico Nava, Heinz Nixdorf Institute - Universität Paderborn, DE
Authors:
Federico Nava1 and Christoph Scheytt2
1Heinz Nixdorf Institute - Universität Paderborn, DE; 2Heinz Nixdorf Institute - Paderborn, DE
Abstract
The importance of high precision sensors, sensors-arrays and the concept of sensor fusion are rising interest in the field of scientific research for autonomous vehicles. For this reason the System and Circuit Technology group at the Heinz Nixdorf Institute is currently developing a highly integrated radar module as a sensor for Unmanned Aerial Vehicle applications. The presented system is composed of a radar IC (130nm SiGe) with in-package antennas and operating frequency of 122GHz mounted on a FLEX-PCB including a CORTEX M4 MCU for a total size of 30x30mm.
The presentation will show the FMCW/CW radar functions of the device, allowing the tracking of velocity and distance for multiple objects. The results of the radar measurements will be presented on a screen showing the raw data acquired in time domain and a FFT representation. Different objects will move simultaneously in the area of reception of the sensor. The results of the tracked distances will be then plotted on screen.

Download Paper (PDF)
UB01.2D-VASIM: TIMING ANALYSIS OF GENETIC LOGIC CIRCUITS USING D-VASIM
Presenter:
Hasan Baig, Technical University of Denmark, DK
Authors:
Hasan Baig and Jan Madsen, Technical University of Denmark, DK
Abstract
A genetic logic circuit is a gene regulator network implemented by re-engineering the DNA of a cell, in order to control gene expression or metabolic pathways, through a logic combination of external signals, such as chemicals or proteins. As for electronic logic circuits, timing and propagation delay analysis may also play a very significant role in the designing of genetic logic circuits. In this demonstration, we present the capability of D-VASim (Dynamic Virtual Analyzer and Simulator) to perform the timing and propagation delay analysis of a single as well as cascaded genetic logic circuits. D-VASim allows user to change the circuit parameters during runtime simulation to observe their effects on circuit's timing behavior. The results obtained from D-VASim can be used not only to characterize the timing behavior of genetic logic circuits but also to analyze the timing constraints of cascaded genetic logic circuits.

Download Paper (PDF)
UB01.3ALPT: A FAST PROTOTYPING METHODOLOGY WITH CONSTRAINED FLOORPLANING ON ANALOG LAYOUT GENERATION
Presenter:
Po-Cheng Pan, National Chiao Tung University, TW
Authors:
Po-Cheng Pan, Hung-Wen Huang and Hung-Ming Chen, National Chiao Tung University, TW
Abstract
Layout generation in the recent analog design is challenging by its critical layout dependent effect (LDE). Based on the same netlist design, different layouts lead distinct performances. Therefore, it is necessary to observe and avoid the LDE during generation. Traditionally, the strategies of analog layout generation mostly count on experienced designers. However, the experience is based on time-consuming manually try-run, which is inefficient and unreliable. In this work, we develop a fast prototyping for analog layout generation. In our approach, we apply a fast floorplanning algorithm, for multi-layout generation and select the feasible results w.r.t. the analog constraints pre-decided. For practical usage, we implement this approach embedded on the EDA-tool so that layout designers are able to design with such prototypes for efficiency. The demonstration includes layout prototyping generation, the integration between our program and EDA-tool and the resulting layout prototypes.

Download Paper (PDF)
UB01.4MICROTESK ARMV8 EDITION: SPECIFICATION-BASED TEST PROGRAM GENERATOR
Presenter:
Andrei Tatarnikov, Russian Academy of Sciences (RAS), RU
Authors:
Andrei Tatarnikov, Alexander Kamkin and Artem Kotsynyak, Russian Academy of Sciences (RAS), RU
Abstract
This work presents a test program generation tool for ARMv8 microprocessors. The tool consists of two parts: an architecture-independent test program generation core and ARMv8 specifications. The specifications provide information on the instruction set architecture and the memory management unit of an ARMv8 microprocessor. Test programs are generated on the basis of test templates provided by users and testing knowledge extracted from the specifications. Test templates describe scenarios to be covered in terms of test situations, while testing knowledge specifies constraints that should be satisfied in order for these situations to occur. The architecture-independent test program generation core implements a wide range of test generation techniques including random generation, combinatorial generation, constraint solving and symbolic execution. Flexible architecture of the tool allows integrating different generation methods and extending the test generation core with new engines.

Download Paper (PDF)
UB01.5AGAMID: A TLM FRAMEWORK FOR EVALUATION OF HARDWARE-ENHANCED MANY-CORE RUN-TIME MANAGEMENT
Presenter:
Daniel Gregorek, University of Bremen, DE
Authors:
Daniel Gregorek and Alberto Garcia-Ortiz, University of Bremen, DE
Abstract
The advent of many-core processors raises novel demands to system design. Power-limitations and abundant parallelism require for efficient and scalable run-time management. But the design of a many-core run-time manager generally suffers from exhaustive evaluation time. AGAMID is a novel research framework for design space exploration of hardware-enhanced many-core run-time management. In this demo, we use AGAMID for the interactive analysis of many-core architectures and run-time management systems. We perform hands-on comparison of RTM architectures, RTM algorithms and HW/SW partitionings. We also give insights into the design and architecture of the framework itself.

Download Paper (PDF)
UB01.6A-LOOP: AMP SYSTEM WITH A DUAL-CORE ARM CORTEX A9 PROCESSOR WITH LINUX OPERATING SYSTEM AND A QUAD-CORE LEON3 PROCESSOR WITH LINUX OPERATING SYSTEM, OPENMP LIBRARY AND HARDWARE PROFILING SYSTEM
Presenter:
Giacomo Valente, Università Degli Studi Dell'Aquila, IT
Authors:
Giacomo Valente and Vittoriano Muttillo, Università Degli Studi Dell'Aquila, IT
Abstract
Isles of computational elements with different characteristics can be exploited for separate tasks with different non-functional requirements. This can drive to realization of smart System On Modules (SoM). In such a context, SoC with FPGA can be viewed as platforms useful to prototype these architectures. This demo shows a SoM prototype for aerospace applications developed on Zynq7000 SoC, composed of dual-core ARM Cortex A9 with Linux operating system (isle#1) able to interface with external data, and quad-core Leon3 with SMP Linux operating system (isle#2), able to execute parallel applications based on OpenMP library. These 2 computational isles share an external DDR memory, so that isle#1 can provide data and collect results from isle#2. Moreover, isle#1 is able to monitor performance of isle#2 without introducing software overhead (i.e. no SW instrumentation) by using a hardware profiling system. The whole system that executes a MANET localization algorithm will be presented.

Download Paper (PDF)
UB01.7RETRASCOPE: TOOLKIT FOR ANALYSIS AND VERIFICATION OF HDL DESIGNS
Presenter:
Sergey Smolov, Russian Academy of Sciences (RAS), RU
Authors:
Sergey Smolov, Alexander Kamkin and Mikhail Lebedev, Russian Academy of Sciences (RAS), RU
Abstract
Retrascope is an open-source toolkit for Reverse Engineering and TRAnsformation of digital hardware designs described in such hardware description languages as Verilog and VHDL. The toolkit allows analyzing HDL descriptions, reconstructing the underlying models (guarded actions, extended finite state machines, high-level decision diagrams etc.) and using the derived models for test generation, property checking and other tasks. Retrascope is organized as an extendible framework with the ability to add new types of models as well as tools for their analysis and transformation. The primary application domain of the toolkit is functional verification of hardware at the unit level.

Download Paper (PDF)
UB01.8PFPSIM: A PROGRAMMABLE FORWARDING PLANE SIMULATOR
Presenter:
Gordon Bailey, Concordia University, CA
Author:
Gordon Bailey, Concordia University, CA
Abstract
We demonstrate PFPSim, a host-compiled simulator for early validation and analysis of packet processing applications on programmable forwarding plane architectures, used in software defined networks. The simulation model is automatically generated from a high-level description of the hardware/software architecture of the forwarding device and the behavioral description of the various modules in the architecture. Our high-level architectural description language is capable of defining many-core network processors as well as reconfigurable pipelines. The behavior of the fixed-function processing elements in the architecture is defined in C++. The code targeted for the processor cores, or reconfigurable pipeline stages, is compiled from P4, an emerging programming language for packet processing applications. Network dataplane programmers can use PFPSim as a virtual prototype to simulate and debug their applications before hardware availability.

Download Paper (PDF)
UB01.9BIOVIZ: AN INTERACTIVE VISUALIZATION ENGINE FOR MICROFLUIDIC BIOCHIPS
Presenter:
Oliver Keszöcze, University of Bremen, DE
Authors:
Oliver Keszöcze1, Jannis Stoppe2, Robert Wille3 and Rolf Drechsler2
1University of Bremen, DE; 2DFKI and University of Bremen, DE; 3Johannes Kepler University, AT, DFKI and University of Bremen, DE
Abstract
In order to shorten the required time for the analysis of medical substances, digital microfluidic biochips (DMFBs) have been suggested. Issues such as routing and layouting are complex and currently being investigated. Although first automatic solutions assist the designers, the results are usually provided in a complex and non-intuitive fashion. Creating solutions requires testing of different setups, comparing the results and debugging of algorithms. Solutions, while being technically correct, often include negative aspects such as e.g. unnecessary cell usage. These aspects are difficult to spot without being able to visually inspect the design. Still, while designers would benefit from visualization tools, no dedicated tools have been built yet. We present BioViz, an interactive visualization tool for DMFBs that explicitly addresses these problems.

Download Paper (PDF)
12:30End of session
13:00Lunch Break in Großer Saal + Saal 1

2.1 Executive Track Panel: Enabling a Connected World via Internet of Things

Date: Tuesday 15 March 2016
Time: 11:30 - 13:00
Location / Room: Saal 2

Organiser:
Yervant Zorian, Synopsys, US

Enabling a connected world through Internet of Things empower a variety of applications, including medical wearables, home automation, energy, transportation, environmental monitoring, etc. This results in several new approaches and innovative methods that work together to enable the network of smart devices. The executives in this session will discuss the impact of IoT on the semiconductor industry and their influence on the eco system players.

Executives:

  • Christoph Heer, Intel, DE
  • Jamil Kawa, Synopsys, US
  • Rudy Lauwereins, IMEC, BE
  • Cheng-Wen Wu, Industrial Technology Research Institute, TW
13:00End of session
Lunch Break in Großer Saal + Saal 1

2.2 Embedded Tutorial: The Dark Silicon Problem: Technology to the Rescue?

Date: Tuesday 15 March 2016
Time: 11:30 - 13:00
Location / Room: Konferenz 6

Organisers:
Michael Niemier, University of Notre Dame, South Bend, US
Siddharth Garg, New York University, US

Chair:
Muhammad Shafique, Karlsruhe Institute of Technology (KIT), DE

Co-Chair:
Umit Ogras, Arizona State University, US

In 2014, Jörg Henkel organized a "hot topic" special session that provided the DATE community with a snapshot of current research activities related to the grand challenge of dark silicon (DS). A primary purpose of that session was to introduce and engage the design automation community on this important problem. The lead presentation in the 2014 session was by Prof. Michael Taylor who spoke about the "landscape of the new dark silicon design regime." He defined a taxonomy termed "the four horsemen" for addressing the DS challenge. These are:- The shrinking horseman - i.e., addressing power density and thermal challenges caused by transistor scaling- The dim horseman - i.e., mitigating the DS challenge using near-threshold voltage scaling- The "deux ex machine" horseman - i.e., leveraging emerging and/or disruptive device technologies with more appealing power, performance and power density trade-offs- The specialization horseman - i.e., provisioning chips with a large number of application-specific acceleratorsTaylor notes: "Future chips are likely to employ not just one horseman, but all of them, in interesting and unique combinations.". In this embedded tutorial, we consider how researchers are leveraging new technologies - especially 3D integration and new transistor technologies - to address the DS problem. For continuity, we frame technology-based solutions in the context of the four-horsemen identified by Taylor in 2014.

TimeLabelPresentation Title
Authors
11:302.2.1TOWARDS PERFORMANCE AND RELIABILITY-EFFICIENT COMPUTING IN THE DARK SILICON ERA
Speaker:
Jörg Henkel, Karlsruhe Institute of Technology (KIT), DE
Authors:
Jörg Henkel, Santiago Pagani, Heba Khdr, Florian Kriebel, Semeen Rehman and Muhammad Shafique, Karlsruhe Institute of Technology (KIT), DE
Abstract
This paper discusses the power density and temperature induced issues in modern on-chip systems due to the high integration density and roadblock on the voltage scaling. First, the emerging dark silicon problem is discussed, and the corresponding critical research challenges in future chips are enumerated. Afterwards, we present an overview of some key research efforts and concepts that leverage dark silicon for performance and reliability optimization of on-chip systems under power or temperature constraints. The summarized works account for heat transfer inside a chip, as well as the varying performance and power trade-offs of gray silicon, that is, the potential benefits of operating at lower-than-nominal voltage and frequency levels. Besides realizing reliability-heterogeneous architectures, reliability of an on-chip system is enhanced by exploiting dark silicon for aging deceleration and resilience-driven resource management to mitigate soft-errors. Several of the tools discussed in this paper are available for download at http://ces.itec.kit.edu/download.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.2.2TOWARDS NEAR-THRESHOLD SERVER PROCESSORS
Speaker:
David Atienza, École Polytechnique Fédérale de Lausanne (EPFL), CH
Authors:
Ali Pahlevan1, Javier Picorel1, Arash Pourhabibi Zarandi1, Davide Rossi2, Marina Zapater3, Andrea Bartolini4, Pablo G. del Valle1, David Atienza1, Luca Benini4 and Babak Falsafi1
1École Polytechnique Fédérale de Lausanne (EPFL), CH; 2ETH Zurich, CH; 3CEI Campus Moncloa, UCM-UPM, ES; 4Università di Bologna, IT
Abstract
The popularity of cloud computing has led to a dramatic increase in the number of data centers in the world. The ever-increasing computational demands along with the slowdown in technology scaling has ushered an era of power-limited servers. Techniques such as near-threshold computing (NTC) can be used to improve energy efficiency in the post-Dennard scaling era. This paper describes an architecture based on the FD-SOI process technology for near-threshold operation in servers. Our work explores the trade-offs in energy and performance when running a wide range of applications found in private and public clouds, ranging from traditional scale-out application, such as web search or media streaming, to virtualized banking applications. Our study demonstrates the benefits of near-threshold operation and proposes several directions to synergistically increase the energy proportionality of a near-threshold server.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.2.3CAN BEYOND-CMOS DEVICES ILLUMINATE DARK SILICON?
Speaker:
Michael Niemier, University of Notre Dame, US
Authors:
Robert Perricone, X. Sharon Hu, Joseph Nahas and Michael Niemier, University of Notre Dame, US
Abstract
Throughout the last decade, the microprocessor industry has been struggling to preserve the benefits of Moore's Law scaling. The persistent scaling of CMOS technology no longer yields exponential performance gains due in part to the growth of dark silicon. With each subsequent technology node generation, power constraints resulting from factors such as sub-threshold leakage currents are projected to further limit the number of transistors that can be simultaneously pow- ered. To overcome the limits of CMOS devices, researchers are working to develop "beyond-CMOS" device technologies. To determine the most promising beyond-CMOS devices, it is necessary to benchmark them against CMOS. In this paper, we present the design and validation of an analytical bench- marking model that evaluates CMOS and beyond-CMOS devices at the architectural-level. Our model is built from the device to the architectural/application-level. Our target architecture is a symmetric multi-core processor executing highly parallel applications (i.e., PARSEC). As a case study, we select one class of promising beyond-CMOS devices, tunneling field-effect transistors, to evaluate against CMOS.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Großer Saal + Saal 1

2.3 Automotive Systems and Smart Energy Systems

Date: Tuesday 15 March 2016
Time: 11:30 - 13:00
Location / Room: Konferenz 1

Chair:
Geoff Merrett, University of Southampton, GB

Co-Chair:
Frank Hannig, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE

This session considers the state of the art in automotive systems and smart energy systems including novel approaches for efficient embedded software in automobiles, formal analyses and fault detection, and joint optimisation approaches for lifetime and functionality improvements in electric vehicles.

TimeLabelPresentation Title
Authors
11:302.3.1(Best Paper Award Candidate)
OTEM: OPTIMIZED THERMAL AND ENERGY MANAGEMENT FOR HYBRID ELECTRICAL ENERGY STORAGE IN ELECTRIC VEHICLES
Speaker:
Mohammad Al Faruque, University of California, Irvine, US
Authors:
Korosh Vatanparvar and Mohammad Abdullah Al Faruque, University of California, Irvine, US
Abstract
Electric Vehicles (EV) pose challenges in terms of reliability and performance which are due to the stringent design constraints. For instance, an insufficient energy storage restricts the EV driving range. Highly dense battery packs providing EV with the required power, may generate extreme internal heat which causes the battery temperature to rise significantly and thereby results in reliability and safety issues. Moreover, both high battery utilization and temperature may degrade the battery capacity and Battery LifeTime (BLT), which should be extended as much as possible to postpone expensive battery replacement costs. Although, researchers have provided separate battery energy and thermal managements for EVs to address the above-mentioned challenges, in this paper, we are bringing a joint optimized solution. Hence, we introduce a novel metric Thermal and Energy Budget (TEB) in a Hybrid Electrical Energy Storage (HEES) with an active battery cooling system. Furthermore, we propose a novel Optimized Thermal and Energy Management (OTEM) methodology which optimizes the battery/ultracapacitor utilization, battery temperature, and thereby TEB, in order to improve the driving range, extend the BLT, and maintain the battery temperature in the safe zone. Our methodology provides significant improvement in BLT (on average 16.8%) and average energy consumption (on average 12.1% reduction) compared to the state-of-the-art methodologies.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.3.2SUPERTASK: MAXIMIZING RUNNABLE-LEVEL PARALLELISM IN AUTOSAR APPLICATIONS
Speaker:
Sebastian Kehr, Denso Automotive Deutschland GmbH, DE
Authors:
Sebastian Kehr1, Milos Panic2, Eduardo Quinones3, Bert Boeddeker1, Jorge Becerril Sandoval1, Jaume Abella3, Francisco Cazorla4 and Günter Schäfer5
1Denso Automotive Deutschland GmbH, DE; 2Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES; 3Barcelona Supercomputing Center, ES; 4Barcelona Supercomputing Center and IIIA-CSIC, ES; 5Ilmenau University of Technology, DE
Abstract
The migration of legacy AUTOSAR automotive software from a single-core ECU to a multicore ECU faces two main challenges: 1) data dependencies between AUTOSAR runnables must be respected, which may limit the level of parallelism; 2) the original data-flow from the single-core must be reproduced, in order to guarantee the same functional behaviour without exhaustive validation and testing efforts afterwards. This article proposes the concept of supertask that maximizes the level of parallelism among runnables and maintains the original data-flow from the single-core. Supertasks group consecutively scheduled AUTOSAR tasks into a unique scheduling entity with a period equal to the least common multiple of tasks composing it. We evaluate supertasks with a real automotive application and compare it with existing state-of-the-art approaches with the same objectives. Our results show that supertasks effectively increase the performance with respect to current state-of-the-art, resulting in an overall performance improvement of the application when combining supertask with current approaches.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.3.3FORMAL ANALYSIS BASED EVALUATION OF SOFTWARE DEFINED NETWORKING FOR TIME-SENSITIVE ETHERNET
Speaker:
Daniel Thiele, Technische Universität Braunschweig, DE
Authors:
Daniel Thiele and Rolf Ernst, Technische Universität Braunschweig, DE
Abstract
Software defined networking (SDN) aims to standardize the control and configuration of network infrastructure. It consolidates network control by moving the network's control plane to a (logically) centralized controller and downgrading switches to simple forwarding devices. This offers huge advantages for future automotive Ethernet networks, including admission control (e.g. to prevent/limit congestion) or network reconfiguration (e.g. in case of faults), both based on a centralized view of the current network state. SDN's centralized architecture, however, requires additional communication, which entails a certain overhead. If SDN is used in safety-critical real-time networks, this communication is subject to strict timing requirements. In this paper, we present a formal analysis based evaluation of the general suitability of SDN for time-sensitive networks including overhead, scalability, and timing guarantees by using a realistic automotive setup.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:452.3.4ACCELERATED ARTIFICIAL NEURAL NETWORKS ON FPGA FOR FAULT DETECTION IN AUTOMOTIVE SYSTEMS
Speaker:
Shreejith Shanker, Nanyang Technological University, SG
Authors:
Shreejith Shanker1, Bezborah Anshuman1 and Suhaib A. Fahmy2
1Nanyang Technological University, SG; 2University of Warwick, GB
Abstract
Modern vehicles are complex distributed systems with critical real-time electronic controls that have progressively replaced their mechanical/hydraulic counterparts, for performance and cost benefits. The harsh and varying vehicular environment can induce multiple errors in the computational/communication path, with temporary or permanent effects, thus demanding the use of fault-tolerant schemes. Constraints in location, weight, and cost prevent the use of physical redundancy for critical systems in many cases, such as within an internal combustion engine. Alternatively, algorithmic techniques like artificial neural networks (ANNs) can be used to detect errors and apply corrective measures in computation. Though adaptability of ANNs presents advantages for fault-detection and fault-tolerance measures for critical sensors, implementation on automotive grade processors may not serve required hard deadlines and accuracy simultaneously. In this work, we present an ANN-based fault-tolerance system based on hybrid FPGAs and evaluate it using a diesel engine case study. We show that the hybrid platform outperforms an optimised software implementation on an automotive grade ARM Cortex M4 processor in terms of latency and power consumption, also providing better consolidation.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00IP1-1, 56A SCALABLE LANE DETECTION ALGORITHM ON COTSS WITH OPENCL
Speaker:
Kai Huang, Sun Yat-Sen University, CN
Authors:
Kai Huang1, Biao Hu2, Jan Botsch3, Nikhil Madduri3 and Alois Knoll3
1Sun Yat-Sen University, CN; 2Tech­nische Univer­sität München (TUM), DE; 3Technische Universität München (TUM), DE
Abstract
Road lane detection are classical requirements for advanced driving assistant systems. With new computer technologies, lane detection algorithms can be exploited on Cots platforms. This paper investigates the use of OpenCL and develop a particle- filter based lane detection algorithm that can tune the trade-off between detection accuracy and speed. Our algorithm is tested on 14 video streams from different data-sets with different scenarios on different Cots hardware. With an average deviation fewer than 5 pixels, the average frame rates for the 14 videos can reach about 400 fps on both Gpu and Fpga. The peak frame rates for certain videos on GPU can reach almost 1000 fps.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:01IP1-2, 611SIMULATION OF FALLING RAIN FOR ROBUSTNESS TESTING OF VIDEO-BASED SURROUND SENSING SYSTEMS
Speaker:
Dennis Hospach, Universität Tübingen, DE
Authors:
Dennis Hospach1, Stefan Mueller1, Wolfgang Rosenstiel1 and Oliver Bringmann2
1Universität Tübingen, DE; 2Universität Tübingen / FZI, DE
Abstract
Recently, optical sensors have become a standard item in modern cars, raising questions with respect to the necessary testing under various ambient effects. In order to achieve a high test coverage of vision-based surround sensing systems, a lot of different environmental conditions need to be tested. Unfortunately, it is by far too time-consuming to build test sets of all relevant environmental conditions by recording real video data. This paper presents a novel approach for ambient-aware virtual prototyping and robustness testing. We propose a method to significantly reduce the needed on-road captures being used for design and validation of vision-based Advanced Driver Assistance Systems (ADAS) and fully automated driving. Our approach facilitates the generation of comparable test sets by using largely reduced amounts of real on-road captures and applying computer-generated variations of falling rain to it in a comprehensive virtual prototyping environment. In combination with the simulation of camera properties, which influence the visual effects of falling rain to a great extent, we are able to generate different rain scenarios under a wide variety of parameters. Our approach has been applied to an automotive lane detection system using a series of multiple rain scenarios. We have explored, how falling rain can influence such a system and how such behavior can be detected using simulated rain scenarios.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:02IP1-3, 618PROPOSAL FOR FAST DIRECTIONAL ENERGY INTERCHANGE USED IN MCMC-BASED AUTONOMOUS DECENTRALIZED MECHANISM TOWARD RESILIENT MICROGRID
Speaker:
Yusuke Sakumoto, Tokyo Metropolitan University, JP
Authors:
Yusuke Sakumoto1 and Ittetsu Taniguchi2
1Tokyo Metropolitan University, JP; 2Ritsumeikan University, JP
Abstract
Microgrid is well known as key technology to improve renewable energy's ease of use. Some previous works focused on a microgrid that is divided into autonomous electricity subsystems~(AESs) for its reliability and scalability. We have proposed the MCMC-based autonomous decentralized mechanism (ADM) to perform energy interchange between AESs so as to be supply energy appropriately for different energy demands among AESs. In this paper, toward resilient of microgrids, we design a method to realize directional energy interchange in our ADM on the basis of the convection diffusion. We investigate the effectiveness of the proposed method through simulation experiment considering energy shortage and emergency situations. We clarify that the proposed method can fast supply energy from external power grid to a microgrid under energy shortage situation, and can fast gather distributed energy to a specific AES~(e.g., safe shelter) under emergency situation.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Großer Saal + Saal 1

2.4 Physical Design for Cutting-edge Lithography

Date: Tuesday 15 March 2016
Time: 11:30 - 13:00
Location / Room: Konferenz 2

Chair:
Jens Lienig, Technische Universität Dresden, DE

Co-Chair:
Patrick Groeneveld, Synopsys Inc., US

Major developments in lithography covered in this session include multiple patterning, optical proximity correction and directed self-assembly. The papers contribute numerical and graph-theoretic techniques for analysis, design and optimization. The last paper explores circuit partitioning for heterogeneous 3D integration.

TimeLabelPresentation Title
Authors
11:302.4.1OPTIMIZATION FOR MULTIPLE PATTERNING LITHOGRAPHY WITH CUTTING PROCESS AND BEYOND
Speaker:
Jian Kuang, The Chinese University of Hong Kong, HK
Authors:
Jian Kuang and Evangeline F. Y. Young, The Chinese University of Hong Kong, HK
Abstract
Multiple Patterning Lithography (MPL) is indispensable for producing sub-22nm devices. Recently, multiple patterning with cutting (MPC) was proposed. For example, in triple patterning with cutting (LELECUT), the first two masks are used to do double patterning, whereas the third mask is used to cut off the unwanted parts. In this paper, we will systematically study the problem of cut candidate generation, and propose a flow to optimally minimize the manufacturing cost for standard cell based design with MPC. We will further extend the optimization flow to handle multiple patterning with e-beam cuts. Experiments demonstrate the effectiveness of the proposed algorithms.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.4.2A FAST MANUFACTURABILITY AWARE OPTICAL PROXIMITY CORRECTION (OPC) ALGORITHM WITH ADAPTIVE WAFER IMAGE ESTIMATION
Speaker:
Ahmed Awad, Tokyo Institute of Technology, JP
Authors:
Ahmed Awad1, Atsushi Takahashi1 and Chikaaki Kodama2
1Tokyo Institute of Technology, JP; 2Toshiba Corporation, JP
Abstract
Aggressive Optical Proximity Correction (OPC) has been widely adopted in optical lithography to preserve circuit performance for sub-20nm technology nodes. However, complex mask patterns are outputted resulting in large mask manufacturability cost and large computational time. In this paper, we propose a fast OPC algorithm in which intensity estimation during OPC is improved for better pattern fidelity and in which post processing to effectively improve mask manufacturability with preserving acceptable pattern fidelity is executed.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.4.3REDUNDANT VIA INSERTION IN DIRECTED SELF-ASSEMBLY LITHOGRAPHY
Speaker:
Woohyun Chung, Korea Advanced Institute of Science and Technology, KR
Authors:
Woohyun Chung, Seongbo Shim and Youngsoo Shin, Korea Advanced Institute of Science and Technology, KR
Abstract
In directed self-assembly lithography (DSAL), vias that are located close are clustered and patterned together. A large and complex cluster, however, is not allowed in this process due to its potential danger of pattern failure. We address redundant via insertion in DSAL. The goal is to insert maximum number of redundant vias while adjacent vias do not form a large and complex cluster. The problem is formulated as maximum independent set (MIS) of a conflict graph. Experiments demonstrate 13% more redundant vias inserted compared to simple-minded approach, basic insertion with no consideration of DSAL followed by removal of redundant vias in large and complex clusters. We also introduce DSA defect probability in order to quantitatively define which clusters should be allowed during insertion process.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:452.4.4IMPROVED PERFORMANCE OF 3DIC IMPLEMENTATIONS THROUGH INHERENT AWARENESS OF MIX-AND-MATCH DIE STACKING
Speaker:
Andrew B. Kahng, UCSD, US
Authors:
Kwangsoo Han, Andrew B. Kahng and Jiajia Li, University of California, San Diego, US
Abstract
3D logic-logic integration is an important future lever for continued cost and density scaling value propositions in the semiconductor industry. In the 3DIC context, several works have proposed "mix-and- match" of multiple stacked die, according to binning information, to improve overall product yield. However, each of the stacked die in these works is independently designed: there is no holistic "design for eventual stacking" of any of the die. Separately, many approaches have been proposed for design partitioning and implementation with multiple die, including 3D stacked-die implementation. However, the signoff criteria used to implement such a multi-die solution must necessarily validate timing correctness for all combinations of process conditions on the multiple die. To our knowledge, no previous work has examined the fundamental issue of design partitioning and signoff specifically for mix- and-match die stacking. In this work, we study performance improvements of 3DIC implementation that leverage knowledge of mix-and-match die stacking during manufacturing. We propose partitioning methodologies to partition timing-critical paths across tiers to explicitly optimize the signed-off timing across the reduced set of corner combinations that can be produced by the stacked-die manufacturing. These include both an ILP- based methodology and a heuristic with novel maximum-cut partitioning, solved by semidefinite programming, and a signoff timing-aware FM optimization. We also extend two existing 3DIC implementation flows to incorporate mix-and-match-aware partitioning and signoff, demonstrating the simplicity of adopting our techniques. Experimental results show that our optimization flow achieves up to 16% timing improvement as compared to the existing 3DIC implementation flow in the context of mix-and-match die stacking.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00IP1-4, 544GRID-BASED SELF-ALIGNED QUADRUPLE PATTERNING AWARE TWO DIMENSIONAL ROUTING PATTERN
Speaker:
Atsushi Takahashi, Tokyo Institute of Technology, JP
Authors:
Takeshi Ihara1, Toshiyuki Hongo1, Atsushi Takahashi1 and Chikaaki Kodama2
1Tokyo Institute of Technology, JP; 2Toshiba, JP
Abstract
Self-Aligned Quadruple Patterning (SAQP) is an important manufacturing technique for sub 14 nm technology node. Although various routing algorithms for SAQP have been proposed, it is not easy to find a dense SAQP compliant routing pattern efficiently. Even though a grid for SAQP compliant routing pattern was proposed, it is not easy to find a valid routing pattern on the grid. The routing pattern of SAQP on the grid consists of three types of routing. Among them, third type has turn prohibition constraint on the grid. Typical routing algorithms often fail to find a valid routing for third type. In this paper, SAQP compliant two dimensional routing patterns are found effectively on the grid by finding an optimal valid tertiary pattern. Experiments show that SAQP compliant routing patterns are found efficiently.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:01IP1-5, 356PRACTICAL ILP-BASED ROUTING OF STANDARD CELLS
Speaker:
Rung-Bin Lin, Yuan Ze University, TW
Authors:
Hsueh-Ju Lu, En-Jang Jang, Ang Lu, Yu Ting Zhang, Yu-He Chang, Chi-Hung Lin and Rung-Bin Lin, Yuan Ze University, TW
Abstract
This paper proposes a two-stage transistor routing approach that synergizes the merits of channel routing and integer linear programming for CMOS standard cells. It can route 185 cells in 611 seconds. About 21% of cells obtained by our approach have smaller wire length than their handcrafted counterparts. Only 11% of cells use more vias than their handcrafted counterparts. Our router completes routing of many cells that cannot be routed by an industrial one.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:02IP1-6, 732A PROCEDURE FOR IMPROVING THE DISTRIBUTION OF CONGESTION IN GLOBAL ROUTING
Speaker:
Azadeh Davoodi, University of Wisconsin - Madison, US
Authors:
Daohang Shi, Azadeh Davoodi and Jeffrey Linderoth, University of Wisconsin - Madison, US
Abstract
This work introduces a procedure which takes as input a global routing solution that is already improved for routability based on the traditional total overflow (TOF) metric, and then improves the distribution of congestion without increasing the TOF. Our router is able to significantly decrease the number of edges in undesirable ranges of congestion by optimizing a convex piece-wise linear penalty function. The penalties are flexible and may be specified by the user. In our experiments, using the already-optimized global routing solutions of the ISPD'11 benchmarks—mostly have 0 units of TOF—we show the number of edges which are utilized very close to capacity can be significantly reduced. This work is the first to explicitly target improving the distribution of edge congestion corresponding to an already-optimized global routing solution without sacrificing the TOF.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Großer Saal + Saal 1

2.5 Energy Efficient Systems and Architectures

Date: Tuesday 15 March 2016
Time: 11:30 - 13:00
Location / Room: Konferenz 3

Chair:
Mladen Berekovic, TU Braunschweig, DE

Co-Chair:
Rolf Ernst, TU Braunschweig, DE

This session will explore novel technologies to reduce the energy and power of computing systems. The first paper explores system-level DVFS approaches that maximize performance within a fixed thermal envelope. The second paper introduces a highly introspective system that can monitor and optimize its own energy usage at run-time. The third paper explores a control algorithm design that can utilize a specialized SRAM cell design that trades performance and reliability. The fourth paper finds new ways to better utilize GPU power resources by co-scheduling synergistic kernels.

TimeLabelPresentation Title
Authors
11:302.5.1A DISCRETE THERMAL CONTROLLER FOR CHIP-MULTIPROCESSORS
Speaker:
Yingnan Cui, Nanyang Technological University, SG
Authors:
Yingnan Cui1, Wei Zhang2 and Bingsheng He1
1Nanyang Technological University, SG; 2Hong Kong University of Science and Technology, HK
Abstract
The ever increasing power density has posed challenges to the thermal management of modern chip-multiprocessors (CMP). Closed-loop thermal controllers have the benefits of high response speed, high robustness and high accuracy. Most previously proposed closed-loop automatic thermal controllers are designed by continuous control theories. However, the thermal controllers for microprocessors are discrete controllers by nature. The traditional design methodology fails to analyze the discrete features of the thermal controllers such as the influence of sampling frequency and signal distortion. In this paper, we proposed an automatic thermal controller for microprocessors which is designed by discrete control theories. With specific concerns about the discrete feature of thermal control systems, our discrete thermal controller increases the performance of CMPs by reducing the sampling frequency and improves the control quality of the thermal control system. When compared with state-of-the-art thermal controllers, our discrete thermal controller achieves up to 50% reduction in sampling frequency and up to 20% higher performance of the CMPs.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.5.2SWALLOW: BUILDING AN ENERGY-TRANSPARENT MANY-CORE EMBEDDED REAL-TIME SYSTEM
Speaker:
Steve Kerrison, University of Bristol, GB
Authors:
Steve Kerrison and Simon Hollis, University of Bristol, GB
Abstract
Swallow is a many-core platform of interconnected embedded real time processors with time-deterministic execution and a cache-less memory subsystem. Its largest current configuration is 480 × 32-bit processors. It is open-source, designed from the ground up to allow the exploration of flexibility, scalability and energy efficiency in large systems of embedded processors. Further, it enables the behavior of various structures of parallel programs to be explored. It is a proof of concept and design example for other potential systems of this kind. We present the energy transparency features and proportional energy scaling of the system that allows it to be expanded beyond hundreds of cores. We discuss the design choices, construction and novel network implementation of Swallow. Currently, the system provides up to 240 GIPS, with each core consuming 71-193 mW, dependent on workload. Its power per instruction is lower than almost all systems of comparable scale. We discuss the challenges associated with efficiently utilizing this system, particularly communication/computation ratios, and give recommendations for future systems and their software.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.5.3A NOVEL CACHE-UTILIZATION BASED DYNAMIC VOLTAGE FREQUENCY SCALING (DVFS) MECHANISM FOR RELIABILITY ENHANCEMENTS
Speaker:
Yen-Hao Chen, National Tsing Hua University, Taiwan, TW
Authors:
Yen-Hao Chen1, Yi-Lun Tang1, Yi-Yu Liu2, Allen C.-H. Wu3 and TingTing Hwang1
1National Tsing Hua University, TW; 2Yuan Ze University, TW; 3Jiangnan University, CN
Abstract
We propose a cache architecture using a 7T/14T SRAM [1] and a control mechanism for reliability enhancements. Our control mechanism differs from the conventional DVFS methods, which considers not only the CPI behaviors but also the cache utilizations. To measure cache utilization, a novel metric is proposed. The experimental results show that our proposed method achieves thousand times less bit-error occurrences compared to the conventional DVFS methods under the ultra-low voltage operation. Moreover, the results show that our proposed method surprisingly not only incurs no performance and energy overheads but also achieves on an average 5.1% performance improvement and 5% energy reduction compared to the conventional DVFS methods.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:452.5.4EFFICIENT KERNEL MANAGEMENT ON GPUS
Speaker:
Xiuhong Li, Peking University, CN
Authors:
Xiuhong Li and Yun Liang, Peking University, CN
Abstract
As the complexity of applications continues to grow, each new generation of GPUs has been equipped with advanced architectural features and more resources to sustain its performance acceleration capability. Recent GPUs have been featured with concurrent kernel execution, which is designed to improve the resource utilization by executing multiple kernels simultaneously. However, prior systems only achieve limited performance improvement as they do not optimize the thread-level parallelism (TLP) and model the resource contention for the concurrently executing kernels. In this paper, we design a framework that optimizes the performance and energy-efficiency for multiple kernel execution on GPUs. It employs two key techniques. First, we develop an algorithm to adjust the TLP for the concurrently executing kernels. Second, we employ cache bypassing to mitigate the cache contention. Experiments indicate that our framework can improve performance by 1.42X on average (energy-efficiency by 1.33X on average), compared with default concurrent kernel execution framework.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00IP1-7, 83(Best Paper Award Candidate)
MACHINE LEARNED MACHINES: ADAPTIVE CO-OPTIMIZATION OF CACHES, CORES, AND ON-CHIP NETWORK
Speaker:
Rahul Jain, Indian Institute of Technology Delhi, IN
Authors:
Rahul Jain1, Preeti Ranjan Panda1 and Sreenivas Subramoney2
1Indian Institute of Technology Delhi, IN; 2Intel, IN
Abstract
Abstract—Modern multicore architectures require runtime optimization techniques to address the problem of mismatches between the dynamic resource requirements of different processes and the runtime allocation. Choosing between multiple optimizations at runtime is complex due to the non-additive effects, making the adaptiveness of the machine learning techniques useful. We present a novel method, Machine Learned Machines (MLM), by using Online Reinforcement Learning (RL) to perform dynamic partitioning of the last level cache (LLC), along with dynamic voltage and frequency scaling (DVFS) of the core and uncore (interconnection network and LLC). We show that the co-optimization results in much lower energy-delay product (EDP) than any of the techniques applied individually. The results show an average of 19.6% EDP and 2.6% execution time improvement over the baseline.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Großer Saal + Saal 1

2.6 Fault-Tolerant Embedded Systems

Date: Tuesday 15 March 2016
Time: 11:30 - 13:00
Location / Room: Konferenz 4

Chair:
Lothar Thiele, ETH Zurich, CH

Co-Chair:
Jian-Jia Chen, TU Dortmund, DE

This session presents new results on timing and schedulability bounds for fault-tolerant systems, covering both transient and permanent faults.

TimeLabelPresentation Title
Authors
11:302.6.1(Best Paper Award Candidate)
PROBABILISTIC WCET ESTIMATION IN PRESENCE OF HARDWARE FOR MITIGATING THE IMPACT OF PERMANENT FAULTS
Speaker:
Damien Hardy, University of Rennes/IRISA, FR
Authors:
Damien Hardy1, Isabelle Puaut1 and Yiannakis Sazeides2
1University of Rennes 1/IRISA, FR; 2University of Cyprus, CY
Abstract
Fine-grained disabling and reconfiguration of hardware elements (functional units, cache blocks) will become economically necessary to recover from permanent failures, whose rate is expected to increase dramatically in the near future. This fine-grained disabling will lead to degraded performance as compared to a fault-free execution. Until recently, all static worst-case execution time (WCET) estimations methods were assuming fault-free processors, resulting in unsafe estimates in the presence of faults. The first static WCET estimation technique dealing with the presence of permanent faults in instruction caches was proposed in [1]. This study probabilistically quantified the impact of permanent faults on WCET estimates. It demonstrated that the probabilistic WCET (pWCET) estimates of tasks increase rapidly with the probability of faults as compared to fault-free WCET estimates. In this paper, we show that very simple reliability mechanisms allow mitigating the impact of faulty cache blocks on pWCETs. Two mechanisms, that make part of the cache resilient to faults are analyzed. Experiments show that the gain in pWCET for these two mechanisms are on average 48% and 40% as compared to an architecture with no reliability mechanism.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.6.2A FOUR-MODE MODEL FOR EFFICIENT FAULT-TOLERANT MIXED-CRITICALITY SYSTEMS
Speaker:
Zaid Al-bayati, McGill University, CA
Authors:
Zaid Al-bayati1, Jonah Caplan1, Brett Meyer1 and Haibo Zeng2
1McGill University, CA; 2Virginia Tech, US
Abstract
Mixed-criticality systems (MCS) integrate components from different levels of criticality onto the same platform. MCS, like all other electronic systems, are susceptible to transient faults. These systems must mitigate the effects of faults and provide recovery mechanisms when faults occur. In this paper, we consider the problem of designing and scheduling certifiable faulttolerant mixed-criticality systems. To address certification and transient faults, two-mode models must treat any single overrun or fault as a combination of the two, reserving time for the reexecution of tasks with extended execution time. We therefore propose a new four-mode model that addresses fault and execution time overrun with separate modes. This model, combined with the selective continuation of low-criticality tasks, improves the quality of service (QoS) to these tasks while providing the same guarantee to high-criticality tasks. Experimental results show that QoS improvements of up to 42.9% can be achieved by the new model. Furthermore, we show how the model and its schedulability analysis can be calibrated to realistic failures rates to achieve even more efficient designs.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.6.3PROVIDING FORMAL LATENCY GUARANTEES FOR ARQ-BASED PROTOCOLS IN NETWORKS-ON-CHIP
Speaker:
Eberle A Rambo, Technische Universität Braunschweig, DE
Authors:
Eberle A Rambo, Selma Saidi and Rolf Ernst, Technische Universität Braunschweig, DE
Abstract
Networks-on-Chip (NoCs) are the backbone of Multiprocessor Systems-on-Chip (MPSoCs). In this paper, we perform a formal worst-case communication time analysis of Automatic Repeat reQuest (ARQ) protocols for NoCs. Therefor, we integrate the transport layer analysis for general networks and the network layer analysis for NoCs. An ARQ variant optimized for DMA transfers (DMA ARQ) is introduced and analyzed. Experimental evaluation with Stop-and-Wait, Go-Back-N, and DMA ARQ, in the context of real-time memory traffic is presented, including both error-free and error cases. DMA ARQ achieves a factor 6 improvement on latency bounds over conventional Stop-and-Wait.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00IP1-8, 739IMPROVING PERFORMANCE BY MONITORING WHILE MAINTAINING WORST-CASE GUARANTEES
Speaker:
Syed Md Jakaria Abdullah, Uppsala University, SE
Authors:
Syed Md Jakaria Abdullah, Kai Lampka and Wang Yi, Uppsala University, SE
Abstract
With real-time systems, feasibility analysis is based on worst-case scenarios. At run-time, worst-case situations are often very unlikely to occur. With the system being dimensioned for the worst-case, one faces low resource utilization and implicit loss in performance at run-time. We propose to use run-time monitoring for evaluating the deviation of job releases from their worst-case release bound. This allows us to compute a conservative bound on the future workload. Based on this, we design a scheme for reclaiming computation time, which has been originally allocated for the jobs which are now known to be absent. By organizing the consumption of extra computing time in a dynamic and time-safe manner, we improve the run-time performance of applications and provably maintain the worst-case guarantees for their response times. We evaluate the usefulness of the presented approach by using randomly generated traces of job releases.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Großer Saal + Saal 1

2.7 Variability Challenges in Nanoscale Designs

Date: Tuesday 15 March 2016
Time: 11:30 - 13:00
Location / Room: Konferenz 5

Chair:
Vikas Chandra, ARM Research, US

Co-Chair:
Said Hamdioui, TU Delft, NL

Process variation continues to be an important challenge across new technologies. This session explores methods for systematically designing test chips, building photonic interconnects and constructing spatial models.

TimeLabelPresentation Title
Authors
11:302.7.1ACHIEVING 100% CELL-AWARE COVERAGE BY DESIGN
Speaker:
Zeye Liu, Carnegie Mellon University, US
Authors:
Zeye Liu, Benjamin Niewenhuis, Soumya Mittal and Ronald Blanton, Carnegie Mellon University, US
Abstract
A comprehensive investigation of new integrated circuit design and fabrication technologies is crucial for yielding reliable parts. Prior work proposed a novel logic characterization vehicle called the Carnegie Mellon Logic Characterization Vehicle (CM-LCV), and an implementation flow that ensures a test chip to be product-like with near optimal testability and diagnosability. This work describes an enhanced implementation methodology for CM-LCV that not only guarantees 100% intra-cell defect testability for all standard cells but also reflects the user-specified design characteristics. Experiments comparing intra-cell defect testability between a CM-LCV and various benchmark circuits demonstrate the efficacy of this approach. Specifically, the CM-LCV achieves 92.4% overall input pattern fault coverage and 100% cell-aware fault coverage using an optimal, minimal test set.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.7.2(Best Paper Award Candidate)
MODELING FABRICATION NON-UNIFORMITY IN CHIP-SCALE SILICON PHOTONIC INTERCONNECTS
Speaker:
Mahdi Nikdast, Polytechnique Montréal and McGill University, CA
Authors:
Mahdi Nikdast1, Gabriela Nicolescu2, Jelena Trajkovic3 and Odile Liboiron-Ladouceur4
1Polytechnique Montréal and McGill University, CA; 2Polytechnique Montréal, CA; 3Concordia University, CA; 4McGill University, CA
Abstract
Silicon photonic interconnect (SPI) is a promising candidate for the communication infrastructure in multiprocessor systems-on-chip (MPSoCs). When employing SPIs with wavelength-division multiplexing (WDM), it is required to precisely match different devices, such as photonic switches, filters, etc, in terms of their central wavelengths. Nevertheless, SPIs are vulnerable to fabrication non-uniformity (a.k.a. process variations), which influences the reliability and performance of such systems. Understanding process variations helps develop system design strategies to compensate for the variations, as well as estimate the implementation cost for such compensations. For the first time, this paper presents a computationally efficient and accurate bottom-up method to systematically study different process variations in passive SPIs. Analytical models to study the impact of silicon thickness and waveguide width variations on strip waveguides and microresonator (MR)-based add-drop filters are developed. Numerical simulations are used to evaluate our proposed method. Furthermore, we designed, fabricated, and tested several identical MRs to demonstrate process variations. The proposed method is applied to a case study of a passive WDM-based photonic switch, which is the building block in passive SPIs, to evaluate its optical signal-to-noise ratio (OSNR) under different variations. The efficiency of our proposed method enables its application to large-scale SPIs in MPSoCs, where employing numerical simulations is not feasible.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.7.3EFFICIENT SPATIAL VARIATION MODELING VIA ROBUST DICTIONARY LEARNING
Speaker:
Changhai Liao, Fudan University, CN
Authors:
Changhai Liao1, Jun Tao1, Xuan Zeng1, Yangfeng Su1, Dian Zhou2 and Xin Li3
1Fudan University, CN; 2Fudan University & The University of Texas at Dallas, US; 3Carnegie Mellon University, US
Abstract
In this paper, we propose a novel spatial variation modeling method based on robust dictionary learning for nanoscale integrated circuits. This method takes advantage of the historical data to efficiently improve the accuracy of wafer-level spatial variation modeling with extremely low measurement cost. Robust regression is adopted by our implementation to reduce the bias posed by outliers. An iterative coordinate descent method is further introduced to solve the dictionary learning problem with consideration of missing data. Our numerical experiments based on industrial measurement data demonstrate that the proposed method achieves up to 70% error reduction over the conventional VP approach without increasing the measurement cost.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00IP1-9, 239FAULT TOLERANT NON-VOLATILE SPINTRONIC FLIP-FLOP
Speaker:
Rajendra Bishnoi, Karlsruhe Institute of Technology (KIT), DE
Authors:
Rajendra Bishnoi, Fabian Oboril and Mehdi Tahoori, Karlsruhe Institute of Technology (KIT), DE
Abstract
With technology down scaling, static power has become one of the biggest challenges in a System-On-Chip. Normally-off computing using non-volatile sequential elements is a promising solution to address this challenge. Recently, many non-volatile shadow flip-flop architectures were introduced, in which Magnetic Tunnel Junction (MTJ) cells are employed as backup storing elements. Due to the emerging fabrication processes of magnetic layers, MTJs are more susceptible to manufacturing defects than their CMOS counterparts. Moreover, unlike memory arrays that can effectively be repaired with well-established memory repair and coding schemes, flip-flops scattered in the layout are more difficult to repair. So, without effective defect and fault tolerance for non-volatile flip-flops, the manufacturing yield will be severely affected. Therefore, we propose a Fault Tolerant Non-Volatile Latch (FTNV-L) design, in which we arrange several MTJ cells in such a way that it is resilient to various MTJ faults. Simulation results show that our proposed FTNV-L can effectively tolerate all single MTJ faults with considerably lower overhead than traditional approaches.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:01IP1-10, 291TOWARDS AUTOMATIC DIAGNOSIS OF MINORITY CARRIERS PROPAGATION PROBLEMS IN HV/HT AUTOMOTIVE SMART POWER ICS
Speaker:
Yasser Moursy, Sorbonne Universités, UPMC, FR
Authors:
Yasser Moursy1, Hao Zou1, Ramy Iskander1, Pierre Tisserand2, Dieu-My Ton2, Giuseppe Pasetti3, Ehrenfried Seebacher4, Alexander Steinmair4, Thomas Gneiting5 and Heidrun Alius5
1Sorbonne Universités, UPMC, FR; 2Valeo, Creteil, FR; 3AMS, Navacchio, IT; 4AMS AG, Unterpremstaetten, AT; 5AdMOS, Frickenhausen, DE
Abstract
In this paper, a proposed methodology to identify the substrate coupling effects in smart power integrated circuits is presented. This methodology is based on a tool called AUTOMICS to extract substrate parasitic network. This network comprises diodes and resistors that are able to maintain the continuity of minority carrier concentration. The contribution of minority carriers in the substrate noise is significant in high-voltage and high temperature applications. The proposed methodology along with conventional latch-up problem identification for a test case automotive chip AUTOCHIP1 are presented. The time of the proposed methodology is significantly shorter than the conventional one. The proposed methodology could significantly shorten the time-to-market and ameliorate the robustness of the design.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Großer Saal + Saal 1

2.8 Revolutionising the Teaching of Computer Architecture and System on Chip Design

Date: Tuesday 15 March 2016
Time: 11:30 - 13:00
Location / Room: Exhibition Theatre

Moderator:
Robert Owen, Imagination Technologies, GB

This special themed exhibition theatre session is for academics teaching Computer Architecture, Verification or System-on-Chip design. Companies with a commercial interest are also welcome.

The real 'industrial' RTL code for a MIPS processor is now freely available for academic use under the Imagination University Programme and is called 'MIPSfpga'. MIPSfpga provides the RTL source code of the MIPS microAptiv UP (microprocessor) core together with teaching materials and reference designs for implementation on an FPGA. The MIPS microAptiv UP core is a member of the same microcontroller family found in many embedded devices, including the popular PIC32MZ micro-controller from Microchip and Samsung's new Artik1.

The session begins with a short overview of Imagination's Worldwide University Programme ("IUP"), and will be presented by Robert Owen, Manager: IUP. Robert is well known in Universities around the world, having specialised in this field for more than 22 years, including the establishment of Texas Instruments' very first program in 1994.

Alex Wong, Technical Systems Analyst from Digilent, will talk about the Digilent educational mission and how they collaborate with Imagination's University Programme to bring the latest technology, including an introduction to the Nexys 4 DDR which can be used with MIPSfpga.

Munir Hasan, Solutions Engineer, will then present and demo MIPSfpga Fundamentals. It is a complete package of teaching materials, including slides, student manual and lab exercises. This package will show how to go from digital design blocks in RTL to Microprocessor to then creating an SoC.

Zubair Kakakhel, Graduate Software Engineer, will present and demo MIPSfpga SoC. Linux is one of the most popular and scalable operating systems in the world. After learning the basics in MIPSfpga Fundamentals, we will now demonstrate how system level design tools such as Vivado IP Integrator can be used to make a complex soft-SoC that is capable of running Linux. We will then switch attention to the software, and show how we port the Linux kernel and Buildroot to run on our soft-SoC based platform. MIPSfpga SoC comes with structured labs that walk through the entire process in an digestible academic format. MIPSfpga SoC gives a genuine "behind the curtain" view of how the semiconductor industry works. Vendors sell various bits of IP blocks which are stitched together by chip manufacturers to make the brains of embedded systems around you.

Additional WORKSHOP

For those who want to go deeper, there's a half-day hands-on workshop at DATE on Wednesday 16th March, 08:30 to 12:30, Seminar Room 1.
- EARLY Registration is recommended! HERE

TimeLabelPresentation Title
Authors
11:302.8.1IMAGINATION TECHNOLOGIES WORLDWIDE UNIVERSITY PROGRAMME
Speaker:
Robert Owen, Imagination Technologies, GB
11:452.8.2HARDWARE TOOLS FOR UNIVERSITY LABS
Speaker:
Alex Wong, Digilent Inc., GB
11:502.8.3MIPSFPGA FUNDAMENTALS
Speaker:
Munir Hasan, Imagination Technologies, GB
12:202.8.4MIPSFPGA SOC
Speaker:
Zubair Kakakhel, Imagination Technologies, GB
12:502.8.5Q&A
13:00End of session
Lunch Break in Großer Saal + Saal 1

UB02 Session 2

Date: Tuesday 15 March 2016
Time: 12:30 - 15:00
Location / Room: Booth 15, Exhibition Area

LabelPresentation Title
Authors
UB02.1IN-NODE PROCESSING: MODELLING FRAMEWORK FOR IN-NODE PROCESSING IN INDUSTRIAL SENSOR AND ACTUATOR NETWORKS.
Presenter:
Qaiser Anwar, Mid Sweden University, SE
Authors:
Qaiser Anwar, Muhammad Imran and Mattias O´Nils, Mid Sweden University, SE
Abstract
Architecting efficient systems with on-board sensing capabilities with a growing number of sensing devices is a challenging task, in particular because of the range of the technological field, as well as the diversity and complexity of requirements. We present a novel modeling framework, which can describe different implementation strategies for computation of data locally. In this framework, we first describe the systems in Architecture Analysis and Design Language (AADL), following which the described system is exported to XML which is then given input to java based software program. This program automatically generates different implementation options, illustrates different parameters such as processing energy, communication energy, latency and design complexity. To show a proof-of-concept, we have modelled a real-life system in a modelling framework, which shows that the framework can be of use in automated design space and architecture exploration for in-node processing.

Download Paper (PDF)
UB02.2EXTRA-FUNCTIONAL PROPERTY SIMULATION WITH VIRTUAL PLATFORMS
Presenter:
Ralph Görgen, OFFIS - Intitute for Information Technology, DE
Authors:
Ralph Görgen, Kim Grüttner and Sören Schreiner, OFFIS - Intitute for Information Technology, DE
Abstract
The demo shows the usage of virtual platforms and model-based design to perform early analyses of extra-functional properties in a mixed-critical scenario. The application shown is a quadro-copter equipped with a camera system. The copter's flight controller is safety critical; the video processing is less critical. Both parts of the system are implemented in a single chip, a Xilinx ZNQ SoC. The video processing is implemented in the ARM dual-core, the flight controller is realized in the FPGA part and based on two MicroBlaze cores. This platform has been modeled as an OVP-based virtual platform, which is extended by more fine grain timing models as well as power models. Furthermore, it can be coupled with a model of the quadro-copter physics and environment realized in iXtronics CamelView. We will show how to use this setup to analyze timing, power, and temperature behavior of the system and the interference between the high- and low-critical parts with respect to these properties.

Download Paper (PDF)
UB02.3COMPSOC: VIRTUALISING CONTROL APPLICATIONS ON A DISTRIBUTED COMPSOC PLATFORM
Presenter:
Kees Goossens, Eindhoven University of Technology, NL
Author:
Kees Goossens, Eindhoven University of Technology, NL
Abstract
In our University Booth we will demonstrate that multiple real-time control applications can be developed independently even though they share platform resources. We show that they can run together with other applications on a wireless network of multiple CompSOC platforms, where each platform has multiple processors, NOC, and a complete microkernel, streaming software, and resource management stack. We will also show that (control) applications can be quickly and safely loaded and started without interference to other (real-time control) applications, thus implementing a network of MPSOCs for distributed mixed time-criticality applications.

Download Paper (PDF)
UB02.4RC3E: DESIGN AND TEST AUTOMATIZATION IN THE CLOUD
Presenter:
Patrick Lehmann, Technische Universität Dresden, DE
Authors:
Patrick Lehmann, Oliver Knodel, Martin Zabel and Rainer G. Spallek, Technische Universität Dresden, DE
Abstract
Cloud computing is getting more and more interesting for companies, caused by its flexibility to provide apparently endless resources and nouveau services, while reducing he total cost of ownership for the user. Fields of applications reach from web technologies over storage solutions to complex business processes. The domain of chip and system design is well known for offloading resource intensive and long running synthesis or simulation task onto centralized servers. As hardware designs grow in an exponential way and verification requirements were strengthened, cloud services are investigated to compensate these needs. Anyway, in the end real hardware tests cannot be avoided. Our RC3E eco system brings close to the hardware prototype development and automated hardware testing into the cloud, continuing the principle of "test often and test early". The architecture offers virtualized and shared FPGA resources for prototyping, with automated remote debugging capabilities.

Download Paper (PDF)
UB02.5AIPHS: ADAPTIVE PROFILING HARDWARE SUB-SYSTEM
Presenter:
Luigi Pomante, Università degli Studi dell'Aquila, IT
Authors:
Luigi Pomante1, Giacomo Valente2 and Vittoriano Muttillo2
1Università degli Studi dell'Aquila, IT; 2Università Degli Studi Dell'Aquila, IT
Abstract
Run-time monitoring systems on reconfigurable logic have the advantage that they can be customized with respect to specific applications: in the context of automated testing, this can lead to powerful scenarios. This demo presents a smart monitoring system by showing both a customization for stalls identification in a message passing scenario (based on four MicroBlaze that executes a bare-metal FFT application), and a customization for bus utilization monitoring in a symmetric multi-processing system scenario (based on four Leon3 running a custom Linux kernel). The whole development flow (and related prototypal EDA tools), that starts exploiting a library of elements to compose the desired hardware profiler, that leads to the introduction of such a profiler in the target architecture, and that allows profiling data collection and analysis will be shown. Moreover, a comparison among different functionalities will be illustrated. Both systems will be illustrated by using Zynq7000 SoC.

Download Paper (PDF)
UB02.7DIGITALLY DRIVEN TOP-DOWN METHODOLOGY FOR MIXED SIGNAL CIRCUIT DESIGN
Presenter:
Markus Mueller, University of Heidelberg, DE
Authors:
Markus Mueller, Maximilian Thuermer and Ulrich Bruening, University of Heidelberg, DE
Abstract
In this methodology,synthesizable modules and full custom blocks are first described in an HDL in a top-down approach. For analog cells, real number based models are created.Once the complete mixed signal model is done, each cell in the design is completely described concerning interface and behavior. The models then serve as specification for the full custom cell development.Schematics which don't include any primitives are automatically generated from the HDL description by a scripted flow to ensure consistency.Design space exploration can be done fast and very efficient this way. Cells which can be reused at different places in the design are identified and problems arising from interactions on the system level are found early in the design phase.This methodology accelerates the design process significantly, avoids errors and provides higher flexibility for design changes. A digital centric design example of a High Speed SerDes IP is demonstrated using the described methodology.

Download Paper (PDF)
UB02.8GPCDS: AN INTERACTIVE TOOL FOR CREATING SCHEMATIC MODULE GENERATORS IN ANALOG IC DESIGN
Presenter:
Matthias Greif, Reutlingen University, DE
Authors:
Matthias Greif and Juergen Scheible, Reutlingen University, DE
Abstract
While digital design automation is highly developed, analog design automation still remains behind the demands. Previous approaches of circuit creation, which are usually based on optimization algorithms, do not satisfy industrial requirements. A promising alternative is given by procedural approaches, which imitate the solution strategy of a human expert. We are working on parameterized generators (such as PCells) for analog circuit and layout modules, special kinds of such procedures. We present "gPCDS", a novel tool for the creation of schematic generators for analog circuit design. Associated with a common design environment, gPCDS offers a sophisticated interactive design flow for the development of schematic PCells. gPCDS thus substitutes the crucial process of manual code writing by an intuitive graphic-based way of schematic PCell creation. The GUI of gPCDS provides a variety of useful functions, such as defining parameter ranges or placing predefined building blocks.

Download Paper (PDF)
UB02.106CH-SDR-PLATFORM: 6 CHANNEL SDR PROTOTYPING PLATFORM FOR VEHICLE SELF-LOCALIZATION
Presenter:
Marko Rößler, Technische Universität Chemnitz, DE
Authors:
Marko Rößler1, Ulrich Heinkel1, Daniel Fross1 and Ahmad El-Assaad2
1Technische Universität Chemnitz, DE; 2Novero GmbH, DE
Abstract
Many modern applications depend on location information. Precision and availability out- and indoor get more and more crucial. Acquisition of this information from radio links used for wireless data transfer is logical step. Link-availability, RSSI, timing or phase shifts are byproducts that carry knowledge about the distance between communication endpoints. Extensive signal processing, advanced receiver setups and statistical algorithms allow the extraction of reliable position information. We present a high performance multichannel SDR platform based on FPGA that allows the quick development of respective technology parts. It is based on KC705-Board connecting a Linux PC via PCIe. Featuring three RF-Frontends (AD-FMCOMM-S3) we are able to control six independent paths time synchronous. With 50 MSa/s at 12 bit resolution a data stream of 7.2 Gbit/s can be processed. We target for radio frequency based vehicle self-localization using smart array antennas.

Download Paper (PDF)
15:00End of session
16:00Coffee Break in Exhibition Area

3.1 Executive Track Panel: New Opportunities in Automotive Electronics

Date: Tuesday 15 March 2016
Time: 14:30 - 16:00
Location / Room: Saal 2

Organiser:
Yervant Zorian, Synopsys, US

While the robustness requirements for automotive chips remain crucial due to their safety critical mission, the new automotive chips keep growing in functionality and complexity. The executives in this session will discuss the impact of automotive market on these semiconductor chips and the new opportunities it may bring in designing today's automotive chips.

Executives:

  • Josef Stockinger, STMicroelectronics, FR
  • Rainer Kress, Infineon Technologies, DE
  • Dan Kochpatcharin, TSMC Europe, NL
  • Frank Schirrmeister, Cadence Design Systems, US
16:00End of session
Coffee Break in Exhibition Area

3.2 Hot Topic: 3D ICs: Leap Forward to 1,000X Performance

Date: Tuesday 15 March 2016
Time: 14:30 - 16:00
Location / Room: Konferenz 6

Organiser:
Vikas Chandra, ARM, US

Chair:
Vikas Chandra, ARM, US

Co-Chair:
Norbert Wehn, University of Kaiserslautern, DE

In this session, we will cover the entire spectrum of innovations in 3D-IC integration to applications which would give benefits of multiple orders of magnitude and everything in between. First talk focuses on discussing N3XT architecture to improve the energy efficiency of abundant-data applications significantly, thereby enabling new frontiers of applications for both mobile devices and the cloud. Second talk discusses the opportunities brought by 3D sequential integration and highlights the applications benefiting from a very small 3D contact pitch. Third talk concludes the session with the discussion of the interactions of upcoming 3D-IC technologies and the system-level interconnect hierarchy to design the next generation applications.

TimeLabelPresentation Title
Authors
14:303.2.1THE N3XT 1,000X
Speaker and Author:
Subhasish Mitra, Stanford University, US
15:003.2.23D SEQUENTIAL INTEGRATION FOR MONOLITHIC 3DIC DESIGN
Speaker and Author:
Olivier Billoint, CEA-Leti, FR
15:303.2.33D TECHNOLOGY DRIVEN BY 3D APPLICATION REQUIREMENTS: A 3D-LANDSCAPE FOR 3D SYSTEM DESIGN
Speaker and Author:
Dragomir Milojevic, IMEC, BE
16:00End of session
Coffee Break in Exhibition Area

3.3 On-Chip Security Testing

Date: Tuesday 15 March 2016
Time: 14:30 - 16:00
Location / Room: Konferenz 1

Chair:
Giorgio Di Natale, LIRMM, FR

Co-Chair:
Marc Witteman, Riscure, NL

This session deals with the question whether the actual chip satisfies the design and all mechanisms work securely. This includes on-the-fly testing of the quality of a random number generator as well as methods to detect hardware trojans.

TimeLabelPresentation Title
Authors
14:303.3.1(Best Paper Award Candidate)
TOTAL: TRNG ON-THE-FLY TESTING FOR ATTACK DETECTION USING LIGHTWEIGHT HARDWARE
Speaker:
Bohan Yang, Katholieke Universiteit Leuven, BE
Authors:
Bohan Yang1, Vladimir Rozic1, Nele Mentens1, Wim Dehaene2 and Ingrid Verbauwhede1
1Katholieke Universiteit Leuven, BE; 2KU Leuven and IMEC, BE
Abstract
We present a design methodology for embedded tests of entropy sources. These tests are necessary to detect attacks and failures of true random number generators. The central idea of this work is to use an empirical design methodology consisting of two phases: collecting the data under attack and finding a useful statistical feature. In this work we focus on statistical features that are implementable in lightweight hardware. This is the first paper to address the design of on-the-fly tests based on the attack effects. The presented design methodology is illustrated with 2 examples: an elementary ring-oscillator based TRNG and a carry-chain based TRNG. The effectiveness of the tests was confirmed on FPGA prototypes.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:003.3.2ON-CHIP FINGERPRINTING OF IC TOPOLOGY FOR INTEGRITY VERIFICATION
Speaker:
Maxime Lecomte, CEA, FR
Authors:
Maxime Lecomte1, Jacques Fournier1 and Philippe Maurine2
1CEA, FR; 2CEA/LIRMM, FR
Abstract
The integrity of integrated circuits (ICs), in particular for detecting malicious add-ons like Hardware Trojans (HTs), have been studied in several recent research papers. The main limit of the proposed techniques so far is that the bias induced by the process variations restrict their efficiency and practicality. Most of those techniques compare two ICs' signatures while trying to get rid of the process variations. In this paper we propose a novel approach which in practice eliminates this limit. We first make the assumption that IC infection is done at a lot level, which is more realistic than models where infections are done on individual circuits. We introduce a variation model for the performance of CMOS structures in real designs which are different from test chips dedicated to the measure of process variations. This model is used to create signatures of lots which are independent of the process variations and is used as a base to define methods allowing to detect HTs and counterfeits in a straightforward way. The model and the methods are validated experimentally on 30 FPGA boards.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:303.3.3ACTIVATION OF LOGIC ENCRYPTED CHIPS: PRE-TEST OR POST-TEST?
Speaker:
Ozgur Sinanoglu, New York University, Abu Dhabi, AE
Authors:
Muhammad Yasin1, Samah Mohamed Saeed2, Jeyavijayan (JV) Rajendran3 and Ozgur Sinanoglu4
1New York University, US; 2University of Washington, Tacoma, US; 3The University of Texas at Dallas, US; 4New York University, Abu Dhabi, AE
Abstract
Logic encryption has been a popular defense against Intellectual Property (IP) piracy, hardware Trojans, reverse engineering, and IC overproduction. It protects a design from these threats by inserting key-gates that break the functionality when controlled by wrong keys. Researchers have taken multiple attempts in breaking logic encryption and leaking its secret key, while they also proposed difficult-to-break logic encryption techniques. Mainly, state-of-the-art logic encryption techniques pursue two different models that differ in when the manufactured chips are activated by loading the secret key on the chip's memory: activation prior to manufacturing test (pre-test) versus subsequent to manufacturing test (post-test). In this paper, we shed light on the interaction between manufacturing test and logic encryption. We assess and compare the pre-test and post-test activation models not only in terms of the impact of logic encryption on test parameters such as fault coverage, test pattern count and test power consumption, but also in terms of the impact of manufacturing test on the security of logic encryption. We outline a test data mining attack that can successfully determine the logic encryption key of a pre-test activated chip by utilizing the test data.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:01IP1-12, 753TOWARDS HIGHLY RELIABLE SRAM-BASED PUFS
Speaker:
Elena Ioana Vatajelu, Politecnico di Torino, IT
Authors:
Elena Ioana Vatajelu1, Giorgio Di Natale2 and Paolo Prinetto3
1POLITO, IT; 2LIRMM, FR; 3Politecnico di Torino, IT
Abstract
Physically Unclonable Functions (PUFs) are emerging cryptographic primitives used to implement low-cost device authentication and secure secret key generation. Several solutions exists for classical CMOS devices, the most investigated solutions today for weak PUF implementation are based on the use of SRAMs which offer the advantage of reusing the memories that already exist in many designs. The efficiency of PUF implementations is strongly dependent on the unclonability and reliability of their responses. It has been shown that SRAM PUFs can guarantee high levels of both unclonability and reliability. However, high reliability is today achieved by using Fuzzy extractor structures combined with complex error correcting codes (ECCs) which increase the complexity and cost of the design. The overheads associated with these techniques increases with their error correction capability. In this paper we define an effective method to identify the unreliable cells in the PUF implementation based on SRAM stability test. This information is used to significantly reduce the need for complex ECCs resulting in efficient, low cost PUF implementations.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:02IP1-13, 791CURRENT BASED PUF EXPLOITING RANDOM VARIATIONS IN SRAM CELLS
Speaker:
Fengchao Zhang, University of Florida, US
Authors:
Fengchao Zhang1, Shuo Yang1, Jim Plusquellic2 and Swarup Bhunia1
1University of Florida, US; 2University of New Mexico, US
Abstract
Physical Unclonable Function (PUF) is a security primitive that has been proven to be effective in diverse security solutions ranging from hardware authentication to on-die entropy generation. PUFs can be implemented in a design in two possible ways: (1) adding a separate dedicated circuit; and (2) reusing an existing onchip structure for generating random signatures. A large percentage of existing PUFs falls into the first category, which suffers from the important drawback of often unacceptable hardware and design overhead. Moreover, they cannot be applied to legacy designs, which do not allow insertion of additional circuit structures. Intrinsic PUFs, that rely on pre-existing circuit structures, such as static randomaccess memory (SRAM), fall into the second category. They, however, typically suffer from poor entropy as well as lack of robustness. In this paper, we introduce a novel PUF implementation of the second category that exploits the effect of manufacturing process variations in SRAM read access current. In particular, we note that transistor level variations in SRAM cells cause significant variations in the read current and the variation changes with the stored content in a SRAM cell. We propose a method to transform the analog read current value for an SRAM array into robust binary signatures. The proposed PUF can be easily employed for authentication of commercial SRAM chips without any design modification. Furthermore, it can be realized, with minor hardware modification, into chips with embedded memory, e.g., a processor, for on-die entropy generation. Simulation results at 45nm CMOS process for 1000 chips as well as measurement results based on 30 commercial SRAM chips, show promising randomness, uniqueness and robustness under environmental fluctuations.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area

3.4 Application-specific Low-power Techniques

Date: Tuesday 15 March 2016
Time: 14:30 - 16:00
Location / Room: Konferenz 2

Chair:
Sheldon X.-D. Tan, University of California at Riverside, US

Co-Chair:
Masaaki Kondo, University of Tokyo, JP

This session introduces power and energy management technics that are tailed for application-specific characteristics. The first paper introduces how simplified neurons without a multiplier can perform in artificial neural networks. The second paper again demonstrates power saving of artificial neural networks with a novel hybrid SRAM cells. The third paper introduces network-aware energy management for mobile applications.

TimeLabelPresentation Title
Authors
14:303.4.1MULTIPLIER-LESS ARTIFICIAL NEURONS EXPLOITING ERROR RESILIENCY FOR ENERGY-EFFICIENT NEURAL COMPUTING
Speaker:
Syed Shakib Sarwar, Purdue University, US
Authors:
Syed Shakib Sarwar, Swagath Venkataramani, Anand Raghunathan and Kaushik Roy, Purdue University, US
Abstract
Large-scale artificial neural networks have shown significant promise in addressing a wide range of classification and recognition applications. However, their large computational requirements stretch the capabilities of computing platforms. The fundamental components of these neural networks are the neurons and its synapses. The core of a digital hardware neuron consists of multiplier, accumulator and activation function. Multipliers consume most of the processing energy in the digital neurons, and thereby in the hardware implementations of artificial neural networks. We propose an approximate multiplier that utilizes the notion of computation sharing and exploits error resilience of neural network applications to achieve improved energy consumption. We also propose Multiplier-less Artificial Neuron (MAN) for even larger improvement in energy consumption and adapt the training process to ensure minimal degradation in accuracy. We evaluated the proposed design on 5 recognition applications. The results show, 35% and 60% reduction in energy consumption, for neuron sizes of 8 bits and 12 bits, respectively, with a maximum of ~2.83% loss in network accuracy, compared to a conventional neuron implementation. We also achieve 37% and 62% reduction in area for a neuron size of 8 bits and 12 bits, respectively, under iso-speed conditions.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:003.4.2SIGNIFICANCE DRIVEN HYBRID 8T-6T SRAM FOR ENERGY-EFFICIENT SYNAPTIC STORAGE IN ARTIFICIAL NEURAL NETWORKS
Speaker:
Syed Shakib Sarwar, Purdue University, US
Authors:
Gopalakrishnan Srinivasan, Parami Wijesinghe, Syed Shakib Sarwar, Akhilesh Jaiswal and Kaushik Roy, Purdue University, US
Abstract
Multilayered artificial neural networks have found widespread utility in classification and recognition applications. The scale and complexity of such networks together with the inadequacies of general purpose computing platforms have led to a significant interest in the development of efficient hardware implementations. In this work, we focus on designing energy-efficient on-chip storage for the synaptic weights, motivated primarily by the observation that the number of synapses is orders of magnitude larger than the number of neurons. Typical digital CMOS implementations of such large-scale networks are power hungry. In order to minimize the power consumption, the digital neurons could be operated reliably at scaled voltages by reducing the clock frequency. On the contrary, the on-chip synaptic storage designed using a conventional 6T SRAM is susceptible to bitcell failures at reduced voltages. However, the intrinsic error resiliency of neural networks to small synaptic weight perturbations enables us to scale the operating voltage of the 6T SRAM. Our analysis on a widely used digit recognition dataset indicates that the voltage can be scaled by 200 mV from the nominal operating voltage (950 mV) for practically no loss (less than 0.5%) in accuracy (22 nm predictive technology). Scaling beyond that causes substantial performance degradation owing to increased probability of failures in the MSBs of the synaptic weights. We, therefore propose a significance driven hybrid 8T-6T SRAM, wherein the sensitive MSBs are stored in 8T bitcells that are robust at scaled voltages due to decoupled read and write paths. In an effort to further minimize the area penalty, we present a synaptic-sensitivity driven hybrid memory architecture consisting of multiple 8T-6T SRAM banks. Our circuit to system-level simulation framework shows that the proposed synaptic-sensitivity driven architecture provides a 30.91% reduction in the memory access power with a 10.41% area overhead, for less than 1% loss in the classification accuracy.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:303.4.3NETWORK DELAY-AWARE ENERGY MANAGEMENT FOR MOBILE SYSTEMS
Speaker:
Soontae Kim, Korea Advanced Institute of Science and Technology, KR
Authors:
Minho Ju, Hyeonggyu Kim and Soontae Kim, Korea Advanced Institute of Science and Technology, KR
Abstract
Smartphones and tablets have occupied the every facet of our daily life in recent years. According to a recent survey, users spend over 3 hours a day on their mobile devices. In addition, 76% and 75% of smartphone users perform web browsing and social networking at least once a day, respectively. To fully enjoy their benefits, those mobile systems require a long battery life. However, network errors such as packet losses decrease the battery life more quickly. We analyzed the reason for this through measurements using real smartphones and mobile full system simulation. We found that the smartphones maintain high performance level on packet losses without doing useful work. To address this problem, we propose a method for reducing energy consumption by lowering down performance level with a Dynamic Voltage and Frequency Scaling mechanism when long network delay is expected due to packet losses. Experimental results show that the total energy consumption is reduced by 8.4% without performance loss.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP1-14, 635BEHAVIORAL MODELING OF TIMING SLACK VARIATION IN DIGITAL CIRCUITS DUE TO POWER SUPPLY NOISE
Speaker:
Taesik Na, Georgia Institute of Technology, US
Authors:
Taesik Na and Saibal Mukhopadhyay, Georgia Institute of Technology, US
Abstract
Timing error due to power supply noise (PSN) is a key challenge for design of digital systems. This paper presents an accurate time-domain behavioral model of timing slack variation due to the PSN while accounting for the clock-data compensation (CDC). The accuracy of the model is verified against SPICE for complex designs including AES engine and LEON3 processor. As a case study, the model is used for time-domain co-simulation of power distribution network (PDN) and LEON3 processor with circuit-based noise tolerance techniques. The analysis shows that the model helps reduce pessimism in estimated timing slack by considering effects of PSN and CDC.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area

3.5 Emerging Devices and Methodologies for Energy Efficient Systems

Date: Tuesday 15 March 2016
Time: 14:30 - 16:00
Location / Room: Konferenz 3

Chair:
Mehdi Tahoori, Karlsruhe Institute of Technology, DE

Co-Chair:
Aida Todri-Sanial, LIRMM, FR

This session explores how new devices can be used to build energy efficient systems. The first paper presents a novel simultaneously bi-directional TSV technology, promising area and energy benefits. The second paper presents programmable logic circuit designs based on nanowire transistors. The last paper examines how inherent characteristics of reversible logic circuits can be exploited to check combinational equivalence in faster ways.

TimeLabelPresentation Title
Authors
14:303.5.1ENABLING SIMULTANEOUSLY BI-DIRECTIONAL TSV SIGNALING FOR ENERGY AND AREA EFFICIENT 3D-ICS
Speaker:
Sunghyun Park, Massachusetts Institute of Technology (MIT), US
Authors:
Sunghyun Park1, Alice Wang2, Uming Ko2, Li-Shiuan Peh1 and Anantha Chandrakasan1
1Massachusetts Institute of Technology (MIT), US; 2MediaTek Inc., US
Abstract
This paper presents an analytic and experimental study on a simultaneously bi-directional (SBD) TSV interconnect capable of energy and area efficient 3D-IC vertical signaling. We first explore TSV channel characteristics that differ from well-known off-chip channel properties, then analyze circuit design tradeoffs for SBD TSV signaling in terms of energy, bandwidth and noise margin. Based on this analysis, we propose a novel SBD TSV signaling circuit optimized for our system-level design goals and given TSV technology. Measurement results on a 28nm CMOS test chip show that the proposed SBD TSV interconnect enables 10.3-31.1% lower energy at 34.4% less area than equivalent two uni-directional TSVs. Although our single SBD TSV has 12.5% lower bandwidth than two uni-directional TSVs, the SBD TSV can support up to 9.1Gb/s/TSV bi-directional signaling (i.e. 4.55GHz maximum clock speed) at 1.05V.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:003.5.2RECONFIGURABLE NANOWIRE TRANSISTORS WITH MULTIPLE INDEPENDENT GATES FOR EFFICIENT AND PROGRAMMABLE COMBINATIONAL CIRCUITS
Speaker:
Jens Trommer, Namlab gGmbH, DE
Authors:
Jens Trommer1, Michael Raitza2, André Heinzig2, Tim Baldauf2, Marcus Völp2, Thomas Mikolajick3 and Walter Weber4
1Namlab gGmbH, DE; 2Technische Universität Dresden, DE; 3NaMLab Gmbh / TU Dresden, DE; 4NaMLab gGmbH and CfAED, DE
Abstract
We present MUX based programmable logic circuits built from newly proposed compact and efficient designs of combinational logic gate. These are enabled by reconfigurable Schottky barrier nanowire transistors with multiple independent gates, which can be dynamically switched between p- and n-type functionality. It will be shown that a single device can be used to replace paths of several transistors in series. This leads to topological differences and increased flexibility in circuit design. We found that especially complex functions, like Majority and Parity gates of many inputs, which are generally avoided in standard CMOS technology, benefit from the new device type. This can be exploited to directly map reconfigurable building blocks, e.g. dynamically switching NAND to NOR. Exemplary 6 functional logic circuits will be shown, which exhibit up to 80% reduction in transistor count, while maintaining the same functionality as compared to the CMOS reference design. Logical effort analysis indicates that 20% less circuit delay and 33% less normalized dynamic power consumption can be achieved.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:303.5.3EXPLOITING INHERENT CHARACTERISTICS OF REVERSIBLE CIRCUITS FOR FASTER COMBINATIONAL EQUIVALENCE CHECKING
Speaker:
Luca Amaru, École Polytechnique Fédérale de Lausanne (EPFL), CH
Authors:
Luca Amaru1, Pierre-Emmanuel Gaillardon2, Robert Wille3 and Giovanni De Micheli1
1École Polytechnique Fédérale de Lausanne (EPFL), CH; 2University of Utah, US; 3Johannes Kepler University Linz, AT
Abstract
Reversible circuits implement invertible logic functions. They are of great interest to cryptography, coding theory, interconnect design, computer graphics, quantum computing, and many other fields. As for conventional circuits, checking the combinational equivalence of two reversible circuits is an important but difficult (coNP-complete) problem. In this work, we present a new approach for solving this problem significantly faster than the state-of-the-art. For this purpose, we exploit inherent characteristics of reversible computation, namely bi-directional (invertible) execution and the XOR-richness of reversible circuits. Bi-directional execution allows us to create an identity miter out of two reversible circuits to be verified, which naturally encodes the equiv- alence checking problem in the reversible domain. Then, the abundant presence of XOR operations in the identity miter enables an efficient problem mapping into XOR-CNF satisfiability. The resulting XOR-CNF formulas are eventually more compact than pure CNF formulas and potentially easier to solve. As previously anticipated, experimental results show that our equivalence checking methodology is more than one order of magnitude faster, on average, than the state-of-the-art solution based on established CNF-formulation and standard SAT solvers.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP1-15, 515LOSSLESS COMPRESSION ALGORITHM BASED ON DICTIONARY CODING FOR MULTIPLE E-BEAM DIRECT WRITE SYSTEM
Speaker:
Pei-Chun Lin, National Taiwan University, TW
Authors:
Pei-Chun Lin, Yu-Hsuan Pai, Yu-Hsiang Chiu, Shao-Yuan Fang and Charlie Chung-Ping Chen, National Taiwan University, TW
Abstract
Electron-beam direct-write (EBDW) lithography is an attractive candidate of next-generation lithography in advanced semiconductor processes. The huge data stream bandwidth required for the data delivery path in EBDW systems could seriously deteriorate throughput, which is one of the major deficiencies constraining EBDW lithography from mass production. A lossless electron-beam layout data compression and decompression algorithm is proposed in this paper for 5-bit gray level bitmaps. Compared with the state-of-the-art LineDiff Entropy algorithm, the proposed method averagely improves the compression rate by 18% and achieves more than 7.5 times speedup for decompression.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:01IP1-16, 283PHONOCMAP: AN APPLICATION MAPPING TOOL FOR PHOTONIC NETWORKS-ON-CHIP
Speaker:
Edoardo Fusella, University of Naples Federico II, IT
Authors:
Edoardo Fusella and Alessandro Cilardo, University of Naples Federico II, IT
Abstract
While providing a promising solution for high-performance on-chip communication, photonic networks-on-chip suffer from insertion loss and crosstalk noise, which may severely constrain their scalability. In this paper, we introduce a methodology and a related tool, PhoNoCMap, for the design space exploration of optical NoCs mapping solutions, which automatically assigns application tasks to the nodes of a generic photonic NoC architecture such that the worst-case either insertion loss or crosstalk noise are minimized. The experimental results show significant benefits in terms of insertion loss and crosstalk noise, allowing improved network scalability.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area

3.6 Timing Analysis and Measurement

Date: Tuesday 15 March 2016
Time: 14:30 - 16:00
Location / Room: Konferenz 4

Chair:
Marko Bertogna, Università di Modena e Reggio Emilia, IT

Co-Chair:
Damien Hardy, University of Rennes 1/IRISA, FR

The papers in this session provide timing estimation techniques for a variety of real-time systems and components, ranging from engine control to networked systems.

TimeLabelPresentation Title
Authors
14:303.6.1CONSERVATIVE MODELING OF SHARED RESOURCE CONTENTION FOR DEPENDENT TASKS IN PARTITIONED MULTI-CORE SYSTEMS
Speaker:
Junchul Choi, Seoul National University, KR
Authors:
Junchul Choi, Donghyun Kang and Soonhoi Ha, Seoul National University, KR
Abstract
In a multi-core system with shared resources, the accesses to the shared resources from several cores may experience non-deterministic arbitration delay due to resource contention. Such delay should be considered conservatively in the worst case response time (WCRT) analysis of multi-core systems. Recently, several techniques have been proposed to account for arbitration delay for shared resource contention, based on the event stream modeling of resource access. While they all assume independent tasks, in this paper, we propose a conservative modeling technique of shared resource contention supporting dependent tasks. To find a tight upper bound of arbitration delay, we derive a shared resource demand bound for each processing element, considering the task dependency. The proposed technique is not specific to a particular WCRT analysis method, and supports both preemptive and non-preemptive scheduling policy. In the experiments, the significance of considering data dependency of parallel applications and the performance of our technique are verified by synthetic examples and a real-life example.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:003.6.2FORMAL WORST-CASE TIMING ANALYSIS OF ETHERNET TSN'S BURST-LIMITING SHAPER
Speaker:
Daniel Thiele, Technische Universität Braunschweig, DE
Authors:
Daniel Thiele and Rolf Ernst, Technische Universität Braunschweig, DE
Abstract
Future in-vehicle networks will use Ethernet as their communication backbone. As many automotive applications are latency-sensitive and have strict real-time requirements, a key challenge in automotive network design is the deterministic low-latency transport of latency-critical Ethernet frames. Time-sensitive networking (TSN) is an upcoming set of Ethernet standards, which address these requirements by specifying new quality of service mechanisms in the form of different traffic shapers. One of these traffic shapers is the burst-limiting shaper (BLS). In this paper, we evaluate whether BLS is able to fulfill these strict timing requirements. We present a formal timing analysis for BLS in order to compute worst-case latency bounds. We use a realistic automotive Ethernet setup to compare BLS against Ethernet AVB and Ethernet following IEEE 802.1Q.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:303.6.3REAL-TIME ANALYSIS OF ENGINE CONTROL APPLICATIONS WITH SPEED ESTIMATION
Speaker:
Alessandro Biondi, Scuola Superiore Sant'Anna, IT
Authors:
Alessandro Biondi and Giorgio Buttazzo, Scuola Superiore Sant'Anna, IT
Abstract
Engine control applications include computational activities that adapt their behavior as a function of the engine speed, referred to as adaptive variable-rate (AVR) tasks. Although a substantial amount of work has been done to analyze the timing behavior of real-time applications with AVR tasks, most of the authors assumed the knowledge of the instantaneous engine speed at any instant. In practice, however, the instantaneous engine speed is not known and can only be estimated by various techniques, which hence introduce an error with respect to the ideal case of perfect knowledge. If not properly handled, such an error can result in a potentially unsafe analysis. This paper proposes a general approach to include speed estimators in the analysis of engine control applications and shows two particular examples using common speed estimators. Finally, estimators are also characterized through a numerical evaluation and experimental results are presented to evaluate their impact in terms of system schedulability.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:453.6.4TRACE-BASED ANALYSIS METHODOLOGY OF PROGRAM FLASH CONTENTION IN EMBEDDED MULTICORE SYSTEMS
Speaker:
Lin Li, Infineon Technologies, DE
Authors:
Lin Li and Albrecht Mayer, Infineon Technologies, DE
Abstract
Contention for shared resources is a major performance issue in multicore systems. In embedded multicore microcontrollers, contentions of program flash accesses have a significant performance impact, because the flash access has a large latency compared to a core clock cycle. Therefore, the detection and analysis of program flash contentions are necessary to remedy this situation. With a lack of existing tools being able to fulfill this task, a novel post-processing analysis methodology is proposed in this paper to acquire the information of program flash contentions in detail based on the non-intrusive trace. This information can be utilized to improve the overall performance and particularly to enhance the real-time performance of specific threads or functions for hard real-time multicore systems.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP1-17, 397DESIGN OF AN EFFICIENT READY QUEUE FOR EARLIEST-DEADLINE-FIRST (EDF) SCHEDULER
Speaker and Author:
Risat Mahmud Pathan, Chalmers University of Technology, SE
Abstract
Although dynamic-priority-based EDF algorithm is known to be theoretically optimal for scheduling sporadic real-time tasks on uniprocessor, fixed-priority (FP) scheduling is mostly used in practice. One of the main reasons for FP scheduling being popular in the industry is its efficient implementation: operations on the ready queue can be done in constant time. On the other hand, ready queue of EDF scheduler is generally implemented as a priority queue, for example, using a binary min-heap data structure in which (insertion/deletion) operation cannot be done in constant time. This paper proposes a new design of ready queue for EDF scheduler: a simple data structure for the ready queue and efficient operations to insert and remove task control blocks (TCBs) to and from the ready queue are proposed. Insertion of a TCB of a newly released job (that cannot preempt the currently-executing job) is done in non-constant time. However, insertion of a TCB of a preempted job or the removal of the TCB of job having the highest EDF priority from the ready queue can be done in constant time. Simulation using randomly generated task sets shows that the overhead of managing jobs in our proposed ready queue for EDF scheduler is significantly lower than that of other approaches. We believe that theoretically optimal EDF algorithm implemented based on our proposed ready-queue data structure will make EDF popular in industry.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area

3.7 Dealing with Runtime Failures

Date: Tuesday 15 March 2016
Time: 14:30 - 16:00
Location / Room: Konferenz 5

Chair:
Lorena Anghel, TIMA Laboratory, FR

Co-Chair:
Michel Renovell, LIRMM, FR

Reliability is an important consideration in modern design. Two key issues in runtime resilience are robustness against soft errors and tolerance of aging effects. The papers in this session consider both effects.

TimeLabelPresentation Title
Authors
14:303.7.1A CROSS-LAYER ANALYSIS OF SOFT ERROR, AGING AND PROCESS VARIATION IN NEAR THRESHOLD COMPUTING
Speaker:
Mehdi B. Tahoori, Karlsruhe Institute of Technology (KIT), DE
Authors:
Anteneh Gebregiorgis, Saman Kiamehr, Fabian Oboril, Rajendra Bishnoi and Mehdi B. Tahoori, Karlsruhe Institute of Technology (KIT), DE
Abstract
Near Threshold Computing (NTC) is a promising approach to reduce the power consumption of modern VLSI designs. However, NTC designs suffer from functional failures and performance loss. Understanding the characteristics of the functional failures and variability effects is of decisive importance in order to mitigate them, and get the most out of NTC. This paper presents a cross-layer reliability analysis in the presence of soft errors, aging and process variation effects in the near threshold voltage domain. The objective is to quantify the reliability of different SRAM designs and to find a reliability-performance optimal cache organization for an NTC microprocessor. In this work, the Soft Error Rate (SER) and Signal Noise Margin (SNM) of 6T and 8T SRAM cells and their dependencies on aging and process variation are investigated by considering device, circuit and architecture level analysis. Their experimental results reveal that in NTC, process variation and aging-induced SNM degradation is 2.5X higher than in the super threshold domain while SER is 8X higher. The use of 8T instead of 6T SRAM cells can reduce the system-level SNM and SER by 14\% and 22\% respectively. Besides, we observe that we can find the right balance between performance and reliability by using an appropriate cache organization at NTC which is different from the super threshold.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:003.7.2FAST-YET-ACCURATE VARIATION-AWARE CURRENT AND VOLTAGE MODELLING OF RADIATION-INDUCED TRANSIENT FAULT
Speaker:
Yuwen Lin, National Chiao Tung University, TW
Authors:
Yuwen (Dave) Lin, Yuwen Lin and Hung-Pin Wen, National Chiao Tung University, TW
Abstract
For robust systems, it is important to mitigate radiation effect in early stages to reduce the design cost. Traditionally, a double-exponential current source is widely used to model the transient fault for analyzing the radiation effects. However, in light of complicating effects in the advanced technologies, such approach is no longer sufficient to estimate transient faults and may lead to inaccurate results. Therefore, we propose a fast-yet- accurate approach to model the radiation-induced transient fault, meanwhile considering the interaction between its transient current and transient voltage. Experimental results show that the proposed method can achieve 10^5X speedup with an average accuracy loss of only 2.6% compared to the 3D mixed-mode TCAD simulation. Moreover, variation sources also become big issues with the progressing technology nodes and thus the proposed method is then extended to incorporate these variations during transient-fault analysis. As a result, sensitivity analysis that covers voltage, gate-length and device-width variations can be performed fast and accurately in our method.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:303.7.3A DETAILED METHODOLOGY TO COMPUTE SOFT ERROR RATES IN ADVANCED TECHNOLOGIES
Speaker:
Marc Riera, Universitat Politècnica de Catalunya (UPC), ES
Authors:
Marc Riera1, Ramon Canal2, Jaume Abella3 and Antonio Gonzalez2
1Universitat Politècnica de Catalunya (UPC), ES; 2UPC-Barcelona, ES; 3Barcelona Supercomputing Center, ES
Abstract
System reliability has become a key design aspect for computer systems due to the aggressive technology miniaturization. Errors are typically dominated by transient faults due to radiation and are strongly related to the technology used to build hardware. However, there is a lack of detailed methodologies to model and fairly compare Soft Error Rates (SER) across different advanced technologies. This work first describes a common methodology that from (1) technology models, (2) location (latitude, longitude and altitude), (3) operating conditions and (4) circuit descriptions (i.e. SRAM, latches, logic gates) can obtain accurate Soft Error Rates. Then, we use it to characterize soft errors through current and future technologies. Results at the technology layer show that new technologies, such as FinFET and SOI, can reduce SER up to 100x while the location can increase SER up to 650x.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:453.7.4ANALYSIS OF NBTI EFFECTS ON HIGH FREQUENCY DIGITAL CIRCUITS
Speaker:
Ahmet Unutulmaz, OFFIS Institute for Information Technology, DE
Authors:
Ahmet Unutulmaz1, Domenik Helms1, Reef Eilers1, Malte Metzdorf1, Ben Kaczer2 and Wolfgang Nebel3
1OFFIS Institute for Information Technology, DE; 2IMEC, BE; 3University of Oldenburg and OFFIS, DE
Abstract
This paper analyzes some of the secondary effects in estimating negative bias temperature instability (NBTI) induced threshold voltage shift on high frequency digital circuits. Therefore, a circuit model is developed to be used for statistical estimation of the threshold voltage shift. Making use of this model as well as technology computer aided design (TCAD) and SPICE simulations, a methodology is developed to estimate NBTI induced threshold voltage shift. Simulation results reveal that commonly made assumptions on digital circuits, such as: square signal assumption and ignorable effect of drain bias, may yield overestimation of the NBTI induced threshold voltage shift by more than 10% after five years of operation, which may lead to a severe underestimation of a circuit's reliability

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP1-18, 91RT LEVEL TIMING MODELING FOR AGING PREDICTION
Speaker:
Nils Koppaetzky, OFFIS Institute for Information Technology, DE
Authors:
Nils Koppaetzky1, Malte Metzdorf1, Reef Eilers1, Domenik Helms1 and Wolfgang Nebel2
1OFFIS Institute for Information Technology, DE; 2University of Oldenburg and OFFIS, DE
Abstract
The simulation of aging related degradation mech- anisms is a challenging task for timing and reliability estimations during all design phases of digital systems. Some good approaches towards accurate, efficient and applicable timing models at the register transfer level (RTL) have already been made. However recent state-of-the-art models often have to access lower levels of abstraction, such as the underlying gate-level netlist for each timing estimation and require to repeat every analyzing step if parameters, input signals or designs are changed. This work introduces a new RTL timing model concept that provides a separation of design analysis and aging estimation. It allows more efficient design evaluations with respect to aging. Although this is work in progress and systematic evaluations are still ongoing, early results indicate the applicability and capability of the approach to compete with recent models both in accuracy and efficiency.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area

3.8 Presentations from FDSOI-Campus and from European Projects Booths: Leveraging new Semiconductor Technologies

Date: Tuesday 15 March 2016
Time: 14:30 - 16:00
Location / Room: Exhibition Theatre

Organiser:
Hans-Jürgen Brand, IDT/ZMDI, DE

In this session GLOBALFOUNDRIES gives an introduction to the applications of the ultra-low-power technology FDSOI and its innovation potential. The REPARA project will show how to make better use of the advances of new semiconductor technologies for parallel computing architectures, boosting application performance and energy efficiency. Attendees are invited to also visit the ultra-low-power technologies campus and project booths for further details and discussions.

TimeLabelPresentation Title
Authors
14:303.8.1GLOBALFOUNDRIES 22FDX INNOVATION POTENTIAL
Speaker:
Gerd Teepe, GLOBALFOUNDRIES, DE
15:303.8.2REPARA - REENGINEERING AND ENABLING PERFORMANCE AND POWER OF APPLICATIONS
Speaker:
Imre Pechan, evopro Innovation Kft., HU
Abstract

In recent years, traditional processors have not been able to translate the advances of silicon fabrication technology into corresponding performance gain due to physical constraints and weaknesses of the sequential programming model. These difficulties have forced a shift from CPU-based homogeneous machines to heterogeneous architectures combining different kinds of computing devices, programmed in a highly parallel fashion yet poorly optimizing the available resources towards performance and low energy consumption.

The REPARA project aims to help the transformation and deployment of new and legacy applications in parallel heterogeneous computing architectures while maintaining a balance between application performance, energy efficiency and source code maintainability. The REPARA framework consists of a set of tools assisting the developer in the course of transforming and deploying the source code on heterogeneous platforms, supporting multicore CPU, GPU, DSP as well as reconfigurable FPGA devices.

The main contribution of evopro Innovation Kft. to the project is to provide industrial use case applications for evaluating the REPARA methodology and tools. One of the use cases is a typical HPC application called molecular docking, the other one derives from the dynamic railway diagnosis system (eRDM) of evopro. Both of the usem were transformed to various accelerator targets using the REPARA framework. Initial evaluation of the framework shows that considerable performance and energy efficiency improvement can be achieved with low developer intervention for real-world industrial use case applications.

 

15:503.8.3Q&A
16:00End of session
Coffee Break in Exhibition Area

UB03 Session 3

Date: Tuesday 15 March 2016
Time: 15:00 - 17:30
Location / Room: Booth 15, Exhibition Area

LabelPresentation Title
Authors
UB03.1HIGH-END 122GHZ MINIATURE RADAR SENSOR FOR AUTONOMOUS AIRCRAFTS
Presenter:
Federico Nava, Heinz Nixdorf Institute - Universität Paderborn, DE
Authors:
Federico Nava1 and Christoph Scheytt2
1Heinz Nixdorf Institute - Universität Paderborn, DE; 2Heinz Nixdorf Institute - Paderborn, DE
Abstract
The importance of high precision sensors, sensors-arrays and the concept of sensor fusion are rising interest in the field of scientific research for autonomous vehicles. For this reason the System and Circuit Technology group at the Heinz Nixdorf Institute is currently developing a highly integrated radar module as a sensor for Unmanned Aerial Vehicle applications. The presented system is composed of a radar IC (130nm SiGe) with in-package antennas and operating frequency of 122GHz mounted on a FLEX-PCB including a CORTEX M4 MCU for a total size of 30x30mm.
The presentation will show the FMCW/CW radar functions of the device, allowing the tracking of velocity and distance for multiple objects. The results of the radar measurements will be presented on a screen showing the raw data acquired in time domain and a FFT representation. Different objects will move simultaneously in the area of reception of the sensor. The results of the tracked distances will be then plotted on screen.

Download Paper (PDF)
UB03.2EXTRA-FUNCTIONAL PROPERTY SIMULATION WITH VIRTUAL PLATFORMS
Presenter:
Ralph Görgen, OFFIS - Intitute for Information Technology, DE
Authors:
Ralph Görgen, Kim Grüttner and Sören Schreiner, OFFIS - Intitute for Information Technology, DE
Abstract
The demo shows the usage of virtual platforms and model-based design to perform early analyses of extra-functional properties in a mixed-critical scenario. The application shown is a quadro-copter equipped with a camera system. The copter's flight controller is safety critical; the video processing is less critical. Both parts of the system are implemented in a single chip, a Xilinx ZNQ SoC. The video processing is implemented in the ARM dual-core, the flight controller is realized in the FPGA part and based on two MicroBlaze cores. This platform has been modeled as an OVP-based virtual platform, which is extended by more fine grain timing models as well as power models. Furthermore, it can be coupled with a model of the quadro-copter physics and environment realized in iXtronics CamelView. We will show how to use this setup to analyze timing, power, and temperature behavior of the system and the interference between the high- and low-critical parts with respect to these properties.

Download Paper (PDF)
UB03.3ETEAK: ASYNCHRONOUS DATAFLOWS SYNTHESIS ONTO FPGAS USING THE ETEAK FRAMEWORK
Presenter:
Mahdi Jelodari Mamaghani, The University of Manchester, GB
Authors:
Mahdi Jelodari Mamaghani, Jim Garside and Steve Furber, The University of Manchester, GB
Abstract
We exploit eTeak (De-Elastisation [DATE'15] enabled) to synthesise asynchronous dataflow descriptions in Balsa into synchronous structure loadable onto FPGA. We will be also able to demonstrate the software realisation of the same architecture running on a laptop and let the audience compare the hardware vs. software concurrency. A brief experiment conducted in our recent study where a prime number generator (aka sieve of Eratosthenes) is implemented both in software using the CSP compiler and hardware using eTeak: On average the hardware implementation runs 90-120x faster than its software counterpart while the processor clock speed is almost the same as the hardware clock speed (1.2GHz). This allows us to plan ahead and exploit eTeak toward energy-efficient synthesis. According to EPSRC's research portpolio this work falls under the most growing research subject of "Energy Efficiency" which aims to achieve an energy reduction of 26-43% by exploiting ICT.

Download Paper (PDF)
UB03.4LISA: ENABLING LAYERED INTEROPERABILITY FOR INTERNET OF THINGS THROUGH LISA
Presenter:
Behailu Shiferaw Negash, University of Turku, FI
Authors:
Behailu Shiferaw Negash1, Amir-Mohammad Rahmani1, Tomi Westerlund1, Pasi Liljeberg1 and Hannu Tenhunen2
1University of Turku, FI; 2University of Turku, FI and Royal Institute of Technology (KTH), SE
Abstract
There is high expectation towards the changes that come with the implementation of the Internet of Things (IoT). However, this vision is limited by the heterogeneous nature of IoT devices. This led to vertical application silos that are incapable of working together. To ease this problem of heterogeneity, we have developed a lightweight interoperability framework, LISA, to hide variations in communication technology and data formats and provide a uniform API for programmers. LISA is inspired by Network on Terminal Architecture (NoTA), an open framework from Nokia Research Center. There are few frameworks for interoperability of IoT. However, these frameworks fail to address the resource limitations of the majority of IoT devices. To the best of our knowledge, LISA is the first framework designed for resource constrained devices. This demonstration shows LISA in action, helping heterogeneous devices interoperate through a gateway in the fog layer between the devices and the cloud.

Download Paper (PDF)
UB03.5NEURODSP: A MULTI-PURPOSE ENERGY-OPTIMIZED ACCELERATOR FOR NEURAL NETWORKS
Presenter:
Jean-Marc PHILIPPE, CEA LIST, FR
Authors:
Jean-Marc PHILIPPE, Alexandre CARBON and Renaud SCHMIT, CEA LIST, FR
Abstract
Deep Neural Networks (e.g. Convolutional Neural Networks) is a promising approach to design smart machines for a wide range of application domains (automotive, home automation, industry, etc.). Due to their structure, these processing chains are compute intensive and difficult to embed into low power systems. To tackle this challenge, CEA LIST investigated the NeuroDSP hardware accelerator IP, able to be embedded into FPGA- or ASIC-based systems. Providing the system with a dramatic performance/watt ratio improvement, the IP can sustain 450GMACS/W in FDSOI 28nm technology, meeting the requirements of high-end embedded applications. The proposed demonstration features a comparison between three implementations of a CNN processing chain used to detect faces in a large image database. It shows that a single cluster FPGA-based implementation of the NeuroDSP IP at 100MHz is able to outperform both a Raspberry Pi 2 and an Odroid-XU3 board by a factor of respectively 10 and 6 in performance.

Download Paper (PDF)
UB03.6IDDD: AN INTERACTIVE DEPENDABILITY DRIVEN DESIGN SPACE EXPLORATION
Presenter:
Stefan Scharoba, Brandenburg University of Technology Cottbus-Senftenberg, DE
Authors:
Stefan Scharoba, Jacob Lorenz and Heinrich T. Vierhaus, Brandenburg University of Technology Cottbus-Senftenberg, DE
Abstract
Due to the downscaling of transistor feature sizes, today's integrated circuits are much more likely to be affected by transient or permanent faults. In order to still meet certain dependability requirements, many different fault tolerance techniques have been developed, which can handle these faults in the field. Each of these techniques is associated with distinct costs and benefits. As a consequence, finding the fault tolerant implementation of the system that meets the actual requirements best represents a challenging task. We propose a tool that supports this process. It offers a set of hardware based fault tolerance techniques that can be applied to a given VHDL model. Afterwards, costs and benefits of the respective design choice are estimated automatically. Thus several fault tolerant versions of the design can be evaluated and compared with each other without implementing them manually. Finally, the VHDL code of the preferred design candidate can be generated by the tool.

Download Paper (PDF)
UB03.7DIGITALLY DRIVEN TOP-DOWN METHODOLOGY FOR MIXED SIGNAL CIRCUIT DESIGN
Presenter:
Markus Mueller, University of Heidelberg, DE
Authors:
Markus Mueller, Maximilian Thuermer and Ulrich Bruening, University of Heidelberg, DE
Abstract
In this methodology,synthesizable modules and full custom blocks are first described in an HDL in a top-down approach. For analog cells, real number based models are created.Once the complete mixed signal model is done, each cell in the design is completely described concerning interface and behavior. The models then serve as specification for the full custom cell development.Schematics which don't include any primitives are automatically generated from the HDL description by a scripted flow to ensure consistency.Design space exploration can be done fast and very efficient this way. Cells which can be reused at different places in the design are identified and problems arising from interactions on the system level are found early in the design phase.This methodology accelerates the design process significantly, avoids errors and provides higher flexibility for design changes. A digital centric design example of a High Speed SerDes IP is demonstrated using the described methodology.

Download Paper (PDF)
UB03.8GPCDS: AN INTERACTIVE TOOL FOR CREATING SCHEMATIC MODULE GENERATORS IN ANALOG IC DESIGN
Presenter:
Matthias Greif, Reutlingen University, DE
Authors:
Matthias Greif and Juergen Scheible, Reutlingen University, DE
Abstract
While digital design automation is highly developed, analog design automation still remains behind the demands. Previous approaches of circuit creation, which are usually based on optimization algorithms, do not satisfy industrial requirements. A promising alternative is given by procedural approaches, which imitate the solution strategy of a human expert. We are working on parameterized generators (such as PCells) for analog circuit and layout modules, special kinds of such procedures. We present "gPCDS", a novel tool for the creation of schematic generators for analog circuit design. Associated with a common design environment, gPCDS offers a sophisticated interactive design flow for the development of schematic PCells. gPCDS thus substitutes the crucial process of manual code writing by an intuitive graphic-based way of schematic PCell creation. The GUI of gPCDS provides a variety of useful functions, such as defining parameter ranges or placing predefined building blocks.

Download Paper (PDF)
UB03.9PFPSIM: A PROGRAMMABLE FORWARDING PLANE SIMULATOR
Presenter:
Gordon Bailey, Concordia University, CA
Author:
Gordon Bailey, Concordia University, CA
Abstract
We demonstrate PFPSim, a host-compiled simulator for early validation and analysis of packet processing applications on programmable forwarding plane architectures, used in software defined networks. The simulation model is automatically generated from a high-level description of the hardware/software architecture of the forwarding device and the behavioral description of the various modules in the architecture. Our high-level architectural description language is capable of defining many-core network processors as well as reconfigurable pipelines. The behavior of the fixed-function processing elements in the architecture is defined in C++. The code targeted for the processor cores, or reconfigurable pipeline stages, is compiled from P4, an emerging programming language for packet processing applications. Network dataplane programmers can use PFPSim as a virtual prototype to simulate and debug their applications before hardware availability.

Download Paper (PDF)
UB03.10ALPT: A FAST PROTOTYPING METHODOLOGY WITH CONSTRAINED FLOORPLANING ON ANALOG LAYOUT GENERATION
Presenter:
Po-Cheng Pan, National Chiao Tung University, TW
Authors:
Po-Cheng Pan, Hung-Wen Huang and Hung-Ming Chen, National Chiao Tung University, TW
Abstract
Layout generation in the recent analog design is challenging by its critical layout dependent effect (LDE). Based on the same netlist design, different layouts lead distinct performances. Therefore, it is necessary to observe and avoid the LDE during generation. Traditionally, the strategies of analog layout generation mostly count on experienced designers. However, the experience is based on time-consuming manually try-run, which is inefficient and unreliable. In this work, we develop a fast prototyping for analog layout generation. In our approach, we apply a fast floorplanning algorithm, for multi-layout generation and select the feasible results w.r.t. the analog constraints pre-decided. For practical usage, we implement this approach embedded on the EDA-tool so that layout designers are able to design with such prototypes for efficiency. The demonstration includes layout prototyping generation, the integration between our program and EDA-tool and the resulting layout prototypes.

Download Paper (PDF)
17:30End of session

IP1 Interactive Presentations

Date: Tuesday 15 March 2016
Time: 16:00 - 16:30
Location / Room: Conference Level, Foyer

Interactive Presentations run simultaneously during a 30-minute slot. A poster associated to the IP paper is on display throughout the afternoon. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session, prior to the actual Interactive Presentation. Moreover, one "Best Interactive Presentation Award" will be given.

LabelPresentation Title
Authors
IP1-1A SCALABLE LANE DETECTION ALGORITHM ON COTSS WITH OPENCL
Speaker:
Kai Huang, Sun Yat-Sen University, CN
Authors:
Kai Huang1, Biao Hu2, Jan Botsch3, Nikhil Madduri3 and Alois Knoll3
1Sun Yat-Sen University, CN; 2Tech­nische Univer­sität München (TUM), DE; 3Technische Universität München (TUM), DE
Abstract
Road lane detection are classical requirements for advanced driving assistant systems. With new computer technologies, lane detection algorithms can be exploited on Cots platforms. This paper investigates the use of OpenCL and develop a particle- filter based lane detection algorithm that can tune the trade-off between detection accuracy and speed. Our algorithm is tested on 14 video streams from different data-sets with different scenarios on different Cots hardware. With an average deviation fewer than 5 pixels, the average frame rates for the 14 videos can reach about 400 fps on both Gpu and Fpga. The peak frame rates for certain videos on GPU can reach almost 1000 fps.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-2SIMULATION OF FALLING RAIN FOR ROBUSTNESS TESTING OF VIDEO-BASED SURROUND SENSING SYSTEMS
Speaker:
Dennis Hospach, Universität Tübingen, DE
Authors:
Dennis Hospach1, Stefan Mueller1, Wolfgang Rosenstiel1 and Oliver Bringmann2
1Universität Tübingen, DE; 2Universität Tübingen / FZI, DE
Abstract
Recently, optical sensors have become a standard item in modern cars, raising questions with respect to the necessary testing under various ambient effects. In order to achieve a high test coverage of vision-based surround sensing systems, a lot of different environmental conditions need to be tested. Unfortunately, it is by far too time-consuming to build test sets of all relevant environmental conditions by recording real video data. This paper presents a novel approach for ambient-aware virtual prototyping and robustness testing. We propose a method to significantly reduce the needed on-road captures being used for design and validation of vision-based Advanced Driver Assistance Systems (ADAS) and fully automated driving. Our approach facilitates the generation of comparable test sets by using largely reduced amounts of real on-road captures and applying computer-generated variations of falling rain to it in a comprehensive virtual prototyping environment. In combination with the simulation of camera properties, which influence the visual effects of falling rain to a great extent, we are able to generate different rain scenarios under a wide variety of parameters. Our approach has been applied to an automotive lane detection system using a series of multiple rain scenarios. We have explored, how falling rain can influence such a system and how such behavior can be detected using simulated rain scenarios.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-3PROPOSAL FOR FAST DIRECTIONAL ENERGY INTERCHANGE USED IN MCMC-BASED AUTONOMOUS DECENTRALIZED MECHANISM TOWARD RESILIENT MICROGRID
Speaker:
Yusuke Sakumoto, Tokyo Metropolitan University, JP
Authors:
Yusuke Sakumoto1 and Ittetsu Taniguchi2
1Tokyo Metropolitan University, JP; 2Ritsumeikan University, JP
Abstract
Microgrid is well known as key technology to improve renewable energy's ease of use. Some previous works focused on a microgrid that is divided into autonomous electricity subsystems~(AESs) for its reliability and scalability. We have proposed the MCMC-based autonomous decentralized mechanism (ADM) to perform energy interchange between AESs so as to be supply energy appropriately for different energy demands among AESs. In this paper, toward resilient of microgrids, we design a method to realize directional energy interchange in our ADM on the basis of the convection diffusion. We investigate the effectiveness of the proposed method through simulation experiment considering energy shortage and emergency situations. We clarify that the proposed method can fast supply energy from external power grid to a microgrid under energy shortage situation, and can fast gather distributed energy to a specific AES~(e.g., safe shelter) under emergency situation.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-4GRID-BASED SELF-ALIGNED QUADRUPLE PATTERNING AWARE TWO DIMENSIONAL ROUTING PATTERN
Speaker:
Atsushi Takahashi, Tokyo Institute of Technology, JP
Authors:
Takeshi Ihara1, Toshiyuki Hongo1, Atsushi Takahashi1 and Chikaaki Kodama2
1Tokyo Institute of Technology, JP; 2Toshiba, JP
Abstract
Self-Aligned Quadruple Patterning (SAQP) is an important manufacturing technique for sub 14 nm technology node. Although various routing algorithms for SAQP have been proposed, it is not easy to find a dense SAQP compliant routing pattern efficiently. Even though a grid for SAQP compliant routing pattern was proposed, it is not easy to find a valid routing pattern on the grid. The routing pattern of SAQP on the grid consists of three types of routing. Among them, third type has turn prohibition constraint on the grid. Typical routing algorithms often fail to find a valid routing for third type. In this paper, SAQP compliant two dimensional routing patterns are found effectively on the grid by finding an optimal valid tertiary pattern. Experiments show that SAQP compliant routing patterns are found efficiently.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-5PRACTICAL ILP-BASED ROUTING OF STANDARD CELLS
Speaker:
Rung-Bin Lin, Yuan Ze University, TW
Authors:
Hsueh-Ju Lu, En-Jang Jang, Ang Lu, Yu Ting Zhang, Yu-He Chang, Chi-Hung Lin and Rung-Bin Lin, Yuan Ze University, TW
Abstract
This paper proposes a two-stage transistor routing approach that synergizes the merits of channel routing and integer linear programming for CMOS standard cells. It can route 185 cells in 611 seconds. About 21% of cells obtained by our approach have smaller wire length than their handcrafted counterparts. Only 11% of cells use more vias than their handcrafted counterparts. Our router completes routing of many cells that cannot be routed by an industrial one.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-6A PROCEDURE FOR IMPROVING THE DISTRIBUTION OF CONGESTION IN GLOBAL ROUTING
Speaker:
Azadeh Davoodi, University of Wisconsin - Madison, US
Authors:
Daohang Shi, Azadeh Davoodi and Jeffrey Linderoth, University of Wisconsin - Madison, US
Abstract
This work introduces a procedure which takes as input a global routing solution that is already improved for routability based on the traditional total overflow (TOF) metric, and then improves the distribution of congestion without increasing the TOF. Our router is able to significantly decrease the number of edges in undesirable ranges of congestion by optimizing a convex piece-wise linear penalty function. The penalties are flexible and may be specified by the user. In our experiments, using the already-optimized global routing solutions of the ISPD'11 benchmarks—mostly have 0 units of TOF—we show the number of edges which are utilized very close to capacity can be significantly reduced. This work is the first to explicitly target improving the distribution of edge congestion corresponding to an already-optimized global routing solution without sacrificing the TOF.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-7(Best Paper Award Candidate)
MACHINE LEARNED MACHINES: ADAPTIVE CO-OPTIMIZATION OF CACHES, CORES, AND ON-CHIP NETWORK
Speaker:
Rahul Jain, Indian Institute of Technology Delhi, IN
Authors:
Rahul Jain1, Preeti Ranjan Panda1 and Sreenivas Subramoney2
1Indian Institute of Technology Delhi, IN; 2Intel, IN
Abstract
Abstract—Modern multicore architectures require runtime optimization techniques to address the problem of mismatches between the dynamic resource requirements of different processes and the runtime allocation. Choosing between multiple optimizations at runtime is complex due to the non-additive effects, making the adaptiveness of the machine learning techniques useful. We present a novel method, Machine Learned Machines (MLM), by using Online Reinforcement Learning (RL) to perform dynamic partitioning of the last level cache (LLC), along with dynamic voltage and frequency scaling (DVFS) of the core and uncore (interconnection network and LLC). We show that the co-optimization results in much lower energy-delay product (EDP) than any of the techniques applied individually. The results show an average of 19.6% EDP and 2.6% execution time improvement over the baseline.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-8IMPROVING PERFORMANCE BY MONITORING WHILE MAINTAINING WORST-CASE GUARANTEES
Speaker:
Syed Md Jakaria Abdullah, Uppsala University, SE
Authors:
Syed Md Jakaria Abdullah, Kai Lampka and Wang Yi, Uppsala University, SE
Abstract
With real-time systems, feasibility analysis is based on worst-case scenarios. At run-time, worst-case situations are often very unlikely to occur. With the system being dimensioned for the worst-case, one faces low resource utilization and implicit loss in performance at run-time. We propose to use run-time monitoring for evaluating the deviation of job releases from their worst-case release bound. This allows us to compute a conservative bound on the future workload. Based on this, we design a scheme for reclaiming computation time, which has been originally allocated for the jobs which are now known to be absent. By organizing the consumption of extra computing time in a dynamic and time-safe manner, we improve the run-time performance of applications and provably maintain the worst-case guarantees for their response times. We evaluate the usefulness of the presented approach by using randomly generated traces of job releases.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-9FAULT TOLERANT NON-VOLATILE SPINTRONIC FLIP-FLOP
Speaker:
Rajendra Bishnoi, Karlsruhe Institute of Technology (KIT), DE
Authors:
Rajendra Bishnoi, Fabian Oboril and Mehdi Tahoori, Karlsruhe Institute of Technology (KIT), DE
Abstract
With technology down scaling, static power has become one of the biggest challenges in a System-On-Chip. Normally-off computing using non-volatile sequential elements is a promising solution to address this challenge. Recently, many non-volatile shadow flip-flop architectures were introduced, in which Magnetic Tunnel Junction (MTJ) cells are employed as backup storing elements. Due to the emerging fabrication processes of magnetic layers, MTJs are more susceptible to manufacturing defects than their CMOS counterparts. Moreover, unlike memory arrays that can effectively be repaired with well-established memory repair and coding schemes, flip-flops scattered in the layout are more difficult to repair. So, without effective defect and fault tolerance for non-volatile flip-flops, the manufacturing yield will be severely affected. Therefore, we propose a Fault Tolerant Non-Volatile Latch (FTNV-L) design, in which we arrange several MTJ cells in such a way that it is resilient to various MTJ faults. Simulation results show that our proposed FTNV-L can effectively tolerate all single MTJ faults with considerably lower overhead than traditional approaches.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-10TOWARDS AUTOMATIC DIAGNOSIS OF MINORITY CARRIERS PROPAGATION PROBLEMS IN HV/HT AUTOMOTIVE SMART POWER ICS
Speaker:
Yasser Moursy, Sorbonne Universités, UPMC, FR
Authors:
Yasser Moursy1, Hao Zou1, Ramy Iskander1, Pierre Tisserand2, Dieu-My Ton2, Giuseppe Pasetti3, Ehrenfried Seebacher4, Alexander Steinmair4, Thomas Gneiting5 and Heidrun Alius5
1Sorbonne Universités, UPMC, FR; 2Valeo, Creteil, FR; 3AMS, Navacchio, IT; 4AMS AG, Unterpremstaetten, AT; 5AdMOS, Frickenhausen, DE
Abstract
In this paper, a proposed methodology to identify the substrate coupling effects in smart power integrated circuits is presented. This methodology is based on a tool called AUTOMICS to extract substrate parasitic network. This network comprises diodes and resistors that are able to maintain the continuity of minority carrier concentration. The contribution of minority carriers in the substrate noise is significant in high-voltage and high temperature applications. The proposed methodology along with conventional latch-up problem identification for a test case automotive chip AUTOCHIP1 are presented. The time of the proposed methodology is significantly shorter than the conventional one. The proposed methodology could significantly shorten the time-to-market and ameliorate the robustness of the design.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-12TOWARDS HIGHLY RELIABLE SRAM-BASED PUFS
Speaker:
Elena Ioana Vatajelu, Politecnico di Torino, IT
Authors:
Elena Ioana Vatajelu1, Giorgio Di Natale2 and Paolo Prinetto3
1POLITO, IT; 2LIRMM, FR; 3Politecnico di Torino, IT
Abstract
Physically Unclonable Functions (PUFs) are emerging cryptographic primitives used to implement low-cost device authentication and secure secret key generation. Several solutions exists for classical CMOS devices, the most investigated solutions today for weak PUF implementation are based on the use of SRAMs which offer the advantage of reusing the memories that already exist in many designs. The efficiency of PUF implementations is strongly dependent on the unclonability and reliability of their responses. It has been shown that SRAM PUFs can guarantee high levels of both unclonability and reliability. However, high reliability is today achieved by using Fuzzy extractor structures combined with complex error correcting codes (ECCs) which increase the complexity and cost of the design. The overheads associated with these techniques increases with their error correction capability. In this paper we define an effective method to identify the unreliable cells in the PUF implementation based on SRAM stability test. This information is used to significantly reduce the need for complex ECCs resulting in efficient, low cost PUF implementations.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-13CURRENT BASED PUF EXPLOITING RANDOM VARIATIONS IN SRAM CELLS
Speaker:
Fengchao Zhang, University of Florida, US
Authors:
Fengchao Zhang1, Shuo Yang1, Jim Plusquellic2 and Swarup Bhunia1
1University of Florida, US; 2University of New Mexico, US
Abstract
Physical Unclonable Function (PUF) is a security primitive that has been proven to be effective in diverse security solutions ranging from hardware authentication to on-die entropy generation. PUFs can be implemented in a design in two possible ways: (1) adding a separate dedicated circuit; and (2) reusing an existing onchip structure for generating random signatures. A large percentage of existing PUFs falls into the first category, which suffers from the important drawback of often unacceptable hardware and design overhead. Moreover, they cannot be applied to legacy designs, which do not allow insertion of additional circuit structures. Intrinsic PUFs, that rely on pre-existing circuit structures, such as static randomaccess memory (SRAM), fall into the second category. They, however, typically suffer from poor entropy as well as lack of robustness. In this paper, we introduce a novel PUF implementation of the second category that exploits the effect of manufacturing process variations in SRAM read access current. In particular, we note that transistor level variations in SRAM cells cause significant variations in the read current and the variation changes with the stored content in a SRAM cell. We propose a method to transform the analog read current value for an SRAM array into robust binary signatures. The proposed PUF can be easily employed for authentication of commercial SRAM chips without any design modification. Furthermore, it can be realized, with minor hardware modification, into chips with embedded memory, e.g., a processor, for on-die entropy generation. Simulation results at 45nm CMOS process for 1000 chips as well as measurement results based on 30 commercial SRAM chips, show promising randomness, uniqueness and robustness under environmental fluctuations.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-14BEHAVIORAL MODELING OF TIMING SLACK VARIATION IN DIGITAL CIRCUITS DUE TO POWER SUPPLY NOISE
Speaker:
Taesik Na, Georgia Institute of Technology, US
Authors:
Taesik Na and Saibal Mukhopadhyay, Georgia Institute of Technology, US
Abstract
Timing error due to power supply noise (PSN) is a key challenge for design of digital systems. This paper presents an accurate time-domain behavioral model of timing slack variation due to the PSN while accounting for the clock-data compensation (CDC). The accuracy of the model is verified against SPICE for complex designs including AES engine and LEON3 processor. As a case study, the model is used for time-domain co-simulation of power distribution network (PDN) and LEON3 processor with circuit-based noise tolerance techniques. The analysis shows that the model helps reduce pessimism in estimated timing slack by considering effects of PSN and CDC.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-15LOSSLESS COMPRESSION ALGORITHM BASED ON DICTIONARY CODING FOR MULTIPLE E-BEAM DIRECT WRITE SYSTEM
Speaker:
Pei-Chun Lin, National Taiwan University, TW
Authors:
Pei-Chun Lin, Yu-Hsuan Pai, Yu-Hsiang Chiu, Shao-Yuan Fang and Charlie Chung-Ping Chen, National Taiwan University, TW
Abstract
Electron-beam direct-write (EBDW) lithography is an attractive candidate of next-generation lithography in advanced semiconductor processes. The huge data stream bandwidth required for the data delivery path in EBDW systems could seriously deteriorate throughput, which is one of the major deficiencies constraining EBDW lithography from mass production. A lossless electron-beam layout data compression and decompression algorithm is proposed in this paper for 5-bit gray level bitmaps. Compared with the state-of-the-art LineDiff Entropy algorithm, the proposed method averagely improves the compression rate by 18% and achieves more than 7.5 times speedup for decompression.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-16PHONOCMAP: AN APPLICATION MAPPING TOOL FOR PHOTONIC NETWORKS-ON-CHIP
Speaker:
Edoardo Fusella, University of Naples Federico II, IT
Authors:
Edoardo Fusella and Alessandro Cilardo, University of Naples Federico II, IT
Abstract
While providing a promising solution for high-performance on-chip communication, photonic networks-on-chip suffer from insertion loss and crosstalk noise, which may severely constrain their scalability. In this paper, we introduce a methodology and a related tool, PhoNoCMap, for the design space exploration of optical NoCs mapping solutions, which automatically assigns application tasks to the nodes of a generic photonic NoC architecture such that the worst-case either insertion loss or crosstalk noise are minimized. The experimental results show significant benefits in terms of insertion loss and crosstalk noise, allowing improved network scalability.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-17DESIGN OF AN EFFICIENT READY QUEUE FOR EARLIEST-DEADLINE-FIRST (EDF) SCHEDULER
Speaker and Author:
Risat Mahmud Pathan, Chalmers University of Technology, SE
Abstract
Although dynamic-priority-based EDF algorithm is known to be theoretically optimal for scheduling sporadic real-time tasks on uniprocessor, fixed-priority (FP) scheduling is mostly used in practice. One of the main reasons for FP scheduling being popular in the industry is its efficient implementation: operations on the ready queue can be done in constant time. On the other hand, ready queue of EDF scheduler is generally implemented as a priority queue, for example, using a binary min-heap data structure in which (insertion/deletion) operation cannot be done in constant time. This paper proposes a new design of ready queue for EDF scheduler: a simple data structure for the ready queue and efficient operations to insert and remove task control blocks (TCBs) to and from the ready queue are proposed. Insertion of a TCB of a newly released job (that cannot preempt the currently-executing job) is done in non-constant time. However, insertion of a TCB of a preempted job or the removal of the TCB of job having the highest EDF priority from the ready queue can be done in constant time. Simulation using randomly generated task sets shows that the overhead of managing jobs in our proposed ready queue for EDF scheduler is significantly lower than that of other approaches. We believe that theoretically optimal EDF algorithm implemented based on our proposed ready-queue data structure will make EDF popular in industry.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-18RT LEVEL TIMING MODELING FOR AGING PREDICTION
Speaker:
Nils Koppaetzky, OFFIS Institute for Information Technology, DE
Authors:
Nils Koppaetzky1, Malte Metzdorf1, Reef Eilers1, Domenik Helms1 and Wolfgang Nebel2
1OFFIS Institute for Information Technology, DE; 2University of Oldenburg and OFFIS, DE
Abstract
The simulation of aging related degradation mech- anisms is a challenging task for timing and reliability estimations during all design phases of digital systems. Some good approaches towards accurate, efficient and applicable timing models at the register transfer level (RTL) have already been made. However recent state-of-the-art models often have to access lower levels of abstraction, such as the underlying gate-level netlist for each timing estimation and require to repeat every analyzing step if parameters, input signals or designs are changed. This work introduces a new RTL timing model concept that provides a separation of design analysis and aging estimation. It allows more efficient design evaluations with respect to aging. Although this is work in progress and systematic evaluations are still ongoing, early results indicate the applicability and capability of the approach to compete with recent models both in accuracy and efficiency.

Download Paper (PDF; Only available from the DATE venue WiFi)

4.1 Executive Track Panel: Trends & Challenges to Ensure Security

Date: Tuesday 15 March 2016
Time: 17:00 - 18:30
Location / Room: Saal 2

Organiser:
Yervant Zorian, Synopsys, US

While the new chips in the mission critical applications keep growing both in functionality and numbers, protecting the security of their content remains a major challenge. The extent of connectedness and the wealth of accessibility provided in today's chips negatively impact the security of these applications. The speakers in this executive session will address the current trends and challenges of hardware security.

Executives:

  • Mike Borza, Synopsys, CA
  • Leo Dorrendorf, ARM, US
  • Bill Eklow, Cisco Systems, US
  • Serge Leef, Mentor, US
18:30End of session

4.2 Hot Topic: Nanoelectronic Design Tools Addressing Coupled Problems for 3D-IC Integration

Date: Tuesday 15 March 2016
Time: 17:00 - 18:30
Location / Room: Konferenz 6

Organisers:
Caren Tischendorf, Humboldt University of Berlin, DE
Jan ter Maten, University of Wuppertal, DE

Chair:
Wim Schoenmaker, Magwel NV, Leuven, BE

Co-Chair:
Caren Tischendorf, Humboldt University of Berlin, DE

The 3D-IC integration involves strong feedback coupled problems caused by electrical proximity and heat dissipation as well as new design challenges due to immense variety and complexity. New sophisticated modeling and simulation techniques are required in order to facilitate robust designs and enable complex analyses. Within a special hot-topic session, speakers from industry (NXP, ACCO Semiconductor), CAD tool vendors (MAGWEL NV, ON Semiconductor Belgium) and research institutions (University of Wuppertal, TU Darmstadt, Humboldt University of Berlin, Max Planck Institute for Dynamics of Complex Technical Systems, University of Applied Sciences Upper Austria) shall present new jointly developed CAD tools enabling coupled electromagnetic field-circuit-heat simulations, coupled electro-thermal-stress analyses as well as aging effect predictions based on enhanced, parameterized model order reduction techniques, multirate methods, monolithic field-circuit modeling, holistic electro-thermal modeling and uncertainty quantification via adapted probability distributions.

TimeLabelPresentation Title
Authors
17:004.2.1FAST TIME DOMAIN SIMULATION FOR RELIABLE FAULT DETECTION
Speaker:
Jos J. Dohmen, NXP Semiconductors, NL
Authors:
Bratislav Tasic1, Jos J. Dohmen1, Rick Janssen1, E. Jan W. ter Maten2, Theo J.G. Beelen3 and Roland Pulch4
1NXP Semiconductors, NL; 2Bergische Universität Wuppertal, DE; 3Eindhoven University of Technology, NL; 4Ernst-Moritz-Arndt-Universität Greifswald, DE
Abstract
Imperfections in manufacturing processes may cause unwanted connections (faults) that are added to the nominal, "golden", design of an electronic circuit. By fault simulation we simulate all situations: a huge number of new connections and each with many different values, up to the regime of large deviations, for the newly added element. We also consider "opens" (broken connections). A strategy is developed to efficiently simulate the faulty solutions until their moment of detection. We fully exploit the hierarchical structure of the circuit. Fast fault simulation is achieved in which the golden solution and all faulty solutions are calculated over the same time step.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:224.2.2HOLISTIC COUPLED FIELD AND CIRCUIT SIMULATION
Speaker:
Christian Strohm, Humboldt University of Berlin, DE
Authors:
Peter Meuris1, Wim Schoenmaker1, Christian Strohm2 and Caren Tischendorf2
1Magwel NV, Leuven, BE; 2Humboldt University of Berlin, DE
Abstract
Circuit simulators used in semiconductor industry are based on lumped element models described in form of net lists. In order to be able to incorporate the mutual electromagnetic influence of neighboring elements (e.g. cross talking), one needs refined models based on a sufficiently exact discretization of the full Maxwell equations. Here, we present a holistic simulation approach for lumped circuit models including 3D electromagnetic field models for specific devices.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:444.2.3MODEL ORDER REDUCTION FOR NANOELECTRONICS COUPLED PROBLEMS WITH MANY INPUTS
Speaker:
Nicodemus Banagaaya, Max Planck Institute for Dynamics of Complex Technical Systems, DE
Authors:
Nicodemus Banagaaya1, Lihong Feng1, Wim Schoenmaker2, Peter Meuris2, Aarnout Wieers3, Renaud Gillon3 and Peter Benner1
1Max Planck Institute for Dynamics of Complex Technical Systems, DE; 2Magwel NV, Leuven, BE; 3ON Semiconductor, BE
Abstract
This paper is concerned with Model Order Reduction (MOR) for nanoelectronics coupled problems with many inputs. Our main applications are electro-thermal coupled problems described by nonlinear quadratic differential-algebraic systems (DAEs). We present algorithms that combine the advantages of the splitting techniques for DAEs and the existing MOR methods for systems with many inputs such as sparse implicit projection (SIP) for RC/RLC networks and MOR based on the superposition principle.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:064.2.4SHAPE OPTIMIZATION OF A POWER MOS DEVICE TRANSISTOR UNDER UNCERTAINTIES
Speaker:
Piotr Putek, Bergische Universität Wuppertal, DE
Authors:
Piotr Putek1, Peter Meuris2, Roland Pulch3, E. Jan W. ter Maten1, Michael Günther1, Wim Schoenmaker2, Frederik Deleu4 and Aarnout Wieers4
1Bergische Universität Wuppertal, DE; 2Magwel NV, Leuven, BE; 3Ernst-Moritz-Arndt-Universität Greifswald, DE; 4ON Semiconductor, BE
Abstract
Abstract—In this paper we focus on a shape/topology optimization problem of a power MOS transistor under geometrical and material uncertainties to reduce the current density overshoot. This problem, occurring in the automotive industry, yields a stochastic electro-thermal coupled problem. Its solution enables to investigate the propagation of uncertainties through a 3-D model, which affect yield and performance of a power transistor. In our work, the Stochastic Collocation Method (SCM) has been used for this purpose. In particular, uncertainties, which result from imperfections of an industrial production, are modeled by random variables with known a priori probability density distributions, for example, a Gaussian or uniform type. Then, the Polynomial Chaos Expansion (PCE) with the basis associated to the assumed distribution can be used to construct numerical methods for a stochastic representation of the random-dependent solutions. Furthermore, this optimization is formulated in terms of statistical moments such as the mean and the variance. The gradient directions of a bi-objective cost functional is calculated using the Continuum Design Shape Sensitivity and the PCE in conjunction with the SCM. Finally, the optimization results for a relevant nanoelectronics problem demonstrate that the proposed method is robust and efficient.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

4.3 Firmware Security

Date: Tuesday 15 March 2016
Time: 17:00 - 18:30
Location / Room: Konferenz 1

Chair:
Nele Mentens, Katholieke Universiteit Leuven, BE

Co-Chair:
Aurelien Francillon, EURECOM, FR

The papers in this session tackle firmware security vulnerabilities caused by threats such as software updates and code reuse. Protection against such threats include special programming approaches, symbolic execution and authenticated encryption.

TimeLabelPresentation Title
Authors
17:004.3.1PRACTICAL EVALUATION OF CODE INJECTION IN ENCRYPTED FIRMWARE UPDATES
Speaker:
Oscar Guillen, Technische Universität München (TUM), DE
Authors:
Oscar Guillen1, Dawin Schmidt2 and Georg Sigl1
1Technische Universität München (TUM), DE; 2LMU München, DE
Abstract
Several firmware update mechanisms in microcontrollers still make use of confidentiality-only block cipher modes, ultimately lulling the users into a false sense of security. In this work we show how easy it is to apply well known malleability attacks to successfully inject arbitrary code into an encrypted firmware image. We demonstrate this vulnerability by attacking the Advanced Encryption Standard in Cipher Block Chaining mode on an ARM-based microcontroller. The attack makes use of patterns in the structure of the firmware image to obtain known-plaintexts which may be used to modify an encrypted image. Subsequently, malicious code may be injected to extract the memory contents of the device. This work shall help motivate the use of authenticated encryption modes even in resource constrained devices.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.3.2INTEGRATION OF ROP/JOP MONITORING IPS IN AN ARM-BASED SOC
Speaker:
Yunheung Paek, Seoul National University, KR
Authors:
Yongje Lee, Jinyong Lee, Ingoo Heo, Dongil Hwang and Yunheung Paek, Seoul National University, KR
Abstract
Code reuse attack (CRA) is a powerful technique that allows attackers to perform arbitrary computation by reusing the existing code fragments. To defend from CRAs while complying with the conventional ARM-based SoC design principles, the previous hardware solution suggests the use of the ARM debug interface to acquire the control flow information of an application running on the host. However, it requires tremendous storage space to store the complementary data necessary to trace the execution flow. In this paper, we propose a new hardware CRA monitor which gives both low storage overhead and high performance. For this, we have used an instrumentation technique which transforms the original ARM binary code into a form which will ease the CRA monitor to efficiently extract through the debug interface all crucial pieces of runtime information from the trace outcomes. In addition, while the previous solution was only built to detect one type of CRAs, called return-oriented programming (ROP), ours has been designed to unify the detection logics for ROP and another important type of CRAs, called jump-oriented programming (JOP). Empirical results show that our solution dramatically reduces the storage overhead for CRA detection, yet successfully detecting both ROP and JOP attacks simultaneously with negligibly low runtime overhead and moderate area overhead.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:004.3.3VERIFYING INFORMATION FLOW PROPERTIES OF FIRMWARE USING SYMBOLIC EXECUTION
Speaker:
Sharad Malik, Princeton University, US
Authors:
Pramod Subramanyan1, Sharad Malik1, Hareesh Khattri2, Abhranil Maiti2 and Jason Fung2
1Princeton University, US; 2Intel Corporation, US
Abstract
Verifying security requirements of the firmware in contemporary system-on-chip (SoC) designs is a critical challenge. There are two main difficulties in addressing this problem. Security properties like confidentiality and integrity cannot be specified with commonly-used property specification schemes like assertion-based verification/linear temporal logic (LTL). Second, firmware interacts closely with other hardware and firmware which may be untrusted/malicious and their behavior has to be correctly modelled for the verification to be sound and complete. In this paper, we propose an approach to verify firmware security properties using symbolic execution. We introduce a property specification language for information flow properties of firmware which intuitively captures the requirements of confidentiality and integrity. We also propose an algorithm based on symbolic execution to verify these properties. Evaluation on a commercial SoC design uncovered a complex security bug missed by simulation-based testing.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP2-1, 361(Best Paper Award Candidate)
ANALYZING THE IMPACT OF INJECTED SENSOR DATA ON AN ADVANCED DRIVER ASSISTANCE SYSTEM USING THE OP2TIMUS PROTOTYPING PLATFORM
Speaker:
Alexander Stühring, University of Oldenburg, DE
Authors:
Alexander Stühring1, Günter Ehmen1 and Sibylle Fröschle2
1University of Oldenburg, DE; 2OFFIS Institute for Information Technology, DE
Abstract
Modern vehicles are running complex and safety critical applications distributed over several Electronic Control Units (ECUs). Some ECUs are equipped with communication interfaces providing access to other devices, networks or remote services. Since the number of attack vectors is increasing, an early investigation of the impact of attacks becomes steadily more important. This paper gives an example how manipulated sensor data injected to the CAN bus affects an Advanced Driver Assistance System (ADAS). Within multiple experiments we illustrate the impact of different aspects like the sending rate.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:31IP2-2, 302HARDWARE TROJANS IN INCOMPLETELY SPECIFIED ON-CHIP BUS SYSTEMS
Speaker:
Nicole Fern, UC Santa Barbara, US
Authors:
Nicole Fern, Ismail San, Cetin Kaya Koc and Kwang-Ting (Tim) Cheng, UC Santa Barbara, US
Abstract
The security, functionality, and performance of the on-chip bus system is critical in an SoC design. We highlight the susceptibility of current bus implementations to Hardware Trojans hiding in unspecified functionality. Unlike existing Trojans which aim to disrupt normal bus behavior and are often designed for a specific protocol and topology, we present a general model for creating a covert Trojan communication channel between SoC components. From our channel model, which is applicable to any topology and protocol, one can create circuitry allowing information to flow covertly by altering existing bus signals only when they are unspecified. We give the specifics of this circuitry for AMBA AXI4 and APB, then create a system comprised of several master and slave units connected by an AXI4-Lite interconnect to quantify the overhead of the Trojan channel and illustrate the ability of our Trojans to evade a suite of protocol compliance checking assertions from ARM. We further outline several detection strategies for this class of hardware Trojan.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

4.4 System-Level Energy Management

Date: Tuesday 15 March 2016
Time: 17:00 - 18:30
Location / Room: Konferenz 2

Chair:
William Fornaciari, Politecnico di Milano - DEIB, IT

Co-Chair:
Soontae Kim, KAIST, KR

The goal of this session is to provide a comprehensive perspective on the design and management of power and energy, tacking the problem from several standpoints. The first paper proposes a methodology to reduce the energy consumed by OLED displays exploiting image-specific pixel-by-pixel transformations, aimed at preserving the contrast of the image as much as possible while reducing the overall power. The second paper presents an efficient Energy Management Unit (EMU) to supply generic loads when the average harvested power is much smaller than required for sustained system operation. A dynamic energy burst scaling (DEBS) technique is proposed to dynamically configure the EMU. The third paper aims at optimizing acousting monitoring by exploiting a two-stage architecture with a low power pattern recognition for feature extraction, combined with an optimized wakeup stage.

TimeLabelPresentation Title
Authors
17:004.4.1LOW-OVERHEAD ADAPTIVE CONSTRAST ENHANCEMENT AND POWER REDUCTION FOR OLEDS
Speaker:
Massimo Poncino, Politecnico di Torino, IT
Authors:
Daniele Jahier Pagliari, Massimo Poncino and Enrico Macii, Politecnico di Torino, IT
Abstract
Organic Light Emitting Diode (OLED) display panels are becoming increasingly popular especially in mobile devices; one of the key characteristics of these panels is that their power consumption strongly depends on the displayed image. In this paper, we propose a new methodology to reduce the energy consumed by OLED displays that relies on image-specific pixel-by-pixel transformations, aimed at preserving the contrast of the image as much as possible while reducing the overall power. Unlike previous approaches, our method focuses specifically on the minimization of time and power overheads to implement the image transformation at runtime. To this end, we propose a transformation that can be executed online in real time, either in software, with low time overhead, or in a hardware accelerator with a small silicon footprint. Despite the great reduction in complexity, our results are comparable to those achieved with more complex approaches in terms of image quality. Moreover, our method allows to easily explore the full quality-versus-power tradeoff by acting on a few basic parameters; thus, it enables the runtime selection among multiple display quality settings, according to the status of the system.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.4.2DYNAMIC ENERGY BURST SCALING FOR TRANSIENTLY POWERED SYSTEMS
Speaker:
Andres Gomez, ETH Zurich, US
Authors:
Andres Gomez, Lukas Sigrist, Michele Magno, Luca Benini and Lothar Thiele, ETH Zurich, CH
Abstract
Energy harvesting is generally seen to be the key to power cyber-physical systems in a low-cost, long term, efficient manner. However, harvesting has traditionally been coupled with large energy storage devices to mitigate the effects of the source's variability. The emerging class of transiently powered systems avoids this issue by performing computation only as a function of the harvested energy, minimizing the obtrusive and expensive storage element. In this work, we present an efficient Energy Management Unit (EMU) to supply generic loads when the average harvested power is much smaller than required for sustained system operation. By building up charge to a pre-defined energy level, the EMU can generate short energy bursts predictably, even under variable harvesting conditions. Furthermore, we propose a dynamic energy burst scaling (DEBS) technique to adjust these bursts to the load's requirements. Using a simple interface, the load can dynamically configure the EMU to supply small bursts of energy at its optimal power point, independent from the harvester's operating point. Extensive theoretical and experimental data demonstrate the high energy efficiency of our approach, reaching up to 73.6% even when harvesting only 110 uW to supply a load of 3.89 mW.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:004.4.3LOW-POWER MULTICHANNEL SPECTRO-TEMPORAL FEATURE EXTRACTION CIRCUIT FOR AUDIO PATTERN WAKE-UP
Speaker:
Dinko Oletic, University of Zagreb, HR
Authors:
Dinko Oletic1, Vedran Bilas1, Michele Magno2, Norbert Felber2 and Luca Benini2
1University of Zagreb, HR; 2ETH Zurich, CH
Abstract
In many distributed sensing applications, continuous sensor monitoring requires processing with a significant energy footprint, which hinders autonomous operation and battery lifetime of sensor nodes. In our research we explore the power savings gained by splitting the hardware architecture for continuous monitoring into two stages: an always-on ultra-low-power mixed-signal wake-up circuit placed near the sensor, performing coarse recognition (e.g. wake-up circuit) and waking up the main digital processing unit only on event detection. This enables for activation of energy-hungry digital processing only at the rate of event occurrence without penalising responsiveness and monitoring continuity. We focus on the wake-up circuit performing recognition of spectro-temporal audio patterns, consisting of spectro-temporal feature extraction, and the classification sub-circuits. We propose a novel design of the feature extraction circuit. It consists of a spectral decomposition multi-channel analog band-pass filter bank, implemented in generalized impedance converter topology (GIC), and the bank of passive channel detectors for measuring the intervals of in-band signals. Experimental filter characterization demonstrated the benefits of proposed filtering topology for low-power applications in the audio frequency range even with operational amplifiers of very limited bandwidth. Detector's response was verified in multi-channel environment. Preliminary analysis showed power consumption ranging from 10.5 to 13.5 µW per channel using off-the-shelf components.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

4.5 Ultra-low Energy Memory Devices

Date: Tuesday 15 March 2016
Time: 17:00 - 18:30
Location / Room: Konferenz 3

Chair:
Fabien Clermidy, CEA-Leti, FR

Co-Chair:
Walter Weber, Namlab, DE

This session explores the use of emerging memory devices for energy efficiency. The first paper proposes a compact SRAM design employing silicon-based tunnel FETs. A new type of tunneling device is also used in the second paper to build circuits designs of flip-flops and latches. Finally, an energy saving system integration of non-volatile ternary content addressable memory cells is presented in the third paper.

TimeLabelPresentation Title
Authors
17:004.5.13T-TFET BITCELL BASED TFET-CMOS HYBRID SRAM DESIGN FOR ULTRA-LOW POWER APPLICATIONS
Speaker:
Costin Anghel, Institut Supérieur d'électronique de Paris (ISEP), FR
Authors:
Navneet Gupta1, Adam Makosiej2, Andrei Vladimirescu3, Amara Amara3 and Costin Anghel3
1Institut Supérieur d'Électronique de Paris (ISEP) and CEA-Leti, FR; 2CEA-Leti, FR; 3Institut Supérieur d'Électronique de Paris (ISEP), FR
Abstract
This paper presents a TFET/CMOS hybrid SRAM architecture designed to address the requirements for ULP (Ultra-Low Power) applications, like IoT (Internet of Things). A novel 3-Transistor TFET SRAM cell is used for array while CMOS for periphery. The simulation extractions for power and speed are done including wiring and device parasitic capacitance from 4Kb SRAM designed in 28nm FDSOI CMOS process using MOSFETs & Tunnel FETs (TFETs). The proposed 3T-TFET SRAM cell supports aggressive voltage scaling without impacting data stability and allows application of performance boosting techniques without impacting cell leakage. A 0.35 fA/bit memory array leakage current was achieved showing a 14x to 10000x improvement compared with state-of-the-art TFET and CMOS SRAM bitcells. Minimum read and write access pulse is evaluated at 1.27ns at sub-1V supply voltage.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.5.2DESIGN OF LATCHES AND FLIP-FLOPS USING EMERGING TUNNELING DEVICES
Speaker:
Xunzhao Yin, University of Notre Dame, US
Authors:
Xunzhao Yin, Behnam Sedighi, Michael Niemier and Xiaobo Sharon Hu, University of Notre Dame, US
Abstract
Tunneling field-effect transistors (TFETs) stand out among novel device technologies for low-power circuits and systems. While some TFETs exhibits behavior similar to MOSFETs, a group of emerging tunneling devices including symmetric tunneling FETs (SymFETs) and interlayer tunnel FETs (IFETs) demonstrate a bell-shaped I-V characteristic dissimilar to that of MOSFETs. They have shown the potential for image processing and nontraditional computing in analog applications and the design of Boolean gates with SymFETs has also been explored. This paper uses a SymFET as a proxy to design sequential circuits comprised of devices with bell-shaped I-V characteristics. Said circuits are essential as practically any application requires the indefinite storage of data and control modules during computation. We show that the negative differential resistance (NDR) behavior of SymFET transistors can be employed to build compact and low power latches and flip-flops. The relationship of SymFET with another well-known tunneling device, namely resonant tunneling diode (RTD), is investigated. We illustrate how previous research on RTD-based circuits -- such as monostable-bistable (MOBILE) self-latching circuits and highly compact MOBILE-based D flip-flop circuits -- can be adopted to SymFETs. Our paper provides a novel path of circuit designs based on devices that have characteristics similar to SymFETs and shows that SymFETs are a promising option for image processing applications in terms of power and area.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:004.5.3MASC: ULTRA-LOW ENERGY MULTIPLE-ACCESS SIGNLE-CHARGE TCAM FOR APPROXIMATE COMPUTING
Speaker:
Tajana Rosing, UC San Diego, US
Authors:
Mohsen Imani1, Shruti Patil1 and Tajana Rosing2
1UC San Diego, US; 2University of California, San Diego, US
Abstract
Memory-based computing using associative memory has emerged as a promising solution to reduce the energy consumption of important classes of streaming applications such as multimedia by avoiding redundant computations. In associative memory, a set of frequent patterns that represent basic functions are pre-stored in ternary content addressable memory (TCAM) and reused. The primary limitation to using associative memory in modern parallel processors is the large search energy required by TCAMs. In TCAMs, all match rows, except hit rows, precharge and discharge in every search operation, resulting in high and undesirable energy consumption. In this paper, we propose a new multiple-access single-charge (MASC) TCAM architecture which is capable of searching TCAM contents multiple times with a single precharging cycle. In contrast to previous designs, the MASC TCAM keeps the match-line voltage of all miss-rows high and uses their charge for the next search operation, while only the hit rows discharge. We use a periodic refresh scheme to guarantee the accuracy of the search. We also implement a new type of approximate associative memory by setting longer refresh times for MASC TCAMs, which yields search results within 1-2 bit Hamming distances of the exact result. Our evaluation on AMD Southern Island GPU shows that using MASC associative memory can improve the average GPGPU energy efficiency by 36.6%, 40.2% and 39.4% for exact matching, selective 1-HD and 2-HD approximations respectively, with acceptable quality of service (PSNR>30dB). These energy savings are 1.8X and 1.6X higher than GPGPU using exact matching TCAM and approximation TCAM that uses voltage overscaling, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

4.6 Managing Multi-Core and Flash Memory

Date: Tuesday 15 March 2016
Time: 17:00 - 18:30
Location / Room: Konferenz 4

Chair:
Akash Kumar, Technische Universität Dresden, DE

Co-Chair:
Olivier Sentiyes, INRIA, FR

This session deals with methods to improve the management of multi- and many-core systems and flash memories. Various constraints and objectives are considered: real-time, process variation, fairness, power consumption and performance.

TimeLabelPresentation Title
Authors
17:004.6.1DISTRIBUTED FAIR SCHEDULING FOR MANY-CORES
Speaker:
Anuj Pathania, Karlsruhe Institute of Technology (KIT), DE
Authors:
Anuj Pathania1, Vanchinathan Venkataramani2, Muhammad Shafique1, Tulika Mitra2 and Jörg Henkel1
1Karlsruhe Institute of Technology (KIT), DE; 2National University of Singapore, SG
Abstract
Transition of embedded processors from multi-cores to many-cores continues unabated. Many-cores execute tens of tasks in parallel and in some contexts, it is crucial that the processing cores are distributed fairly amongst the tasks. Traditional queue-based centralized fair schedulers designed for multi-cores will have excessive overhead on many-cores due to the enlarged optimization search-space. Further, the processing requirements of executing tasks may vary under different phases of their execution necessitating lightweight dynamic fair schedulers to regularly perform partial reallocation of the cores. We introduce a distributed dynamic fair scheduler that can scale up with the increase in number of cores because it disburses the processing overhead of scheduling amongst all the cores. Based on observations made for task executions on many-cores, we propose an optimal solution under certain constraints for the fair scheduling problem, which in general is NP-Hard.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.6.2KEEP IT SLOW AND IN TIME: ONLINE DVFS WITH HARD REAL-TIME WORKLOADS
Speaker:
Kai Lampka, Uppsala University, SE
Authors:
Kai Lampka and Björn Forsberg, Uppsala University, SE
Abstract
To handle hot spots or power shortages, modern multicore processors are equipped with a supervisory dynamic thermal and power management (DTPM) system. When necessary, the DTPM system autonomously adapts the capacity of the cooling system or throttles the speed of core-local clocks via dynamic voltage and frequency scaling (DVFS) techniques. Opposed to best-effort scenarios, online DVFS with real-time workloads also needs to consider completion times of computations. Whereas execution times can be bounded adequately with worst-case estimates, arrival times of computation requests are potentially unknown. A deadline for completing a computation can easily be missed, if workloads suddenly peak and past clock speed assignments have built-up a non-negligible backlog of computations. To overcome this problem, we introduce an online DVFS management scheme which is history-aware. It operates a core at higher speed levels only if the future workload has the potential to result in timing violations, if not anticipated by rising clock speed assignments. We present an implementation of the scheme running on the Gem5 hardware simulator.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:004.6.3EXPLOITING PROCESS VARIATION FOR RETENTION INDUCED REFRESH MINIMIZATION ON FLASH MEMORY
Speaker:
Yejia Di, Chongqing University, CN
Authors:
Yejia Di1, Liang Shi1, Kaijie Wu1 and Chun Jason Xue2
1Chongqing University, CN; 2City University of Hong Kong, HK
Abstract
Solid state drives (SSDs) are becoming the default storage medium with the cost dropping of NAND flash memory. However, the cost dropping driven by the density improvement and technology scaling would bring in new challenges. One challenge is the overwhelmingly decreasing retention time. The duration of time for which the data written in flash memory cells can be read reliably is called retention time. To deal with the decreasing retention time, refresh has been highly recommended. However, refresh will seriously hurt the performance and lifetime, especially at the end life of flash memory. The second challenge is the process variation (PV). Significant PV has been observed in flash memory, which introduces large variations in the endurance of flash blocks. Blocks with high-endurance can provide long retention time, while the retention time is short for low-endurance blocks. Considering these two challenges, a novel refresh minimization scheme is proposed for lifetime and performance improvement. The main idea of the proposed approach is to allocate high endurance blocks to the data with long retention time requirement in priority. In this way, the refresh operations can be minimized. Implementation and analysis show that the overhead of the proposed work is negligible. Simulation results show that both the lifetime and performance are significantly improved over the state-of-the-art scheme.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP2-3, 253WORKLOAD-AWARE POWER OPTIMIZATION STRATEGY FOR ASYMMETRIC MULTIPROCESSORS
Speaker:
Emanuele Del Sozzo, Politecnico di Milano, IT
Authors:
Emanuele Del Sozzo, Gianluca Durelli, Ettore Trainiti, Antonio Miele, Marco Domenico Santambrogio and Cristiana Bolchini, Politecnico di Milano, IT
Abstract
Asymmetric multi-core architectures, such as the ARM big.LITTLE, are emerging as successful solutions for the embedded and mobile markets due to their capabilities to trade-off performance and power consumption. However, both the HMP scheduler integrated in the commercial products and the previous research approaches are not able to fully exploit such potentiality. We propose a new runtime resource management policy for the big.LITTLE architecture integrated in Linux aimed at optimizing the power consumption while fulfilling performance requirements specified for the running applications. Experimental results show an improvement of the 11% on the performance and at the same time 8% in peak power consumption w.r.t. the current Linux HMP solution.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:31IP2-4, 18(Best Paper Award Candidate)
THE SLOWDOWN OR RACE-TO-IDLE QUESTION: WORKLOAD-AWARE ENERGY OPTIMIZATION OF SMT MULTICORE PLATFORMS UNDER PROCESS VARIATION
Speaker:
Anup Das, University of Southampton, GB
Authors:
Anup Das, Geoff Merrett and Bashir Al-Hashimi, University of Southampton, GB
Abstract
Increasing use of high performance applications on multicore platforms has proliferated energy consumption, transforming this as a primary design optimization objective. Two widely used approaches for reducing energy consumption in multithreaded workloads are slowdown (using DVFS) and race-to-idle. In this paper, we first demonstrate that most energy efficient choice is dependent on (1) workload (memory bound, CPU bound etc.), (2) process variation and (3) support for Simultaneous Multithreading (SMT). We then propose an approach for mapping application threads on SMT multicore systems at runtime, to minimize energy consumption. The proposed approach interfaces with the operating system and hardware performance counters and timers to characterize application threads. This characterization captures the effect of process variation on execution time and identifies the break-even operating point, where one strategy (slowdown or race-to-idle) outperforms the other. Thread mapping is performed using these characterized data by iteratively collapsing application threads (SMT) followed by binary programming-based thread mapping. Finally, performance slack is exploited at run-time to select between slowdown and race-to-idle, based upon the break-even operating point calculated for each individual thread. This end-to-end approach is implemented as a run-time manager for the Linux operating system and is validated across a range of high performance applications. Results demonstrate up to 13% energy reduction over all state-of-the-art approaches, with an average of 18% improvement over Linux.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:32IP2-5, 165TOWARDS GENERAL PURPOSE COMPUTATIONS ON LOW-END MOBILE GPUS
Speaker:
Leonidas Kosmidis, Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES
Authors:
Matina Maria Trompouki1 and Leonidas Kosmidis2
1Universitat Politècnica de Catalunya, ES; 2Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES
Abstract
GPUs traditionally offer high computational capabilities, frequently higher than their CPU counterparts. While high-end mobile GPUs vendors introduced recently general purpose APIs, such as OpenCL, to leverage their computational power, the vast majority of the mobile devices lack such support. Despite that their graphics APIs have similarities with desktop graphics APIs, they have significant differences, which prevent the use of well-known techniques that offer general-purpose computations over such interfaces. In this paper we show how these obstacles can be overcome, in order to achieve general purpose programmability of these devices. As a proof of concept we implemented our proposal on a real embedded platform (Raspberry Pi) based on Broadcom's VideoCore IV GPU, obtaining a speedup of 7.2X over the CPU.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

4.7 Modeling of Devices and Mixed-Signal Circuits

Date: Tuesday 15 March 2016
Time: 17:00 - 18:30
Location / Room: Konferenz 5

Chair:
Nuno Horta, Instituto de Telecomunicacoes, PT

Co-Chair:
Jaijeet Roychowdhury, UC Berkeley, US

This session contains papers presenting surrogate models for RF inductors, compact models for bipolar transistor and nonlinear models for low power DC-DC converters.

TimeLabelPresentation Title
Authors
17:004.7.1ACCURATE SYNTHESIS OF INTEGRATED RF PASSIVE COMPONENTS USING SURROGATE MODELS
Speaker:
Fabio Passos, CSIC, Universidad de Sevilla, ES
Authors:
F. Passos, R. González-Echeverría, E. Roca, R. Castro-López and F. V. Fernández, CSIC, Universidad de Sevilla, ES
Abstract
Passive components play a key role on the design of RF CMOS integrated circuits. Their synthesis, however, is still an unsolved problem due to the lack of accurate analytical models that can replace the computationally expensive electromagnetic simulations (EM). Both, physical-based and surrogate models have been reported that fail to accurately model the complete design space of inductors. Surrogate-assisted optimization techniques, where coarse models are locally enhanced during the inductor synthesis process by using new EM-simulated points to update the model, have been proposed, but either the efficiency is dramatically decreased due to the online EM simulations or the optimization may converge to suboptimal regions. In this paper, we present a new surrogate model, valid in the entire design space with less than 1% error when compared with EM simulations. This model can be generated offline, and, when embedded within an optimization algorithm, allows the synthesis of integrated inductors with high accuracy and high efficiency, reducing the synthesis time in three orders of magnitude.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.7.2IMPLEMENTATION AND QUALITY TESTING FOR COMPACT MODELS IMPLEMENTED IN VERILOG-A
Speaker:
Anindya Mukherjee, Technische Universität Dresden, DE
Authors:
Anindya Mukherjee1, Andreas Pawlak1, Michael Schröter1, Didier Celi2 and Zoltan Huszka3
1Technische Universität Dresden, DE; 2ST, FR; 3AMS, AG, HU
Abstract
An overview on the implementation of new physical effects into the compact heterojunction bipolar transistor model HICUM/L2 is presented along with a description of quality testing procedures before its public release for production circuit design in commercial simulators. Related topics such as potential measures for model run time improvements and failures are also discussed. Significant differences in run time for different commercial circuit simulators reflect their different approaches towards compact model implementation.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:004.7.3MULTI-HARMONIC NONLINEAR MODELING OF LOW-POWER PWM DC-DC CONVERTERS OPERATING IN CCM AND DCM
Speaker:
Dani Tannir, Lebanese American University, LB
Authors:
Ya Wang1, Di Gao1, Dani Tannir2 and Peng Li1
1Texas A&M University, US; 2Lebanese American University, LB
Abstract
DC-DC converters form an essential component of modern low-power integrated circuits. This paper presents a novel nonlinear modeling technique for pulse-width modulated (PWM) DC-DC converters for low-power applications. Our enhanced model not only predicts the dc response, but also captures harmonics of arbitrary degrees. The proposed full-order model retains the inductor current as a state variable and accurately captures the circuit dynamics even in the transient state. Furthermore, by continuously monitoring state variables, our model seamlessly transitions between continuous conduction mode (CCM) and discontinuous conduction mode (DCM), which often occurs in low-power applications while also accounting for the non-idealities of the circuit devices. The proposed model, when tested with a system decoupling technique, obtains up to 10X runtime speedups over transistor-level simulations with a maximum output voltage error that never exceeds 4%.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

4.8 Presentations from IoT-Campus (I): ASIC and Sensor Solutions

Date: Tuesday 15 March 2016
Time: 17:00 - 18:30
Location / Room: Exhibition Theatre

Organiser:
Hans-Jürgen Brand, IDT/ZMDI, DE

This session features presentations given by exhibitors from the Campus on IoT and Secure Systems, with a special focus on ASIC and sensor solutions for IoT applications. A second session (7.8) will highlight how IoT will change our life and how to design IoT devices. Attendees are invited to also visit the campus booths for further details and discussions.

TimeLabelPresentation Title
Authors
17:004.8.1SENSOR-PLATFORMS FOR IOT SOLUTIONS
Speaker:
Michael Georgi, IDT/ZMDI, DE
17:304.8.2CHALLENGES IN ASIC DEVELOPMENT FOR IOT SENSOR NODES
Speaker:
Dirk Droste, Bosch Sensortec GmbH, DE
18:004.8.3INFINEON: MAKING THE INTERNET OF THINGS SMART, SECURE AND POWER EFFICIENT
Speaker:
Uwe Gäbler, Infineon Technologies, DE
18:30End of session

UB04 Session 4

Date: Tuesday 15 March 2016
Time: 17:30 - 19:30
Location / Room: Booth 15, Exhibition Area

LabelPresentation Title
Authors
UB04.1MICROTESK ARMV8 EDITION: SPECIFICATION-BASED TEST PROGRAM GENERATOR
Presenter:
Andrei Tatarnikov, Russian Academy of Sciences (RAS), RU
Authors:
Andrei Tatarnikov, Alexander Kamkin and Artem Kotsynyak, Russian Academy of Sciences (RAS), RU
Abstract
This work presents a test program generation tool for ARMv8 microprocessors. The tool consists of two parts: an architecture-independent test program generation core and ARMv8 specifications. The specifications provide information on the instruction set architecture and the memory management unit of an ARMv8 microprocessor. Test programs are generated on the basis of test templates provided by users and testing knowledge extracted from the specifications. Test templates describe scenarios to be covered in terms of test situations, while testing knowledge specifies constraints that should be satisfied in order for these situations to occur. The architecture-independent test program generation core implements a wide range of test generation techniques including random generation, combinatorial generation, constraint solving and symbolic execution. Flexible architecture of the tool allows integrating different generation methods and extending the test generation core with new engines.

Download Paper (PDF)
UB04.2RT-POWMODS: RUN-TIME CPU POWER MODELS FROM REAL DATA
Presenter:
Matthew Walker, University of Southampton, GB
Authors:
Matthew Walker1, Stephan Diestelhorst2, Andreas Hansson2, Geoff Merrett1 and Bashir Al-Hashimi1
1University of Southampton, GB; 2ARM Ltd., GB
Abstract
Being able to accurately estimate CPU power consumption is a key requirement for both controlling online CPU energy-saving techniques and design-space exploration. Models built and validated using measured data from an actual device are valuable as their accuracy is known and trusted. We present our techniques and freely available software tools for running experiments on mobile development boards and using the recorded data to build accurate run-time power models. Our novel methodology uniquely considers the stability of the model and we demonstrate how it allows the models to achieve a higher accuracy on a wider range of workloads. We show how our tools are able to predict run-time power of an ARM Cortex-A15 CPU with an average error of less than 3% when validated with over 50 workloads.

Download Paper (PDF)
UB04.3FORMAL VERIFICATION OF CLOCK DOMAIN CROSSING USING GATE-LEVEL MODELS OF METASTABLE FLIP-FLOPS
Presenter:
Ghaith Tarawneh, Newcastle University, GB
Authors:
Ghaith Tarawneh, Andrey Mokhov and Alex Yakovlev, Newcastle University, GB
Abstract
We present a first prototype of a gate-level tool that enables simple and intuitive verification of multi-clock designs. The tool's underlying methodology (described in the paper "Formal Verification of Clock Domain Crossing using Gate-level Models of Metastable Flip-Flops" to be presented in the conference) relies on transforming gate-level netlists so that they can reproduce problematic CDC behaviour digitally. Processed netlists can then be passed to formal verification tools to identify and debug CDC faults. The tool is at an early development stage but consists of a functional Verilog parser and CDC transformation functions that can be invoked from the command line. The demo will showcase the tool using simple sender-receiver circuits. Synthesized netlists will be processed by the tool and then fed to a formal verification tool to identify CDC issues (e.g. missing synchronizers, path convergence). Verification output from source and processed netlists will be compared.

Download Paper (PDF)
UB04.4GRIP: GRAPH-REWRITING-BASED IP-INTEGRATION (GRIP) - AN EDA TOOL FOR SOFTWARE DEFINED SOC DESIGN
Presenter:
Munish Jassi, Technische Universität München, DE
Authors:
Munish Jassi, Yong Hu, Jian Lyu, Daniel Mueller-Gritschneder and Ulf Schlichtmann, Technische Universität München, DE
Abstract
The GRIP tool - Graph-Rewriting-Based IP-Integration - provides system engineers with a comprehensive platform that takes care of their IP-integration concerns for IP-centric SoC designs, also referred to as SW-defined SoCs. The tool uses the standardized meta-data IP-XACT format for HW descriptions and encodes the design IP-integration knowledge as a set of integration rules based on graph rewriting and grammar theory. The tool automates and encodes the step-by-step integration of IPs to build a desired system architecture. Multiple sequential IP-integration steps can be compiled to iteratively generate new architectures. For design space exploration (DSE), constraints can be given to generate a desired subset of candidate SoCs. Code generation generates the design files for each architecture. This is demonstrated as DSE for OpenCV CV application running on a Xilinx Zynq chipset based Zedboard. GRIP additionally generates the HW-drivers for both non-OS and Linux-based systems.

Download Paper (PDF)
UB04.5A-LOOP: AMP SYSTEM WITH A DUAL-CORE ARM CORTEX A9 PROCESSOR WITH LINUX OPERATING SYSTEM AND A QUAD-CORE LEON3 PROCESSOR WITH LINUX OPERATING SYSTEM, OPENMP LIBRARY AND HARDWARE PROFILING SYSTEM
Presenter:
Giacomo Valente, Università Degli Studi Dell'Aquila, IT
Authors:
Giacomo Valente and Vittoriano Muttillo, Università Degli Studi Dell'Aquila, IT
Abstract
Isles of computational elements with different characteristics can be exploited for separate tasks with different non-functional requirements. This can drive to realization of smart System On Modules (SoM). In such a context, SoC with FPGA can be viewed as platforms useful to prototype these architectures. This demo shows a SoM prototype for aerospace applications developed on Zynq7000 SoC, composed of dual-core ARM Cortex A9 with Linux operating system (isle#1) able to interface with external data, and quad-core Leon3 with SMP Linux operating system (isle#2), able to execute parallel applications based on OpenMP library. These 2 computational isles share an external DDR memory, so that isle#1 can provide data and collect results from isle#2. Moreover, isle#1 is able to monitor performance of isle#2 without introducing software overhead (i.e. no SW instrumentation) by using a hardware profiling system. The whole system that executes a MANET localization algorithm will be presented.

Download Paper (PDF)
UB04.6IDDD: AN INTERACTIVE DEPENDABILITY DRIVEN DESIGN SPACE EXPLORATION
Presenter:
Stefan Scharoba, Brandenburg University of Technology Cottbus-Senftenberg, DE
Authors:
Stefan Scharoba, Jacob Lorenz and Heinrich T. Vierhaus, Brandenburg University of Technology Cottbus-Senftenberg, DE
Abstract
Due to the downscaling of transistor feature sizes, today's integrated circuits are much more likely to be affected by transient or permanent faults. In order to still meet certain dependability requirements, many different fault tolerance techniques have been developed, which can handle these faults in the field. Each of these techniques is associated with distinct costs and benefits. As a consequence, finding the fault tolerant implementation of the system that meets the actual requirements best represents a challenging task. We propose a tool that supports this process. It offers a set of hardware based fault tolerance techniques that can be applied to a given VHDL model. Afterwards, costs and benefits of the respective design choice are estimated automatically. Thus several fault tolerant versions of the design can be evaluated and compared with each other without implementing them manually. Finally, the VHDL code of the preferred design candidate can be generated by the tool.

Download Paper (PDF)
UB04.7CONTREP: A SINGLE-SOURCE FRAMEWORK FOR UML-BASED MODELLING AND DESIGN OF MIXED-CRITICALITY SYSTEMS
Presenter:
Fernando Herrera, University of Cantabria, ES
Authors:
Fernando Herrera and Eugenio Villar, University of Cantabria, ES
Abstract
Mixed-criticality systems integrate applications, platform resources and requirements with different criticality. A criticality reflects the impact of either a failure of a component or a violation of a requirement, which can range from irrelevant to catastrophic effects. This booth presents the CONTREP framework, which supports UML/MARTE based modeling, analysis and design of mixed-criticality embedded systems. The booth shows a model of a quadcopter control system which integrates safety critical (e.g. flight control), mission-critical (e.g., a video processing payload), and non-critical (e.g., monitoring) functions. The booth shows how mixed-criticality is captured, together with the description of the functional architecture, and of the multi-core embedded platform where the system is implemented; how CONTREP automates different design activities, i.e. model validation, performance assessment and design space exploration, exploiting mixed-criticality information in every case.

Download Paper (PDF)
UB04.8GPCDS: AN INTERACTIVE TOOL FOR CREATING SCHEMATIC MODULE GENERATORS IN ANALOG IC DESIGN
Presenter:
Matthias Greif, Reutlingen University, DE
Authors:
Matthias Greif and Juergen Scheible, Reutlingen University, DE
Abstract
While digital design automation is highly developed, analog design automation still remains behind the demands. Previous approaches of circuit creation, which are usually based on optimization algorithms, do not satisfy industrial requirements. A promising alternative is given by procedural approaches, which imitate the solution strategy of a human expert. We are working on parameterized generators (such as PCells) for analog circuit and layout modules, special kinds of such procedures. We present "gPCDS", a novel tool for the creation of schematic generators for analog circuit design. Associated with a common design environment, gPCDS offers a sophisticated interactive design flow for the development of schematic PCells. gPCDS thus substitutes the crucial process of manual code writing by an intuitive graphic-based way of schematic PCell creation. The GUI of gPCDS provides a variety of useful functions, such as defining parameter ranges or placing predefined building blocks.

Download Paper (PDF)
UB04.9ANALYSIS AND VERIFICATION OF COMMUNICATION FABRICS
Presenter:
Frank Burns, Newcastle University, GB
Authors:
Frank Burns, Danil Sokolov and Alex Yakovlev, Newcastle University, GB
Abstract
xMASCraft is a tool for visual modelling, analysis and verification of GALS xMAS circuits. The tool is based on a structured approach which provides unique visual feedback about complex deadlocks occurring at both global and local levels. The deadlocks are identified by a novel unfolding algorithm that relies on structured occurrence nets driven by synchronisation policy. For deadlock analysis a new representation is used based on blocking/idle relations through which relational analysis can be made based on querying. This is fed back to the interface in the form of unique textual/graphical feedback providing detailed relational information. This enables enhanced visualisation of the causality of the deadlocks to be worked out. In particular it reveals vulnerable parts of the system which are susceptible to shut down, point-to-point causes of deadlock occurring between different modules and the original sources of deadlocks.

Download Paper (PDF)
UB04.10WORKCRAFT: FRAMEWORK FOR INTERPRETED GRAPHS
Presenter:
Danil Sokolov, Newcastle University, GB
Author:
Danil Sokolov, Newcastle University, GB
Abstract
A large number of models that are employed in the field of concurrent systems' design, such as Petri nets, gate-level circuits, dataflow structures, etc. - all have an underlying static graph structure. Their semantics, however, is defined using additional entities, e.g. tokens or node/arc states, which in turn form the overall state of the system. We jointly refer to such formalisms as interpreted graph models (IGMs). Workcraft is designed to provide a flexible common framework for development of IGMs, including visual editing, (co)simulation and analysis. The similarities between the IGMs allow for links between different formalisms to be created, either by means of adapter interfaces or by conversion from one model type into another. This greatly extends the range of applicable modelling and analysis techniques.

Download Paper (PDF)
19:30End of session

Exhibition-Reception

Date: Tuesday 15 March 2016
Time: 18:30 - 19:30
Location / Room: Exhibition Area (Terrace Level)

The Exhibition Reception will take place on Tuesday, March 15, 2016, from 1830 - 1930 in the exhibition area (Terrace Level), where free drinks are offered for all conference delegates and exhibition visitors.

TimeLabelPresentation Title
Authors
19:30End of session

5.1 SPECIAL DAY Hot Topic: Building Confidence in Advanced Driver Assistance Systems

Date: Wednesday 16 March 2016
Time: 08:30 - 10:00
Location / Room: Saal 2

Organisers:
Wolfgang Ecker, Infineon Technologies, DE
Samarjit Chakraborty, Technische Universität München (TUM), DE

Chair:
Sebastian Steinhorst, TUM CREATE, SG

Co-Chair:
Kai Lampka, Uppsala University, SE

With the recent evolutions of nanometer transistor technologies, power consumption emerged as the most critical limitation. Within advanced processors and computing architectures, the processor-memory communication accounts for a significant part of the energy requirement. While alternative design approaches, such as the use of optimized accelerators or advanced power management techniques are successfully employed in contemporary designs, the trend keeps worsening due to the ever-increasing gap between on-chip and off-chip memory data rates. This trend, known as Von Neumann bottleneck, not only limits the system performance, but also acts nowadays as a limiter of the energy scaling. The quest towards more energy efficiency requires solutions that solve the Von Neumann bottleneck by tightly intertwining computing with memories. In this hot topic session, we intend to elaborate on in-memory computing by identifying its current applications and its promises in light of emerging technologies. In-memory computing is considered here in the general sense of computing information locally within large data storage. Four talks will be provided. The first talk will cover the current industrial applications of in-memory computing to achieve energy efficient acceleration. The three other talks will explore the opportunities of in-memory systems realized with emerging technologies. In particular, we will see how the memristor theory can benefit to Cellular Neural Network (CNN). We will also dig into the recently introduced concept of memcomputing that promises to speed up the execution of NP-complete problems. Finally, we will present a novel computer architecture that relies on resistive memory elements to compute and store information.

TimeLabelPresentation Title
Authors
08:305.1.1AVAILABILITY AND INTERPRETABILITY OF OPTIMAL CONTROL FOR CRITICALITY ESTIMATION IN VEHICLE ACTIVE SAFETY
Speaker:
Wolfgang Utschick, Universität München (TUM), DE
Authors:
Stephan Herrmann and Wolfgang Utschick, Technische Universität München (TUM), DE

Download Paper (PDF; Only available from the DATE venue WiFi)
09:005.1.2SAFETY ANALYSIS ON MULTIPLE ABSTRACTION LEVELS
Speaker:
Wolfgang Ecker, Infineon Technologies, DE
Authors:
Bogdan-Andrei Tabacaru, Moomen Chaari, Wolfgang Ecker, Thomas Kruse and Cristiano Novello, Infineon Technologies, DE
09:305.1.3DEEP LEARNING IN ADVANCED DRIVER ASSISTANCE SYSTEMS
Speaker and Author:
Qing Rao, Daimler AG, DE
10:00End of session
Coffee Break in Exhibition Area

5.2 Hot Topic: In-memory Computing: Status and Trends

Date: Wednesday 16 March 2016
Time: 08:30 - 10:00
Location / Room: Konferenz 6

Organiser:
Pierre-Emmanuel Gaillardon, University of Utah, Salt Lake City, US

Chair:
Ian O'Connor, Institute des Nanotechnologies de Lyon, Ecully, FR

Co-Chair:
Michael Niemier, University of Notre Dame, South Bend, US

With the recent evolutions of nanometer transistor technologies, power consumption emerged as the most critical limitation. Within advanced processors and computing architectures, the processor-memory communication accounts for a significant part of the energy requirement. While alternative design approaches, such as the use of optimized accelerators or advanced power management techniques are successfully employed in contemporary designs, the trend keeps worsening due to the ever-increasing gap between on-chip and off-chip memory data rates. This trend, known as Von Neumann bottleneck, not only limits the system performance, but also acts nowadays as a limiter of the energy scaling. The quest towards more energy-efficiency requires solutions that solve the Von Neumann bottleneck by tightly intertwining computing with memories. In this hot topic session, we intend to elaborate on in-memory computing by identifying its current applications and its promises in light of emerging technologies. In-memory computing is considered here in the general sense of computing information locally within large data storage. Four talks will be provided. The first talk will cover the current industrial applications of in-memory computing to achieve energy efficient acceleration. The three other talks will explore the opportunities of in-memory systems realized with emerging technologies. In particular, we will see how the memristor theory can benefit to Cellular Neural Network (CNN). We will also dig into the recently introduced concept of memcomputing that promises to speed up the execution of NP-complete problems. Finally, we will present a novel computer architecture that relies on resistive memory elements to compute and store information.

TimeLabelPresentation Title
Authors
08:305.2.1SOFTWARE AND SYSTEM CO-OPTIMIZATION IN THE ERA OF HETEROGENEOUS COMPUTING
Speaker and Author:
Ruchir Puri, IBM, US
Abstract
Escalating costs of semiconductor technology and its lagging performance relative to historic trends is motivating acceleration and specialization as more impactful means to increase system value. Targeted specialization is being increasingly pursued as an important way to achieve dramatic improvements in workload acceleration. This requires a broad understanding of workloads, system structures, and algorithms to determine what to accelerate / specialize, and how, i.e., via SW?; via HW?; or via SW+HW? which presents many choices, necessitating co-optimization of SW and HW. In this talk, we will focus on an application driven approach to software and system co-optimization, based on inventing new software algorithms, that have strong affinity to hardware acceleration. A High Level design methodology that is needed to enable targeted specialization in hardware will also be described.
08:525.2.2FADING MEMORY EFFECTS IN A MEMRISTOR FOR CELLULAR NANOSCALE NETWORK APPLICATIONS
Speaker:
Alon Ascoli, Technische Universität Dresden, DE
Authors:
Alon Ascoli1, Ronald Tetzlaff1, Leon O. Chua2, John Paul Strachan3 and R. Stanley Williams3
1Technische Universität Dresden, DE; 2University of California, Berkeley, US; 3Hewlett Packard Labs, US
Abstract
CNN based analogic cellular computing is a unified paradigm for universal spatio-temporal computation with several applications in a large number of different fields of research. By endowing CNN with local memory, control, and communication circuitry, many different hardware architectures with stored programmability, showing an enormous computing power - trillion of operations per second may be executed on a single chip -, have been realized. The complex spatio-temporal dynamics emerging in certain CNN may lead to the development of more efficient information processing methods as compared to conventional strategies. Memristors exhibit a rich variety of nonlinear behaviours, occupy a negligible amount of integrated circuit area, consume very little power, are suited to a massivelyparallel data flow, and may combine data storage with signal processing. As a result, the use of memristors in future CNN based computing structures may improve and/or extend the functionalities of state-of-the art hardware architectures. This contribution provides a detailed analysis of the system-theoretic model of a tantalum oxide memristor, in view of its potential adoption for the implementation of synaptic operators in CNN architectures.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:145.2.3DIGITAL MEMCOMPUTING MACHINES
Speaker:
Fabio L. Traversa, University of California San Diego, US
Authors:
Massimiliano Di Ventra and Fabio L. Traversa, UC San Diego, US
Abstract
In this contribution we will discuss the digital, hence scalable, version of memcomputing machines. These are non-Turing machines that use memory to both process and store information at the same physical location. We will introduce their mathematical definition and provide as an example their implementation of an inverse three-bit sum gate using selforganizable logic gates, namely gates that organize dynamically to satisfy their logical propositions.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:365.2.4THE PROGRAMMABLE LOGIC-IN-MEMORY (PLIM) COMPUTER
Speaker:
Pierre-Emmanuel Gaillardon, University of Utah, US
Authors:
Pierre-Emmanuel Gaillardon1, Luca Amaru2, Anne Siemon3, Eike Linn3, Rainer Waser3, Anupam Chattopadhyay4 and Giovanni De Micheli2
1University of Utah, US; 2École Polytechnique Fédérale de Lausanne (EPFL), CH; 3RWTH Aachen University, DE; 4Nanyang Technological University, SG
Abstract
Realization of logic and storage operations in memristive circuits have opened up a promising research direction of in-memory computing. Elementary digital circuits, e.g., Boolean arithmetic circuits, can be economically realized within memristive circuits with a limited performance overhead as compared to the standard computation paradigms. This paper takes a major step along this direction by proposing a fully-programmable in memory computing system. In particular, we address, for the first time, the question of controlling the in-memory computation, by proposing a lightweight unit managing the operations performed on a memristive array. Assembly-level programming abstraction is achieved by a natively-implemented majority and complement operator. This platform enables diverse sets of applications to be ported with little effort. As a case study, we present a standardized symmetric-key cipher for lightweight security applications. The detailed system design flow and simulation results with accurate device models are reported validating the approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area

5.3 Physical Attacks and Countermeasures

Date: Wednesday 16 March 2016
Time: 08:30 - 10:00
Location / Room: Konferenz 1

Chair:
Berndt Gammel, Infineon Technologies, DE

Co-Chair:
Francesco Regazzoni, ALaRI, CH

This session presents recent improvements on physical attacks and countermeasures. Papers discuss how to reconstruct the logic function of a camouflaged circuit, propose sensors allowing to detect injection electromagnetic pulses and countermeasures against fault attacks implemented at register transfer level.

TimeLabelPresentation Title
Authors
08:305.3.1ORACLE-GUIDED INCREMENTAL SAT SOLVING TO REVERSE ENGINEER CAMOUFLAGED LOGIC CIRCUITS
Speaker:
Daniel Holcomb, University of Massachusetts, Amherst, US
Authors:
Duo Liu, Cunxi Yu, Xiangyu Zhang and Daniel Holcomb, University of Massachusetts, Amherst, US
Abstract
Layout-level gate camouflaging has attracted interest as a countermeasure against reverse engineering of combinational logic. In order to minimize area overhead, typically only a subset of gates in a circuit are camouflaged, and each camouflaged gate layout can implement a few different logic functions. The security of camouflaging relies on the difficulty of learning the overall combinational logic function without knowing which logic functions the camouflaged gates implement. In this paper, we present an incremental-SAT approach to reconstruct the logic function of a circuit with camouflaged gates. Our algorithm uses the standard attacker model in which an adversary knows only the non-camouflaged gate functions, and has the ability to query the circuit to learn the correct output vector for any input vector. Our results demonstrate an order-of-magnitude speedup over the best existing deobfuscation algorithm. Beyond demonstrating speedup, we use our powerful approach to produce new insights about the strength of obfuscation. First we show that deobfuscation is feasible even in the more challenging setting where layout reveals nothing about the possible logic function of camouflaged gates. Additionally, our results question the common wisdom that strong obfuscation should maximize output corruption under incorrect deobfuscation hypotheses. We show that obfuscation decisions that maximize output corruption actually result in an easier deobfuscation problem.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:005.3.2A FULLY-DIGITAL EM PULSE DETECTOR
Speaker:
David El-baze, Mines Saint-Etienne, FR
Authors:
David El-Baze1, Jean-Baptiste Rigaud1 and Philippe Maurine2
1Mines Saint-Etienne, FR; 2LIRMM, FR
Abstract
ElectroMagnetic Pulse Injection (EMPI) has recently been demonstrated to be an efficient fault injection technique with many advantages especially when considering security issues of Systems on Chip (SoC) embedded on ball grid array packages, i.e. when adversaries do not have an easy access to the backside. EMPI must therefore be considered as a real threat against smartcards and SoC from now on. Among the usual countermeasures against fault attacks, one can identify the use of embedded sensors. If one can find voltage glitch or laser shot detectors in the literature, there is only one proposal which puts forward the idea of detecting ElectroMagnetic Pulse (EMP). However, this former sensor requires a fine tuning of some timing characteristics and, as a result, its use appears complex and even impractical with SoCs which are heterogeneous by nature and designed by worldwide teams. Within this context, this paper introduces and experimentally validates a new sensor allowing to detect EMP. Because the sensor is fully digital, it is low cost and above all fully compliant with the standard design flow of SoC.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:305.3.3ON THE DEVELOPMENT OF A NEW COUNTERMEASURE BASED ON A LASER ATTACK RTL FAULT MODEL
Speaker:
Athanasios Papadimitriou, Univ. Grenoble Alpes, LCIS F-26000, Valence, FR
Authors:
Charalampos Ananiadis1, Athanasios Papadimitriou1, David Hely1, Vincent Beroulle1, Regis Leveugle2 and Paolo Maistri3
1Univ. Grenoble Alpes, LCIS, F-26000, Valence, FR; 2Univ. Grenoble Alpes, TIMA, F-38000, Grenoble, FR; 3CNRS, TIMA, F-38000, Grenoble, FR
Abstract
Secure integrated circuits that implement cryptographic algorithms (e.g., AES) require protection against laser attacks. The goal of such attacks is to inject errors during the computation and then use these errors to retrieve the secret key. Laser attacks can produce single or multiple-bit errors, but have a local and usually transient impact in the circuit. In order to detect such attacks, countermeasures must take into account the circuit implementation. This paper proposes a countermeasure implemented at the Register Transfer Level (RTL) according to a previously proposed laser attack RTL fault model. The efficiency of the implemented countermeasure is evaluated on a case study in terms of area overhead, error detection rates at RTL and fault detection capabilities with respect to layout information.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP2-6, 926ESTIMATING DELAY DIFFERENCES OF ARBITER PUFS USING SILICON DATA
Speaker:
Keshab Parhi, University of Minnesota, US
Authors:
Satya Venkata Sandeep Avvaru, Chen Zhou, Saroj Satapathy, Yingjie Lao, Chris Kim and Keshab Parhi, University of Minnesota, US
Abstract
This paper presents a novel approach to estimate delay differences of each stage in a standard MUX-based physical unclonable function (PUF). Test data collected from PUFs fabricated using 32 nm process are used to train a linear model. The delay differences of the stages directly correspond to the model parameters. These parameters are trained by using a least mean square (LMS) adaptive algorithm. The accuracy of the response using the proposed model is around 97.5% and 99.5% for two different PUFs. Second, the PUF is also modeled by a perceptron. The perceptron has almost 100% classification accuracy. A comparison shows that the perceptron model parameters are scaled versions of the model derived by the LMS algorithm. Thus, the delay differences can be estimated from the perceptron model where the scaling factor is computed by comparing the models of the LMS algorithm and the perceptron. Because the delay differences are challenge independent, these parameters can be stored on the server. This will enable the server to issue random challenges whose responses need not be stored. An analysis of the proposed model confirms that the delay differences of all stages of the PUFs on the same chip belong to the same Gaussian probability density function.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP2-7, 153ON THE USE OF FORWARD BODY BIASING TO DECREASE THE REPEATABILITY OF LASER-INDUCED FAULTS
Speaker:
Marc Lacruche, Ecole Nationale Supérieure des Mines de Saint Etienne (ENSM-SE), FR
Authors:
Marc Lacruche1, Noemie Beringuier-Boher1, Jean-Max Dutertre1, Jean-Baptiste Rigaud1 and Edith Kussener2
1Ecole Nationale Supérieure des Mines de Saint Etienne (ENSM-SE), FR; 2IM2NP, FR
Abstract
This paper presents a study on the effect of Forward Body Biasing on the laser fault sensitivity of a CMOS 90nm microcontroller. Tests were performed on a register of this target, under several supply voltage and body bias settings, showing significant laser sensitivity variations. Based on these results, a method which aims at decreasing fault repeatability by using variable supply voltage and body bias settings is proposed. Finally, tests are performed on an implementation of this method on a temporally redundant AES and the results are presented.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area

5.4 Architectural-level Low-power Design

Date: Wednesday 16 March 2016
Time: 08:30 - 10:00
Location / Room: Konferenz 2

Chair:
Alberto Macii, Politecnico di Torino, IT

Co-Chair:
Pascal Vivet, CEA LETI, FR

This session will demonstrate some new techniques to minimize power consumption at architectural level. The first paper will present a 2-story power distribution network applied to a GPU. The technique is extended from circuit to architecture level. The workload is evenly partitioned between the cores so that the power network is never unbalanced. The second paper demonstrate many different write-assist techniques on a 4T SRAM structure in a dual-Vt Fin-FET technology. Those techniques are efficiently evaluated and applied to the 4T structure. The third paper of this session will focus on reliability issues due to dark silicon in processors. A new physical-based EM reliability will be presented to come up with a Q-learning methods to minimize the overall power consumption. Finally, an IP presentation will present two algorithms to detect and remove redundant resets for all registers in the design in one pass, saving design effort for RTL designers. This technique is demonstrated on multiple process technologies showing the impact on power and area.

TimeLabelPresentation Title
Authors
08:305.4.1MULTI-STORY POWER DISTRIBUTION NETWORKS FOR GPUS
Speaker:
Mark Gottscho, UCLA, US
Authors:
Qixiang Zhang1, Liangzhen Lai2, Mark Gottscho3 and Puneet Gupta3
1Zhejiang University, CN; 2ARM/UCLA, US; 3UCLA, US
Abstract
High-performance chips require many power pins to support large currents, which increases fabrication cost, limits scalability, and degrades power efficiency. Multi-story serial power distribution networks (PDNs) are a promising approach to reducing pin counts and power losses. We study the feasibility of 2-story PDNs for graphics processing units (GPUs). These PDNs use either an auxiliary off-chip regulator or integrated on-die supercapacitors to stabilize the virtual rail voltage. Static SIMT thread scheduling (SSTS) and dynamic current compensation (DCC) can reduce transient impedance mismatch when the auxiliary regulator is omitted. Simulation results show that compared to a traditional 1-story design, our 2-story GPU architectures can reduce the required number of core power pins by up to 2X, power losses in the PDN by up to 3.6X, and/or maximum voltage swing by up to 2X without any performance degradation. Our results demonstrate the efficiency and cost advantages of multi-story PDNs for GPUs without any impact on performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:005.4.2ENERGY-EFFICIENT CACHE MEMORIES USING A DUAL-VT 4T SRAM CELL WITH READ-ASSIST TECHNIQUES
Speaker:
Massoud Pedram, University of Southern California, US
Authors:
Alireza Shafaei Bejestan and Massoud Pedram, University of Southern California, US
Abstract
In order to improve the energy-efficiency of cache memories, this paper presents a static random access memory (SRAM) cell composed of four transistors using dual-Vt FinFET devices. The proposed 4T SRAM cell is designed by (i) removing pull-down transistors of the standard 6T SRAM, and (ii) using low-leakage high-Vt devices for pull-up transistors and fast low-Vt devices for access transistors. This dual-Vt design simultaneously improves hold and write characteristics, but results in a destructive read operation. Accordingly, read-assist techniques are employed to ensure a non-destructive and robust read operation. A selective row address decoder is also proposed to prevent the undesired write operation in half-selected cells. The 4T SRAM cell compared with the all-single-fin 6T counterpart has a 25% smaller layout area with an aspect ratio closer to one. Furthermore, using 7nm FinFET devices with a nominal supply voltage of 0.45V, the 4T SRAM cell achieves 3.5X lower cell leakage power. Because of these features, the energy consumption of a 32KB L1 (256KB L2) cache memory using 4T SRAM cell compared with its 6T counterpart is reduced by 18% (2X), with 35% (19%) higher cache access frequency.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:305.4.3LEARNING-BASED DYNAMIC RELIABILITY MANAGEMENT FOR DARK SILICON PROCESSOR CONSIDERING EM EFFECTS
Speaker:
Sheldon X.-D. Tan, University of California, Riverside, US
Authors:
Taeyoung Kim1, Xin Huang1, Hai-Bao Chen2, Valeriy Sukharev3 and Sheldon X.-D. Tan1
1University of California, Riverside, US; 2Shanghai Jiao Tong University, CN; 3Mentor Graphics Corporation, US
Abstract
In this article, we propose a new dynamic reliability management (DRM) technique for emerging dark silicon manycore processors. We formulate our DRM problem as minimizing the energy consumption subject to the reliability, performance and thermal constraints. The new approach is based on a newly proposed physics-based electromigration (EM) reliability model to predict the EM reliability of full-chip power grid networks. We consider thermal design power (TDP) as the power constraint for a dark silicon manycore processor. We employ both dynamic voltage and frequency scaling (DVFS) and dark silicon core using ON/OFF pulsing action as the two control knobs. To solve the problem, we apply the adaptive Q-learning based method, which is suitable for runtime operation as it can provide cost-effective yet good solutions. A large class of multithreaded applications is used as the benchmark to validate and compare the proposed dynamic reliability management methods. Experimental results on a 64-core dark silicon chip show that the proposed DRM algorithm can effectively reduce the energy consumption of a dark silicon manycore system when the system is not tightly constrained. The proposed method can outperform a simple global DVFS method significantly in this case.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP2-8, 245SEQUENTIAL ANALYSIS DRIVEN RESET OPTIMIZATION TO IMPROVE POWER, AREA AND ROUTABILITY
Speaker:
Srihari Yechangunja, Mentor Graphics Corporation, IN
Authors:
Srihari Yechangunja1, Raj Shekhar1, Mohit Kumar1, Nikhil Tripathi1, Abhishek Ranjan1, Abhishek Mittal1, Jianfeng Liu2, Minyoung Mo2, Kyungtae Do2, Jung Yun Choi2 and SungHo Park2
1Mentor Graphics Corporation, IN; 2S.LSI, Samsung Electronics Co. Ltd, KR
Abstract
Resets are required in the design to initialize the hardware for system operation and to force it into a known state for simulation or to recover from an error. Given the increasing design complexity and time-to-market pressures, figuring out the registers which do not require resets is extremely challenging. In this paper, we present a novel algorithm which uses observability based sequential analysis to identify the registers in design which do not require resets. With the proposed algorithm, we have seen that in some cases 70% registers in the design can have redundant resets. Further, with removal of the redundant resets on registers up to 22% sequential power savings and up to 3% area reduction post-layout can be obtained.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area

5.5 Alternative Computing Models

Date: Wednesday 16 March 2016
Time: 08:30 - 10:00
Location / Room: Konferenz 3

Chair:
Yiyu Shi, University of Notre Dame, US

Co-Chair:
Sébastien Le Beux, Ecole Centrale de Lyon, FR

The approximate nature of neuromorphic / machine learning approaches is explored from several perspectives. Two works focus on modeling techniques and tools for such architectures, while the third leverages approximate metrics of classification difficulty to trade between accuracy and classification cost.

TimeLabelPresentation Title
Authors
08:305.5.1MNSIM: SIMULATION PLATFORM FOR MEMRISTOR-BASED NEUROMORPHIC COMPUTING SYSTEM
Speaker:
Lixue Xia, Tsinghua University, CN
Authors:
Lixue Xia1, Boxun Li1, Tianqi Tang1, Peng Gu2, Xiling Yin1, Wenqin Huangfu1, Pai-Yu Chen3, Shimeng Yu3, Yu Cao3, Yu Wang1, Yuan Xie2 and Huazhong Yang1
1Tsinghua University, CN; 2UC Santa Barbara, US; 3Arizona State University, US
Abstract
Memristor-based neuromorphic computing system provides a promising solution to significantly boost the power efficiency of computing system. Memristor-based neuromorphic computing system has a wide range of design choices, such as the various memristor crossbar cell designs and different parallelism degrees of peripheral circuits. However,a memristor-based neuromorphic computing system simulator, which is able to model the system and realize an early-stage design space exploration, is still missing. In this paper, we develop a memristor- based neuromorphic system simulation platform (MNSIM). MNSIM proposes a general hierarchical structure for memristor-based neuro- mophic computing system, and provides flexible interface for users to customize the design. MNSIM also provides a detailed reference design for large-scale applications. MNSIM embeds estimation models of area, power, and latency to simulate the performance of system. To estimate the computing accuracy of memristor crossbar, MNSIM proposes a behavior-level model between computing error rate and crossbar design parameters considering the influence of interconnect lines and non- ideal device factors. The error rate between our accuracy model and SPICE simulation result is less than 1%. Experimental results show that MNSIM achieves more than 7000 times speed-up compared with SPICE and obtains reasonable accuracy (more than 90%). MNSIM can further estimate the trade-off between computing accuracy, energy, latency, and area among different designs for optimization.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:005.5.2CONDITIONAL DEEP LEARNING FOR ENERGY-EFFICIENT AND ENHANCED PATTERN RECOGNITION
Speaker:
Priyadarshini Panda, Purdue University, US
Authors:
Priyadarshini Panda, Abhronil Sengupta and Kaushik Roy, Purdue University, US
Abstract
Deep learning neural networks have emerged as one of the most powerful classification tools for vision related applications. However, the computational and energy requirements associated with such deep nets can be quite high, and hence their energy-efficient implementation is of great interest. Although traditionally the entire network is utilized for the recognition of all inputs, we observe that the classification difficulty varies widely across inputs in real-world datasets; only a small fraction of inputs require the full computational effort of a network, while a large majority can be classified correctly with very low effort. In this paper, we propose Conditional Deep Learning (CDL) where the convolutional layer features are used to identify the variability in the difficulty of input instances and conditionally activate the deeper layers of the network. We achieve this by cascading a linear network of output neurons for each convolutional layer and monitoring the output of the linear network to decide whether classification can be terminated at the current stage or not. The proposed methodology thus enables the network to dynamically adjust the computational effort depending upon the difficulty of the input data while maintaining competitive classification accuracy. We evaluate our approach on the MNIST dataset. Our experiments demonstrate that our proposed CDL yields 1.91x reduction in average number of operations per input, which translates to 1.84x improvement in energy. In addition, our results show an improvement in classification accuracy from 97.5% to 98.9% as compared to the original network.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:305.5.3PROBABILISTIC ERROR MODELS FOR MACHINE LEARNING KERNELS IMPLEMENTED ON STOCHASTIC NANOSCALE FABRICS
Speaker:
Sai Zhang, University of Illinois at Urbana-Champaign, US
Authors:
Sai Zhang and Naresh Shanbhag, University of Illinois at Urbana-Champaign, US
Abstract
Presented in this paper are probabilistic error models for machine learning kernels implemented on low-SNR circuit fabrics where errors arise due to voltage overscaling (VOS), process variations, or defects. Four different variants of the additive error model are proposed that describe the error probability mass function (PMF): additive over Reals Error Model with independent Bernoulli RVs (REM-i), additive over Reals Error Model with joint Bernoulli RVs (REM-j), additive over Galois field Error Model with independent Bernoulli RVs (GEM-i), and additive over Galois field Error Model with joint Bernoulli RVs (GEM-j). Analytical expressions for the error PMF, mean and variance are derived. Kernel level model validation is accomplished by comparing the Jensen-Shannon divergence D_{JS} between the modeled PMF and the PMFs obtained via HDL simulations in a commercial 45nm CMOS process of MAC units used in a 2nd order polynomial support vector machine (SVM) to classify data from the UCI machine learning repository. Results indicate that at the MAC unit level, D_{JS} for the GEM-j models are 1-to-2-orders-of-magnitude lower (better) than the REM models for VOS and process variation errors. However, when considering errors due to defects, D_{JS} for REM-j is between 1-to-2-orders-of-magnitude lower than the others. Performance prediction of the SVM using these models indicate that when compared with Monte Carlo with HDL generated error statistics, probability of detection p_{det} estimated using GEM-j is within 3% for VOS error when the error rate <= 80%, and within 5% for process variation error when supply voltage V_{dd} is between 0.3V and 0.7V. In addition, p_{det} using REM-j is within 2% for defect errors when the defect rate (the percentage of circuit nets subject to stuck-at-faults) p_{saf} is between 10^{-3} and 0.2. .

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP2-9, 173EFFICIENT GLOBAL OPTIMIZATION OF MEMS BASED ON SURROGATE MODEL ASSISTED EVOLUTIONARY ALGORITHM
Speaker:
Bo Liu, Glyndwr University, GB
Authors:
Bo Liu1 and Anna Nikolaeva2
1Glyndwr University, GB; 2Bauman Moscow State Technical University, RU
Abstract
Optimization plays a key role in MEMS design. However, most MEMS design optimization (exploration) methods either depend on ad-hoc analytical / behavioural models or time consuming numerical simulations. Surrogate modeling techniques have been introduced to integrate generality and efficiency, but the number of design variables which can be handled by most existing efficient MEMS design optimization methods is often less than 5. To address the above challenges, a new method, called Adaptive Gaussian Process-Assisted Differential Evolution for MEMS Design Optimization (AGDEMO) is proposed. The key idea is the proposed ON-LINE adaptive surrogate model assisted optimization framework. In particular, AGDEMO performs global optimization of MEMS using numerical simulation and the differential evolution (DE) algorithm, and a Gaussian process surrogate model is constructed ON-LINE to predict the results of expensive numerical simulations. AGDEMO is tested by two actuators (both with 9 design variables). Comparisons with state-of-the-art methods verify advantages of AGDEMO in terms of efficiency, optimization capacity and scalability.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area

5.6 Efficient System Modeling with SystemC

Date: Wednesday 16 March 2016
Time: 08:30 - 10:00
Location / Room: Konferenz 4

Chair:
Gunar Schirner, Northeastern University, US

Co-Chair:
Christian Haubelt, University of Rostock, DE

SystemC has become an important tool to enable system-level modeling and simulation for early concept validation, design space exploration and virtual prototyping. However, the predominant single-threaded discrete event kernels used for its simulation limit efficiency and applicability in modeling of large systems. This session features two research papers that explore different techniques for speeding up simulation by means of parallelizing the kernel and by exploiting properties of the time-decoupled modeling approach. The third paper investigates a new modeling technique and SystemC extension to enable fast and accurate simulation of analog/mixed signal systems.

TimeLabelPresentation Title
Authors
08:305.6.1A NEW PARALLEL SYSTEMC KERNEL LEVERAGING MANYCORE ARCHITECTURES
Speaker:
Nicolas Ventroux, CEA LIST, FR
Authors:
Nicolas Ventroux and Tanguy Sassolas, CEA LIST, FR
Abstract
The complexity of system-level modeling is continuously increasing. Electronic System Level (ESL) design requires fast simulation techniques to control future SoC development cost and time-to-market. However, SystemC simulations are sequential and then limited by single-thread performance. In this paper, we present a new parallel SystemC kernel that efficiently leverages the multiple cores of a host machine, reaching high simulation performance without relaxing accuracy. It supports atomic parallel evaluation of SystemC processes and repeatable execution for HW/SW debugging. This new kernel is fully compliant with existing standards and easy to integrate in any existing SystemC model. Evaluations show a maximum acceleration of 34x compared to Accellera SystemC on a 64-core AMD Opteron machine.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:005.6.2SYSTEMC-LINK: PARALLEL SYSTEMC SIMULATION USING TIME-DECOUPLED SEGMENTS
Speaker:
Jan Henrik Weinstock, RWTH Aachen University, DE
Authors:
Jan Henrik Weinstock1, Rainer Leupers1, Gerd Ascheid1, Dietmar Petras2 and Andreas Hoffmann2
1RWTH Aachen University, DE; 2Synopsys GmbH, DE
Abstract
Virtual platforms have become essential tools in the design process of modern embedded systems. Their accessibility and early availability make them ideal tools for design space exploration and debugging of target specific software. However, due to increasing platform complexity and the need to simulate more and more processors simultaneously, performance of virtual platforms degrades rapidly. This work presents SystemC-Link, a segment based parallel simulation framework for SystemC simulators. It achieves high simulation performance by using a parallel and time-decoupled simulation approach. Furthermore, it offers a virtual sequential environment for each simulation segment. This enables use of legacy models by allowing operation on global state without risking race conditions during parallel simulation. The approach is evaluated in a variety of scenarios, including a contemporary multi-core platform based on the OpenRISC architecture running Linux. For this benchmark, a 3.2x higher simulation performance was achieved with SystemC-Link compared to standard SystemC on a regular workstation PC.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:305.6.3ORTHOGONAL SIGNAL MODELING AND OPERATIONAL COMPUTATION OF AMS CIRCUITS FOR FAST AND ACCURATE SYSTEM SIMULATION
Speaker:
Leandro Gil, University of Stuttgart, DE
Authors:
Leandro Gil and Martin Radetzki, University of Stuttgart, DE
Abstract
We present a general mathematical model of signals for efficient and accurate simulation of analog and mixed signal (AMS) systems. It relies on signal coding and parameterization and allows heterogeneous system specification at different abstraction levels, as well as, the operational computation of continuous time systems' dynamical behavior. In particular, we derive a matrix for operational subdivision of continuous signals and use it to capture accurately the interaction between continuous and discrete time systems. A key advantage of this signal representation is that continuous signal monitoring and analysis can be performed more efficiently, speeding up system verification. We implemented the proposed modeling approach in SystemC AMS 2.0 to exploit the dynamic reactive behavior of TDF MoC for accurate synchronization between the digital and analog system parts. With the example of a PLL system we evaluate the capabilities of our implementation to cope with heterogeneous designs at different design abstraction levels. The experimental results show a significant simulation speedup for high accurate models.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP2-10, 609EFFICIENT MONITORING OF LOOSE-ORDERING PROPERTIES FOR SYSTEMC TLM
Speaker:
Yuliia Romenska, Univ. Grenoble Alpes, VERIMAG, FR
Authors:
Yuliia Romenska1 and Florence Maraninchi2
1Univ. Grenoble Alpes, VERIMAG, FR; 2Grenoble INP & Verimag, FR
Abstract
SystemC Transaction-level modeling (TLM) provides high-level component-based models for SoCs, for which Assertion-Based-Verification (ABV) allows property checking early in the design cycle. We introduce the notion of loose-ordering to specify when components interact with each other and we propose a set of patterns to capture this notion in assertions. This new notion can already be expressed in languages like PSL, for which there exist tools to generate ABV monitors. But the definition of dedicated patterns makes it easier to write the properties. Moreover we define a direct translation of these patterns into SystemC monitors, and we show that it avoids the combinatorial explosion that would occur during a prior translation into PSL.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area

5.7 RF, Power Converters, and ADC: Innovative Design and Test Solutions

Date: Wednesday 16 March 2016
Time: 08:30 - 10:00
Location / Room: Konferenz 5

Chair:
Marie-Minerve Louerat, Université Pierre & Marie Curie, (UPMC - Paris 6), FR

Co-Chair:
Christoph Grimm, University of Kaiserslautern, DE

This session presents innovative solutions for test of millimeter-wave circuits, power monitoring, and ADC optimization

TimeLabelPresentation Title
Authors
08:305.7.1(Best Paper Award Candidate)
BUILT-IN TEST OF MILLIMETER-WAVE CIRCUITS BASED ON NON-INTRUSIVE SENSORS
Speaker:
Athanasios Dimakos, Université Grenoble Alpes, CNRS, TIMA, FR
Authors:
Athanasios Dimakos1, Haralampos-G. Stratigopoulos2, Alexandre Siligaris3, Salvador Mir1 and Emeric De Foucauld3
1Université Grenoble Alpes, CNRS, TIMA, FR; 2Sorbonne Universités, UPMC, FR; 3CEA-Leti, FR
Abstract
This paper addresses the high-volume production test problem for millimeter-wave (mm-Wave) circuits. Bit error rate testing is the only feasible solution nowadays for mm-Wave transceivers, but is extremely costly and challenging to be implemented in high-volume production test floors. The lack of alternative solutions is due to the difficulty in extracting off-chip and processing mm-Wave frequencies. In this paper, we propose a built-in test solution that has two important attributes. First, it is based on non-intrusive sensors that are totally transparent to the mm-Wave circuit. They monitor variations in the performances of the mm-Wave circuit indirectly by virtue of offering an "image" of process variations. Second, the non-intrusive sensors operate at DC or low-frequency, thus dramatically simplifying the test of the mm-Wave circuit. We demonstrate the concept on a 65nm 60GHz mm-Wave low-noise amplifier (LNA).

Download Paper (PDF; Only available from the DATE venue WiFi)
09:005.7.2ADAPTIVE DELAY MONITORING FOR WIDE VOLTAGE-RANGE OPERATION
Speaker:
Jongho Kim, Seoul National University, KR
Authors:
Jongho Kim1, Gunhee Lee1, Kiyoung Choi1, Yonghwan Kim2, Wook Kim2, Kyungtae Do2 and Jungyun Choi2
1Seoul National University, KR; 2Samsung Electronics, KR
Abstract
As process technology scales down, circuit delay variations become more and more serious due to manufacturing and environmental variations. The delay variations are hardly predictable and thus require additional design margin and impede the chance to reduce area and power consumption of a chip. One way to alleviate the problem is to measure the circuit delay at run-time and control the supply voltage accordingly through a closed-loop dynamic voltage and frequency scaling (closed-loop DVFS) scheme. The circuit delay is typically measured by a monitoring circuit. However, the key issue of this scheme is the delay mismatch between the monitoring circuit and the target circuit block such as a CPU or a GPU. A large delay mismatch might lose the advantage of closed-loop DVFS. And it becomes worse as the circuit block operates in a wider voltage-range. This paper proposes a novel adaptive delay monitoring scheme for a wide voltage-range operation, which provides a better delay correlation between the monitor and the target compared to conventional monitoring approaches. The proposed approach reduces the average error in the measured delay by up to 45% and the maximum error by up to 68%. The reduction of the error brings the decrease of design margin, resulting in a lower-power and lower-cost design.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:305.7.3ANALYTICAL DESIGN OPTIMIZATION OF SUB-RANGING ADC BASED ON STOCHASTIC COMPARATOR
Speaker:
Md. Maruf Hossain, The University of Tokyo, JP
Authors:
Md. Maruf Hossain, Tetsuya Iizuka, Toru Nakura and Kunihiro Asada, The University of Tokyo, JP
Abstract
An optimal design method for a sub-ranging Analog to Digital Converter (ADC) based on stochastic comparator is demonstrated by performing theoretical analysis of random fluctuations in the comparator offset voltage. The proposed performance model is based on a simple but rigorous Probability Density Function (PDF) for the effective resolution of a stochastic comparator. It is possible to approximately calculate the yield of a stochastic comparator by assuming that the correlations among different analog steps of the output transfer function are negligible. Comparison with Monte Carlo simulation shows that the proposed model precisely estimates the yield of the ADC when it is designed for a reasonable target yield of > 0.8, which is the most practical case while designing a high performance ADC. Application of this model to a stochastic comparator reveals that an additional calibration can significantly enhance the resolution, i.e. it can increase the Number of Bits (NOB) by approximately 2 bits under the same chip yield. Extending the model to a stochastic-comparator-based sub-ranging ADC indicates that the ADC design parameters can be tuned to find the optimal resource distribution between the deterministic coarse stage and the stochastic fine stage.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP2-11, 362TESTABLE DESIGN OF REPEATERLESS LOW SWING ON-CHIP INTERCONNECT
Speaker:
Naveen Kadayinti, Indian Institute of Technology Bombay, IN
Authors:
Naveen Kadayinti and Dinesh Sharma, Indian Institute of Technology Bombay, IN
Abstract
Repeaterless low swing interconnects use mixed signal circuits to achieve high performance at low power. When these interconnects are used in large scale and high volume digital systems their testability becomes very important. This paper discusses the testability of low swing repeaterless on-chip interconnects with equalization and clock synchronization. A capacitively coupled transmitter with a weak driver is used as the transmitter. The receiver samples the low swing input data at the center of the data eye and converts it to rail to rail levels and also synchronizes the data to the receiver's clock domain. The system is a mixed signal circuit and the digital components are all scan testable. For the analog section, just a DC test has a fault coverage of 50% of the structural faults. Simple techniques allow integration of the analog components into the digital scan chain increasing the coverage to 74%. Finally, a BIST with low overhead enhances the coverage to 95% of the structural faults. The design and simulations have been done in UMC 130 nm CMOS technology.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP2-12, 320ALL-DIGITAL HYBRID-CONTROL BUCK CONVERTER FOR INTEGRATED VOLTAGE REGULATOR APPLICATIONS
Speaker:
Visvesh Sathe, University of Washington, US
Authors:
Ta-tung Yen, Bin Yu and Visvesh Sathe, University of Washington, US
Abstract
With efficiency and performance gains from subsequent CMOS technology generations continuing to taper-off, power-dissipation remains a roadblock to maintaining growth in computational performance. Power management systems are expected to continue to heavily rely on Dynamic Voltage and Frequency Scaling (DVFS), and Integrated Voltage Regulation (IVR) in particular, to drive improvements in energy-efficiency through finer supply-voltage control. As voltage domains continue to shrink, and multiple IVRs are employed within a System-on-Chip (SoC), all-digital buck converters will become increasingly important from a scalability, portability, and methodology-compatibility perspective. In addition to some of the existing challenges facing Voltage Regulator Modules (VRMs), IVR implementations are faced with additional efficiency and transient response due to the limited available filter capacitance. In this paper, we propose an alldigital hybrid-control buck converter which addresses these key challenges effectively by regulating supply voltage based on slack information from a critical path monitor, a novel and accurate technique for digital derivative measurement for effective PID control, and the use of digital non-linear control for fast transient response. Simulations in an industrial 65nm process technology demonstrate stable, energy-efficient operation with fast load regulation. Operating with a single phase, using package mounted inductor and filter capacitor models, the converter achieves a 25mV droop for a 5A load current ramp at 500mA/ns. With a high-side supply voltage of 2V, the converter achieves a peak efficiency of 86% at 2A.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area

5.8 Model Based Design and Verification Day - Exhibition Keynote and Application Talk

Date: Wednesday 16 March 2016
Time: 08:30 - 10:00
Location / Room: Exhibition Theatre

Moderator:
John Zhao, MathWorks Inc., US

With its special "Model Based Design and Verification Day" DATE 2016 for the first time combines a visionary keynote from an industrial leader, application talks of experienced users and an industrial tutorial with two sessions of the DATE conference Technical Program on latest research results in the field. This gives attendees the opportunity to get a comprehensive overview on start-of-the-art in model based design and test, ranging from industrial application to academic research.

This session starts the day with an Exhibition Keynote given by Jimm Tung, MathWorks Fellow at MathWorks Inc., followed by an application talk, given by Robert Stewart, MathWorks Professor at University of Strathclyde. Please see the abstracts of the talks for more details. It will be followed by the Technical Program sessions 6.6 and 7.6 covering research work on modelling and control of cyber-physical systems and techniques for the analysis and testing of embedded software, respectively. The day will be completed with the Exhibition Theatre session 8.8 giving an industrial tutorial on FPGA/ARM System Development and Verification.

Click here to download The MathWorks "Model Based Design and Verification Day" flyer.

TimeLabelPresentation Title
Authors
08:305.8.1INTRODUCTION
Speaker:
John Zhao, MathWorks Inc., US
08:355.8.2EXHIBITION KEYNOTE: THE TRANSFORMATIVE FUSION OF SENSING, COMPUTING, COMMUNICATION & CONTROL
Speaker:
Jim Tung, MathWorks Inc., US
Abstract

We are ushering in a new era of industrial transformation. Automotive and consumer technologies are merging. Telecomm, media, and Internet and search providers are blending. Aerospace systems are enabling consumer services from product delivery to internet access. The catalysts - powerful, low-cost technologies for sensing, computing, communications, and control - are changing the markets of embedded hardware and software, as well as the companies and industries creating the increasingly smart and multi-function systems that leverage those technologies.  This presentation will describe these broad transformations, the research opportunities that they create, and the approaches that trailblazing researchers and organizations are relying on to succeed in this new era.

09:205.8.3APPLICATION TALK: MODEL BASED DESIGN FOR 4G AND 5G WIRELESS COMMUNICATIONS SOFTWARE DEFINED RADIO USING MATLAB
Speaker:
Robert Stewart, MathWorks Professor, University of Strathclyde, GB
Abstract

Over recent years Software Defined Radio (SDR) has evolved from an expensive research lab based activity to a platform for the implementation of efficient DSP algorithms in real-time on FPGAs. In this presentation we will look at SDR model based design using MATLAB in combination with supported SDR hardware implementation platforms.  As part of a concept to implementation design flow, we will show how to develop floating point simulations for the PHY layer of a LTE receiver, implement fixed point simulations, and verify performance using real off-the-air RF data. Using the same MATLAB models, we can generate HDL and C implementations and target a Xilinx Zynq SDR platform.

09:505.8.4Q&A
10:00End of session
Coffee Break in Exhibition Area

IP2 Interactive Presentations

Date: Wednesday 16 March 2016
Time: 10:00 - 10:30
Location / Room: Conference Level, Foyer

Interactive Presentations run simultaneously during a 30-minute slot. A poster associated to the IP paper is on display throughout the afternoon. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session, prior to the actual Interactive Presentation. Moreover, one "Best Interactive Presentation Award" will be given.

LabelPresentation Title
Authors
IP2-1(Best Paper Award Candidate)
ANALYZING THE IMPACT OF INJECTED SENSOR DATA ON AN ADVANCED DRIVER ASSISTANCE SYSTEM USING THE OP2TIMUS PROTOTYPING PLATFORM
Speaker:
Alexander Stühring, University of Oldenburg, DE
Authors:
Alexander Stühring1, Günter Ehmen1 and Sibylle Fröschle2
1University of Oldenburg, DE; 2OFFIS Institute for Information Technology, DE
Abstract
Modern vehicles are running complex and safety critical applications distributed over several Electronic Control Units (ECUs). Some ECUs are equipped with communication interfaces providing access to other devices, networks or remote services. Since the number of attack vectors is increasing, an early investigation of the impact of attacks becomes steadily more important. This paper gives an example how manipulated sensor data injected to the CAN bus affects an Advanced Driver Assistance System (ADAS). Within multiple experiments we illustrate the impact of different aspects like the sending rate.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-2HARDWARE TROJANS IN INCOMPLETELY SPECIFIED ON-CHIP BUS SYSTEMS
Speaker:
Nicole Fern, UC Santa Barbara, US
Authors:
Nicole Fern, Ismail San, Cetin Kaya Koc and Kwang-Ting (Tim) Cheng, UC Santa Barbara, US
Abstract
The security, functionality, and performance of the on-chip bus system is critical in an SoC design. We highlight the susceptibility of current bus implementations to Hardware Trojans hiding in unspecified functionality. Unlike existing Trojans which aim to disrupt normal bus behavior and are often designed for a specific protocol and topology, we present a general model for creating a covert Trojan communication channel between SoC components. From our channel model, which is applicable to any topology and protocol, one can create circuitry allowing information to flow covertly by altering existing bus signals only when they are unspecified. We give the specifics of this circuitry for AMBA AXI4 and APB, then create a system comprised of several master and slave units connected by an AXI4-Lite interconnect to quantify the overhead of the Trojan channel and illustrate the ability of our Trojans to evade a suite of protocol compliance checking assertions from ARM. We further outline several detection strategies for this class of hardware Trojan.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-3WORKLOAD-AWARE POWER OPTIMIZATION STRATEGY FOR ASYMMETRIC MULTIPROCESSORS
Speaker:
Emanuele Del Sozzo, Politecnico di Milano, IT
Authors:
Emanuele Del Sozzo, Gianluca Durelli, Ettore Trainiti, Antonio Miele, Marco Domenico Santambrogio and Cristiana Bolchini, Politecnico di Milano, IT
Abstract
Asymmetric multi-core architectures, such as the ARM big.LITTLE, are emerging as successful solutions for the embedded and mobile markets due to their capabilities to trade-off performance and power consumption. However, both the HMP scheduler integrated in the commercial products and the previous research approaches are not able to fully exploit such potentiality. We propose a new runtime resource management policy for the big.LITTLE architecture integrated in Linux aimed at optimizing the power consumption while fulfilling performance requirements specified for the running applications. Experimental results show an improvement of the 11% on the performance and at the same time 8% in peak power consumption w.r.t. the current Linux HMP solution.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-4(Best Paper Award Candidate)
THE SLOWDOWN OR RACE-TO-IDLE QUESTION: WORKLOAD-AWARE ENERGY OPTIMIZATION OF SMT MULTICORE PLATFORMS UNDER PROCESS VARIATION
Speaker:
Anup Das, University of Southampton, GB
Authors:
Anup Das, Geoff Merrett and Bashir Al-Hashimi, University of Southampton, GB
Abstract
Increasing use of high performance applications on multicore platforms has proliferated energy consumption, transforming this as a primary design optimization objective. Two widely used approaches for reducing energy consumption in multithreaded workloads are slowdown (using DVFS) and race-to-idle. In this paper, we first demonstrate that most energy efficient choice is dependent on (1) workload (memory bound, CPU bound etc.), (2) process variation and (3) support for Simultaneous Multithreading (SMT). We then propose an approach for mapping application threads on SMT multicore systems at runtime, to minimize energy consumption. The proposed approach interfaces with the operating system and hardware performance counters and timers to characterize application threads. This characterization captures the effect of process variation on execution time and identifies the break-even operating point, where one strategy (slowdown or race-to-idle) outperforms the other. Thread mapping is performed using these characterized data by iteratively collapsing application threads (SMT) followed by binary programming-based thread mapping. Finally, performance slack is exploited at run-time to select between slowdown and race-to-idle, based upon the break-even operating point calculated for each individual thread. This end-to-end approach is implemented as a run-time manager for the Linux operating system and is validated across a range of high performance applications. Results demonstrate up to 13% energy reduction over all state-of-the-art approaches, with an average of 18% improvement over Linux.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-5TOWARDS GENERAL PURPOSE COMPUTATIONS ON LOW-END MOBILE GPUS
Speaker:
Leonidas Kosmidis, Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES
Authors:
Matina Maria Trompouki1 and Leonidas Kosmidis2
1Universitat Politècnica de Catalunya, ES; 2Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES
Abstract
GPUs traditionally offer high computational capabilities, frequently higher than their CPU counterparts. While high-end mobile GPUs vendors introduced recently general purpose APIs, such as OpenCL, to leverage their computational power, the vast majority of the mobile devices lack such support. Despite that their graphics APIs have similarities with desktop graphics APIs, they have significant differences, which prevent the use of well-known techniques that offer general-purpose computations over such interfaces. In this paper we show how these obstacles can be overcome, in order to achieve general purpose programmability of these devices. As a proof of concept we implemented our proposal on a real embedded platform (Raspberry Pi) based on Broadcom's VideoCore IV GPU, obtaining a speedup of 7.2X over the CPU.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-6ESTIMATING DELAY DIFFERENCES OF ARBITER PUFS USING SILICON DATA
Speaker:
Keshab Parhi, University of Minnesota, US
Authors:
Satya Venkata Sandeep Avvaru, Chen Zhou, Saroj Satapathy, Yingjie Lao, Chris Kim and Keshab Parhi, University of Minnesota, US
Abstract
This paper presents a novel approach to estimate delay differences of each stage in a standard MUX-based physical unclonable function (PUF). Test data collected from PUFs fabricated using 32 nm process are used to train a linear model. The delay differences of the stages directly correspond to the model parameters. These parameters are trained by using a least mean square (LMS) adaptive algorithm. The accuracy of the response using the proposed model is around 97.5% and 99.5% for two different PUFs. Second, the PUF is also modeled by a perceptron. The perceptron has almost 100% classification accuracy. A comparison shows that the perceptron model parameters are scaled versions of the model derived by the LMS algorithm. Thus, the delay differences can be estimated from the perceptron model where the scaling factor is computed by comparing the models of the LMS algorithm and the perceptron. Because the delay differences are challenge independent, these parameters can be stored on the server. This will enable the server to issue random challenges whose responses need not be stored. An analysis of the proposed model confirms that the delay differences of all stages of the PUFs on the same chip belong to the same Gaussian probability density function.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-7ON THE USE OF FORWARD BODY BIASING TO DECREASE THE REPEATABILITY OF LASER-INDUCED FAULTS
Speaker:
Marc Lacruche, Ecole Nationale Supérieure des Mines de Saint Etienne (ENSM-SE), FR
Authors:
Marc Lacruche1, Noemie Beringuier-Boher1, Jean-Max Dutertre1, Jean-Baptiste Rigaud1 and Edith Kussener2
1Ecole Nationale Supérieure des Mines de Saint Etienne (ENSM-SE), FR; 2IM2NP, FR
Abstract
This paper presents a study on the effect of Forward Body Biasing on the laser fault sensitivity of a CMOS 90nm microcontroller. Tests were performed on a register of this target, under several supply voltage and body bias settings, showing significant laser sensitivity variations. Based on these results, a method which aims at decreasing fault repeatability by using variable supply voltage and body bias settings is proposed. Finally, tests are performed on an implementation of this method on a temporally redundant AES and the results are presented.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-8SEQUENTIAL ANALYSIS DRIVEN RESET OPTIMIZATION TO IMPROVE POWER, AREA AND ROUTABILITY
Speaker:
Srihari Yechangunja, Mentor Graphics Corporation, IN
Authors:
Srihari Yechangunja1, Raj Shekhar1, Mohit Kumar1, Nikhil Tripathi1, Abhishek Ranjan1, Abhishek Mittal1, Jianfeng Liu2, Minyoung Mo2, Kyungtae Do2, Jung Yun Choi2 and SungHo Park2
1Mentor Graphics Corporation, IN; 2S.LSI, Samsung Electronics Co. Ltd, KR
Abstract
Resets are required in the design to initialize the hardware for system operation and to force it into a known state for simulation or to recover from an error. Given the increasing design complexity and time-to-market pressures, figuring out the registers which do not require resets is extremely challenging. In this paper, we present a novel algorithm which uses observability based sequential analysis to identify the registers in design which do not require resets. With the proposed algorithm, we have seen that in some cases 70% registers in the design can have redundant resets. Further, with removal of the redundant resets on registers up to 22% sequential power savings and up to 3% area reduction post-layout can be obtained.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-9EFFICIENT GLOBAL OPTIMIZATION OF MEMS BASED ON SURROGATE MODEL ASSISTED EVOLUTIONARY ALGORITHM
Speaker:
Bo Liu, Glyndwr University, GB
Authors:
Bo Liu1 and Anna Nikolaeva2
1Glyndwr University, GB; 2Bauman Moscow State Technical University, RU
Abstract
Optimization plays a key role in MEMS design. However, most MEMS design optimization (exploration) methods either depend on ad-hoc analytical / behavioural models or time consuming numerical simulations. Surrogate modeling techniques have been introduced to integrate generality and efficiency, but the number of design variables which can be handled by most existing efficient MEMS design optimization methods is often less than 5. To address the above challenges, a new method, called Adaptive Gaussian Process-Assisted Differential Evolution for MEMS Design Optimization (AGDEMO) is proposed. The key idea is the proposed ON-LINE adaptive surrogate model assisted optimization framework. In particular, AGDEMO performs global optimization of MEMS using numerical simulation and the differential evolution (DE) algorithm, and a Gaussian process surrogate model is constructed ON-LINE to predict the results of expensive numerical simulations. AGDEMO is tested by two actuators (both with 9 design variables). Comparisons with state-of-the-art methods verify advantages of AGDEMO in terms of efficiency, optimization capacity and scalability.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-10EFFICIENT MONITORING OF LOOSE-ORDERING PROPERTIES FOR SYSTEMC TLM
Speaker:
Yuliia Romenska, Univ. Grenoble Alpes, VERIMAG, FR
Authors:
Yuliia Romenska1 and Florence Maraninchi2
1Univ. Grenoble Alpes, VERIMAG, FR; 2Grenoble INP & Verimag, FR
Abstract
SystemC Transaction-level modeling (TLM) provides high-level component-based models for SoCs, for which Assertion-Based-Verification (ABV) allows property checking early in the design cycle. We introduce the notion of loose-ordering to specify when components interact with each other and we propose a set of patterns to capture this notion in assertions. This new notion can already be expressed in languages like PSL, for which there exist tools to generate ABV monitors. But the definition of dedicated patterns makes it easier to write the properties. Moreover we define a direct translation of these patterns into SystemC monitors, and we show that it avoids the combinatorial explosion that would occur during a prior translation into PSL.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-11TESTABLE DESIGN OF REPEATERLESS LOW SWING ON-CHIP INTERCONNECT
Speaker:
Naveen Kadayinti, Indian Institute of Technology Bombay, IN
Authors:
Naveen Kadayinti and Dinesh Sharma, Indian Institute of Technology Bombay, IN
Abstract
Repeaterless low swing interconnects use mixed signal circuits to achieve high performance at low power. When these interconnects are used in large scale and high volume digital systems their testability becomes very important. This paper discusses the testability of low swing repeaterless on-chip interconnects with equalization and clock synchronization. A capacitively coupled transmitter with a weak driver is used as the transmitter. The receiver samples the low swing input data at the center of the data eye and converts it to rail to rail levels and also synchronizes the data to the receiver's clock domain. The system is a mixed signal circuit and the digital components are all scan testable. For the analog section, just a DC test has a fault coverage of 50% of the structural faults. Simple techniques allow integration of the analog components into the digital scan chain increasing the coverage to 74%. Finally, a BIST with low overhead enhances the coverage to 95% of the structural faults. The design and simulations have been done in UMC 130 nm CMOS technology.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-12ALL-DIGITAL HYBRID-CONTROL BUCK CONVERTER FOR INTEGRATED VOLTAGE REGULATOR APPLICATIONS
Speaker:
Visvesh Sathe, University of Washington, US
Authors:
Ta-tung Yen, Bin Yu and Visvesh Sathe, University of Washington, US
Abstract
With efficiency and performance gains from subsequent CMOS technology generations continuing to taper-off, power-dissipation remains a roadblock to maintaining growth in computational performance. Power management systems are expected to continue to heavily rely on Dynamic Voltage and Frequency Scaling (DVFS), and Integrated Voltage Regulation (IVR) in particular, to drive improvements in energy-efficiency through finer supply-voltage control. As voltage domains continue to shrink, and multiple IVRs are employed within a System-on-Chip (SoC), all-digital buck converters will become increasingly important from a scalability, portability, and methodology-compatibility perspective. In addition to some of the existing challenges facing Voltage Regulator Modules (VRMs), IVR implementations are faced with additional efficiency and transient response due to the limited available filter capacitance. In this paper, we propose an alldigital hybrid-control buck converter which addresses these key challenges effectively by regulating supply voltage based on slack information from a critical path monitor, a novel and accurate technique for digital derivative measurement for effective PID control, and the use of digital non-linear control for fast transient response. Simulations in an industrial 65nm process technology demonstrate stable, energy-efficient operation with fast load regulation. Operating with a single phase, using package mounted inductor and filter capacitor models, the converter achieves a 25mV droop for a 5A load current ramp at 500mA/ns. With a high-side supply voltage of 2V, the converter achieves a peak efficiency of 86% at 2A.

Download Paper (PDF; Only available from the DATE venue WiFi)

UB05 Session 5

Date: Wednesday 16 March 2016
Time: 10:00 - 12:00
Location / Room: Booth 15, Exhibition Area

LabelPresentation Title
Authors
UB05.1VISUALNOC: VISUALIZATION NETWORK-ON-CHIP DESIGN FRAMEWORK
Presenter:
Junshi Wang, Unversity of Electronics Science and Technology of China, CN
Authors:
Junshi Wang1, Letian Huang1, Guangjun Li1 and Axel Jantsch2
1Unversity of Electronics Science and Technology of China, CN; 2Technology University of Vienna, AT
Abstract
Simulations are the most common approach to evaluating Network on Chip (NOC) designs and many simulators at different abstraction levels have been developed. However, developers have to spend a considerable amount of time and energy to extract meaningful information from the simulator reports. Visualization of simulation is a sensible approach in the study of NoC design. We introduce a Visualization Network-on-Chip Design Framework (VisualNoC) that can connect with any NoC simulator. It tracks the event trace files of the simulator recording the behavior of routers and packets in the network based on an event-based model. VisualNoC operates with cycle-accuracy in analyzing the status of the network and can complement traditional tools to facilitate efficient debugging and analysis helping to reduce the number of design iterations.

Download Paper (PDF)
UB05.2D-VASIM: TIMING ANALYSIS OF GENETIC LOGIC CIRCUITS USING D-VASIM
Presenter:
Hasan Baig, Technical University of Denmark, DK
Authors:
Hasan Baig and Jan Madsen, Technical University of Denmark, DK
Abstract
A genetic logic circuit is a gene regulator network implemented by re-engineering the DNA of a cell, in order to control gene expression or metabolic pathways, through a logic combination of external signals, such as chemicals or proteins. As for electronic logic circuits, timing and propagation delay analysis may also play a very significant role in the designing of genetic logic circuits. In this demonstration, we present the capability of D-VASim (Dynamic Virtual Analyzer and Simulator) to perform the timing and propagation delay analysis of a single as well as cascaded genetic logic circuits. D-VASim allows user to change the circuit parameters during runtime simulation to observe their effects on circuit's timing behavior. The results obtained from D-VASim can be used not only to characterize the timing behavior of genetic logic circuits but also to analyze the timing constraints of cascaded genetic logic circuits.

Download Paper (PDF)
UB05.3COSSIM: A NOVEL, COMPREHENSIBLE, ULTRA-FAST, SECURITY-AWARE CPS SIMULATOR
Presenter:
Antonios Nikitakis, Technical University of Crete, GR
Authors:
Antonios Nikitakis and Andreas Brokalakis, Technical University of Crete, GR
Abstract
Nowadays, Cyber Physical Systems (CPS) are growing in capability at an extraordinary rate, promoted by the increased presence and capabilities of electronic control Units as well as of the sensors and actuators and the interconnecting networks. One of the main problems CPS designers face is the lack of simulation tools and models for system design and analysis. This is mainly because the majority of the existing simulation tools for complex CPS handle efficiently only parts of a system (only the processing or network) while none of them support the notion of security. The presented system is a "Novel, Comprehensible, Ultra-Fast, Security-Aware CPS Simulator" (COSSIM). COSSIM is the first known simulation framework that allows for the simulation of a complete CPS utilizing complex SoCs interconnected with sophisticated networks. Finally, the COSSIM system support accurate power estimations while it is the first such tool supporting security as a feature of the design process.

Download Paper (PDF)
UB05.4AGAMID: A TLM FRAMEWORK FOR EVALUATION OF HARDWARE-ENHANCED MANY-CORE RUN-TIME MANAGEMENT
Presenter:
Daniel Gregorek, University of Bremen, DE
Authors:
Daniel Gregorek and Alberto Garcia-Ortiz, University of Bremen, DE
Abstract
The advent of many-core processors raises novel demands to system design. Power-limitations and abundant parallelism require for efficient and scalable run-time management. But the design of a many-core run-time manager generally suffers from exhaustive evaluation time. AGAMID is a novel research framework for design space exploration of hardware-enhanced many-core run-time management. In this demo, we use AGAMID for the interactive analysis of many-core architectures and run-time management systems. We perform hands-on comparison of RTM architectures, RTM algorithms and HW/SW partitionings. We also give insights into the design and architecture of the framework itself.

Download Paper (PDF)
UB05.5A-LOOP: AMP SYSTEM WITH A DUAL-CORE ARM CORTEX A9 PROCESSOR WITH LINUX OPERATING SYSTEM AND A QUAD-CORE LEON3 PROCESSOR WITH LINUX OPERATING SYSTEM, OPENMP LIBRARY AND HARDWARE PROFILING SYSTEM
Presenter:
Giacomo Valente, Università Degli Studi Dell'Aquila, IT
Authors:
Giacomo Valente and Vittoriano Muttillo, Università Degli Studi Dell'Aquila, IT
Abstract
Isles of computational elements with different characteristics can be exploited for separate tasks with different non-functional requirements. This can drive to realization of smart System On Modules (SoM). In such a context, SoC with FPGA can be viewed as platforms useful to prototype these architectures. This demo shows a SoM prototype for aerospace applications developed on Zynq7000 SoC, composed of dual-core ARM Cortex A9 with Linux operating system (isle#1) able to interface with external data, and quad-core Leon3 with SMP Linux operating system (isle#2), able to execute parallel applications based on OpenMP library. These 2 computational isles share an external DDR memory, so that isle#1 can provide data and collect results from isle#2. Moreover, isle#1 is able to monitor performance of isle#2 without introducing software overhead (i.e. no SW instrumentation) by using a hardware profiling system. The whole system that executes a MANET localization algorithm will be presented.

Download Paper (PDF)
UB05.6RC3E: DESIGN AND TEST AUTOMATIZATION IN THE CLOUD
Presenter:
Patrick Lehmann, Technische Universität Dresden, DE
Authors:
Patrick Lehmann, Oliver Knodel, Martin Zabel and Rainer G. Spallek, Technische Universität Dresden, DE
Abstract
Cloud computing is getting more and more interesting for companies, caused by its flexibility to provide apparently endless resources and nouveau services, while reducing he total cost of ownership for the user. Fields of applications reach from web technologies over storage solutions to complex business processes. The domain of chip and system design is well known for offloading resource intensive and long running synthesis or simulation task onto centralized servers. As hardware designs grow in an exponential way and verification requirements were strengthened, cloud services are investigated to compensate these needs. Anyway, in the end real hardware tests cannot be avoided. Our RC3E eco system brings close to the hardware prototype development and automated hardware testing into the cloud, continuing the principle of "test often and test early". The architecture offers virtualized and shared FPGA resources for prototyping, with automated remote debugging capabilities.

Download Paper (PDF)
UB05.8DIGITALLY DRIVEN TOP-DOWN METHODOLOGY FOR MIXED SIGNAL CIRCUIT DESIGN
Presenter:
Markus Mueller, University of Heidelberg, DE
Authors:
Markus Mueller, Maximilian Thuermer and Ulrich Bruening, University of Heidelberg, DE
Abstract
In this methodology,synthesizable modules and full custom blocks are first described in an HDL in a top-down approach. For analog cells, real number based models are created.Once the complete mixed signal model is done, each cell in the design is completely described concerning interface and behavior. The models then serve as specification for the full custom cell development.Schematics which don't include any primitives are automatically generated from the HDL description by a scripted flow to ensure consistency.Design space exploration can be done fast and very efficient this way. Cells which can be reused at different places in the design are identified and problems arising from interactions on the system level are found early in the design phase.This methodology accelerates the design process significantly, avoids errors and provides higher flexibility for design changes. A digital centric design example of a High Speed SerDes IP is demonstrated using the described methodology.

Download Paper (PDF)
UB05.10DAC GENERATOR: A DAC STAGE ANALOG CIRCUIT GENERATOR FOR UDSM AND FD-SOI TECHNOLOGIES
Presenter:
Benjamin Prautsch, Fraunhofer Institute for Integrated Circuits IIS, Design Automation Division EAS, DE
Authors:
Benjamin Prautsch, Sunil Rao, Uwe Eichler, Ajith Puppala and Torsten Reich, Fraunhofer Institute for Integrated Circuits IIS, Design Automation Division EAS, DE
Abstract
The design of analog integrated circuits requires extensive manual work which is error-prone and inefficient. With advanced ultra-deep sub-micron (UDSM) technologies, the manual design effort increases further dramatically. This work presents the application of a rethought generator approach for the efficient reusable design of a 12 bit current steering DAC. The current mirror stage of the DAC, which is arranged in the complex Q² random walk scheme for high intrinsic matching [1], is realized by a circuit generator which automatically creates schematic, symbol, and layout of the required cells within few minutes. Originally focused on a 28 nm bulk technology, the generator code was also executed in a 28 nm FD-SOI technology with minor migration effort due to the generic nature of our tool. In addition, the fast circuit generation enables an efficient layout optimization showcasing the benefit of analog circuit generators for "bottom-up" design [2] in advanced technology nodes.

Download Paper (PDF)
12:00End of session
12:30Lunch Break in Großer Saal + Saal 1
Keynote Lecture in "Saal 2" 14:00 - 14:30

6.1 SPECIAL DAY Hot Topic: Formal Methods for Automotive Software

Date: Wednesday 16 March 2016
Time: 11:00 - 12:30
Location / Room: Saal 2

Chair:
Marc Geilen, Eindhoven University of Technology, NL

Co-Chair:
Wolfgang Ecker, Infineon Technologies, DE

The growing complexity of automotive software has led to the increasing focus on the use of formal methods for automotive software development and validation. This session will feature three invited talks giving different perspectives on the use of formal methods for automotive software development. This will include techniques for the verification of control software code to timing analysis of automotive software.

TimeLabelPresentation Title
Authors
11:006.1.1REQUIREMENTS ENGINEERING FOR SOFTWARE-INTENSIVE AUTOMOTIVE EMBEDDED SYSTEMS
Speaker and Author:
Manfred Broy, Technische Universität München (TUM), DE
11:306.1.2FORMAL SPECIFICATION AND VERIFICATION OF AUTOMOTIVE SOFTWARE IN PRACTICE
Speaker and Author:
Ravindra Metta, TCS Innovation Labs, IN
12:006.1.3TIMING ANALYSIS OF AUTOMOTIVE ARCHITECTURES AND SOFTWARE
Speaker and Author:
Nicolas Navet, University of Luxembourg and RealTime-at-Work, LU
12:30End of session
Lunch Break in Großer Saal + Saal 1
Keynote Lecture in "Saal 2" 14:00 - 14:30

6.2 Panel: Looking Backwards and Forwards

Date: Wednesday 16 March 2016
Time: 11:00 - 12:30
Location / Room: Konferenz 6

Organiser:
Marco Casale-Rossi, Synopsys, US

Chair:
Marco Casale-Rossi, Synopsys, US

Co-Chair:
Giovanni De Micheli, École Polytechnique Fédérale de Lausanne (EPFL), CH

Ten years ago, at 90 nanometers EDA was challenged, and deemed inadequate in dealing with increasing complexity, power consumption, and sub-wavelength lithography, thus harming the progress of mobile phones. Today, at 10 nanometers integration capacity has increased by two orders of magnitude, power consumption has been successfully "defeated", and 193 nanometer immersion lithography is still relied upon… *also* thanks to EDA; tools, methodologies, and flows that were originally devised for design enablement at the emerging technology nodes, have been successfully re-deployed at the established technology nodes, where they represent a critical design differentiation factor. However, the battleground is changing again: after the billions of phones, trillions of "things" lie ahead; moving forward, emerging and established technology nodes, digital and analog, hardware and software will be equally critical. What is EDA doing and, more important, what should EDA do - and is not doing - in order for the next decade to be as great as the past one? This panel session, moderated by EPFL Professor Giovanni De Micheli, gathers academia, semiconductor, and EDA industry to discuss the challenges and the requirements of the new era.


Download Paper (PDF; Only available from the DATE venue WiFi)

Moderator:

  • Giovanni De Micheli, École Polytechnique Fédérale de Lausanne (EPFL), CH

Panelists:

  • Antun Domic, Synopsys, US
  • Enrico Macii, Politecnico di Torino, IT
  • Domenico Rossi, STMicroelectronics, IT
  • Joseph Sawicki, Mentor, US
12:30End of session
Lunch Break in Großer Saal + Saal 1
Keynote Lecture in "Saal 2" 14:00 - 14:30

6.3 Anti-aging and Error Protection using Checkpointing and DVFS

Date: Wednesday 16 March 2016
Time: 11:00 - 12:30
Location / Room: Konferenz 1

Chair:
Antonio Rosario Miele, Polimi, IT

Co-Chair:
Jose L. Ayala, Complutense University of Madrid, ES

As reliability becomes a major concern for both designers and technologists, techniques such as error protection is needed to keep the best known state and preserve it for subsequent operations. In this session various methods of checkpointing at register level and at memory level are presented that relieve systems from aging. Various combinations of DVFS and checkpointing techniques are presented in this session including techniques that exploit application level tolerability to errors.

TimeLabelPresentation Title
Authors
11:006.3.1AGING-AWARE VOLTAGE SCALING
Speaker:
Victor M. van Santen, Karlsruhe Institute of Technology (KIT), DE
Authors:
Victor M. van Santen1, Hussam Amrouch1, Narendra Parihar2, Souvik Mahapatra2 and Jörg Henkel1
1Karlsruhe Institute of Technology (KIT), DE; 2Indian Institute of Technology Bombay, IN
Abstract
As feature sizes of transistors began to approach atomic levels, aging effects have become one of major concerns when it comes to reliability. Recently, aging effects have become a subject to voltage scaling as the latter entered the sub-micron regime. Hence, aging shifted from a sole long-term (as treated by state-of-the-art) to a short and long-term reliability challenge. This paper interrelates both aging and voltage scaling to explore and quantify for the first time the short-term effects of aging. We propose "aging-awareness" with respect to voltage scaling which is indispensable to sustain runtime reliability. Otherwise, transient errors, caused by the short-term effects of aging, may occur. Compared to state-of-the-art, our aging-aware voltage scaling optimizes for both short-term and long-term aging effects at marginal guardband overhead.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:306.3.2RECORD: REDUCING REGISTER TRAFFIC FOR CHECKPOINTING IN RELIABLE EMBEDDED PROCESSORS
Speaker:
Sri Parameswaran, University of New South Wales, AU
Authors:
Tuo Li1, Jude Angelo Ambrose2 and Sri Parameswaran1
1University of New South Wales, AU; 2Canon Information Systems Research Australia, AU
Abstract
Checkpoint/recovery, as a classic method, has been widely used for overcoming transient faults in computing systems. The basic function of checkpoint/recovery is to save the system states periodically and to restore the system states by using the saved states if a fault occurs. With the hardware-implemented checkpointing mechanism executing at runtime, a processor will have substantially increased register-file reads. For embedded processors, which typically have restricted design constraints on area, power, and performance, such increases might compromise the quality of the application greatly. In this paper, we present a checkpointing method, ReCoRD, aimed at reducing the resultant register traffic at runtime, by leveraging register data dependencies. The proposed checkpointing method can reduce redundant executions of register-file checkpointing. The experiments show that ReCoRD achieves improved register traffic reduction (20%) along with reduced dynamic power consumption (approximately 20%) in comparison to the state of the art with minimal area overhead. The leakage power increases marginally (about 2%), but is more than compensated by the decrease in dynamic power.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:006.3.3ERROR RESILIENCE AND ENERGY EFFICIENCY: AN LDPC DECODER DESIGN STUDY
Speaker:
Philipp Schläfer, University of Kaiserslautern, DE
Authors:
Philipp Schläfer1, Chu-Hsiang Huang2, Clayton Schoeny2, Christian Weis1, Yao Li3, Norbert Wehn1 and Lara Dolecek2
1University of Kaiserslautern, DE; 2University of California, Los Angeles, US; 3Akamai Inc., US
Abstract
Iterative decoding algorithms for low-density parity check (LDPC) codes have an inherent fault tolerance. In this paper, we exploit this robustness and optimize an LDPC decoder for high energy efficiency: we reduce energy consumption by opportunistically increasing error rates in decoder memories, while still achieving successful decoding in the final iteration. We develop a theory-guided unequal error protection (UEP) technique. UEP is implemented using dynamic voltage scaling that controls the error probability in the decoder memories on a per iteration basis. Specifically, via a density evolution analysis of an LDPC decoder, we first formulate the optimization problem of choosing an appropriate error rate for the decoder memories to achieve successful decoding under minimal energy consumption. We then propose a low complexity greedy algorithm to solve this optimization problem and map the resulting error rates to the corresponding supply voltage levels of the decoder memories in each iteration of the decoding algorithm. We demonstrate the effectiveness of our approach via ASIC synthesis results of a decoder for the LDPC code in the IEEE 802.11ad standard, implemented in 28 nm FD-SOI technology. The proposed scheme achieves an increase in energy efficiency of up to 40% compared to the state-of-the-art solution.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:156.3.4RUNTIME INTERVAL OPTIMIZATION AND DEPENDABLE PERFORMANCE FOR APPLICATION-LEVEL CHECKPOINTING
Speaker:
Dimitrios Rodopoulos, ICCS/NTUA, GR
Authors:
Apostolos Kokolis1, Alexandros Mavrogiannis1, Dimitrios Rodopoulos2, Christos Strydis3 and Dimitrios Soudris1
1NTUA, GR; 2ICCS/NTUA, GR; 3Erasmus MC, NL
Abstract
As aggressive integration paves the way for performance enhancement of many-core chips and technology nodes go below deca-nanometer dimensions, system-wide failure rates are becoming noticeable. Inevitably, system designers need to properly account for such failures. Checkpoint/Restart (C/R) can be deployed to prolong dependable operation of such systems. However, it introduces additional overheads that lead to performance variability. We present a versatile dependability manager (DepMan) that orchestrates a many-core application-level C/R scheme, while being able to follow time-varying error rates. DepMan also contains a dedicated module that ensures on-the-fly performance dependability for the executing application. We evaluate the performance of our scheme using an error injection module both on the experimental Intel Single-Chip Cloud Computer (SCC) and on a commercial Intel i7 general purpose computer. Runtime checkpoint interval optimization adapts to a variety of failure rates without extra performance or energy costs. The inevitable timing overhead of C/R is reclaimed systematically with Dynamic Voltage and Frequency Scaling (DVFS), so that dependable application performance is ensured.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP3-1, 631A FLEXIBLE INEXACT TMR TECHNIQUE FOR SRAM-BASED FPGAS
Speaker:
Akash Kumar, Technische Universität Dresden, DE
Authors:
Shyamsundar Venkataraman1, Rui Santos1 and Akash Kumar2
1National University of Singapore, SG; 2Technische Universität Dresden, DE
Abstract
Single Event Upsets (SEUs) inadvertently change the logic memory and thereby the configuration of the Field Programmable Gate Arrays (FPGAs), leading to their incorrect functioning. Traditional methods to tolerate such faults include Triple Modular Redundancy (TMR). However, such method has a high overhead in terms of power and area. Moreover, the inexact methods used in ASICs to overcome this problem are not efficient when applied in FPGAs. Therefore, this paper proposes a novel technique based on heuristic to tolerate faults in SRAM-based FPGAs by using inexact modules in conjunction with TMR, thus reducing the area and power overhead of the design. Experiments run on various MCNC benchmark circuits show the accuracy of the proposed technique. They also show that the design solutions found through this technique only differ 0.52% on average from the optimal ones and savings up to 84.4% in terms of computation time can be reached on average.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal + Saal 1
Keynote Lecture in "Saal 2" 14:00 - 14:30

6.4 Power Modeling and Power Aware Synthesis

Date: Wednesday 16 March 2016
Time: 11:00 - 12:30
Location / Room: Konferenz 2

Chair:
Alberto Garcia Ortiz, University of Bremen, DE

Co-Chair:
Qi Zhu, UCR, US

Papers in this session address methods for power efficient design of digital systems. The first paper presents an FPGA emulation for design trade-offs. The second paper proposes a methodology to automatically generate power state machine models for SoCs. The third paper presents an automatic method to place isolation gates trading off precision and power dissipation. The IP paper investigates circuit verification of power grids.

TimeLabelPresentation Title
Authors
11:006.4.1A SYSTEMATIC APPROACH TO AUTOMATED CONSTRUCTION OF POWER EMULATION MODELS
Speaker:
Benjamin Andreassen Bjørnseth, Norwegian University of Science and Technology, NO
Authors:
Benjamin Andreassen Bjørnseth, Asbjørn Djupdal and Lasse Natvig, Norwegian University of Science and Technology, NO
Abstract
Efficient estimation of power consumption is vital when designing large digital systems. The technique called power emulation can speed up estimation by implementing power models alongside a design on an FPGA. Current state-of-the-art power emulation methods construct models using various custom techniques, but there is no study on how the existing methods relate to each other nor how their differences impact the final quality of the model. We propose a methodology which describes the breadth of current approaches to automated construction of power emulation models. We also evaluate the current methods, finding that there is significant variation in accuracy and complexity. In 32.8 % of all tests, the average accuracy of the least complex method is better than that of the most advanced method at less than 0.3 % the hardware overhead. This result fuels the hope that further innovation may yield models with high accuracy at low implementation cost. Our software frameworks and experimental data are made available to promote continued work on the field.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:306.4.2AUTOMATIC GENERATION OF POWER STATE MACHINES THROUGH DYNAMIC MINING OF TEMPORAL ASSERTIONS
Speaker:
Graziano Pravadelli, University of Verona, IT
Authors:
Alessandro Danese, Ivan Zandonà and Graziano Pravadelli, University of Verona, IT
Abstract
Several papers propose approaches based on power state machines (PSMs) for modelling and simulating the power consumption of system-on-chips (SoCs). However, while they focus on the use of PSMs as the underlying formalism for imple- menting dynamic power management techniques, they generally do not deal with the basic problem of generating PSMs. In most of these papers, PSMs just exist, in some cases they are manually defined, and only a few approaches give a hint of semi-automatic generation, but no fully-automatic approach exists in the literature. Indeed, without an automatic procedure, an accurate power characterization of complex SoCs by using PSMs is almost impossible. Thus, in this paper, first a methodology for the automatic generation of PSMs is proposed, and then, a statistical approach based on a Hidden Markov Model is presented for their simulation. The core of the approach is based on a mining procedure whose role consists of extracting temporal assertions describing the functional behaviours of the IP, which are then automatically mapped on states of the PSMs and characterized from the energy consumption point of view.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:006.4.3APPROXIMATION THROUGH LOGIC ISOLATION FOR THE DESIGN OF QUALITY CONFIGURABLE CIRCUITS
Speaker:
Shubham Jain, Purdue University, US
Authors:
Shubham Jain, Swagath Venkataramani and Anand Raghunathan, Purdue University, US
Abstract
Intrinsic application resilience, a property exhibited by many emerging application domains, allows designers to optimize computing platforms by approximating selected computations within an application without any perceivable loss in its output quality. At the circuit level, this is often achieved by designing circuits that are more efficient but realize slightly modified functionality. Most prior efforts on approximate circuit design hardwire the degree of approximation into the implementation. This severely limits their applicability, as intrinsic resilience significantly varies both across and within applications, and often the same computation needs to be executed at different levels of accuracy when the application processes a different input or used in a different context. To address this limitation, in this work, we propose a new approach to design quality configurable circuits that are equipped to modulate their output accuracy and energy at runtime. Our approach, approximation through logic isolation, identifies portions of logic in the circuit that consume significant power, but contribute only minimally to output accuracy. One or more approximate modes of circuit operation are then enabled by isolating the identified logic (using muxes, latches or power gating cells) to benefit power while satisfying the desired output accuracy. We propose a systematic methodology to transform a given circuit into a quality-configurable circuit by applying the proposed technique. Our methodology generates a favorable energy-quality trade-off by deliberately creating opportunities for error compensation between multiple logic islands that are simultaneously isolated. This enables more aggressive approximation for a given output quality, leading to a superior power benefits. We evaluate the proposed methodology using a wide range of arithmetic circuits, complex modules and datapaths. The synthesized quality configurable circuits support 3 quality modes viz. accurate, <0.1% average error, and <0.2% average error. Power improvements achieved in the approximate modes are 8.4%-34.5% and 17%-51.5%, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP3-2, 297ACCURATE VERIFICATION OF RC POWER GRIDS
Speaker:
Mohammad Fawaz, University of Toronto, CA
Authors:
Mohammad Fawaz and Farid N. Najm, University of Toronto, CA
Abstract
The power distribution network (PDN) of an integrated circuit (IC) must undergo various checks throughout the design flow, in order to guarantee that the voltage fluctuations are within certain user-specified safety thresholds. Vectorless verification of the PDN is one approach for verification that requires little information about the on-die logic. This verification problem has been studied extensively over the past few years and has been generally solved by first discretizing time using a particular user-defined time-step. We investigate the effect of this time-step on the quality of the solutions produced (both exact and estimates). We also propose an efficient method to specify the time-step in a way to minimize the errors introduced by the voltage drop estimates.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal + Saal 1
Keynote Lecture in "Saal 2" 14:00 - 14:30

6.5 Biochips

Date: Wednesday 16 March 2016
Time: 11:00 - 12:30
Location / Room: Konferenz 3

Chair:
Robert Wille, JKU, AT

Co-Chair:
Ian O'Connor, Ecole Centrale de Lyon, FR

This session focuses on design methods for biochips. The first paper presents a methodology for synthesizing fault-tolerant biochips. The second paper proposes a synthesis method considering sieve valves, a key component in flow-based microfluidic biochips. Finally the third paper proposes a design automation framework for quantitative gene expression on cyberphysical digital microfluidic biochips.

TimeLabelPresentation Title
Authors
11:006.5.1ARCHITECTURE SYNTHESIS FOR COST-CONSTRAINED FAULT-TOLERANT FLOW-BASED BIOCHIPS
Speaker:
Seetal Potluri, Technical University of Denmark, IN
Authors:
Morten Chabert Eskesen, Paul Pop and Seetal Potluri, Technical University of Denmark, DK
Abstract
In this paper, we are interested in the synthesis of fault-tolerant architectures for flow-based microfluidic biochips, which use microvalves and channels to run biochemical applications. The growth rate of device integration in flow-based microfluidic biochips is scaling faster than Moore's law. This increase in fabrication complexity has led to an increase in defect rates during the manufacturing, thereby motivating the need to improve the yield, by designing these biochips such that they are fault tolerant. We propose an approach based on a Greedy Randomized Adaptive Search Procedure (GRASP) for the synthesis of fault-tolerant biochip architectures. Our approach optimizes the introduction of redundancy within a given unit cost budget, such that, the biochemical application can successfully complete its execution within its deadline, even in the presence of faults, and the yield is maximized. The proposed algorithm has been evaluated using several benchmarks and compared to the results of a Simulated Annealing metaheuristic.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:306.5.2SIEVE-VALVE-AWARE SYNTHESIS OF FLOW-BASED MICROFLUIDIC BIOCHIPS CONSIDERING SPECIFIC BIOLOGICAL EXECUTION LIMITATIONS
Speaker:
Mengchu Li, Technische Universität München (TUM), DE
Authors:
Mengchu Li1, Tsun-Ming Tseng1, Bing Li1, Tsung-Yi Ho2 and Ulf Schlichtmann1
1Technische Universität München (TUM), DE; 2National Tsing Hua University, TW
Abstract
Microfluidic biochips are being used to perform ever more complex and error-prone bioassays. This results in increasing demand for design automation for such biochips, as these sophisticated designs are beyond the scope of manual design. So far, much research in the field of design automation has been devoted to satisfy this demand from biology, but the gap between design automation and biology is still huge. To narrow this gap, we propose a synthesis method in which sieve valves, which are key components in flow-based microfluidic biochips, are considered for the first time. In addition, we integrate three more constraints into our synthesis that are commonly seen in bioassays but have so far been neglected by design automation: immediate execution, mutual exclusion, and parallel execution. Experiments show that compared with traditional synthesis, this new method shows significant improvements, and the gap between design automation and biology is getting bridged.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:006.5.3INTEGRATED AND REAL-TIME QUANTITATIVE ANALYSIS USING CYBERPHYSICAL DIGITAL-MICROFLUIDIC BIOCHIPS
Speaker:
Mohamed Ibrahim, Duke University, US
Authors:
Mohamed Ibrahim, Krishnendu Chakrabarty and Kristin Scott, Duke University, US
Abstract
Considerable effort has recently been directed towards the implementation of molecular bioassays on digital-microfluidic biochips. However, today's solutions suffer from the drawback that multiple sample pathways are not supported and on-chip reconfigurable devices are not efficiently exploited. To overcome this problem, we present a spatial-reconfiguration technique that incorporates resource-sharing specifications into the synthesis flow. This technique is combined with cyberphysical integration to develop the first design-automation framework for quantitative gene expression. The proposed framework is based on a real-time resource-allocation algorithm that responds promptly to decisions about the protocol flow received from a firmware layer. Simulation results show that our adaptive framework efficiently utilizes on-chip resources to reduce time-to-result without sacrificing the chip's lifetime.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal + Saal 1
Keynote Lecture in "Saal 2" 14:00 - 14:30

6.6 Modelling and Control of Cyber-Physical Systems

Date: Wednesday 16 March 2016
Time: 11:00 - 12:30
Location / Room: Konferenz 4

Chair:
Donatella Sciuto, Politecnico di Milano, IT

Co-Chair:
Paul Pop, Technical University of Denmark, DK

The session has two papers on improving the quality-of-control for cyber-physical systems, targeting the timing analysis of self-triggered controllers and the optimization of resources in a partitioned architecture. Two other papers are on modeling aspects of the human body for cyber-physical medical applications: modeling the brain-machine-body interface and a model for the electrical conduction of the human heart. One of the interactive presentations is on security aspects of vehicular systems, and the second interactive presentation is on the online control of jobs in production systems.

TimeLabelPresentation Title
Authors
11:006.6.1SELF-TRIGGERED CONTROLLERS AND HARD REAL-TIME GUARANTEES
Speaker:
Amir Aminifar, Linköping University, SE
Authors:
Amir Aminifar1, Paulo Tabuada2, Petru Eles1 and Zebo Peng1
1Linköping University, SE; 2University of California at Los Angeles, US
Abstract
It is well known that event-triggered and self-triggered controllers implemented on dedicated platforms can provide the same performance as the traditional periodic controllers, while consuming considerably less bandwidth. However, since the majority of controllers are implemented by software tasks on shared platforms, on one hand, it might no longer be possible to grant access to the event-triggered controller upon request. On the other hand, due to the seemingly irregular requests from self-triggered controllers, other applications, while in reality schedulable, may be declared unschedulable, if not carefully analyzed. The schedulability and response-time analysis in the presence of self-triggered controllers is still an open problem and the topic of this paper.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:306.6.2A SPATIO-TEMPORAL FRACTAL MODEL FOR A CPS APPROACH TO BRAIN-MACHINE-BODY INTERFACES
Speaker:
Yuankun Xue, University of Southern California, US
Authors:
Yuankun Xue, Saul Rodriguez and Paul Bogdan, University of Southern California, US
Abstract
Capturing the mathematical features of physical and cyber processes is essential for endowing the CPS with built-in intelligence. In this paper, we develop a compact yet accurate mathematical model able to capture the spatio-temporal fractal cross-dependencies between coupled processes and illustrate its benefits within the context of brain-machine-body interface. Our generalized mathematical model improves the modeling accuracy of the dynamics of biological processes and is validated against medical observations.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:006.6.3MODULAR CODE GENERATION FOR EMULATING THE ELECTRICAL CONDUCTION SYSTEM OF THE HUMAN HEART
Speaker:
Nathan Allen, University of Auckland, NZ
Authors:
Nathan Allen1, Sidharta Andalam1, Partha Roop1, Avinash Malik1, Mark Trew2 and Nitish Patel1
1University of Auckland, NZ; 2Auckland Bioengineering Institute, NZ
Abstract
We study the problem of modular code generation for emulating the electrical conduction system of the heart, which is essential for the validation of implantable devices such as pacemakers. In order to develop high fidelity models, it is essential to consider the operation of hundreds, if not millions of conduction elements, called nodes of the heart. Published results so far, however, have considered a maximum of 33 nodes modelled as Hybrid Input Output Automata (HIOA). The behaviour of this model is captured using the well known commercial tool Simulink. These approaches are limiting due to the lack of model fidelity of the conduction system. In this paper, we first develop a semantic preserving modular compilation approach for a network of HIOA, by proposing to translate them to a network of FSMs. We then demonstrate that a delayed synchronous composition of the cardiac nodes enables modular code generation that is both semantic preserving and efficient. In addition to the above example, we have developed several examples from other domains to compare Simulink and the developed tool called Piha. The results show that we are able to generate code which, for the cardiac model, is 60% smaller in binary size while executing 20 times faster when compared to Simulink.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:156.6.4RESOURCE UTILIZATION AND QUALITY-OF-CONTROL TRADE-OFF FOR A COMPOSABLE PLATFORM
Speaker:
Juan Valencia, Eindhoven University of Technology, NL
Authors:
Juan Valencia, Eelco van Horssen, Dip Goswami, Maurice Heemels and Kees Goossens, Eindhoven University of Technology, NL
Abstract
This paper deals with implementation of feedback controllers on embedded platforms and investigates the trade-off between Quality-of-Control (QoC) and resource utilization. In particular, we consider a setting where the embedded platform executes multiple applications including the control application under consideration. Such a setting is common in domains like automotive where consolidation of several applications is desirable for cost reasons. While tackling inter-application interference is a challenge, our platform offers composability using resource virtualization allowing for interference-free application development and cycle-accurate timing behavior. In this work, from the feedback control perspective, we show that platform timing behavior can be characterized by a finite, known and periodic set of sampling intervals for a given resource allocation. Utilizing the platform timing, we show that the control design problem can be transformed into a classical discrete-time Linear Quadratic Regulator (LQR) problem which can be efficiently solved to obtain optimal QoC for a given resource allocation. Our method is validated both in simulation and experiments, considering a Multiple-Input and Multiple-Output (MIMO) control application.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP3-3, 756SECURITY ANALYSIS OF CYBER-PHYSICAL SYSTEMS ILLUSTRATED WITH AUTOMOTIVE CASE STUDY
Speaker:
Viacheslav Izosimov, KTH Royal Institute of Technology, SE
Authors:
Viacheslav Izosimov1, Alexandros Asvestopoulos2, Oscar Blomkvist2 and Martin Törngren3
1Semcon, SE; 2Scania CV, SE; 3KTH Royal Institute of Technology, SE
Abstract
We present a method for systematic consideration of security attributes in development of cyber-physical systems. We evaluate our method in development of commercial vehicles that were so far unreasonably excluded from automotive security studies (despite the great importance of commercial vehicles for the society). We have conducted analysis of a known zero-cost non-physical attack, fine-tuned to our commercial vehicle (a truck), and considered countermeasures within the development flow.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:31IP3-4, 953ONLINE HEURISTIC FOR THE MULTI-OBJECTIVE GENERALIZED TRAVELING SALESMAN PROBLEM
Speaker:
Joost van Pinxten, Eindhoven University of Technology, NL
Authors:
Joost van Pinxten1, Marc Geilen1, Twan Basten1, Umar Waqas1 and Lou Somers2
1Eindhoven University of Technology, NL; 2Océ Technologies, NL
Abstract
Today's manufacturing systems are typically complex cyber-physical systems where the physical and control aspects interact with the scheduling decisions. Optimizing such facilities requires ordering jobs and configuring the manufacturing system for each job. This optimization problem can be described as a Multi-Objective Generalized TSP where conflicting objectives lead to a trade-off space. This is the first work to address this TSP variant, introducing a compositional heuristic suitable to online application.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal + Saal 1
Keynote Lecture in "Saal 2" 14:00 - 14:30

6.7 Fault Tolerant Systems and Methods

Date: Wednesday 16 March 2016
Time: 11:00 - 12:30
Location / Room: Konferenz 5

Chair:
Viacheslav Izosimov, Semcon Sweden AB, SE

Co-Chair:
Zebo Peng, Linköping University, SE

The papers in this session present arithmetic components for approximate and fault tolerant computing, self-checking methodologies and tools for the implementation and evaluation of reliable systems

TimeLabelPresentation Title
Authors
11:006.7.1(Best Paper Award Candidate)
INEXACT DESIGNS FOR APPROXIMATE LOW POWER ADDITION BY CELL REPLACEMENT
Speaker:
Nandha Kumar Thulasiraman, The University of Nottingham, MY
Authors:
Haider A.F. Almurib1, Nandha Kumar Thulasiraman1 and Fabrizio Lombardi2
1The University of Nottingham, MY; 2Northeastern University, US
Abstract
This paper proposes three designs of an inexact adder cell for approximate computing. These cells require a substantially smaller number of transistors compared to an exact full adder cell as well as known inexact designs. These inexact cells are simulated at 45 nm and compared with respect to circuit based metrics (such as energy consumption, delay, complexity and energy delay product) as well as error metrics (such as error rate). The replacement of exact cells with inexact cells such as the ones proposed in this manuscript in a ripple carry adder is evaluated to assess by exhaustive simulation different metrics for approximate computing; image addition is then pursued as application. These results show that among existing inexact cells found in the technical literature, the proposed designs consume the least power and have superior performance in terms of delay, switching capacitance and error measures for image quality and processing.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:306.7.2A GENERAL APPROACH FOR HIGHLY DEFECT TOLERANT PARALLEL PREFIX ADDER DESIGN
Speaker:
Wenjing Rao, University of Illinois at Chicago, US
Authors:
Soumya Banerjee and Wenjing Rao, University of Illinois at Chicago, US
Abstract
This paper proposes a highly defect tolerant Parallel Prefix Adder (PPA) design. Motivated by the inherent defect tolerance capability displayed in a Kogge Stone Adder (KSA), this paper identifies the key elements that can be applied to make the general PPA's defect tolerant: 1) the Generate and Propagate computing hardware is divided into disjoint groups, such that defects in one group will not "contaminate" the computation carried out by the other groups; 2) redundant copies of the results for each group can be derived cost-effectively from the other disjoint groups. This approach provides flexibilities for a defect tolerant PPA design on both the number of groups and the type of Sub-Adder structure to be adopted. As is verified by the simulation results, the proposed scheme not only offers a general way of constructing highly defect tolerant PPA's, but also opens up a large number of pareto-front design choices, considering the objectives of reliability, hardware and performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:006.7.3INVERTERS' SELF-CHECKING MONITORS FOR RELIABLE PHOTOVOLTAIC SYSTEMS
Speaker:
Cecilia Metra, Università di Bologna, IT
Authors:
Martin Omana, Alessandro Fiore and Cecilia Metra, Università di Bologna, IT
Abstract
Photovoltaic systems are a widespread form of green energy, that is becoming increasingly considered also as a form of economical investment. Their reliability is consequently becoming a concern. In this paper we focus on the reliability of the DC-AC converters (inverters) of photovoltaic (PV) systems. We analyze the effects of the faults likely to affect their operation in the field. We show that such faults can impact catastrophically the power delivered to the load. We then propose a self-checking monitor, that is able to detect the occurrence of such faults in the field, as well as faults possibly affecting itself. Our monitor can therefore be adopted to guarantee the concurrent on-line test of faults affecting the inverters of PV systems. Moreover, if used together with suitable recovery strategies, for instance based on proper hardware reconfiguration, it can provide PV systems with fault tolerance ability, thus meeting the increasing demand for reliable PV systems.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP3-5, 327TOWARDS LOW OVERHEAD CONTROL FLOW CHECKING USING REGULAR STRUCTURED CONTROL
Speaker:
Zhiqi Zhu, The University of Texas at Dallas, US
Authors:
Zhiqi Zhu and Joseph Callenes-Sloan, The University of Texas at Dallas, US
Abstract
Abstract—With process scaling and the adoption of post- CMOS technologies, reliability has been brought to the forefront of modern computer system design. Among the different ways that hardware faults can manifest in a system, errors related to the control flow of a program tend to be the most difficult to handle when ensuring reliable computing. Errors in the sequencing of instructions executed are usually catastrophic, resulting in system hangs, crashes, and/or corrupted data. For this reason, conventional approaches rely on some form of general redundancy for detecting or recovering from a control flow error. Due to the power constraints of emerging systems however, these types of conservative approaches are quickly becoming infeasible. Control Flow Checking by Software Signatures (CFCSS) is a software-based technique for detecting control flow errors [1] that using assigned signatures rather than by using general redundancy. Unfortunately, the performance overhead for CFCSS can still be as high as 80%-90% for many applications. In this paper, we propose a novel method for reducing the overhead of control flow checking by exploiting the regular control structure found in many applications. Specifically, we observe that the alternating sequence of conditional and unconditional based control allows for the full control signatures to be computed at alternating basic blocks. Based on experimental results of the proposed approach, we observe that the overheads of the traditional methods are reduced on average by 25.9%.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:31IP3-6, 460EMULATION-BASED HIERARCHICAL FAULT-INJECTION FRAMEWORK FOR COARSE-TO-FINE VULNERABILITY ANALYSIS OF HARDWARE-ACCELERATED APPROXIMATE ALGORITHMS
Speaker:
Theocharis Theocharides, University of Cyprus, CY
Authors:
Ioannis Chadjiminas, Ioannis Savva, Christos Kyrkou, Maria K. Michael and Theocharis Theocharides, University of Cyprus, CY
Abstract
This paper proposes a hierarchical fault injection emulation framework tailored to the structure of complex and large application-specific circuits, that performs vulnerability analysis of the system for single event upsets (SEUs) at different design granularities in real-time. In particular, the framework allows for efficient probabilistic modelling of the SEU impact, making it particularly applicable for hardware-accelerated approximate applications such as multimedia, computer vision and image/signal processing, due to its high processing speed and real-time capabilities. The framework is emulated on an FPGA-based platform and evaluated using a depth computation kernel, both in standalone manner as well as within a robotic obstacle avoidance application.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal + Saal 1
Keynote Lecture in "Saal 2" 14:00 - 14:30

6.8 Presentations from 5G-Campus and European Projects Booths: 5G for the Connected World, Optimizing Computing Everywhere

Date: Wednesday 16 March 2016
Time: 11:00 - 12:30
Location / Room: Exhibition Theatre

Organiser:
Hans-Jürgen Brand, IDT/ZMDI, DE

This session presents how 5G technology will enable the connected world of the future. Attendees are invited to also visit the campus booths for further details and discussions.

TimeLabelPresentation Title
Authors
11:006.8.15G FOR THE CONNECTED WORLD
Speaker:
Rainer Liebhart, Nokia Networks, DE
11:456.8.2COLLECTING AND SHARING KNOWLEDGE TO OPTIMIZE THE EFFICIENCY AND COST OF COMPUTING EVERYWHERE
Speaker:
Anton Lokhmotov, dividiti, GB
Abstract

Designing faster, more energy efficient and reliable computer systems requires effective collaboration between hardware designers, system programmers and performance analysts, as well as feedback from system users. Supported by a grant from the EU TETRACOM Coordination Action, we have developed Collective Knowledge (CK), an open framework for reproducible and collaborative design and optimization. CK enables systematic and reproducible experimentation, combined with leading edge predictive analytics to gain valuable insights into system performance. The modular architecture of CK helps engineers create and share entire experimental workflows involving modules such as tools, programs, data sets, experimental results, predictive models and so on. We encourage a wide community, including system engineers and users, to share and reuse CK modules to fuel R&D on increasing the efficiency and decreasing the costs of computing everywhere.

12:30End of session
Lunch Break in Großer Saal + Saal 1
Keynote Lecture in "Saal 2" 14:00 - 14:30

UB06 Session 6

Date: Wednesday 16 March 2016
Time: 12:00 - 14:00
Location / Room: Booth 15, Exhibition Area

LabelPresentation Title
Authors
UB06.1LOOPINVADER: A COMPILER FOR TIGHTLY COUPLED PROCESSOR ARRAYS
Presenter:
Alexandru Tanase, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE
Authors:
Alexandru Tanase, Michael Witterauf, Ericles Sousa, Vahid Lari, Frank Hannig and Jürgen Teich, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE
Abstract
In today's coarse-grained reconfigurable architectures (CGRAs), application performance depends mostly on exploiting loop level and instruction level parallelism. However, it is tedious and error-prone to program such architectures in machine language manually. Here, only a compiler can make such architectures feasible. For solving this problem, we present a compiler for programming massively parallel processor arrays in particularly for so-called tightly processor arrays (TCPAs).By using a domain-specific language as design entry, our compiler symbolically parallelizes the code by using symbolic loop tiling techniques in the polyhedron model. Then, by replacing the parameters, e.g., with the desired number of processors elements (PEs), the compiler generates assembly code and interconnect configuration for different PEs which are combined to one binary. Finally, we demonstrate our tool flow for several selected examples.

Download Paper (PDF)
UB06.2INVADESIM: A SIMULATOR FOR HETEROGENEOUS MULTI-PROCESSOR SYSTEMS-ON-CHIP
Presenter:
Sascha Roloff, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE
Authors:
Sascha Roloff, Frank Hannig and Jürgen Teich, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE
Abstract
Innovative simulation mechanisms at system-level are a key for embedded hardware designers and parallel software developers to predict performance. This is important especially in a very early development phase where design space exploration (DSE) helps to guide design decisions in proper directions. In case of modern MPSoCs, DSE can be very costly and time consuming depending on the underlying simulation techniques. We present InvadeSIM, a parallel execution-driven simulator for fast functional and timing simulation of heterogeneous NoC-based MPSoCs. For this purpose, InvadeSIM combines a fast direct-execution simulation approach with different parallelization strategies. We will showcase our work by simulating a stream processing application from computer vision domain on a tiled MPSoC architecture in real-time. In particular, we present an object tracking chain that continuously captures frames from a robot camera, followed by object detection, and a control loop back to the camera.

Download Paper (PDF)
UB06.3COSSIM: A NOVEL, COMPREHENSIBLE, ULTRA-FAST, SECURITY-AWARE CPS SIMULATOR
Presenter:
Antonios Nikitakis, Technical University of Crete, GR
Authors:
Antonios Nikitakis and Andreas Brokalakis, Technical University of Crete, GR
Abstract
Nowadays, Cyber Physical Systems (CPS) are growing in capability at an extraordinary rate, promoted by the increased presence and capabilities of electronic control Units as well as of the sensors and actuators and the interconnecting networks. One of the main problems CPS designers face is the lack of simulation tools and models for system design and analysis. This is mainly because the majority of the existing simulation tools for complex CPS handle efficiently only parts of a system (only the processing or network) while none of them support the notion of security. The presented system is a "Novel, Comprehensible, Ultra-Fast, Security-Aware CPS Simulator" (COSSIM). COSSIM is the first known simulation framework that allows for the simulation of a complete CPS utilizing complex SoCs interconnected with sophisticated networks. Finally, the COSSIM system support accurate power estimations while it is the first such tool supporting security as a feature of the design process.

Download Paper (PDF)
UB06.4RT-POWMODS: RUN-TIME CPU POWER MODELS FROM REAL DATA
Presenter:
Matthew Walker, University of Southampton, GB
Authors:
Matthew Walker1, Stephan Diestelhorst2, Andreas Hansson2, Geoff Merrett1 and Bashir Al-Hashimi1
1University of Southampton, GB; 2ARM Ltd., GB
Abstract
Being able to accurately estimate CPU power consumption is a key requirement for both controlling online CPU energy-saving techniques and design-space exploration. Models built and validated using measured data from an actual device are valuable as their accuracy is known and trusted. We present our techniques and freely available software tools for running experiments on mobile development boards and using the recorded data to build accurate run-time power models. Our novel methodology uniquely considers the stability of the model and we demonstrate how it allows the models to achieve a higher accuracy on a wider range of workloads. We show how our tools are able to predict run-time power of an ARM Cortex-A15 CPU with an average error of less than 3% when validated with over 50 workloads.

Download Paper (PDF)
UB06.5T-RIDE: A MOBILE-HEALTH NEURODIAGNOSTIC SYSTEM BASED ON SPATIO-TEMPORAL P300 MONITORING: DESIGN, DEVELOPMENT AND TEST IN VIVO
Presenter:
Valerio Francesco Annese, Politecnico di Bari, IT
Authors:
Valerio Francesco Annese, Giovanni Mezzina and Daniela De Venuto, Politecnico di Bari, IT
Abstract
A mobile health solution for neuro-cognitive impairment monitoring based on P300 spatio-temporal characterization achieved by tuned Residue Iteration Decomposition (t-RIDE) has been presented. The m-health service proposed allows remote monitoring of neuro-cognitive impairment through a 'plug and play' application, while doctor customization and data collection are allowed by cloud bridging. The developed t-RIDE method overcomes the limitations of the previous approaches (ICA; PCA; grand average; etc.). Its testing has been performed on 8 subjects performing three different cognitive tasks of increasing difficulty. P300 amplitude ranges (3.6uV - 11uV), latencies (280ms-390ms) and frontal-cortex spatial evidence (Pz, Fz, Cz) fully match medical references. T-RIDE convergence is reached in 148 iteration ensuring a 80% accuracy in P300 amplitude using only 13 trials (worst case) on single channel.

Download Paper (PDF)
UB06.6BIOVIZ: AN INTERACTIVE VISUALIZATION ENGINE FOR MICROFLUIDIC BIOCHIPS
Presenter:
Oliver Keszöcze, University of Bremen, DE
Authors:
Oliver Keszöcze1, Jannis Stoppe2, Robert Wille3 and Rolf Drechsler2
1University of Bremen, DE; 2DFKI and University of Bremen, DE; 3Johannes Kepler University, AT, DFKI and University of Bremen, DE
Abstract
In order to shorten the required time for the analysis of medical substances, digital microfluidic biochips (DMFBs) have been suggested. Issues such as routing and layouting are complex and currently being investigated. Although first automatic solutions assist the designers, the results are usually provided in a complex and non-intuitive fashion. Creating solutions requires testing of different setups, comparing the results and debugging of algorithms. Solutions, while being technically correct, often include negative aspects such as e.g. unnecessary cell usage. These aspects are difficult to spot without being able to visually inspect the design. Still, while designers would benefit from visualization tools, no dedicated tools have been built yet. We present BioViz, an interactive visualization tool for DMFBs that explicitly addresses these problems.

Download Paper (PDF)
UB06.7MCC: CONTRACT-BASED AUTOMATED INTEGRATION FOR COMPONENT-BASED CRITICAL SYSTEMS
Presenter:
Johannes Schlatow, TU Braunschweig, DE
Authors:
Johannes Schlatow, Marcus Nolte, Rolf Ernst and Markus Maurer, TU Braunschweig, DE
Abstract
In the scope of the research unit Controlling Concurrent Change, we developed a contract-based middleware to autonomously manage and ensure the safety, availability and security properties of a component-based run-time environment. It guarantees that any change to the system is formally analysed beforehand and only applied if it does not violate any of the contracts, thereby enabling in-field updateability of complex critical systems. For this purpose, a Multi-Change Controller (MCC) aggregates component contracts and invokes viewpoint-specific analysis engines to evaluate change requests and find feasible system configurations. The MCC is specifically designed for extensibility so that analysis engines can be added and combined dependent on the application domain. We show a demonstrator that showcases and illustrates this contract-based process for an automated integration of an automotive system. Our demonstrator is built upon the Genode OS Framework and Xilinx Zynq-7000 SoCs.

Download Paper (PDF)
UB06.8SRAM-BASED PHYSICAL UNCLONABLE KEYS FOR BLE SMART LOCK SYSTEMS
Presenters:
Iluminada Baturone and Miguel Ángel Prada-Delgado, University of Seville, ES
Authors:
Iluminada Baturone, Miguel Ángel Prada-Delgado, Alfredo Vázquez-Reyes, Laurentiu Acasandrei, Diego Fernández-Barrera and Javier Prada-Delgado,
Abstract
Nowadays, several smart lock systems use Bluetooth Low Energy (BLE) to recognize when a smartphone, conveniently authenticated by a digital key, is near. The keys can be shared and are managed by web apps, so that system security depends on how the software prevents an attacker from discovering the keys. In order to increase security by a two-factor method ('something you have' in addition to 'something you know'), the BLE smart lock system prototype shown in this demonstrator recognizes when a user wearing an authenticated BLE chip (in a key fob, wristband, etc.) is near. The digital keys are not stored but they are regenerated on the fly by only the trusted chip. This is possible by using the start-up values of the SRAM in the BLE chip, which act as a physical unclonable function (PUF), so that the chip cannot be cloned. The SRAM start-up values of the BLE chip are also exploited as true random numbers to derive fresh keys for each transaction with the lock.

Download Paper (PDF)
UB06.9NEURODSP: A MULTI-PURPOSE ENERGY-OPTIMIZED ACCELERATOR FOR NEURAL NETWORKS
Presenter:
Jean-Marc PHILIPPE, CEA LIST, FR
Authors:
Jean-Marc PHILIPPE, Alexandre CARBON and Renaud SCHMIT, CEA LIST, FR
Abstract
Deep Neural Networks (e.g. Convolutional Neural Networks) is a promising approach to design smart machines for a wide range of application domains (automotive, home automation, industry, etc.). Due to their structure, these processing chains are compute intensive and difficult to embed into low power systems. To tackle this challenge, CEA LIST investigated the NeuroDSP hardware accelerator IP, able to be embedded into FPGA- or ASIC-based systems. Providing the system with a dramatic performance/watt ratio improvement, the IP can sustain 450GMACS/W in FDSOI 28nm technology, meeting the requirements of high-end embedded applications. The proposed demonstration features a comparison between three implementations of a CNN processing chain used to detect faces in a large image database. It shows that a single cluster FPGA-based implementation of the NeuroDSP IP at 100MHz is able to outperform both a Raspberry Pi 2 and an Odroid-XU3 board by a factor of respectively 10 and 6 in performance.

Download Paper (PDF)
UB06.10A CIRCUIT EXTRACTION TOOL FOR FULL CUSTOM DESIGNED MEMS SENSORS
Presenter:
Axel Hald, Robert Bosch GmbH, DE
Authors:
Axel Hald1, Johannes Seelhorst1, Mathias Reimann1, Juergen Scheible2 and Jens Lienig3
1Robert Bosch GmbH, DE; 2Reutlingen University, DE; 3Technische Universität Dresden, DE
Abstract
In contrast to IC design, MEMS design still lacks sophisticated component libraries. Therefore, the physical design of today's MEMS sensors is mostly done by simply drawing polygons. Hence, the sensor structure is only given as plain graphic data which hinders the identification and investigation of topology elements. The growing complexity of future MEMS designs demands a deep and detailed analysis of the sensor structures and the topology elements in order to get a better understanding of the coupling capacitances and parasitics. Our tool is able to extract a circuit out of a MEMS sensor designed in a polygon based design flow. The key feature of this tool is a rule based structure recognition algorithm which identifies the topology elements of the sensor. Thereafter, the electrostatic RC-extraction is performed by a commercial field solver. The extracted lumped elements can be used for further simulation and optimization tasks during the design phase.

Download Paper (PDF)
14:00End of session
16:00Coffee Break in Exhibition Area

7.0 LUNCH TIME KEYNOTE SESSION

Date: Wednesday 16 March 2016
Time: 14:00 - 14:30
Location / Room: Saal 2

Chair:
Luca Fanucci, University of Pisa, IT

Co-Chair:
Wolfgang Ecker, Infineon Technologies, DE

The lunch keynote presentation will be given by Dr. Patrick Leteinturier, Fellow of Automotive Systems at Infineon Technologies. He will present his vision on how cars of the future will impact and dramatically change personal mobility.

TimeLabelPresentation Title
Authors
14:007.0.1THE CAR OF THE FUTURE WILL REINVENT PERSONAL MOBILITY
Speaker and Author:
Patrick Leteinturier, Infineon Technologies, DE
Abstract
The regulations for CO2 and pollutant reduction have pushed the automotive industry for more electrification. The internal combustion engines will continue to power our vehicles for decades but will be assisted by electric traction in various xEV architectures. The race for efficiency, environment friendly, and safety will not end here. Automated and autonomous driving cars are opening a new field of benefits, but also a new field of challenges. The engineers will have to reinvent the EE vehicle architecture for new domain control and fail operational systems. The cars will be connected to other cars and the infrastructure with software update over the air. The new vehicles will be real cyber physical systems. This keynote will explore the potential of electronic technologies to solve the new requirements in sensing, controlling, powering, energizing the car of the future. Key items: - Car Electronic system design & Test - Car EE Architecture - Car Electrification - Connected Car - Car Safety and Security - Software update over the air - Form Advance Driver Assistance System to Autonomous Driving
14:30End of session
16:00Coffee Break in Exhibition Area

UB07 Session 7

Date: Wednesday 16 March 2016
Time: 14:00 - 16:00
Location / Room: Booth 15, Exhibition Area

LabelPresentation Title
Authors
UB07.1LOOPINVADER: A COMPILER FOR TIGHTLY COUPLED PROCESSOR ARRAYS
Presenter:
Alexandru Tanase, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE
Authors:
Alexandru Tanase, Michael Witterauf, Ericles Sousa, Vahid Lari, Frank Hannig and Jürgen Teich, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE
Abstract
In today's coarse-grained reconfigurable architectures (CGRAs), application performance depends mostly on exploiting loop level and instruction level parallelism. However, it is tedious and error-prone to program such architectures in machine language manually. Here, only a compiler can make such architectures feasible. For solving this problem, we present a compiler for programming massively parallel processor arrays in particularly for so-called tightly processor arrays (TCPAs).By using a domain-specific language as design entry, our compiler symbolically parallelizes the code by using symbolic loop tiling techniques in the polyhedron model. Then, by replacing the parameters, e.g., with the desired number of processors elements (PEs), the compiler generates assembly code and interconnect configuration for different PEs which are combined to one binary. Finally, we demonstrate our tool flow for several selected examples.

Download Paper (PDF)
UB07.2INVADESIM: A SIMULATOR FOR HETEROGENEOUS MULTI-PROCESSOR SYSTEMS-ON-CHIP
Presenter:
Sascha Roloff, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE
Authors:
Sascha Roloff, Frank Hannig and Jürgen Teich, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE
Abstract
Innovative simulation mechanisms at system-level are a key for embedded hardware designers and parallel software developers to predict performance. This is important especially in a very early development phase where design space exploration (DSE) helps to guide design decisions in proper directions. In case of modern MPSoCs, DSE can be very costly and time consuming depending on the underlying simulation techniques. We present InvadeSIM, a parallel execution-driven simulator for fast functional and timing simulation of heterogeneous NoC-based MPSoCs. For this purpose, InvadeSIM combines a fast direct-execution simulation approach with different parallelization strategies. We will showcase our work by simulating a stream processing application from computer vision domain on a tiled MPSoC architecture in real-time. In particular, we present an object tracking chain that continuously captures frames from a robot camera, followed by object detection, and a control loop back to the camera.

Download Paper (PDF)
UB07.3ETEAK: ASYNCHRONOUS DATAFLOWS SYNTHESIS ONTO FPGAS USING THE ETEAK FRAMEWORK
Presenter:
Mahdi Jelodari Mamaghani, The University of Manchester, GB
Authors:
Mahdi Jelodari Mamaghani, Jim Garside and Steve Furber, The University of Manchester, GB
Abstract
We exploit eTeak (De-Elastisation [DATE'15] enabled) to synthesise asynchronous dataflow descriptions in Balsa into synchronous structure loadable onto FPGA. We will be also able to demonstrate the software realisation of the same architecture running on a laptop and let the audience compare the hardware vs. software concurrency. A brief experiment conducted in our recent study where a prime number generator (aka sieve of Eratosthenes) is implemented both in software using the CSP compiler and hardware using eTeak: On average the hardware implementation runs 90-120x faster than its software counterpart while the processor clock speed is almost the same as the hardware clock speed (1.2GHz). This allows us to plan ahead and exploit eTeak toward energy-efficient synthesis. According to EPSRC's research portpolio this work falls under the most growing research subject of "Energy Efficiency" which aims to achieve an energy reduction of 26-43% by exploiting ICT.

Download Paper (PDF)
UB07.4RC3E: DESIGN AND TEST AUTOMATIZATION IN THE CLOUD
Presenter:
Patrick Lehmann, Technische Universität Dresden, DE
Authors:
Patrick Lehmann, Oliver Knodel, Martin Zabel and Rainer G. Spallek, Technische Universität Dresden, DE
Abstract
Cloud computing is getting more and more interesting for companies, caused by its flexibility to provide apparently endless resources and nouveau services, while reducing he total cost of ownership for the user. Fields of applications reach from web technologies over storage solutions to complex business processes. The domain of chip and system design is well known for offloading resource intensive and long running synthesis or simulation task onto centralized servers. As hardware designs grow in an exponential way and verification requirements were strengthened, cloud services are investigated to compensate these needs. Anyway, in the end real hardware tests cannot be avoided. Our RC3E eco system brings close to the hardware prototype development and automated hardware testing into the cloud, continuing the principle of "test often and test early". The architecture offers virtualized and shared FPGA resources for prototyping, with automated remote debugging capabilities.

Download Paper (PDF)
UB07.5T-RIDE: A MOBILE-HEALTH NEURODIAGNOSTIC SYSTEM BASED ON SPATIO-TEMPORAL P300 MONITORING: DESIGN, DEVELOPMENT AND TEST IN VIVO
Presenter:
Valerio Francesco Annese, Politecnico di Bari, IT
Authors:
Valerio Francesco Annese, Giovanni Mezzina and Daniela De Venuto, Politecnico di Bari, IT
Abstract
A mobile health solution for neuro-cognitive impairment monitoring based on P300 spatio-temporal characterization achieved by tuned Residue Iteration Decomposition (t-RIDE) has been presented. The m-health service proposed allows remote monitoring of neuro-cognitive impairment through a 'plug and play' application, while doctor customization and data collection are allowed by cloud bridging. The developed t-RIDE method overcomes the limitations of the previous approaches (ICA; PCA; grand average; etc.). Its testing has been performed on 8 subjects performing three different cognitive tasks of increasing difficulty. P300 amplitude ranges (3.6uV - 11uV), latencies (280ms-390ms) and frontal-cortex spatial evidence (Pz, Fz, Cz) fully match medical references. T-RIDE convergence is reached in 148 iteration ensuring a 80% accuracy in P300 amplitude using only 13 trials (worst case) on single channel.

Download Paper (PDF)
UB07.6HYPERDIMENSIONAL COMPUTING FOR TEXT CLASSIFICATION: AN EFFICIENT SOFTWARE IMPLEMENTATION
Presenter:
Fateme Rasti Najafabadi, Sharif University of Technology, IR
Authors:
Fateme Rasti Najafabadi1, Abbas Rahimi2, Pentti Kanerva2 and Jan Rabaey2
1Sharif University of Technology, IR; 2University of California, Berkeley, US
Abstract
The mathematical properties of high-dimensional spaces show remarkable agreement with behaviors controlled by the brain. Hyperdimensional computing explores the emulation of cognition by computing with hypervectors as an alternative to computing with numbers. Hypervectors are high-dimensional (e.g., 10,000 dimensions) and holographic, and they appear randomly. These properties provide an opportunity for efficient computing, while aligning well with undesirable hardware variations in nanoscale fabrics. We focused on an application of hyperdimensional computing for text classification. Accordingly, we developed an algorithm to classify news stories from a stream of letters. Using pentagrams, the algorithm achieved a classification accuracy of above 95% for eight news topics that surpasses the other reported techniques in the literature including Bayes, K-NN, and SVM. We demonstrated a fully software framework that enables execution of such algorithms on the contemporary hardware fabrics.

Download Paper (PDF)
UB07.7Q27: PUTTING QUEENS IN CARRY CHAINS
Presenter:
Thomas Preußer, Technische Universität Dresden, DE
Author:
Thomas Preußer, Technische Universität Dresden, DE
Abstract
The N-Queens Puzzle is a fascinating combinatorial problem. Up to now, the number of distinct valid placements of N non-attacking queens on a generalized NxN-chessboard cannot be computed by a formula. Solution counts obtained from extensive explorations of the solution space are currently known for all N up to 26. The parallelization of this exploration is embarrassingly simple and is achieved by pre-placing the queens of a certain board region. This very flexible partioning approach makes the N-Queens Puzzle a great show-off case for tremendously parallel computation approaches. This demo illustrates an approach to compute the next, yet unknown solution count for the 27-Queens Puzzle that is based on a coronal pre-placement that does not only partition the overall computation but also cuts the size of the search space significantly by exploiting inherent symmetries. It presents higly effective hardware solvers that back an ongoing tremendously parallel computation.

Download Paper (PDF)
UB07.8CONTREP: A SINGLE-SOURCE FRAMEWORK FOR UML-BASED MODELLING AND DESIGN OF MIXED-CRITICALITY SYSTEMS
Presenter:
Fernando Herrera, University of Cantabria, ES
Authors:
Fernando Herrera and Eugenio Villar, University of Cantabria, ES
Abstract
Mixed-criticality systems integrate applications, platform resources and requirements with different criticality. A criticality reflects the impact of either a failure of a component or a violation of a requirement, which can range from irrelevant to catastrophic effects. This booth presents the CONTREP framework, which supports UML/MARTE based modeling, analysis and design of mixed-criticality embedded systems. The booth shows a model of a quadcopter control system which integrates safety critical (e.g. flight control), mission-critical (e.g., a video processing payload), and non-critical (e.g., monitoring) functions. The booth shows how mixed-criticality is captured, together with the description of the functional architecture, and of the multi-core embedded platform where the system is implemented; how CONTREP automates different design activities, i.e. model validation, performance assessment and design space exploration, exploiting mixed-criticality information in every case.

Download Paper (PDF)
UB07.9DAC GENERATOR: A DAC STAGE ANALOG CIRCUIT GENERATOR FOR UDSM AND FD-SOI TECHNOLOGIES
Presenter:
Benjamin Prautsch, Fraunhofer Institute for Integrated Circuits IIS, Design Automation Division EAS, DE
Authors:
Benjamin Prautsch, Sunil Rao, Uwe Eichler, Ajith Puppala and Torsten Reich, Fraunhofer Institute for Integrated Circuits IIS, Design Automation Division EAS, DE
Abstract
The design of analog integrated circuits requires extensive manual work which is error-prone and inefficient. With advanced ultra-deep sub-micron (UDSM) technologies, the manual design effort increases further dramatically. This work presents the application of a rethought generator approach for the efficient reusable design of a 12 bit current steering DAC. The current mirror stage of the DAC, which is arranged in the complex Q² random walk scheme for high intrinsic matching [1], is realized by a circuit generator which automatically creates schematic, symbol, and layout of the required cells within few minutes. Originally focused on a 28 nm bulk technology, the generator code was also executed in a 28 nm FD-SOI technology with minor migration effort due to the generic nature of our tool. In addition, the fast circuit generation enables an efficient layout optimization showcasing the benefit of analog circuit generators for "bottom-up" design [2] in advanced technology nodes.

Download Paper (PDF)
UB07.10LLBMC / QPR-VERIFY: HIGH-PRECISION BOUNDED MODEL CHECKING FOR AUTOMOTIVE SOFTWARE
Presenter:
Carsten Sinz, Karlsruhe Institute of Technology (KIT), DE
Authors:
Carsten Sinz, David Farago, Florian Merz and Reimo Schaupp, Karlsruhe Institute of Technology (KIT), DE
Abstract
LLBMC (the low-level bounded model checker) is a static software analysis tool for finding bugs in C (and, to some extent, in C++) programs. It is mainly intended for checking low-level system code and is based on the technique of Bounded Model Checking. LLBMC is fully automatic and requires minimal preparation efforts and user interaction. It supports all C constructs, including not so common features such as bitfields. LLBMC models memory accesses (heap, stack, global variables) with high precision and is thus able to find hard-to-detect memory access errors like heap or stack buffer overflows. LLBMC can also uncover errors due to uninitalized variables or other sources of non-deterministic behavior. Due to its precise analysis, LLBMC produces almost no false alarms (false positives). LLBMC is developed at Karlsruhe Institute of Technology, and will soon be commercially available via a university spin-off, QPR Technologies.

Download Paper (PDF)
16:00End of session
Coffee Break in Exhibition Area

7.1 SPECIAL DAY Panel: Which EDA Solutions can the Automotive Domain Reuse? Very Few or All?

Date: Wednesday 16 March 2016
Time: 14:30 - 16:00
Location / Room: Saal 2

Chair:
Adam Morawiec, European Chips & Systems Design Initiative (ESCI), FR

This panel will debate on whether and how much of existing EDA solutions may extend to the automotive domain. Given the unique features of the automotive domain such as cost pressures, safety criticality, and complexity, it is not clear whether the automotive domain needs completely new and custom-made EDA techniques or whether existing techniques may be largely reused.

Moderator:

  • Oliver Bringmann, Universität Tübingen / FZI, DE

Panelists:

  • Rainer Kress, Infineon Technologies, DE
  • Gabriele Ernst, Robert Bosch GmbH, DE
  • Jean-Marie Saint-Paul, Mentor, FR
  • Riccardo Mariani, Yogitech, IT
  • Christoph Störmer, ETAS, DE
16:00End of session
Coffee Break in Exhibition Area

7.2 EU Projects Special Session: Energy Efficiency drives Design

Date: Wednesday 16 March 2016
Time: 14:30 - 16:00
Location / Room: Konferenz 6

Organiser:
Roberto Giorgi, University of Siena, IT

Chair:
Martin Schoeberl, Technical University of Denmark, DK

Co-Chair:
Roberto Giorgi, University of Siena, IT

Energy efficiency is a key non-functional property that is currently a number one goal of many designs. In this session several projects focused on future datacenters are illustrated. The adopted solutions and technologies are driving the design of next energy efficient smart embedded systems too.

TimeLabelPresentation Title
Authors
14:307.2.1EUROSERVER: SHARE-ANYTHING SCALE-OUT MICRO-SERVER DESIGN
Speaker:
Manolis Marazakis, FORTH, GR
Authors:
Manolis Marazakis1, John Goodacre2, Didier Fuin3, Paul Carpenter4, John Thomson5, Emil Matus6, Antimo Bruno7, Per Stenström8, Jerome Martin9, Yves Durand9 and Isabelle Dor9
1FORTH, GR; 2ARM Ltd, GB; 3STMicroelectronics, FR; 4Barcelona Supercomputing Center, ES; 5ONAPP Ltd, GB; 6Technische Universität Dresden, DE; 7NEAT Srl, IT; 8Chalmers University of Technology, SE; 9CEA, FR
Abstract
This paper provides a snapshot summary of the trends in the area of micro-server development and their application in the broader enterprise and cloud markets. Focusing on the technology aspects, we provide an understanding of these trends and specifically the differentiation and uniqueness of the approach being adopted by the EUROSERVER FP7 project. The unique technical contributions of EUROSERVER range from the fundamental system compute unit design architecture, through to the implementation approach both at the chiplet nanotechnological integration, and the everything-close physical form factor. Furthermore, we offer optimizations at the virtualisation layer to exploit the unique hardware features, and other framework optimizations, including exploiting the hardware capabilities at the run-time system and application layers.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:457.2.2ENERGY MINIMIZATION AT ALL LAYERS OF THE DATA CENTER: THE PARADIME PROJECT
Speaker:
Christof Fetzer, Technische Universität Dresden, DE
Authors:
Oscar Palomar1, Santhosh Kumar Rethinagiri2, Gulay Yalcin1, Rubén Titos-Gil1, Pablo Prieto1, Emma Torrella1, Osman Unsal1, Adrián Cristal1, Pascal Felber3, Anita Sobe3, Yaroslav Hayduk3, Mascha Kurpicz3, Christof Fetzer4, Thomas Knauth4, Malte Schneegaß5, Jens Struckmeier5 and Dragomir Milojevic6
1Barcelona Supercomputing Center, ES; 2BSC-Microsoft Research Center, ES; 3University of Neuchâtel, CH; 4Technische Universität Dresden, DE; 5Cloud & Heat, DE; 6IMEC, BE
Abstract
The main objective of the ParaDIME project has been to minimize energy consumption at all levels of the data center. On the one hand, we have considered what can be achieved on currently existing systems, via improvements of the programming model and the runtime system. On the other hand, we investigated which techniques and design decisions can make future computing nodes more energy efficient. We have successfully proposed and developed several methodologies that enable up to 60% energy savings in data center's components.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:007.2.3RACK-SCALE DISAGGREGATED CLOUD DATA CENTERS: THE DREDBOX PROJECT VISION
Speaker:
Dimitris Syrivelis, University of Thessaly, GR
Authors:
Kostas Katrinis1, Dimitris Syrivelis2, Dionisios Pnevmatikatos3, Georgios Zervas4, Dimitris Theodoropoulos5, Iordanis Koutsopoulos6, Kobi Hasharoni7, Daniel Raho8, Christian Pinto8, Felix Espina9, Sergio Lopez-Buedo9, Qianqiao Chen4, Mario Nemirovsky10, Damian Roca10, Hans Klos11 and Tom Berends11
1IBM Research Ireland, IE; 2University of Thessaly, GR; 3ECE Department, Technical Univrsity of Crete & FORTH-ICS, GR; 4HPN group, University of Bristol, GB; 5Foundation for Research and Technology Hellas (FORTH), GR; 6Athens University of Economics and Business, GR; 7Compass-EOS, IL; 8Virtual Open Systems, FR; 9NAUDIT HPCN, ES; 10Barcelona Supercomputing Center, ES; 11SINTECS, NL
Abstract
For quite some time now, computing systems servers, whether low-power or high-end ones designs are created around a common design principle:the main-board and its hardware components form a baseline, monolithic building block that the rest of the hardware/software stack design builds upon. This proportionality of compute/memory/network/storage resources is fixed during design time and remains static throughout machine lifetime, with known ramifications in terms of low system resource utilization, costly upgrade cycles and degraded energy proportionality. dReDBox takes on the challenge of revolutionizing the low-power computing market by breaking server boundaries through materialization of the concept of disaggregation. Besides proposing a highly modular software-defined architecture for the next generation datacentre, dRedBox will specify, design and prototype a novel hardware architecture where SoC-based microservers, memory modules and accelerators, will be placed in separated modular server trays interconnected via a high-speed, low-latency opto-electronic system fabric, and be allocated in arbitrary sets, as driven by fit-for-purpose resource/power management software. These blocks will employ state-of-the-art low-power components and be amenable to deployment in various integration form factors and target scenarios. dRedBox aims to deliver a full-fledged, vertically integrated datacentre-in-a-box prototype to showcase the superiority of disaggregation in terms of scalability, efficiency, reliability, performance and energy reduction which will be demonstrated in three pilot use-cases.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:157.2.4ECOSCALE: RECONFIGURABLE COMPUTING AND RUNTIME SYSTEM FOR FUTURE EXASCALE SYSTEMS
Speaker:
Iakovos Mavroidis, Telecommunication Systems Institute, GR
Authors:
Iakovos Mavroidis1, Ioannis Papaefstathiou2, Luciano Lavagno3, Dimitrios Nikolopoulos4, Dirk Koch5, John Goodacre5, Ioannis Sourdis6, Vassilis Papaefstathiou6, Marcello Coppola7 and Manuel Palomino8
1Telecommunication Systems Institute, GR; 2Synelixis, GR; 3Politecnico di Torino, IT; 4Queen's University of Belfast, GB; 5University of Manchester, GB; 6Chalmers University of Technology, SE; 7STMicroelectronics, FR; 8Acciona Infraestructuras S.A., ES
Abstract
In order to reach exascale performance, current HPC systems need to be improved. Simple hardware scaling is not a feasible solution due to the increasing utility costs and power consumption limitations. Apart from improvements in implementation technology, what is needed is to refine the HPC application development flow as well as the system architecture of future HPC systems. ECOSCALE tackles these challenges by proposing a scalable programming environment and architecture, aiming to substantially reduce energy consumption as well as data traffic and latency. ECOSCALE introduces a novel heterogeneous energy-efficient hierarchical architecture, as well as a hybrid many-core+OpenCL programming environment and runtime system. The ECOSCALE approach is hierarchical and is expected to scale well by partitioning the physical system into multiple independent Workers (i.e. compute nodes). Workers are interconnected in a tree-like fashion and define a contiguous global address space that can be viewed either as a set of Partitioned Global Address Space (PGAS) partitions, or as a set of nodes hierarchically interconnected via an MPI protocol. To further increase energy efficiency, as well as to provide resilience, the Workers employ reconfigurable accelerators mapped into the virtual address space utilizing a dual stage System Memory Management Unit with coherent memory access. The architecture supports shared partitioned reconfigurable resources accessed by any Worker in a PGAS partition, as well as automated hardware synthesis of these resources from an OpenCL-based programming model.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:307.2.5ENABLING HPC FOR QOS-SENSITIVE APPLICATIONS: THE MANGO APPROACH
Speaker:
José Flich, Universitat Politècnica de València, ES
Authors:
José Flich1, Giovanni Agosta2, Philipp Ampletzer3, David Atienza4, Alessandro Cilardo5, William Fornaciari2, Ynse Hoornengorb6, Mario Kovac7, Bruno Maitre8, Giuseppe Massari2, Ermis Papastefanakis8, Fabrice Roudet9, Rafael Tornero1 and Davide Zoni2
1Universitat Politècnica de València, ES; 2Politecnico di Milano, IT; 3ProDesign, DE; 4École Polytechnique Fédérale de Lausanne (EPFL), CH; 5University of Naples Federico II, IT; 6Philips Medical, NL; 7Zagreb University, HR; 8Thales Communication, FR; 9EATON, FR
Abstract
In this paper, we provide an overview of the MANGO project and its goal. The MANGO project aims at addressing power, performance and predictability (the PPP space) in future High-Performance Computing systems. It starts from the fundamental intuition that effective techniques for all three goals ultimately rely on customization to adapt the computing resources to reach the desired Quality of Service (QoS). From this starting point, MANGO will explore different but interrelated mechanisms at various architectural levels, as well as at the level of the system software. In particular, to explore a new positioning across the PPP space, MANGO will investigate system-wide, holistic, proactive thermal and power management aimed at extreme-scale energy efficiency.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:457.2.6AUTOTUNING AND ADAPTIVITY APPROACH FOR ENERGY EFFICIENT EXASCALE HPC SYSTEMS: THE ANTAREX APPROACH
Speaker:
Cristina Silvano, Politecnico di Milano, IT
Authors:
Cristina Silvano1, Giovanni Agosta1, Andrea Bartolini2, Andrea Beccari3, Luca Benini2, João Bispo4, João M. P. Cardoso5, Carlo Cavazzoni6, Jan Martinovic7, Gianluca Palermo1, Martin Palkovic7, Pedro Pinto8, Erven Rohou9, Nico Sanna6 and Katerina Slaninova7
1Politecnico di Milano, IT; 2Università di Bologna, IT; 3Dompe' Farmaceutici SpA, IT; 4Faculty of Engineering (FEUP), University of Porto, PT; 5University of Porto, PT; 6CINECA, IT; 7IT4Innovation National Supercomputing Center, CZ; 8Faculty of Engineering, University of Porto, PT; 9INRIA Rennes, FR
Abstract
The main goal of the ANTAREX project is to express by a Domain Specific Language (DSL) the application self-adaptivity and to runtime manage and autotune applications for green and heterogeneous High Performance Computing (HPC) systems up to the Exascale level. Key innovations of the project include the introduction of a separation of concerns between self-adaptivity strategies and application functionalities. The DSL approach will allow the definition of energy-efficiency, perfor- mance, and adaptivity strategies as well as their enforcement at runtime through application autotuning and resource and power management.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP3-7, 1023TECHNOLOGY TRANSFER IN COMPUTING SYSTEMS: THE TETRACOM APPROACH
Speaker and Author:
Rainer Leupers, RWTH Aachen University, DE
Abstract
TETRACOM is an ongoing EU FP7 Coordination Action with the ambition to boost small to medium scale academia-to-industry technology transfer in all domains of computing systems. The project primarily operates via competitive open calls for individual Technology Transfer Projects (TTPs). Each TTP performs a well-defined bilateral transfer activity between one European academic partner and one industry partner. TETRACOM coordinates all TTPs and provides technology transfer advice and co-funding. This paper describes TETRACOM´s experimental concept and project structure. It summarizes preliminary lessons learned after more than two project years and successful management of 30+ individual TTPs.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area

7.3 Low Power Devices and Methods for Healthcare and Assisted Living

Date: Wednesday 16 March 2016
Time: 14:30 - 16:00
Location / Room: Konferenz 1

Chair:
José M. Moya, Technical University of Madrid, ES

Co-Chair:
Giovanni Ansaloni, University of Lugano, CH

This session addresses energy efficiency for ambient intelligence and healthcare. The first part focuses on systems for fall detection and indoor localization in the context of ambient assisted living. The second part is dedicated to methods for cardiovascular monitoring, including low-power real-time diagnosis and efficient communication.

TimeLabelPresentation Title
Authors
14:307.3.1A DIGITAL PROCESSOR ARCHITECTURE FOR COMBINED EEG/EMG FALLING RISK PREDICTION
Speaker:
Valerio Annese, Politecnico di Bari, IT
Authors:
Valerio F. Annese1, Sabino Loconte1, Marco Crepaldi2, Danilo Demarchi3 and Daniela De Venuto1
1Politecnico di Bari, IT; 2Center for Space Human Robotics (CSHR), Istituto Italiano di Tecnologia, IT; 3Politecnico di Torino, IT
Abstract
The brain signal anticipates the voluntary movement with patterns that can be detected even 500ms before the occurrence. This paper presents a digital signal processing unit which implements a real-time algorithm for falling risk prediction. The system architecture is designed to operate with digitized data samples from 8 EMG (limbs) and 8 EEG (motor-cortex) channels and, through their combining, provides 1bit outputs for the early detection of unintentional movement. The digital architecture is validated on an FPGA to determine resources utilization, related timing constraints and performance figures of a dedicated real-time ASIC implementation for wearable applications. The system occupies 85.95% ALMs, 43283 ALUTs, 73.0% registers, 9.9% block memory of an Altera Cyclone V FPGA for a processing latency lower than 1ms. Outputs are available in 56ms, within the time limit of 300ms, enabling decision taking for active control. Comparisons between Matlab (used as golden reference) and measured FPGA outputs outline a very low residual numerical error of about 0.012% (worst case) despite the higher float precision of Matlab simulations and losses due to mandatory dataset conversion for validation.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:007.3.2DISTRIBUTED-NEURON-NETWORK BASED MACHINE LEARNING ON SMART-GATEWAY NETWORK TOWARDS REAL-TIME INDOOR DATA ANALYTICS
Speaker:
Hantao Huang, Nanyang Technological University, SG
Authors:
Hantao Huang, Yuehua Cai and Hao Yu, Nanyang Technological University, SG
Abstract
Indoor data analytics is one typical example of ambient intelligence with behavior or feature extraction from positioning, power, and lighting data. It can be utilized to help improve comfort level in building and room for occupants. To address dynamic ambient change in a large-scaled space, real-time and distributed data analytics is required on sensor (or gateway) network, which however has limited computing resources. This paper proposes a computationally efficient data analytics by distributed-neuron-network (DNN) based machine learning with application for indoor positioning. It is based on one incremental L2-norm based solver for learning collected WiFi-data at each gateway and is further fused for all gateways in the network to determine the location. Experimental results show that with multiple distributed gateways running in parallel, the proposed algorithm can achieve 50x and 38x speedup during data testing and training time respectively with comparable positioning accuracy, when compared to traditional support vector machine(SVM) method.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:307.3.3TOUCH-BASED SYSTEM FOR BEAT-TO-BEAT IMPEDANCE CARDIOGRAM ACQUISITION AND HEMODYNAMIC PARAMETERS ESTIMATION
Speaker:
Dionisije Sopic, École Polytechnique Fédérale de Lausanne (EPFL), CH
Authors:
Dionisije Sopic1, Srinivasan Murali2, Francisco Rincón2 and David Atienza1
1École Polytechnique Fédérale de Lausanne (EPFL), CH; 2SmartCardia Inc., Ltd, CH
Abstract
Among all cardiovascular diseases, congestive heart failure (CHF) has a very high rate of hospitalization and mortality. In order to prevent hospitalization, there is a strong need to identify patients at risk of a CHF event by estimating a set of relevant hemodynamic parameters that will allow physicians to detect its early onset. Today, one of the most popular non-invasive methods to obtain these parameters is through the acquisition of electrocardiogram (ECG) and impedance cardiogram (ICG) by using large hospital systems with electrodes placed on the chest and thorax region. In order to be useful in an ambulatory setting, it is important to obtain an ultra-low power wearable system for acquiring the ICG and ECG, and to detect CHF. In this paper, we present a touch-based ultra-low power device for real-time ICG and ECG signal acquisition, and hemodynamic parameters estimation. We also propose methods for noise cancellation and for calculating the hemodynamic parameters. In addition, a comparative evaluation of susceptibility to different measuring positions is presented. Our proposed design is highly correlated with traditional systems ( > 80%), but able to work with very low power budgets, thus allowing long duration of operation of over four days on a single battery charge.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:457.3.4QUANTIFYING THE BENEFITS OF COMPRESSED SENSING ON A WBSN-BASED REAL-TIME BIOSIGNAL MONITOR
Speaker:
Daniele Bortolotti, Università di Bologna, IT
Authors:
Daniele Bortolotti1, Bojan Milosevic2, Andrea Bartolini3, Elisabetta Farella2 and Luca Benini3
1Università di Bologna, IT; 2Fondazione Bruno Kessler, IT; 3ETH Zurich, CH
Abstract
Technology scaling enables today the design of ultra-low power wearable biosensors for continuous vital signal monitoring or wellness applications. Wireless Body Sensor Networks (WBSN) integrate wearable sensing nodes for an accurate measurement of the desired physiological parameter, e.g. Electrocardiogram (ECG), and a personal gateway for the collection and processing of the data. The diffusion of smartphones enables their use as advanced personal gateways, with the ability to provide user interaction features, connectivity and real-time feedback to the user. Both the sensing node(s) and the smartphone are battery powered and resource-constrained devices, hence energy efficiency is one of the key design goals. In this work, we explore the use of compression techniques to improve the efficiency of a wireless ECG wearable monitor. In the presented system, the wearable node is used for bio-signal acquisition, pre-processing and compression, while a smartphone is used for real-time signal reconstruction. The system aims at medical-grade signal quality and the impact of lossy compression is tested on real signals acquired by the node and its effects are evaluated on system- level energy consumption. We analyze performance/energy trade-offs considering online data compression on the wearable device and real-time reconstruction on the smartphone. Our results show that Compressed Sensing pays off only when the SNR requirement is below 20 dB due to the non-ideal sparsity of ECG signal. We propose a hybrid compression scheme based on CS and under-quantization to address these limitations.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP3-8, 53ENERGY VS. RELIABILITY TRADE-OFFS EXPLORATION IN BIOMEDICAL ULTRA-LOW POWER DEVICES
Speaker:
Loris Duch, École Polytechnique Fédérale de Lausanne (EPFL), CH
Authors:
Loris Duch, Pablo Garcia del Valle, David Atienza, Shrikanth Ganapathy and Andreas Burg, École Polytechnique Fédérale de Lausanne (EPFL), CH
Abstract
State-of-the-art wearable devices such as embedded biomedical monitoring systems apply voltage scaling to lower as much as possible their energy consumption and achieve longer battery lifetimes. While embedded memories often rely on Error Correction Codes (ECC) for error protection, in this paper we explore how the characteristics of biomedical applications can be exploited to develop new techniques with lower power overhead. We then introduce the Dynamic eRror compEnsation And Masking (DREAM) technique, that provides partial memory protection with less area and power overheads than ECC. Different tradeoffs between the error correction ability of the techniques and their energy consumption are examined to conclude that, when properly applied, DREAM consumes 21% less energy than a traditional ECC with Single Error Correction and Double Error Detection (SEC/DED) capabilities.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:01IP3-9, 883A MACHINE LEARNING APPROACH FOR MEDICATION ADHERENCE MONITORING USING BODY-WORN SENSORS
Speaker:
Hassan Ghasemzadeh, Washington State University, US
Authors:
Niloofar Hezar Jaribi, Ramin Fallahzadeh and Hassan Ghasemzadeh, Washington State University, US
Abstract
One of the most important challenges in current healthcare systems is medication non-adherence, which has irrevocable outcomes. Although many technologies have been developed for medication adherence monitoring, the reliability and cost-effectiveness of these technologies are not well understood to date. This paper presents a medication adherence monitoring system by user-activity tracking based on wrist-band wearable sensors. We develop machine learning algorithms that track wrist motions in real-time and identify medication intake activities. We propose a novel data analysis pipeline to reliably detect medication adherence by examining single-wrist motions. Our system achieves an accuracy of 78.3% in adherence detection without need for medication pillboxes and with only one sensor worn on either of the wrists. The accuracy of our algorithm is only 7.9% lower than a system with two sensors that track motions of both wrists.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:02IP3-10, 190REQUIREMENTS-CENTRIC CLOSED-LOOP VALIDATION OF IMPLANTABLE CARDIAC DEVICES
Speaker:
Partha Roop, The University of Auckland, NZ
Authors:
Weiwei Ai, Nitish Patel and Partha Roop, The University of Auckland, NZ
Abstract
Implantable medical devices are recommended by physicians to sustain life while improving the overall quality of life of the patients. In spite of the rigorous testing, there have been numerous failures and associated recalls which suggest that completeness of the testing is elusive. We propose a new validation framework based on formal methods for real-time closed-loop validation of medical devices. The proposed approach includes a synchronous observer acting both as an automated oracle and also as a requirements coverage monitor. The observer combines an on-line testing adequacy evaluation module together with a heuristic learning module. This methodology was applied to validate a pacemaker over a virtual heart model. A subset of the requirements was used to test its efficacy. The results show that the proposed methodology can, in real-time, evaluate the test adequacy and hence guide the on-line test case generation to maximize the requirements coverage.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area

7.4 System-Level Synthesis

Date: Wednesday 16 March 2016
Time: 14:30 - 16:00
Location / Room: Konferenz 2

Chair:
Cathal McCabe, Xilinx, Inc. Ireland, IE

Co-Chair:
Yuichi Nakamura, NEC Japan, JP

This session is centered around topics in System-Level Synthesis, with specific focus on hardware threads, composable templates, and evaluation of fixed-point systems. The session concludes with a short IP presentation on asynchronous circuit synthesis for cryptographic applications.

TimeLabelPresentation Title
Authors
14:307.4.1SYSTEM LEVEL SYNTHESIS FOR VIRTUAL MEMORY ENABLED HARDWARE THREADS.
Speaker:
Nicolas Estibals, IRISA, FR
Authors:
Nicolas Estibals1, Gaël Deest2, Ali Hassan El Moussawi2 and Steven Derrien3
1University of Rennes 1/IRISA, FR; 2University of Rennes 1, FR; 3IRISA, FR
Abstract
Newly introduced ARM-based FPGA platforms enable transparent hardware/software multithreading by providing cache-coherent memory accesses to hardware accelerators. However, the lack of support for virtual memory on the accelerator side impedes the acceleration of legacy applications. To address this problem, we propose a fully automated High Level Synthesis based source-to-source flow to efficiciently support virtual memory in hardware accelerators.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:007.4.2COMPOSABLE, PARAMETERIZABLE TEMPLATES FOR HIGH-LEVEL SYNTHESIS
Speaker:
Dajung Lee, University of California, San Diego, US
Authors:
Janarbek Matai, Dajung Lee, Alric Althoff and Ryan Kastner, University of California, San Diego, US
Abstract
High-level synthesis tools aim to make FPGA programming easier by raising the level of programming abstraction. Yet in order to get an efficient hardware design from HLS tools, the designer must know how to write HLS code that results in an efficient low level hardware architecture. Unfortunately, this requires substantial hardware knowledge, which limits wide adoption of HLS tools outside of hardware designers. In this work, we develop an approach based upon parameterizable templates that can be composed using common data access patterns. This creates a methodology for efficient hardware implementations. Our results demonstrate that a small number of optimized templates can be hierarchically composed to develop highly optimized hardware implementations for large applications.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:307.4.3LEVERAGING POWER SPECTRAL DENSITY FOR SCALABLE SYSTEM-LEVEL ACCURACY EVALUATION
Speaker:
Benjamin Barrois, University of Rennes, INRIA, FR
Authors:
Benjamin Barrois1, Karthick Parashar2 and Olivier Sentieys3
1University of Rennes, INRIA, FR; 2IMEC, BE; 3INRIA, FR
Abstract
The choice of fixed-point word-lengths critically impacts the system performance by affecting the quality of computation, its energy, speed and area. Making a good choice of fixed-point word-length generally requires solving an NP-hard problem by exploring a vast search space. Therefore, the entire fixed-point refinement process becomes critically dependent on evaluating the effects of accuracy degradation. In this paper, a novel technique for the system-level evaluation of fixed-point systems which is more scalable and that renders better accuracy is proposed. This techniques makes use of the information hidden information in the power-spectral density of quantization noise. This technique is found to be very effective in systems consisting of more than one frequency sensitive components. Compared to the state of the art hierarchical methods that are agnostic of the hidden information in the quantization noise spectrum, we show that the proposed technique is 5X to 500X more accurate on some representative signal processing kernels.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP3-11, 132LOW NORMALIZED ENERGY DERIVATION ASYNCHRONOUS CIRCUIT SYNTHESIS FLOW THROUGH FORK-JOIN SLACK MATCHING FOR CRYPTOGRAPHIC APPLICATIONS
Speaker:
Nan Liu, Nanyang Technological University, SG
Authors:
Nan Liu, Kwen-Siong Chong, Weng-Geng Ho, Bah-Hwee Gwee and Joseph S. Chang, Nanyang Technological University, SG
Abstract
In this paper, an automatic synthesis flow of asynchronous (async) Quasi-Delay-Insensitive (QDI) circuits for cryptographic applications is presented. The synthesis flow accepts Verilog netlists as primary inputs, in part leverages on commercial electronic design automation tools for synthesis and verifications, and relies heavily on the proposed translation processes for async netlist conversion and optimization. Particularly, a three-step synchronous-to-asynchronous-direct-translation (SADT) process is proposed. The first step is to translate a Verilog netlist into a direct circuit graph, allowing us to model QDI pipelines for performance analysis based on the same netlist function. Second, graph coarsening in combination with dynamic programing is adopted to analyze the fork-join slack matching of the QDI pipelines, aiming to balance the pipeline depths in any fork-join pipelines to optimize the system performance, and to reduce energy variations of the overall pipelines to against power-analysis-attack. The last step is to insert async local controllers/gates to ensure the async circuits consistent with QDI protocol, hence enhancing its timing robustness to accommodate Process-Voltage-Temperature variations. We show that, on the basis of simulations on the ISCAS benchmark circuits, the QDI circuits based on our proposed automatic synthesis flow are on average 20% faster and feature 30% less normalized energy derivations than un-optimized circuits.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area

7.5 Emerging Memory Architectures

Date: Wednesday 16 March 2016
Time: 14:30 - 16:00
Location / Room: Konferenz 3

Chair:
Costin Anghel, ISEP, FR

Co-Chair:
Fabian Oboril, Karlsruhe Institute of Technology, DE

The first paper presents a method to utilize the variations in RRAM access latency due to IR drop in a given array. The second paper exploits the spatial and temporal locality of cache access, and proposes an ECC scheme wherein write operations with potentially different error rates are mapped to regions with different ECC strengths. The third paper proposes a write scheme for phase change memory to minimize the number of write units.

TimeLabelPresentation Title
Authors
14:307.5.1LEADER: ACCELERATING RERAM-BASED MAIN MEMORY BY LEVERAGING ACCESS LATENCY DISCREPANCY IN CROSSBAR ARRAYS
Speaker:
Hang Zhang, National University of Defense Technology, CN
Authors:
Hang Zhang, Nong Xiao, Fang Liu and Zhiguang Chen, National University of Defense Technology, CN
Abstract
Emerging Resistive Memory (ReRAM) technology is a promising candidate as the replacement to DRAM due to its low leakage power consumption, good scalability, and high density. By employing crossbar structures, the density of ReRAM can be further improved for capacity benefits. However, such structure also causes an IR drop issue due to wire resistance and sneak currents, which lead to an access latency discrepancy in ReRAM memory banks. Existing designs conservatively utilize the worst-case latency of ReRAM arrays, and thus fail to explore the potential of the fast access speed of ReRAM, resulting in sub-optimal performance. In this work, we present an asymmetric ReRAM memory design, which separates a crossbar array into multiple logical regions according to their access latency, and further groups logical regions across different crossbars into virtual regions. Based on the observation of access hotspots inside memory banks, we design a table structure to remap memory requests to different virtual regions with non-uniform access latency, so as to match these access hotspots with the underlying asymmetric bank design. We then introduce both static mapping and dynamic mapping schemes to prioritize memory requests from critical applications to the fast regions for better performance. Experimental results show that our design can improve the 4-core system performance by 13.3% and reduce the memory latency by 21.6% on average for a ReRAM-based memory system across memory intensive applications.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:007.5.2SLIDING BASKET: AN ADAPTIVE ECC SCHEME FOR RUNTIME WRITE FAILURE SUPPRESSION OF STT-RAM CACHE
Speaker:
Yiran Chen, University of Pittsburgh, US
Authors:
Xue Wang1, Mengjie Mao1, Wujie Wen2, Enes Eken1, Hai Li1 and Yiran Chen1
1University of Pittsburgh, US; 2Florida International University, US
Abstract
Write reliability is one of the major challenges in design of spin-transfer torque random access memory (STT- RAM) caches. To ensure design quality, error correction code (ECC) scheme is usually adopted in STT-RAM caches. However, it incurs significant hardware overhead. In observance of the dynamic error correcting requirements, in this work, we propose Sliding Basket - an adaptive ECC scheme to suppress the runtime write failures of STT-RAM cache with minimized hardware cost. Our simulation results show that compared to the STT-RAM caches with conventional ECC scheme, applying Sliding Basket can achieve up to 80.2% saving in ECC bit overhead, comparable write reliability and even better system performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:307.5.3EXPLOITING MORE PARALLELISM FROM WRITE OPERATIONS ON PCM
Speaker:
Zheng Li, Huazhong University of Science and Technology, CN
Authors:
Zheng Li, Fang Wang, Yu Hua, Wei Tong, Jingning Liu, Yu Chen and Dan Feng, Huazhong University of Science and Technology, CN
Abstract
The number of bits can be written concurrently to PCM, called write unit, is restricted due to heavy write energy consumption and we need many serially executed write units to finish a cache line service, which results in long write time and poor write performance of PCM. In order to address the poor write performance problem, we propose a novel PCM write scheme called IZV. The key idea behind IZV is to reduce the number of write unit execution in a cache line service. IZV design includes sFPC (simplified FPC data coding), RW (Reordering Write operations) and WP (Write Parallelism circuits). By means of sFPC, RW and WP, the zero parts of write units can be indicated with predefined prefix bits and the residues can be reordered and written concurrently under power constraints. IZV is highly effective and efficient in improving the performance and reducing the energy consumption. Experimental results of 4-core PARSEC 2.0 workloads show that IZV improves 32.5% performance and reduces 48% energy as well as 44% latency compared with the conventional write scheme. When combined with partly data flip, the variation of IZV (IZV-PF) yields 12% performance improvement, 23% energy saving and 22% latency reduction compared with the state-of-the-art FNW.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area

7.6 Statistical and Symbolic Techniques for the Analysis and Testing of Embedded Software

Date: Wednesday 16 March 2016
Time: 14:30 - 16:00
Location / Room: Konferenz 4

Chair:
Jian-Jia Chen, Technische Universität Dortmund, DE

Co-Chair:
Petru Eles, Linköping University, SE

This session presents new approaches that enable efficient analysis and testing of embedded software. The first paper presents an interesting dynamic partitioning strategy to reduce the complexity of symbolic execution based software testing. An extension to UML activity diagrams towards stocastic modeling is proposed in the second paper; this allows for quantitative reasoning based on statistical model checking techniques. The final paper presents a new testing methodology that combines symbolic decision procedures with statistical hypothesis testing to study the correctness of intelligent embedded systems.

TimeLabelPresentation Title
Authors
14:307.6.1DYNAMIC PARTITIONING STRATEGY TO ENHANCE SYMBOLIC EXECUTION
Speaker:
Brendan Marcellino, Virginia Tech, US
Authors:
Brendan Marcellino and Michael Hsiao, Virginia Tech, US
Abstract
Software testing is a fundamental part of the software development process. In the context of embedded-software applications, testing can find defects which cause unprecedented risks. The path explosion problem often necessitates one to consider an extremely large number of paths in order to reach a specific target. Symbolic execution can reduce this cost by using symbolic values and heuristic exploration strategies. Although various exploration strategies have been proposed in the past, the number of SMT solver calls for reaching a target is still large, resulting in long execution times for programs containing many paths. In this paper, we present a dynamic partitioning strategy in order to mitigate this problem, consequently reducing unnecessary SMT solver calls as well. Using this strategy on SSA-applied code, the code sections are analyzed in a nonconsecutive order guided by data dependency metrics within the sections. Experimental results show that our dynamic strategy can achieve significant speedups in reducing the number of unnecessary solver calls in large programs. More than 1000x speedup can be achieved in large programs over conflict-driven learning techniques.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:007.6.2QUANTITATIVE TIMING ANALYSIS OF UML ACTIVITY DIAGRAMS USING STATISTICAL MODEL CHECKING
Speaker:
Mingsong Chen, East China Normal University, CN
Authors:
Fan Gu1, Xinqian Zhang1, Mingsong Chen1, Daniel Grosse2 and Rolf Drechsler2
1East China Normal University, CN; 2University of Bremen, DE
Abstract
Unified Modeling Language (UML) activity diagrams are widely used in modeling the dynamic aspects of system designs. However, due to frequent interactions between systems and external uncertain environment, the current version of UML activity diagrams cannot be used to accurately capture and quantify the overall timing behaviors of complex systems. To address this issue, this paper extends the UML activity diagrams to enable the stochastic modeling of user inputs and action executions, which strongly affect the overall timing behaviors of systems. Based on the statistical model checker UPPAAL-SMC, this paper proposes an automated framework that can perform quantitative reasoning under various functional and non-functional queries. Experimental results demonstrate the effectiveness of our proposed approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:307.6.3INTEGRATING SYMBOLIC AND STATISTICAL METHODS FOR TESTING INTELLIGENT SYSTEMS: APPLICATIONS TO MACHINE LEARNING AND COMPUTER VISION
Speaker:
Sumit Kumar Jha, University of Central Florida, US
Authors:
Arvind Ramanathan1, Laura Pullum1, Faraz Hussain2, Dwaipayan Chakraborty2 and Sumit Kumar Jha2
1Oak Ridge National Laboratory, US; 2University of Central Florida, US
Abstract
Embedded intelligent systems ranging from tiny im- plantable biomedical devices to large swarms of autonomous un- manned aerial systems are becoming pervasive in our daily lives. While we depend on the flawless functioning of such intelligent systems, and often take their behavioral correctness and safety for granted, it is notoriously difficult to generate test cases that expose subtle errors in the implementations of machine learning algorithms. Hence, the validation of intelligent systems is usually achieved by studying their behavior on representative data sets, using methods such as cross-validation and bootstrapping. In this paper, we present a new testing methodology for studying the correctness of intelligent systems. Our approach uses symbolic decision procedures coupled with statistical hypothesis testing to validate machine learning algorithms. We show how we have employed our technique to successfully identify subtle bugs (such as bit flips) in implementations of the k-means algorithm. Such errors are not readily detected by standard validation methods such as randomized testing. We also use our algorithm to analyze the robustness of a human detection algorithm built using the OpenCV open-source computer vision library. We show that the human detection implementation can fail to detect humans in perturbed video frames even when the perturbations are so small that the corresponding frames look identical to the naked eye.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area

7.7 Aging Mitigation to Improve System Robustness

Date: Wednesday 16 March 2016
Time: 14:30 - 16:00
Location / Room: Konferenz 5

Chair:
Maria Michael, University of Cyprus, CY

Co-Chair:
Carles Hernandez, Barcellona Supercomputer Center, ES

This session presents methodologies for monitoring aging effects in FPGAs and task mapping strategies for prolonging lifetime in robust multi/many-core systems

TimeLabelPresentation Title
Authors
14:307.7.1PATH SELECTION AND SENSOR INSERTION FLOW FOR AGE MONITORING IN FPGAS
Speaker:
Mohammad Ebrahimi, University of Tehran, IR
Authors:
Mohammad Ebrahimi1, Zana Ghaderi2, Eli Bozorgzadeh2 and Zainalabedin Navabi1
1University of Tehran, IR; 2University of California, Irvine, US
Abstract
This paper presents a two-step aging-aware methodology for Representative Critical Paths (RCPs) selection from a large number of Critical Paths (CPs) in programmable logic devices. First, nomination of CPs is based on delay, temperature, and lexicographic function of duty cycle and switching activity filtering, which are the major causes in Bias Temperature Instability (BTI) and Hot Carrier Injection (HCI) aging mechanisms. Secondly, RCPs will be selected based on Fan-out (FO) and physical location of Logic Blocks (LBs) along a CP to decrease aging propagation and sensor distribution fairness, respectively. We then present a sensor insertion algorithm that will be used during design placement to avoid sensors inaccuracy. Implementation steps of sensor insertion are performed automatically with a limited human interaction. Higher aging-rate of RCPs than unselected CPs in our experiments demonstrates the effectiveness of the proposed methodology. Keywords— Aging, FPGA, path selection, placement.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:007.7.2DESIGN AND EVALUATION OF RELIABILITY-ORIENTED TASK RE-MAPPING IN MPSOCS USING TIME-SERIES ANALYSIS OF INTERMITTENT FAULTS
Speaker:
Siva Satyendra Sahoo, National University of Singapore, SG
Authors:
Siva Satyendra Sahoo1, Akash Kumar2 and Bharadwaj Veeravalli1
1National University of Singapore, SG; 2Technische Universität Dresden, DE
Abstract
A large number of hardware faults are being caused by an increasing number of manufacturing defects and physical interactions during operation. This poses major challenges for the design and testing of modern Multiprocessor System-on-Chips (MPSoCs). Intermittent faults constitute a major part of hardware faults and their fault rates can be used as an indicator of the wear-out in a Processing Element (PE). We propose a run-time task re-mapping method that uses this information to improve the useful lifetime of MPSoCs. We also propose a scenario-aware system-level fault injection technique for intermittent faults to evaluate system-level design techniques in MPSoCs. Our performance results conclusively show that our strategy significantly scales on reliability metrics with respect to number of PEs. Specifically, we show that our method can achieve an increase in lifetime of up to 16% and tolerate up to 30% more faults than state-of-the-art techniques.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:307.7.3LIFETIME-AWARE LOAD DISTRIBUTION POLICIES IN MULTI-CORE SYSTEMS: AN IN-DEPTH ANALYSIS
Speaker:
Antonio Miele, Politecnico di Milano, IT
Authors:
Cristiana Bolchini, Luca Cassano and Antonio Miele, Politecnico di Milano, IT
Abstract
Dynamic Reliability Management solutions are often adopted in multi-core systems to mitigate aging and wear-out effects, by opportunely distributing the workload on the available cores. The efficiency of such solutions is generally evaluated by considering only the occurrence of the first core failure due to the computational complexity. In this paper we propose an in-depth analysis of such approaches by considering the occurrence of multiple subsequent core failures, thus offering a more precise estimation of the lifetime reliability. In particular, we analyzed two classical load distribution approaches: a load balancing strategy versus a strategy based on spare resources. Experimental results show benefits and limitations of the considered solutions in terms of lifetime reliability while fulfilling system performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP3-12, 219A LIFETIME-AWARE RUNTIME MAPPING APPROACH FOR MANYCORE SYSTEMS IN THE DARK SILICON ERA
Speaker:
Mohammad-Hashem Haghbayan, University of Turku, FI
Authors:
Mohammad-Hashem Haghbayan1, Antonio Miele2, Amir-Mohammad Rahmani3, Pasi Liljeberg1 and Hannu Tenhunen3
1University of Turku, FI; 2Politecnico di Milano, IT; 3KTH Royal Institute of Technology and University of Turku, FI
Abstract
In this paper, we propose a novel lifetime reliability-aware resource management approach for many-core architectures. The approach is based on hierarchical architecture, composed of a long-term runtime reliability analysis unit and a short-term runtime mapping unit. The former periodically analyses the aging status of the various processing units with respect to a target value specified by the designer, and performs recovery actions on highly stressed cores. The calculated reliability metrics are utilized in runtime mapping of the newly arrived applications to maximize the performance of the system while fulfilling reliability requirements and the available power budget. Our extensive experimental results reveal that the proposed reliability-aware approach can efficiently select the processing cores to be used over time in order to enhance the reliability at the end of the operational life (up to 62%) while offering the comparable performance level of the state-of-the-art runtime mapping approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area

7.8 Presentations from IoT-Campus (II): IoT Survival Guide and Big Data Challenges

Date: Wednesday 16 March 2016
Time: 14:30 - 16:00
Location / Room: Exhibition Theatre

Organiser:
Hans-Jürgen Brand, IDT/ZMDI, DE

This session features presentations given by exhibitors from the Campus on IoT and Secure Systems and from the projects booths, with a special focus on how IoT will change our life and how to design IoT devices. A second session (4.8) will highlight ASIC and sensor solutions for IoT applications. Attendees are invited to also visit the campus and projects booths for further details and discussions.

TimeLabelPresentation Title
Authors
14:307.8.1DIGITAL TRANSFORMATION: THE SURVIVAL GUIDE FOR THE AGE OF BIG DATA, INDUSTRY 4.0 AND THE INTERNET OF THINGS
Speaker:
Christoph Kögler, T-Systems Multimedia Solutions GmbH, DE
15:007.8.2DESIGNING IOT DEVICES WITH X-FAB'S OPEN-PLATFORM FOUNDRY TECHNOLOGIES
Speaker:
Ulrich Bretthauer, X-FAB, DE
15:307.8.3BIG DATA CHALLENGES IN HIGH ENERGY PHYSICS EXPERIMENTS: THE ATLAS (CERN) FAST TRACKER APPROACH
Speaker:
Calliope-Louisa Sotiropoulou, Universita’ di Pisa and INFN Pisa, IT
Abstract

We live in the era of "Big Data" problems. Massive amounts of data are produced and captured, data that require significant amounts of filtering to be processed in a realistically useful form. An excellent example of a "Big Data" problem is the data processing flow in High Energy Physics experiments, in our case the ATLAS detector in CERN. In the Large Hadron Collider (LHC) 40 million collisions of bunches of protons take place every second, which is about 15 trillion collisions per year. For the ATLAS detector alone 1 Mbyte of data is produced for every collision or 2000 Tbytes of data per year. Therefore what is needed is a very efficient real-time trigger system to filter the collisions (events) and identify the ones that contain "interesting" physics for processing.

One of the upgrades of the ATLAS Trigger system is the Fast TracKer real-time pattern matching machine, able to reconstruct the tracks of the particles in the inner silicon detector of the ATLAS experiment in less than 100 μsec. To achieve this performance the Fast TracKer is made of 8 different types of custom designed boards with 8000 ASICs and 2000 FPGAs. Pattern matching and reconstruction is a common data processing problem and therefore the hardware and algorithms developed for the Fast TracKer can be exploited in applications outside High Energy Physics. This is one of the targets of the Marie Curie IAPP Fast TracKer project: to explore the potentials of the Fast TracKer hardware in applications that are beyond its initial design purpose (e.g. biomedical applications, cognitive image processing and security applications).

16:00End of session
Coffee Break in Exhibition Area

IP3 Interactive Presentations

Date: Wednesday 16 March 2016
Time: 16:00 - 16:30
Location / Room: Conference Level, Foyer

Interactive Presentations run simultaneously during a 30-minute slot. A poster associated to the IP paper is on display throughout the afternoon. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session, prior to the actual Interactive Presentation. Moreover, one "Best Interactive Presentation Award" will be given.

LabelPresentation Title
Authors
IP3-1A FLEXIBLE INEXACT TMR TECHNIQUE FOR SRAM-BASED FPGAS
Speaker:
Akash Kumar, Technische Universität Dresden, DE
Authors:
Shyamsundar Venkataraman1, Rui Santos1 and Akash Kumar2
1National University of Singapore, SG; 2Technische Universität Dresden, DE
Abstract
Single Event Upsets (SEUs) inadvertently change the logic memory and thereby the configuration of the Field Programmable Gate Arrays (FPGAs), leading to their incorrect functioning. Traditional methods to tolerate such faults include Triple Modular Redundancy (TMR). However, such method has a high overhead in terms of power and area. Moreover, the inexact methods used in ASICs to overcome this problem are not efficient when applied in FPGAs. Therefore, this paper proposes a novel technique based on heuristic to tolerate faults in SRAM-based FPGAs by using inexact modules in conjunction with TMR, thus reducing the area and power overhead of the design. Experiments run on various MCNC benchmark circuits show the accuracy of the proposed technique. They also show that the design solutions found through this technique only differ 0.52% on average from the optimal ones and savings up to 84.4% in terms of computation time can be reached on average.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-2ACCURATE VERIFICATION OF RC POWER GRIDS
Speaker:
Mohammad Fawaz, University of Toronto, CA
Authors:
Mohammad Fawaz and Farid N. Najm, University of Toronto, CA
Abstract
The power distribution network (PDN) of an integrated circuit (IC) must undergo various checks throughout the design flow, in order to guarantee that the voltage fluctuations are within certain user-specified safety thresholds. Vectorless verification of the PDN is one approach for verification that requires little information about the on-die logic. This verification problem has been studied extensively over the past few years and has been generally solved by first discretizing time using a particular user-defined time-step. We investigate the effect of this time-step on the quality of the solutions produced (both exact and estimates). We also propose an efficient method to specify the time-step in a way to minimize the errors introduced by the voltage drop estimates.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-3SECURITY ANALYSIS OF CYBER-PHYSICAL SYSTEMS ILLUSTRATED WITH AUTOMOTIVE CASE STUDY
Speaker:
Viacheslav Izosimov, KTH Royal Institute of Technology, SE
Authors:
Viacheslav Izosimov1, Alexandros Asvestopoulos2, Oscar Blomkvist2 and Martin Törngren3
1Semcon, SE; 2Scania CV, SE; 3KTH Royal Institute of Technology, SE
Abstract
We present a method for systematic consideration of security attributes in development of cyber-physical systems. We evaluate our method in development of commercial vehicles that were so far unreasonably excluded from automotive security studies (despite the great importance of commercial vehicles for the society). We have conducted analysis of a known zero-cost non-physical attack, fine-tuned to our commercial vehicle (a truck), and considered countermeasures within the development flow.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-4ONLINE HEURISTIC FOR THE MULTI-OBJECTIVE GENERALIZED TRAVELING SALESMAN PROBLEM
Speaker:
Joost van Pinxten, Eindhoven University of Technology, NL
Authors:
Joost van Pinxten1, Marc Geilen1, Twan Basten1, Umar Waqas1 and Lou Somers2
1Eindhoven University of Technology, NL; 2Océ Technologies, NL
Abstract
Today's manufacturing systems are typically complex cyber-physical systems where the physical and control aspects interact with the scheduling decisions. Optimizing such facilities requires ordering jobs and configuring the manufacturing system for each job. This optimization problem can be described as a Multi-Objective Generalized TSP where conflicting objectives lead to a trade-off space. This is the first work to address this TSP variant, introducing a compositional heuristic suitable to online application.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-5TOWARDS LOW OVERHEAD CONTROL FLOW CHECKING USING REGULAR STRUCTURED CONTROL
Speaker:
Zhiqi Zhu, The University of Texas at Dallas, US
Authors:
Zhiqi Zhu and Joseph Callenes-Sloan, The University of Texas at Dallas, US
Abstract
Abstract—With process scaling and the adoption of post- CMOS technologies, reliability has been brought to the forefront of modern computer system design. Among the different ways that hardware faults can manifest in a system, errors related to the control flow of a program tend to be the most difficult to handle when ensuring reliable computing. Errors in the sequencing of instructions executed are usually catastrophic, resulting in system hangs, crashes, and/or corrupted data. For this reason, conventional approaches rely on some form of general redundancy for detecting or recovering from a control flow error. Due to the power constraints of emerging systems however, these types of conservative approaches are quickly becoming infeasible. Control Flow Checking by Software Signatures (CFCSS) is a software-based technique for detecting control flow errors [1] that using assigned signatures rather than by using general redundancy. Unfortunately, the performance overhead for CFCSS can still be as high as 80%-90% for many applications. In this paper, we propose a novel method for reducing the overhead of control flow checking by exploiting the regular control structure found in many applications. Specifically, we observe that the alternating sequence of conditional and unconditional based control allows for the full control signatures to be computed at alternating basic blocks. Based on experimental results of the proposed approach, we observe that the overheads of the traditional methods are reduced on average by 25.9%.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-6EMULATION-BASED HIERARCHICAL FAULT-INJECTION FRAMEWORK FOR COARSE-TO-FINE VULNERABILITY ANALYSIS OF HARDWARE-ACCELERATED APPROXIMATE ALGORITHMS
Speaker:
Theocharis Theocharides, University of Cyprus, CY
Authors:
Ioannis Chadjiminas, Ioannis Savva, Christos Kyrkou, Maria K. Michael and Theocharis Theocharides, University of Cyprus, CY
Abstract
This paper proposes a hierarchical fault injection emulation framework tailored to the structure of complex and large application-specific circuits, that performs vulnerability analysis of the system for single event upsets (SEUs) at different design granularities in real-time. In particular, the framework allows for efficient probabilistic modelling of the SEU impact, making it particularly applicable for hardware-accelerated approximate applications such as multimedia, computer vision and image/signal processing, due to its high processing speed and real-time capabilities. The framework is emulated on an FPGA-based platform and evaluated using a depth computation kernel, both in standalone manner as well as within a robotic obstacle avoidance application.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-7TECHNOLOGY TRANSFER IN COMPUTING SYSTEMS: THE TETRACOM APPROACH
Speaker and Author:
Rainer Leupers, RWTH Aachen University, DE
Abstract
TETRACOM is an ongoing EU FP7 Coordination Action with the ambition to boost small to medium scale academia-to-industry technology transfer in all domains of computing systems. The project primarily operates via competitive open calls for individual Technology Transfer Projects (TTPs). Each TTP performs a well-defined bilateral transfer activity between one European academic partner and one industry partner. TETRACOM coordinates all TTPs and provides technology transfer advice and co-funding. This paper describes TETRACOM´s experimental concept and project structure. It summarizes preliminary lessons learned after more than two project years and successful management of 30+ individual TTPs.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-8ENERGY VS. RELIABILITY TRADE-OFFS EXPLORATION IN BIOMEDICAL ULTRA-LOW POWER DEVICES
Speaker:
Loris Duch, École Polytechnique Fédérale de Lausanne (EPFL), CH
Authors:
Loris Duch, Pablo Garcia del Valle, David Atienza, Shrikanth Ganapathy and Andreas Burg, École Polytechnique Fédérale de Lausanne (EPFL), CH
Abstract
State-of-the-art wearable devices such as embedded biomedical monitoring systems apply voltage scaling to lower as much as possible their energy consumption and achieve longer battery lifetimes. While embedded memories often rely on Error Correction Codes (ECC) for error protection, in this paper we explore how the characteristics of biomedical applications can be exploited to develop new techniques with lower power overhead. We then introduce the Dynamic eRror compEnsation And Masking (DREAM) technique, that provides partial memory protection with less area and power overheads than ECC. Different tradeoffs between the error correction ability of the techniques and their energy consumption are examined to conclude that, when properly applied, DREAM consumes 21% less energy than a traditional ECC with Single Error Correction and Double Error Detection (SEC/DED) capabilities.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-9A MACHINE LEARNING APPROACH FOR MEDICATION ADHERENCE MONITORING USING BODY-WORN SENSORS
Speaker:
Hassan Ghasemzadeh, Washington State University, US
Authors:
Niloofar Hezar Jaribi, Ramin Fallahzadeh and Hassan Ghasemzadeh, Washington State University, US
Abstract
One of the most important challenges in current healthcare systems is medication non-adherence, which has irrevocable outcomes. Although many technologies have been developed for medication adherence monitoring, the reliability and cost-effectiveness of these technologies are not well understood to date. This paper presents a medication adherence monitoring system by user-activity tracking based on wrist-band wearable sensors. We develop machine learning algorithms that track wrist motions in real-time and identify medication intake activities. We propose a novel data analysis pipeline to reliably detect medication adherence by examining single-wrist motions. Our system achieves an accuracy of 78.3% in adherence detection without need for medication pillboxes and with only one sensor worn on either of the wrists. The accuracy of our algorithm is only 7.9% lower than a system with two sensors that track motions of both wrists.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-10REQUIREMENTS-CENTRIC CLOSED-LOOP VALIDATION OF IMPLANTABLE CARDIAC DEVICES
Speaker:
Partha Roop, The University of Auckland, NZ
Authors:
Weiwei Ai, Nitish Patel and Partha Roop, The University of Auckland, NZ
Abstract
Implantable medical devices are recommended by physicians to sustain life while improving the overall quality of life of the patients. In spite of the rigorous testing, there have been numerous failures and associated recalls which suggest that completeness of the testing is elusive. We propose a new validation framework based on formal methods for real-time closed-loop validation of medical devices. The proposed approach includes a synchronous observer acting both as an automated oracle and also as a requirements coverage monitor. The observer combines an on-line testing adequacy evaluation module together with a heuristic learning module. This methodology was applied to validate a pacemaker over a virtual heart model. A subset of the requirements was used to test its efficacy. The results show that the proposed methodology can, in real-time, evaluate the test adequacy and hence guide the on-line test case generation to maximize the requirements coverage.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-11LOW NORMALIZED ENERGY DERIVATION ASYNCHRONOUS CIRCUIT SYNTHESIS FLOW THROUGH FORK-JOIN SLACK MATCHING FOR CRYPTOGRAPHIC APPLICATIONS
Speaker:
Nan Liu, Nanyang Technological University, SG
Authors:
Nan Liu, Kwen-Siong Chong, Weng-Geng Ho, Bah-Hwee Gwee and Joseph S. Chang, Nanyang Technological University, SG
Abstract
In this paper, an automatic synthesis flow of asynchronous (async) Quasi-Delay-Insensitive (QDI) circuits for cryptographic applications is presented. The synthesis flow accepts Verilog netlists as primary inputs, in part leverages on commercial electronic design automation tools for synthesis and verifications, and relies heavily on the proposed translation processes for async netlist conversion and optimization. Particularly, a three-step synchronous-to-asynchronous-direct-translation (SADT) process is proposed. The first step is to translate a Verilog netlist into a direct circuit graph, allowing us to model QDI pipelines for performance analysis based on the same netlist function. Second, graph coarsening in combination with dynamic programing is adopted to analyze the fork-join slack matching of the QDI pipelines, aiming to balance the pipeline depths in any fork-join pipelines to optimize the system performance, and to reduce energy variations of the overall pipelines to against power-analysis-attack. The last step is to insert async local controllers/gates to ensure the async circuits consistent with QDI protocol, hence enhancing its timing robustness to accommodate Process-Voltage-Temperature variations. We show that, on the basis of simulations on the ISCAS benchmark circuits, the QDI circuits based on our proposed automatic synthesis flow are on average 20% faster and feature 30% less normalized energy derivations than un-optimized circuits.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-12A LIFETIME-AWARE RUNTIME MAPPING APPROACH FOR MANYCORE SYSTEMS IN THE DARK SILICON ERA
Speaker:
Mohammad-Hashem Haghbayan, University of Turku, FI
Authors:
Mohammad-Hashem Haghbayan1, Antonio Miele2, Amir-Mohammad Rahmani3, Pasi Liljeberg1 and Hannu Tenhunen3
1University of Turku, FI; 2Politecnico di Milano, IT; 3KTH Royal Institute of Technology and University of Turku, FI
Abstract
In this paper, we propose a novel lifetime reliability-aware resource management approach for many-core architectures. The approach is based on hierarchical architecture, composed of a long-term runtime reliability analysis unit and a short-term runtime mapping unit. The former periodically analyses the aging status of the various processing units with respect to a target value specified by the designer, and performs recovery actions on highly stressed cores. The calculated reliability metrics are utilized in runtime mapping of the newly arrived applications to maximize the performance of the system while fulfilling reliability requirements and the available power budget. Our extensive experimental results reveal that the proposed reliability-aware approach can efficiently select the processing cores to be used over time in order to enhance the reliability at the end of the operational life (up to 62%) while offering the comparable performance level of the state-of-the-art runtime mapping approach.

Download Paper (PDF; Only available from the DATE venue WiFi)

UB08 Session 8

Date: Wednesday 16 March 2016
Time: 16:00 - 18:00
Location / Room: Booth 15, Exhibition Area

LabelPresentation Title
Authors
UB08.1AHLS_DESYNC: DESYNCHRONIZATION TOOL FOR HIGH-LEVEL SYNTHESIS OF ASYNCHRONOUS CIRCUITS
Presenter:
Jean Simatic, TIMA Laboratory, FR
Authors:
Jean Simatic, Rodrigo Possamai Bastos and Laurent Fesquet, TIMA Laboratory, FR
Abstract
We present a tool for the high-level synthesis (HLS) of event-driven (asynchronous) circuits. Our approach first uses an existing HLS tool, AUGH, to generate a synchronous finite state machine (FSM) and a data-path. Then, the presented tool desynchronizes solely the FSM in 5 steps: 1. Parse the FSM to build a state graph containing the control signal assignments. 2. Separate multiplexer control and register control signals by analyzing the data-path. 3. Generate an event-driven FSM netlist by mapping the state graph on a dedicated set of asynchronous controllers. 4. Synthesize the data-path thanks to a commercial synthesis tool (Design Compiler). 5. Estimate the delays in the data-path with a static timing analysis tool (PrimeTime). Insert delays in the controller accordingly. Our demonstration will exhibit two testbenches: a GCD algorithm to expose the basic concepts and a non-uniform sampling FIR filter more representative of real-life applications.

Download Paper (PDF)
UB08.2GRIP: GRAPH-REWRITING-BASED IP-INTEGRATION (GRIP) - AN EDA TOOL FOR SOFTWARE DEFINED SOC DESIGN
Presenter:
Munish Jassi, Technische Universität München, DE
Authors:
Munish Jassi, Yong Hu, Jian Lyu, Daniel Mueller-Gritschneder and Ulf Schlichtmann, Technische Universität München, DE
Abstract
The GRIP tool - Graph-Rewriting-Based IP-Integration - provides system engineers with a comprehensive platform that takes care of their IP-integration concerns for IP-centric SoC designs, also referred to as SW-defined SoCs. The tool uses the standardized meta-data IP-XACT format for HW descriptions and encodes the design IP-integration knowledge as a set of integration rules based on graph rewriting and grammar theory. The tool automates and encodes the step-by-step integration of IPs to build a desired system architecture. Multiple sequential IP-integration steps can be compiled to iteratively generate new architectures. For design space exploration (DSE), constraints can be given to generate a desired subset of candidate SoCs. Code generation generates the design files for each architecture. This is demonstrated as DSE for OpenCV CV application running on a Xilinx Zynq chipset based Zedboard. GRIP additionally generates the HW-drivers for both non-OS and Linux-based systems.

Download Paper (PDF)
UB08.3CLASH: DIGITAL CIRCUITS IN CλASH
Presenter:
Christiaan Baaij, University of Twente, NL
Authors:
Christiaan Baaij and Jan Kuper, University of Twente, NL
Abstract
CλaSH is a novel compiler system for generating digital circuits as described by a mathematical/functional specification of the architecture. We will demonstrate several applications written in CλaSH: * Tunneling ball device: With a minimal amount of acceleration, a fast spinning metal disk is either sped up or slowed down so that a falling ball can fall through one of the metal disk's two holes. * Music synthesizer and spectrum analyser: An audio CODEC samples music being played from either an MP3 player or a computer. We can apply several digital filters which affect the music. The effects of these filters can be both seen on a monitor, and heard through speakers connected to the FPGA board. * Multi-processor system: The system is used in a compiler construction course, where the compiler is written in the Haskell. Because CλaSH is proper subset of Haskell, students can build and experiment with the compiler and the multi-processor system in the same environment.

Download Paper (PDF)
UB08.5LISA: ENABLING LAYERED INTEROPERABILITY FOR INTERNET OF THINGS THROUGH LISA
Presenter:
Behailu Shiferaw Negash, University of Turku, FI
Authors:
Behailu Shiferaw Negash1, Amir-Mohammad Rahmani1, Tomi Westerlund1, Pasi Liljeberg1 and Hannu Tenhunen2
1University of Turku, FI; 2University of Turku, FI and Royal Institute of Technology (KTH), SE
Abstract
There is high expectation towards the changes that come with the implementation of the Internet of Things (IoT). However, this vision is limited by the heterogeneous nature of IoT devices. This led to vertical application silos that are incapable of working together. To ease this problem of heterogeneity, we have developed a lightweight interoperability framework, LISA, to hide variations in communication technology and data formats and provide a uniform API for programmers. LISA is inspired by Network on Terminal Architecture (NoTA), an open framework from Nokia Research Center. There are few frameworks for interoperability of IoT. However, these frameworks fail to address the resource limitations of the majority of IoT devices. To the best of our knowledge, LISA is the first framework designed for resource constrained devices. This demonstration shows LISA in action, helping heterogeneous devices interoperate through a gateway in the fog layer between the devices and the cloud.

Download Paper (PDF)
UB08.6MCC: CONTRACT-BASED AUTOMATED INTEGRATION FOR COMPONENT-BASED CRITICAL SYSTEMS
Presenter:
Johannes Schlatow, TU Braunschweig, DE
Authors:
Johannes Schlatow, Marcus Nolte, Rolf Ernst and Markus Maurer, TU Braunschweig, DE
Abstract
In the scope of the research unit Controlling Concurrent Change, we developed a contract-based middleware to autonomously manage and ensure the safety, availability and security properties of a component-based run-time environment. It guarantees that any change to the system is formally analysed beforehand and only applied if it does not violate any of the contracts, thereby enabling in-field updateability of complex critical systems. For this purpose, a Multi-Change Controller (MCC) aggregates component contracts and invokes viewpoint-specific analysis engines to evaluate change requests and find feasible system configurations. The MCC is specifically designed for extensibility so that analysis engines can be added and combined dependent on the application domain. We show a demonstrator that showcases and illustrates this contract-based process for an automated integration of an automotive system. Our demonstrator is built upon the Genode OS Framework and Xilinx Zynq-7000 SoCs.

Download Paper (PDF)
UB08.7SRAM-BASED PHYSICAL UNCLONABLE KEYS FOR BLE SMART LOCK SYSTEMS
Presenters:
Iluminada Baturone and Miguel Ángel Prada-Delgado, University of Seville, ES
Authors:
Iluminada Baturone, Miguel Ángel Prada-Delgado, Alfredo Vázquez-Reyes, Laurentiu Acasandrei, Diego Fernández-Barrera and Javier Prada-Delgado,
Abstract
Nowadays, several smart lock systems use Bluetooth Low Energy (BLE) to recognize when a smartphone, conveniently authenticated by a digital key, is near. The keys can be shared and are managed by web apps, so that system security depends on how the software prevents an attacker from discovering the keys. In order to increase security by a two-factor method ('something you have' in addition to 'something you know'), the BLE smart lock system prototype shown in this demonstrator recognizes when a user wearing an authenticated BLE chip (in a key fob, wristband, etc.) is near. The digital keys are not stored but they are regenerated on the fly by only the trusted chip. This is possible by using the start-up values of the SRAM in the BLE chip, which act as a physical unclonable function (PUF), so that the chip cannot be cloned. The SRAM start-up values of the BLE chip are also exploited as true random numbers to derive fresh keys for each transaction with the lock.

Download Paper (PDF)
UB08.8CONTREP: A SINGLE-SOURCE FRAMEWORK FOR UML-BASED MODELLING AND DESIGN OF MIXED-CRITICALITY SYSTEMS
Presenter:
Fernando Herrera, University of Cantabria, ES
Authors:
Fernando Herrera and Eugenio Villar, University of Cantabria, ES
Abstract
Mixed-criticality systems integrate applications, platform resources and requirements with different criticality. A criticality reflects the impact of either a failure of a component or a violation of a requirement, which can range from irrelevant to catastrophic effects. This booth presents the CONTREP framework, which supports UML/MARTE based modeling, analysis and design of mixed-criticality embedded systems. The booth shows a model of a quadcopter control system which integrates safety critical (e.g. flight control), mission-critical (e.g., a video processing payload), and non-critical (e.g., monitoring) functions. The booth shows how mixed-criticality is captured, together with the description of the functional architecture, and of the multi-core embedded platform where the system is implemented; how CONTREP automates different design activities, i.e. model validation, performance assessment and design space exploration, exploiting mixed-criticality information in every case.

Download Paper (PDF)
UB08.9CHIMPANC: CHANGE MANAGEMENT USING CHIMPANC
Presenter:
Jannis Stoppe, DFKI and University of Bremen, DE
Authors:
Jannis Stoppe, Martin Ring and Rolf Drechsler, DFKI and University of Bremen, DE
Abstract
One approach to remedy the issue of increasing complexity in the hardware design process is to provide designers with more abstract languages that allow systems to be designed top-down, starting with an abstract model of the system and its requirements. Several of these languages such as SysML and SystemC are being used today. We propose the Change Impact Analysis and Control Tool (ChImpAnC) to handle these challenges. ChImpAnC extracts the relevant information from the models on the different levels and constructs mappings between them, thus allowing to check consistency and refinements, and moreover calculating the impact of changes. Thus, ChImpAnC ensures that e.g. a written specification or documentation is not made obsolete by changes in the implementation without being warned about it.

Download Paper (PDF)
UB08.10LLBMC / QPR-VERIFY: HIGH-PRECISION BOUNDED MODEL CHECKING FOR AUTOMOTIVE SOFTWARE
Presenter:
Carsten Sinz, Karlsruhe Institute of Technology (KIT), DE
Authors:
Carsten Sinz, David Farago, Florian Merz and Reimo Schaupp, Karlsruhe Institute of Technology (KIT), DE
Abstract
LLBMC (the low-level bounded model checker) is a static software analysis tool for finding bugs in C (and, to some extent, in C++) programs. It is mainly intended for checking low-level system code and is based on the technique of Bounded Model Checking. LLBMC is fully automatic and requires minimal preparation efforts and user interaction. It supports all C constructs, including not so common features such as bitfields. LLBMC models memory accesses (heap, stack, global variables) with high precision and is thus able to find hard-to-detect memory access errors like heap or stack buffer overflows. LLBMC can also uncover errors due to uninitalized variables or other sources of non-deterministic behavior. Due to its precise analysis, LLBMC produces almost no false alarms (false positives). LLBMC is developed at Karlsruhe Institute of Technology, and will soon be commercially available via a university spin-off, QPR Technologies.

Download Paper (PDF)
18:00End of session

8.1 SPECIAL DAY Hot Topic: Connectivity in the automotive domain: From micro to macro

Date: Wednesday 16 March 2016
Time: 17:00 - 18:30
Location / Room: Saal 2

Chair:
Henk Corporaal, Eindhoven University of Technology, NL

Co-Chair:
Samarjit Chakraborty, Technische Universität München (TUM), DE

The goal of this session is to discuss issues related to connectivity or communications at various scales in the automotive domain. On one hand we have in-vehicle connectivity issues like cabling, communication buses and their timing analysis, and on the other hand, vehicle-to-vehicle and vehicle-to-infrastructure communication issues are becoming increasingly important. What are the challenges, what are potential solutions, and what are emerging trends, will be discussed in this session

TimeLabelPresentation Title
Authors
17:008.1.1AUTOMOTIVE V2X ON PHONES: ENABLING NEXT-GENERATION MOBILE ITS APPS
Speaker:
Li-Shiuan Peh, Massachusetts Institute of Technology (MIT), US
Authors:
Jason Gao and Li-Shiuan Peh, Massachusetts Institute of Technology (MIT), US

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.1.2EDA FOR AUTOMOTIVE CABLING
Speaker and Author:
Thomas Heurung, Mentor, DE
18:008.1.3DETERMINISTIC ETHERNET IN AUTOMOTIVE APPLICATIONS
Speaker and Author:
Astrit Ademaj, TTTech Computertechnik AG, AT
18:30End of session

8.2 EU Projects Special Session: Towards better EU-projects - Success Stories

Date: Wednesday 16 March 2016
Time: 17:00 - 18:30
Location / Room: Konferenz 6

Organiser:
Roberto Giorgi, University of Siena, IT

Chair:
Cristina Silvano, Politecnico of Milan, IT

Co-Chair:
Roberto Giorgi, University of Siena, IT

From lessons learned to best practices and correct scientific methodologies; In this session several cases are considered showing successful strategies to solve research and industry problems in a European dimension.

TimeLabelPresentation Title
Authors
17:008.2.1COLLECTIVE KNOWLEDGE: TOWARDS R&D SUSTAINABILITY
Speaker:
Anton Lokhmotov, dividiti, GB
Authors:
Grigori Fursin1, Anton Lokhmotov1 and Ed Plowman2
1dividiti, GB; 2ARM, GB
Abstract
Research funding bodies strongly encourage research projects to disseminate discovered knowledge and transfer developed technology to industry. Unfortunately, capturing, sharing, reproducing and building upon experimental results has become close to impossible in computer systems' R&D. The main challenges include the ever changing hardware and software technologies, lack of standard experimental methodology and lack of robust knowledge exchange mechanisms apart from publications where reproducibility is still rarely considered. Supported by the EU FP7 TETRACOM Coordination Action, we have developed Collective Knowledge (CK), an open-source framework and methodology that involves the R&D community to solve the above problems collaboratively. CK helps researchers gradually convert their code and data into reusable components and share them via repositories such as GitHub, design and evolve over time experimental scenarios, replay experiments under the same or similar conditions, apply state-of-the-art statistical techniques, crowdsource experiments across different platforms, and enable interactive publications. Importantly, CK encourages the continuity and sustainability of R&D efforts: researchers and engineers can build upon the work of others and make their own work available for others to build upon. We believe that R&D sustainability will lead to better research and faster commercialization, thus increasing return-on-investment.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:158.2.2LESSONS LEARNED FROM THE EU PROJECT T-CREST
Speaker and Author:
Martin Schoeberl, Technical University of Denmark, DK
Abstract
A three year EU project, such a T-CREST, with partners from all over Europe and with backgrounds from different domains is a challenging endeavor. Successful execution of such a project depends on more factors than simply performing excellent research. Within the three year project T-CREST eight partners from academia and industry developed and evaluated a time-predictable multiprocessor with an accompanying compiler and a worst-case execution time analysis tool. The tight cooperation of the partners and the shared vision of the need of new computer architectures for future real-time systems enabled the successful completion of the T-CREST project. The T-CREST platform is now available, with most components in open source, to be used for future real-time systems and as a platform for further research.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.2.3MULTIPOS: MARIE CURIE NETWORK IN MULTI-TECHNOLOGY POSITIONING
Speaker:
Jari Nurmi, Tampere University of Technology, FI
Authors:
Jari Nurmi and Elena-Simona Lohan, Tampere University of Technology, FI
Abstract
The global navigation market (products and services) is expected to exceed 160 billion EUR revenue in 2015 with significant growth being driven by mobile terminals. Future wireless society needs trustworthiness of the wireless positioning device and eco-friendliness of the transmission-reception process. These are triggered by the user requirements, preferences and targeted applications, and by the type of the environment where navigation takes place. A link has been missing between these user needs/environment awareness (or application layer) and the physical layer where the wireless device is actually designed. The missing link can be created by cognitive approaches, borrowed on one hand from cognitive human behavior, and on the other hand from cognitive computing. Building a cognition stage between the application and physical layers creates a myriad of new possibilities for flexible location-based services and positioning-based applications. MULTI-POS training network is bridging the gap between the lower technology layer and upper application layer involved in wireless mobile location. In addition, MULTI-POS offers comprehensive training to young fellows in the broad field of wireless location, creates novel technologies and business models for the future location-enabled wireless devices, promotes the exchange of fellows in mixed academic-industrial R&D trajectories and in multiple European cultures, and will initiate an educational and research framework that unifies the currently fragmented research activities on technological and applications aspects of wireless navigation. There is strong involvement of industrial partners in the network to accomplish all this.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:458.2.4PROGRAM TRANSFORMATIONS IN THE POLCA PROJECT
Speaker:
Jan Kuper, University of Twente, NL
Authors:
Jan Kuper1, Lutz Schubert2, Kilian Kempf3, Colin Glass4, Daniel Rubio Bonilla4 and Manuel Carro5
1University of Twente, NL; 2University of Ulm, DE; 3Ulm University, DE; 4High Performance Computing Centre Stuttgart, DE; 5Imdea Software Institute Madrid, ES
Abstract
The POLCA project develops annotations on fragments of imperative code to guide program transformations for better utilization of resources. These annotations express the computational essence of the code fragments without referring to memory usage or execution time. That makes the annotations mathematical in nature such that provably correct transformations can be applied to them and the corresponding code fragment can be transformed accordingly for more optimal resource usage, for example on a multi-core platform or on an FPGA.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.2.5COMPUTATION AND COMMUNICATION CHALLENGES TO DEPLOY ROBOTS IN ASSISTED LIVING ENVIRONMENTS
Speaker:
Michael Hübner, Ruhr University Bochum, DE
Authors:
Georgios Keramidas1, Christos Antonopoulos1, Nikolaos S. Voros1, Fynn Schwiegelshohn2, Philipp Wehner2, Jens Rettkowski2, Diana Göhringer2, Michael Hübner2, Stasinos Konstantopoulos3, Theodore Giannakopoulos4, Vangelis Karkaletsis4 and Vaggelis Mariatos5
1Technological Educational Institute of Western Greece, GR; 2Ruhr University Bochum, DE; 3NCSR Demokritos, GR; 4Institute of Informatics and Telecommunications, NCSR "Demokritos", GR; 5AVN Innovative Technology Solutions, CY
Abstract
Demographic and epidemiologic transitions have brought forward a new health care paradigm with the presence of both growing elderly population and chronic diseases. Recent technological advances can support elderly people in their domestic environment assuming that several ethical and clinical requirements can be met. This paper presents an architecture that is able to meet these requirements and investigates the technical challenges introduced by our approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:158.2.6ATHENIS 3D: AUTOMOTIVE TESTED HIGH-VOLTAGE AND EMBEDDED NON-VOLATILE INTEGRATED SOC PLATFORM WITH 3D TECHNOLOGY
Speaker:
Ewald Wachmann, ams AG, AT
Authors:
Ewald Wachmann1, Sergio Saponara2, Cristian Zambelli3, Pierre Tisserand4, Jean Charbonnier5, Tobias Erlbacher6, Saeideh Gruenler6, Christian Hartler1, Joerg Siegert1, Pierre Chassard4, Dieu-My Ton4, Lorenzo Ferrari2 and Luca Fanucci2
1ams AG, AT; 2University of Pisa, IT; 3University of Ferrara, IT; 4Valeo Electrical System, FR; 5CEA-Leti, FR; 6Fraunhofer IISB, DE
Abstract
The ATHENIS_3D FP7 EU project aims at providing new enabling technologies (analog, digital and power components) for high-voltage and high-temperature applications, tested for power systems of new hybrid/electrical vehicles. Innovation is exploited at process/device level (3D chip stacking, wafer level packaging, trench capacitors and TSV-inductors integrated in the interposer, high-reliable non-volatile Magnetic RAM), circuit-level (inductorless high-voltage DC DC converter, high-temperature 28nm System-on-Chip platform) and system-level (compact 3D embedded power mechatronic system). Enabling high integration levels of complex systems, operating in harsh environments, in a single packaged 3D device, ATHENIS_3D allows for one order of magnitude area reduction vs. today PCB-based power and control systems. Integration costs will be consequently reduced in key industrial sectors for Europe where high-voltage/temperature operations are mandatory (vehicles, avionics, space/defence, industrial automation, energy).

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

8.3 Hot Topic: Managing Heterogeneous Computing Resources at Runtime

Date: Wednesday 16 March 2016
Time: 17:00 - 18:30
Location / Room: Konferenz 1

Organisers:
David Andrews, University of Arkansas, US
Christian Plessl, University of Paderborn, DE

Chair:
Daniel Ziener, Hamburg University of Technology, DE

Co-Chair:
José L. Ayala, Complutense University of Madrid, ES

Embedded systems have been using different, specialized computing resources for optimizing the performance, energy consumption and/or real-time constraints of critical application parts. In recent years, we could witness an increasing trend to heterogeneous computing ranging from embedded systems to high performance computing systems. Today, a wide variety of heterogeneous computing architectures are available as off-the-shelf components, such as, heterogeneous SoCs for embedded applications or PCIe-based accelerator cards with FPGAs, GPUs, or many-cores for HPC systems. Also, the programming models, languages and design environments for creating software or hardware configurations for the heterogeneous computing resources are also maturing and increasingly standardized, e.g., OpenCL, OpenACC, and OpenMP. In contrast, the software stack for effectively managing heterogeneous computing resources at runtime is however still largely undeveloped. Hence, the decision at what time and on which computing resource a particular function is executed is explicitly managed at the application level. The constrained view of the application makes it difficult to operate a system to meet global objectives, for example, mapping tasks to available heterogeneous resources such that the performance requirements of all applications are met while minimizing energy consumption. In this hot topic session we focus on run-time systems that strive for overcoming this application-centric view and enable an automated use of heterogeneous computing by dynamically mapping computations to different resources such that global goals are optimized.

TimeLabelPresentation Title
Authors
17:008.3.1RUN TIME INTERPRETATION FOR CREATING CUSTOM ACCELERATORS
Speaker:
David Andrews, University of Arkansas, US
Authors:
Sen Ma, Zeyad Aklah and David Andrews, University of Arkansas, US
Abstract
Despite the significant advancements that have been made in High Level Synthesis, the reconfigurable computing community has not yet managed to achieve a wide-spread use of Field Programmable Gate Arrays (FPGAs) by programmers. Existing barriers that prevent programmers from using FPGAs include the need to work within vendor specific CAD tools, knowledge of hardware programming models, and the requirement to pass each design through a very time consuming synthesis, place and route process. In this paper we present a new approach that takes these barriers out of the design flows for programmers. We move synthesis out of the programmers path by composing pre-synthesized building blocks using a domain-specific language that supports programming patterns tailored to FPGA accelerators. Our results show that the achieved performance of run time assembling accelerators is equivalent to synthesizing a custom block of hardware using automated HLS tools.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.3.2A SELF-ADAPTIVE APPROACH TO EFFICIENTLY MANAGE ENERGY AND PERFORMANCE IN TOMORROW'S HETEROGENEOUS COMPUTING SYSTEMS
Speaker:
Marco Domenico Santambrogio, Politecnico di Milano, IT
Authors:
Ettore Trainiti, Gianluca Durelli, Antonio Miele, Cristiana Bolchini and Marco Domenico Santambrogio, Politecnico di Milano, IT
Abstract
ICT adoption rate boomed during the last decades as well as the power consumption footprint that generates from those technologies. This footprint is expected to more than triple by 2020. Moreover, we are moving towards an on-demand computing scenario, characterized by varying workloads, constituted of diverse applications with different performance requirements, and criticality. A promising approach to address the challenges posed by this scenario is to better exploit specialized computing resources integrated in a heterogeneous system architecture (HSA) by taking advantage of their individual characteristics to optimize the performance/energy trade-off of the overall system. Better exploitation although comes with higher complexity. System architects need to take into account the efficiency of systems units, i.e. GPP(s) either alone or with a single family of accelerators (e.g., GPUs or FPGAs), as well as the applications workload, which often leads to inefficiency in their exploitation, and therefore in performance/energy. The work presented in this paper will address these limitations by exploiting self-adaptivity to allow the system to autonomously decide which specialized resource to exploit for a carbon footprint reduction, due to a more effective execution of the application, optimizing goals that the user can set (e.g., performance, energy, reliability).

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.3.3PERFORMANCE-CENTRIC SCHEDULING WITH TASK MIGRATION FOR A HETEROGENEOUS COMPUTE NODE IN THE DATA CENTER
Speaker:
Christian Plessl, Paderborn University, DE
Authors:
Achim Lösch, Tobias Beisel, Tobias Kenter, Christian Plessl and Marco Platzner, Paderborn University, DE
Abstract
The use of heterogeneous computing resources, such as Graphic Processing Units or other specialized coprocessors, has become widespread in recent years because of their performance and energy efficiency advantages. Approaches for managing and scheduling tasks to heterogeneous resources are still subject to research. Although queuing systems have recently been extended to support accelerator resources, a general solution that manages heterogeneous resources at the operating system-level to exploit a global view of the system state is still missing. In this paper we present a user space scheduler that enables task scheduling and migration on heterogeneous processing resources in Linux. Using run queues for available resources we perform scheduling decisions based on the system state and on task characterization from earlier measurements. With a programming pattern that supports the integration of checkpoints into applications, we preempt tasks and migrate them between three very different compute resources. Considering static and dynamic workload scenarios, we show that this approach can gain up to 17% performance, on average 7%, by effectively avoiding idle resources. We demonstrate that a work-conserving strategy without migration is no suitable alternative.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

8.4 Advanced Methods in High-Level Design

Date: Wednesday 16 March 2016
Time: 17:00 - 18:30
Location / Room: Konferenz 2

Chair:
Fabian Oboril, KIT Germany, DE

Co-Chair:
Luciano Lavagno, Politecnico di Torino, IT

Techniques such as machine learning, spiking neural networks, and probabilistic analysis are being adopted in advanced high-level design methods. This session presents a sampling of these topics, and concludes with a short IP presentation on a new approach to predicting reusable hardware.

TimeLabelPresentation Title
Authors
17:008.4.1ADAPTIVE THRESHOLD NON-PARETO ELIMINATION: RE-THINKING MACHINE LEARNING FOR SYSTEM LEVEL DESIGN SPACE EXPLORATION ON FPGAS
Speaker:
Pingfan Meng, University of California, San Diego, US
Authors:
Pingfan Meng, Alric Althoff, Quentin Gautier and Ryan Kastner, University of California, San Diego, US
Abstract
One major bottleneck of the system level OpenCL-to-FPGA design tools is their extremely time consuming synthesis process (including placement and route). The design space for a typical OpenCL application contains thousands of possible designs even when considering a small number of design space parameters. It costs months of compute time to synthesize all these possible designs into end-to-end FPGA implementations. Thus, the brute force design space exploration (DSE) is impractical for these design tools. Machine learning is one solution that identifies the valuable Pareto designs by sampling only a small portion of the entire design space. However, most of the existing machine learning frameworks focus on improving the design objective regression accuracy, which is not necessarily suitable for the FPGA DSE task. To address this issue, we propose a novel strategy - Adaptive Threshold Non-Pareto Elimination (ATNE). Instead of focusing on regression accuracy improvement, ATNE focuses on understanding and estimating the inaccuracy. ATNE provides a Pareto identification threshold that adapts to the estimated inaccuracy of the regressor. This adaptive threshold results in a more efficient DSE. For the same prediction quality, ATNE reduces the synthesis complexity by 1.6 - 2.89X (hundreds of synthesis hours) against the other state of the art frameworks for FPGA DSE. In addition, ATNE is capable of identifying the Pareto designs for certain difficult design spaces which the other existing frameworks are incapable of exploring effectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.4.2MONITORING OF MTL SPECIFICATIONS WITH IBM'S SPIKING-NEURON MODEL
Speaker:
Konstantin Selyunin, Vienna University of Technology, AT
Authors:
Konstantin Selyunin1, Thang Nguyen2, Ezio Bartocci1, Dejan Nickovic3 and Radu Grosu1
1Vienna University of Technology, AT; 2Infineon Technologies Austria AG, AT; 3AIT Austrian Institute of Technology, AT
Abstract
This paper shows how to use the IBM's TrueNorth spiking neuron model, for monitoring if a digital signal satisfies a metric temporal-logic (MTL) specification. TrueNorth spiking neurons are universal computation blocks, which can perform a variety of deterministic or stochastic tasks (e.g., Boolean/arithmetic opera- tions, filtering, and convolution) depending on the configuration of their parameters. We show how to set these parameters for the deterministic TrueNorth neural-model in order to recognize MTL operators. A TrueNorth circuit then behaves as a runtime MTL monitor. We demonstrate how to translate the neural monitor to synthesizable HDL-code on Xilinx's Zedboard using high-level synthesis. To the best of our knowledge, this is the first application of the IBM's TrueNorth model for runtime monitoring. It also demonstrates the complete flow from a high- level specification to the implementation of a neural monitor in FPGA. As a byproduct, the paper also introduces the first open- source FPGA implementation of the deterministic TrueNorth model. We demonstrate the usefulness of our approach on a case study, the launching of a missile from a battle ship.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.4.3FORMAL PROBABILISTIC ANALYSIS OF DISTRIBUTED RESOURCE MANAGEMENT SCHEMES IN ON-CHIP SYSTEMS
Speaker:
Osman Hasan, School of Electrical Engineering and Computer Science (SEECS), NUST, PK
Authors:
Shafaq Iqtedar1, Osman Hasan1, Muhammad Shafique2 and Jörg Henkel2
1National University of Sciences and Technology (NUST), PK; 2Karlsruhe Institute of Technology (KIT), DE
Abstract
New paradigms for managing resources in on-chip many-core systems come with various issues, among them is the key demand for robust verification of (distributed) resource management schemes before deployment. Moreover, it is important to have a unified framework where different resource management schemes can be formally analyzed and compared for their performance efficiency and robustness. Traditional techniques, like simulation or emulation, are inherently in-exhaustive and thus compromise the completeness and accuracy of the analysis results. In this work, we present a formal approach, based on probabilistic model checking, for evaluating and comparing the performance of different distributed resource management schemes. To illustrate the benefits and applicability of our formal verification and comparative analysis approach, we perform a case study on the comparison of two recent state-of-the-art distributed resource management schemes using the PRISM model checker.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP4-1, 927A Q-GRAM BIRTHMARKING APPROACH TO PREDICTING REUSABLE HARDWARE
Speaker:
Kevin Zeng, Virginia Tech, US
Authors:
Kevin Zeng and Peter Athanas, Virginia Tech, US
Abstract
Designer productivity is a growing concern as overall hardware complexity rises. Design reuse, a key component in productivity, is underutilized. Not only can existing designs be reused, but also the patterns and information contained within them as well. With the increase in the number of circuits available, there requires a need to analyze and retrieve designs with ease in order to accelerate design entry. In this paper, a birthmarking approach using q-grams is presented. Using this technique, design patterns regarding existing circuits can be captured and used to not only suggest similar and reusable designs, but functional blocks throughout the design phase, with little to no effort from the user. Preliminary experiments and case studies of the q-gram birthmarking technique were performed on over 250 circuits from various sources in order to show the feasibility of the proposed methods.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

8.5 Non-volatile Memory Design Methodologies

Date: Wednesday 16 March 2016
Time: 17:00 - 18:30
Location / Room: Konferenz 3

Chair:
Ian O'Connor, Ecole Centrale de Lyon, FR

Co-Chair:
Michael Niemier, University of Notre Dame, US

The first two papers consider hybrid main memories consisting of DRAM and emerging non-volatile memories, and examine system-level optimizations. The last paper considers performance in-memory computing using properties of emerging resistive RAM.

TimeLabelPresentation Title
Authors
17:008.5.1AN OPERATING SYSTEM LEVEL DATA MIGRATION SCHEME IN HYBRID DRAM-NVM MEMORY ARCHITECTURE
Speaker:
Reza Salkhordeh, Sharif University of Technology, IR
Authors:
Reza Salkhordeh and Hossein Asadi, Sharif University of Technology, IR
Abstract
With the emergence of Non-Volatile Memories (NVMs) and their shortcomings such as limited endurance and high power consumption in write requests, several studies have suggested hybrid memory architecture employing both Dynamic Random Access Memory (DRAM) and NVM in a memory system. By conducting a comprehensive experiments, we have observed that such studies lack to consider very important aspects of hybrid memories including the effect of: a) data migrations on performance, b) data migrations on power, and c) the granularity of data migration. This paper presents an efficient data migration scheme at the Operating System level in a hybrid DRAM-NVM memory architecture. In the proposed scheme, two Least Recently Used (LRU) queues, one for DRAM section and one for NVM section, are used for the sake of data migration. With careful characterization of the workloads obtained from PARSEC benchmark suite, the proposed scheme prevents unnecessary migrations and only allows migrations which benefits the system in terms of power and performance. The experimental results show that the proposed scheme can reduce the power consumption up to 79% compared to DRAM-only memory and up to 48% compared to the state-of-the art techniques.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.5.2UNIFIED DRAM AND NVM HYBRID BUFFER CACHE ARCHITECTURE FOR REDUCING JOURNALING OVERHEAD
Speaker:
Lei Ju, Shandong University, CN
Authors:
Zhiyong Zhang, Lei Ju and Zhiping Jia, Shandong University, CN
Abstract
Journaling techniques play an important role in addressing the reliability issue of filesystems caused by the volatile DRAM-based buffer cache. However, journaling techniques introduce a large number of extra storage writes, which greatly degrades the performance of the filesystem. Emerging Non-Volatile Memory (NVM) technologies bring a new perspective of solving the write amplification issue caused by journaling. By adopting NVM as the buffer cache, the committed data can be maintained in NVM before being written back to the storage, thus eliminating the journaling overhead. However, simply replacing DRAM with NVM as the buffer cache suffers from the limited lifetime and relative slow writes of NVM. In this paper, we present a hybrid buffer cache architecture by combing NVM with DRAM to reduce the journaling overhead and overcome the constrains of NVM. In order to better utilize this novel architecture, we first propose a Journaling-Aware Page Management (JAPM) policy. JAPM puts infrequently updated data in NVM to reduce the journaling overhead and frequently updated data in DRAM to improve the write performance and lifetime of the hybrid buffer cache. In addition, data in one transaction may be dispersed in NVM and DRAM simultaneously and different committing policies are required for different storing media, NVM or DRAM. In order to guarantee the atomicity of the transactional execution in the hybrid cache architecture, a Partial In-Place Commit (PIPC) journaling scheme is proposed to coordinate the different committing patterns. We implement the proposed techniques on Linux 3.14.52 and measure the performance with representative I/O-intensive benchmarks. The experimental results show that our scheme effectively improves the I/O performance compared with the ext4 filesystem and prolongs the lifetime of the hybrid buffer cache compared with the Union of Buffer cache and Journaling (UBJ) scheme.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.5.3FAST LOGIC SYNTHESIS FOR RRAM-BASED IN-MEMORY COMPUTING USING MAJORITY-INVERTER GRAPHS
Speaker:
Saeideh Shirinzadeh, University of Bremen, DE
Authors:
Saeideh Shirinzadeh1, Mathias Soeken1, Pierre-Emmanuel Gaillardon2 and Rolf Drechsler3
1University of Bremen, DE; 2University of Utah, US; 3University of Bremen and DFKI, DE
Abstract
Resistive Random Access Memories (RRAMs) have gained high attention for a variety of promising applications especially the design of non-volatile in-memory computing devices. In this paper, we present an approach for the synthesis of RRAM-based logic circuits using the recently proposed Majority-Inverter Graphs (MIGs). We propose a bi-objective algorithm to optimize MIGs with respect to the number of required RRAMs and computational steps in both MAJ-based and IMP-based realizations. Since the number of computational steps is recognized as the main drawback of the RRAM-based logic, we also present an effective algorithm to reduce the number of required steps. Experimental results show that the proposed algorithms achieve higher efficiency compared to the general purpose MIG optimization algorithms, either in finding a good trade-off between both cost metrics or reducing the number of steps. In comparison with the RRAM-based circuits implemented by the state-of-the-art approaches using other well-known data structures the number of required computational steps obtained by our proposed MIG-oriented synthesis approach for large benchmark circuits is reduced up to factor of 26. This strong gain comes from the use of MIGs that provide an efficient and intrinsic representation for RRAM-based computing---particularly in MAJ-based realizations---and the use of techniques proposed for optimization.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP4-2, 32CAPTOPRIL: REDUCING THE PRESSURE OF BIT FLIPS ON HOT LOCATIONS IN NON-VOLATILE MAIN MEMORIES
Speaker:
Majid Jalili, Sharif University of Technology, IR
Authors:
Majid Jalili and Hamid Sarbazi-Azad, Sharif University of Technology, IR
Abstract
High static power consumption and insufficient scalability of the commonly used DRAM main memory technology appear to be tough challenges in upcoming years. Hence, adopting new technologies, i.e. non-volatile memories (NVMs), is a proper choice. NVMs tolerate a low number of write operations while having good scalability and low static power consumption. Due to the non-destructive nature of a read operation and the long latency of a write operation in NVMs, designers use read-before-write (RBW) mechanism to mask the unchanged bits during write operation in order to reduce bit flips. Based on this observation that some specific locations of blocks are responsible for the majority of bit flips, we extend the RBW to further reduce the number of bit flips per write in the memory system. The results taken from full-system simulations reveal that our proposal, called Captopril, can reduce the number of bit flips by 21% and 9%, on average, compared to the baseline and state-of-the-art designs, respectively

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

8.6 Dataflow Modeling and Natural Language Processing

Date: Wednesday 16 March 2016
Time: 17:00 - 18:30
Location / Room: Konferenz 4

Chair:
Dominique Borrione, Laboratoire TIMA, FR

Co-Chair:
Marc Geilen, Eindhoven University of Technology, NL

The first two papers present advances in modeling parallelism and dynamism in dataflow applications. The third paper presents a novel method to extract verification properties from a natural language specification.

TimeLabelPresentation Title
Authors
17:008.6.1EXPLOITING RESOURCE-CONSTRAINED PARALLELISM IN HARD REAL-TIME STREAMING APPLICATIONS
Speaker:
Jelena Spasic, Leiden University, NL
Authors:
Jelena Spasic, Di Liu and Todor Stefanov, Leiden University, NL
Abstract
In this paper, we study the problem of exploiting parallelism when a hard real-time streaming application modeled as a Synchronous Data Flow (SDF) graph is mapped onto a Multi-Processor System-on-Chip (MPSoC) platform. We propose a new unfolding graph transformation and an algorithm that adapts the parallelism in the application according to the resources in an MPSoC by using the unfolding transformation. We evaluate the efficiency of our unfolding graph transformation and the performance and time complexity of our algorithm in comparison to the existing approaches. Experiments on a set of real-life streaming applications demonstrate that: 1) our unfolding transformation gives shorter latency and smaller buffer sizes when compared to the related approaches; and 2) our algorithm finds a solution with smaller code size, smaller buffer sizes and shorter latency in 98\% of the experiments, while meeting the same performance and timing requirements when compared to an existing approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.6.2TRANSACTION PARAMETERIZED DATAFLOW: A MODEL FOR CONTEXT-DEPENDENT STREAMING APPLICATIONS
Speaker:
Xuan Khanh Do, CEA LIST, FR
Authors:
Xuan Khanh Do1, Stéphane Louise1 and Albert Cohen2
1CEA LIST, FR; 2Inria, FR
Abstract
Static dataflow programming models are well suited to the development of embedded many-core systems. However, complex signal and media processing applications often display dynamic behavior that do not fit the classical static restrictions. We propose Transaction Parameterized Dataflow (TPDF), a new model of computation combining integer parameters—to express dynamic rates—and a new type of control actor—to allow topology changes and time constraints enforcement. We present static analyses for liveness and bounded memory usage. We also introduce a static scheduling heuristic to map TPDF to massively parallel embedded platforms. We validate the model and associated methods using a cognitive radio application, demonstrating significant buffer size and performance improvements compared to state of the art models including Cyclo-Static Dataflow (CSDF).

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.6.3GLAST: LEARNING FORMAL GRAMMARS TO TRANSLATE NATURAL LANGUAGE SPECIFICATIONS INTO HARDWARE ASSERTIONS
Speaker:
Christopher Harris, University of California, Irvine, US
Authors:
Christopher Harris and Ian Harris, University of California, Irvine, US
Abstract
The purpose of functional verification is to ensure that a design conforms to its specification. However, large written specifications can contain hundreds of statements describing correct operation which an engineer must use to create sets of correctness properties. This laborious manual process increases both verification time and cost. In this work we present GLAsT, a new learning algorithm which accepts a small set of sentences describing correctness properties and corresponding SystemVerilog Assertions (SVAs). GLAsT creates a custom formal grammar which captures the writing style and sentence structure of a specification and facilitates the automatic translation of English specification sentences into formal SystemVerilog Assertions. We evaluate GLAsT on English sentences from two ARM AMBA bus protocols. Results show that a translation system using the formal grammar generated by GLAsT automatically generates correctly formed SVAs from the targeted AMBA specification as well as from a second, different AMBA bus specification.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP4-3, 292HANDLING COMPLEX DEPENDENCIES IN SYSTEM DESIGN
Speaker:
Mischa Möstl, Technische Universität Braunschweig, DE
Authors:
Mischa Möstl and Rolf Ernst, Technische Universität Braunschweig, DE
Abstract
In this paper we describe a novel strategy to reveal and handle complex dependencies in an incremental and distributed design processes even under the ubiquitous presence of uncertainties concerning model and design. We demonstrate in a case study how to handle epistemic design uncertainty in an iterative process and present how it is possible to selectively exclude dependency paths under certain concerns such as timing by including third party analysis results based on the used models into the dependency analysis. Since the implementation of our approach relies on modern graph analysis libraries it can scale to realistic problem instances.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

8.7 Test Methods Handling Unkowns, 2.50 Integration and Realistic Memory Defects

Date: Wednesday 16 March 2016
Time: 17:00 - 18:30
Location / Room: Konferenz 5

Chair:
Friedrich Hapke, Mentor Graphics Hamburg, DE

Improving the quality of the test and analysis process is crucial from a technical and economical point of view. Novel methods are presented to improve ATPG in the presence of unkowns, to allow pre-bond interposer testing and to exploit the memory infrastructure to improve defect coverage. The session is complemented by an ATPG approach based on behavioral fault models, a hierarchrchical DFT methodology, and a safety analysis process combining simulation-based fault injection with graph-based guidance.

TimeLabelPresentation Title
Authors
17:008.7.1ACCURATE CEGAR-BASED ATPG IN PRESENCE OF UNKNOWN VALUES FOR LARGE INDUSTRIAL DESIGNS
Speaker:
Karsten Scheibler, University of Freiburg, DE
Authors:
Karsten Scheibler, Dominik Erb and Bernd Becker, University of Freiburg, DE
Abstract
Unknown values emerge during the design and test generation process as well as during later test application and system operation. They adversely affect the test quality by reducing the controllability and observability of internal circuit structures -- resulting in a loss of fault coverage. To handle unknown values, conventional test generation algorithms as used in state-of-the-art commercial tools, rely on n-valued algebras. However, n-valued algebras introduce pessimism as soon as X-values reconverge. Consequently, these algorithms fail to compute the accurate result. Therefore, this paper focuses on a new highly incremental CEGAR-based algorithm that overcomes these limitations and hence is completely accurate in presence of unknown values. It relies on a modified SAT-solver tailored to this specific problem. The experimental results for circuits with up to 2.400.000 gates show that this combination allows high accuracy and high scalability at the same time. Compared to a state-of-the-art commercial tool, the fault coverage could be increased significantly. Furthermore, the runtime is reduced remarkably compared to a QBF-based encoding of the problem.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.7.2(Best Paper Award Candidate)
PRE-BOND TESTING OF THE SILICON INTERPOSER IN 2.5D ICS
Speaker:
Ran Wang, Duke University, US
Authors:
Ran Wang1, Zipeng Li1, Sukeshwar Kannan2 and Krishnendu Chakrabarty1
1Duke University, US; 2Global Foundries Inc., US
Abstract
In interposer-based 2.5D integrated circuits, the silicon interposer is the least expensive component in the chip. Thus, it is desirable to test the interposer before bonding to ensure that more expensive and defect-free dies are not stacked on a faulty interposer. We present an efficient method to locate defects in the interposer before stacking. The proposed test architecture uses e-fuses that can be programmed to connect or disconnect functional paths inside the interposer. The concept of die footprint is utilized for interconnect testing, and the overall assembly and test flow is described. In order to reduce test time, the concept of weighted critical area is defined and utilized.We present HSPICE simulation results to demonstrate the effectiveness of the pre-bond test solution. The benefit of using weighted critical area is demonstrated using a commercial interposer from industry.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.7.3IMPROVING SRAM TEST QUALITY BY LEVERAGING SELF-TIMED CIRCUITS
Speaker:
Josef Kinseher, Intel Mobile Communications, DE
Authors:
Josef Kinseher1, Leonardo B. Zordan2, Ilia Polian3 and Andreas Leininger1
1Intel Mobile Communications, DE; 2Intel Mobile Communications, FR; 3University of Passau, DE
Abstract
As process technology continues to scale, SRAM test quality has become a growing concern in modern System-on- a-Chips. Ensuring high test quality while keeping costs low requires increasingly effective memory test solutions. This paper proposes the reuse of self-timing mechanisms that are integrated in many state-of-the-art SRAMs as a programmable DFT solution to improve the defect coverage of memory test algorithms. Its effectiveness is analyzed based on the injection of resistive-open defects inside SRAM core-cells. Simulation results of an industrial 28nm memory design show that the proposed test solution increases the coverage of studied defects by up to 30% dependent on their location, while not requiring extra circuitry inside the SRAM.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP4-4, 260A SYNTHESIS-AGNOSTIC BEHAVIORAL FAULT MODEL FOR HIGH GATE-LEVEL FAULT COVERAGE
Speaker:
Anton Karputkin, Tallinn University of Technology, EE
Authors:
Anton Karputkin and Jaan Raik, Tallinn University of Technology, EE
Abstract
Early design space exploration is a practice for avoiding issues that manifest themselves at late design phases. Nevertheless, the test development has traditionally been postponed to the final stages of the design process. At the same time, more and more IP designs are sold at the RTL, where details of exact gate-level implementation are unknown. While a range of RTL ATPG methods has been developed over the past decades, the fault models are too inaccurate in order to guarantee full coverage for the gate-level faults. This paper fills the gap by proposing a synthesis-agnostic ATPG based on extending behavioral fault models in order to allow targeting stuck-at faults in the gate-level implementations of RTL designs regardless of the synthesis decisions made. Moreover, the approach does not require adding scan paths and therefore the obtained test sequences serve as at-speed, functional mode tests. Experiments on a set of benchmarks and an industrial design show that the proposed fault models are superior to the previous approaches in terms of stuck-at fault coverage. Comparison with a state-of-the-art gate-level sequential ATPG show higher or equal coverage for the proposed technique achieved at shorter runtimes.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:32IP4-6, 672(Best Paper Award Candidate)
COMBINING GRAPH-BASED GUIDANCE AND ERROR EFFECT SIMULATION FOR EFFICIENT SAFETY ANALYSIS
Speaker:
Jo Laufenberg, Universität Tübingen, DE
Authors:
Jo Laufenberg1, Sebastian Reiter2, Alexander Viehl2, Thomas Kropf1, Wolfgang Rosenstiel1 and Oliver Bringmann1
1Universität Tübingen, DE; 2FZI Forschungszentrum Informatik, DE
Abstract
The increasing number of complex embedded systems used in safety relevant tasks produce a major challenge in the field of safety analysis. This paper presents a simulation-based safety analysis that will overcome the challenges resulting from this development. The presented approach consists of two parts: an Error Effect Simulation (EES) and a graph-based specification. The EES is composed of a system simulation with fault injection capability and a generic fault specification. The graph-based specification approach guides systematically the EES and enables a very efficient exploration of the analysis space. Inherent in the graph-based specification is the documentation of the safety analysis and a coverage approach to assess the executed safety analysis. Combining these parts leads to an efficient and automatable framework for safety analysis. A use case of an interconnected electronic control system shows the application of the approach and highlights the benefits for a safety analysis, for example a failure mode and effect analysis.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

8.8 Model Based Design and Verification Day - Tutorial: An Industry Approach to FPGA/ARM System Development and Verification

Date: Wednesday 16 March 2016
Time: 17:00 - 18:30
Location / Room: Exhibition Theatre

Moderator:
John Zhao, MathWorks Inc., US

With its special "Model Based Design and Verification Day" DATE 2016 for the first time combines a visionary keynote from an industrial leader, application talks of experienced users and an industrial tutorial with two sessions of the DATE conference Technical Program on latest research results in the field. This gives attendees the opportunity to get a comprehensive overview on start-of-the-art in model based design and test, ranging from industrial application to academic research.

This session concludes the day with an industrial tutorial on FPGA/ARM System Development and Verification. The previous sessions of this day were Exhibition Theatre session 5.8 with an Exhibition Keynote given by Jim Tung, MathWorks Fellow at MathWorks Inc., and an Application Talk given by Robert Stewart, MathWorks Professor at University of Strathclyde, followed by the Technical Program sessions 6.6 and 7.6 covering research work on modelling and control of cyber-physical systems and techniques for the analysis and testing of embedded software, respectively.

Click here to download The MathWorks "Model Based Design and Verification Day" flyer.

TimeLabelPresentation Title
Authors
17:008.8.1TUTORIAL: AN INDUSTRY APPROACH TO FPGA/ARM SYSTEM DEVELOPMENT AND VERIFICATION
Speaker:
John Zhao, MathWorks Inc., US
Abstract

MATLAB and Simulink provide a rich environment for embedded-system development, with libraries of proven, specialized algorithms ready to use for specific applications.  The environment enables a model-based design workflow for fast prototyping and implementation of the algorithms on heterogeneous embedded targets, such as MPSoC.  A system-level design approach enables architectural exploration and partitioning, as well as coordination between SW and HW development workflows.  Functional verification throughout the design process improves coverage and test-case generation while reducing the time and resources required.

In this set of tutorial sessions, you will learn

  • How to implement an application that leverages the FPGA and ARM core of a Zynq SOC
  • The flexibility and diversity of the approach through examples that include prototyping a motor control algorithm and a video-processing algorithm.
  • A HW/SW co-design workflow that combines system level design and simulation with automatic code generation
  • Successful use of the HW/SW co-design workflow in commercial development
  • Functional verification using MATLAB and Simulink in a SystemVerilog workflow illustrated by a detailed example

Subsessions:

"A HARDWARE / SOFTWARE CO-DESIGN APPROACH FOR MPSOC"

"PROTOTYPING MATLAB AND SIMULINK DESIGN ON FPGA"

"CONNECTING SIMULINK WITH SYSTEMVERILOG FOR FUNCTIONAL VERIFICATION"

18:30End of session

DATE-Party

Date: Wednesday 16 March 2016
Time: 19:30 - 23:00
Location / Room:

The DATE Party traditionally states one of the highlights of the DATE week. As one of the main networking opportunities during the DATE week, is is a perfect occasion to meet friends and colleagues in a relaxed atmosphere while enjoying local amenities. It is scheduled on March 16, 2016, from 1900 to 2300. This year, it will take place in one of Dresden's most outstanding museum locations, the Albertinum Dresden. This museum with its spectacular architecture has been reopened in June 2010 and houses the art gallery "Neue Meister" and the "Skulpturensammlung" (sculpture collection). You may continue the talks and discussions in a relaxed atmosphere while enjoying culinary delights Please kindly note that it is not a seated dinner. All delegates, exhibitors and their guests are invited to attend the party. Please be aware that entrance is only possible with a valid party ticket. Each full conference registration includes a ticket for the DATE Party (which needs to be booked during the online registration process though). Additional tickets can be purchased on-site at the registration desk (subject to availability of tickets). Price for extra ticket: 60 € per person. How to get there: A joint walk from the congress centre to the Albertinum Dresden will be organized, starting at 1900 from the main entrance of the ICC Dresden.

TimeLabelPresentation Title
Authors
23:00End of session

9.1 SPECIAL DAY Embedded Tutorial: Embedded Systems Security

Date: Thursday 17 March 2016
Time: 08:30 - 10:00
Location / Room: Saal 2

Chair:
Matthias Schunter, Intel, DE

Co-Chair:
Wieland Fischer, Infineon Technologies, DE

HW designers need to understand SW attacks. SW designers need to understand the HW platform. In this first session of the special day on secure systems, we present an embedded tutorial on low-level software attacks. This is essential to understand the HW architecture modifications that are being made to make embedded HW/SW platforms more secure.

TimeLabelPresentation Title
Authors
08:309.1.1SOFTWARE SECURITY: VULNERABILITIES AND COUNTERMEASURES FOR TWO ATTACKER MODELS
Speaker:
Frank Piessens, Katholieke Universiteit Leuven, BE
Authors:
Frank Piessens and Ingrid Verbauwhede, Katholieke Universiteit Leuven, BE
Abstract
History has shown that attacks against network-connected software based systems are common and dangerous. An important fraction of these attacks exploit implementation details of the software based system. These attacks - sometimes called low-level attacks - rely on characteristics of the hardware, compiler or operating system used to execute software programs to make these programs misbehave, or to extract sensitive information from them. With the increased Internet-connectivity of embedded devices, including industrial control systems, sensors as well as consumer devices, there is a substantial risk that similar attacks will target these devices. This tutorial paper explains the vulnerabilities, attacks and countermeasures relevant for low-level software security. The paper discusses software security for two different attacker models: the classic model of an attacker that can only interact with the program by providing input and reading output, and the more recent and challenging model of an attacker that controls part of the execution platform on which the software runs, for instance because the attacker has compromised the operating system, or some of the libraries that the software under attack relies on.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area

9.2 Managing the Traffic Jam in NoC

Date: Thursday 17 March 2016
Time: 08:30 - 10:00
Location / Room: Konferenz 6

Chair:
Nader Bagherzadeh, University of California Irvine, US

Co-Chair:
Massoud Daneshtalab, KTH, SE

Multi-core systems-on-chip integrate a growing number of heterogeneous components, leading to increasingly more complex traffic patterns. This section presents three contributions to manage the growing traffic challenges in NoCs: the first paper proposes a traffic splitting model for application-specific NosCs, the second presents a MCAPI-compliant hardware buffer manager to support communication among heterogeneous components, and the third employs an overlay network and scheduling unit to provide latency guarantees for hard real-time transmissions.

TimeLabelPresentation Title
Authors
08:309.2.1OLITS: AN OHM'S LAW-LIKE TRAFFIC SPLITTING MODEL BASED ON CONGESTION PREDICTION
Speaker:
Gaoming Du, Hefei University of Technology, CN
Authors:
Gaoming Du1, Yanghao Ou1, Xiangyang Li1, Ping Song1, Zhonghai Lu2 and Minglun Gao1
1Hefei University of Technology, CN; 2KTH Royal Institute of Technology, SE
Abstract
Through traffic splitting, multi-path routing in Network-on-Chip (NoC) outperforms single-path routing in terms of load balance and resource utilization. However, uncontrolled traffic splitting may aggravate network congestion, thus worsen communication delay. We propose OLITS, an Ohm's Law-like traffic splitting model, for application-specific NoC. We first redefine the contention matrix to characterize the flow congestion state, which contains flow parameters such as average flow rate and burstiness. We then define flow resistance as the flow congestion factor extracted from the contention matrix, and use the parallel resistance theory to predicate the congestion state for every target sub-flow. Finally, the traffic splitting proportions of the parallel sub-flows are assigned according to the equivalent flow resistance. Experiments are taken both on 2D and 3D multi-path routing NoCs. The results show that the worst-case delay bound of target flow is significantly improved, and network congestion can be effectively balanced.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:009.2.2MCAPI-COMPLIANT HARDWARE BUFFER MANAGER MECHANISM TO SUPPORT COMMUNICATION IN MULTI-CORE ARCHITECTURES
Speaker:
Romain Lemaire, CEA-Leti, FR
Authors:
Thiago Raupp da Rosa, Thomas Mesquida, Romain Lemaire and Fabien Clermidy, CEA-Leti, FR
Abstract
High performance and high power efficiency are two mandatory constraints for multi-core systems in order to successfully handle the most recent applications in several fields, e.g. image processing and communication standards. Nowadays, hardware accelerators are often used along with several processing cores to achieve the desired performance while keeping high power efficiency. However, such systems impose an increased programming complexity due to the lack of software standards that supports heterogeneity, frequently leading to custom solutions. On the other hand, implementing a standard software solution for embedded systems might induce significant overheads. This work presents a hardware mechanism in co-design with a standard programming interface (API) for embedded systems focusing to decrease overheads imposed by software implementation while increasing programmability and communication performance. The results show gains of up to 97% in latency and an increase of 40 times in throughput for synthetic traffics and an average decrease of 95% in communication time for an image processing application.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:309.2.3SLACK-BASED RESOURCE ARBITRATION FOR REAL-TIME NETWORKS-ON-CHIP
Speaker:
Adam Kostrzewa, Technische Universität Braunschweig, DE
Authors:
Adam Kostrzewa, Selma Saidi and Rolf Ernst, Technische Universität Braunschweig, DE
Abstract
Networks-on-Chip (NoCs) designed for real-time systems must efficiently deal with a broad diversity of traffic requirements. This demands providing latency guarantees for hard real-time transmissions with minimum impact on performance sensitive best-effort traffic. In this work, we present a novel mechanism which achieves this goal through a slack-based, global and dynamic prioritization of data streams. This is performed using an overlay network and scheduling unit combining local arbitration performed in routers with global scheduling of entire logical transmissions for end to end guarantees. Consequently, our approach allows to decrease both hardware and temporal overhead when compared with existing solutions and to achieve a performance improvement up to 60%.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP4-7, 180PACKET SECURITY WITH PATH SENSITIZATION FOR NOCS
Speaker:
Travis Boraten, Ohio University, US
Authors:
Travis Boraten and Avinash Kodi, Ohio University, US
Abstract
Hardware security is becoming a major concern as integrated circuits (IC) are exponentially growing thanks to technology scaling. With ICs reaching upwards of billions of transistors, detecting hardware trojans (HT) is like finding a needle in a haystack. Therefore, it becomes imperative to protect critical computing infrastructure from malicious attackers attempting to unearth vital information. Security enhancements should offer resiliency to limit their impact on overall chip performance as HTs are likely to slip through detection mechanisms. In this paper, we propose packet-security (P-Sec) a packet validation technique to protect compromised network-on-chip (NoC) architectures from fault injection side channel attacks and covert HT communication by merging two robust error detection schemes, namely algebraic manipulation detection (AMD) and cyclic redundancy check (CRC) codes. With P-Sec, applications containing sensitive and encrypted data can be protected from an ideal attacker using AMD codes at the cost of marginal area and power overhead in the network interface but with enhanced security on demand.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area

9.3 Industrial Experiences

Date: Thursday 17 March 2016
Time: 08:30 - 10:00
Location / Room: Konferenz 1

Chair:
Geoff Merrett, University of Southampton, GB

Co-Chair:
Stephan Diestelhorst, ARM, US

This session presents various inrustrial relevant experiences on automotive electronics, industrial automation, sensor fusions, energy efficient design, and reliability in advanced process nodes.

TimeLabelPresentation Title
Authors
08:309.3.1CHALLENGES OF USING ON-CHIP PERFORMANCE MONITORS FOR PROCESS AND ENVIRONMENTAL VARIATION COMPENSATION
Speaker:
Mahroo Zandrahimi, Delft University of Technology, NL
Authors:
Mahroo Zandrahimi1, Zaid Al-Ars1, Philippe Debaud2 and Armand Castillejo2
1Delft University of Technology, NL; 2STMicroelectronics, FR
Abstract
Circuit monitoring techniques have been adopted widely to compensate for process, voltage, and temperature variations as well as power optimization of integrated circuits. For cost and complexity reasons, these techniques are usually implemented by means of performance monitors allowing fast performance evaluation during production. In this paper, we demonstrate the limitations of performance monitoring methodologies in terms of accuracy and effectiveness. Silicon measurements of a nanometric FD-SOI device show that the required design margin is above 10% of the clock cycle, which leads to unacceptable waste of power.

Download Paper (PDF; Only available from the DATE venue WiFi)
08:459.3.2STUDY OF WORKLOAD IMPACT ON BTI HCI INDUCED AGING OF DIGITAL CIRCUITS
Speaker:
Ajith Sivadasan, ST Microelectronics and TIMA, FR
Authors:
Ajith Sivadasan1, Florian Cacho2, Sidi-Ahmed Benhassain1, Vincent Huard2 and Lorena Anghel3
1ST Microelectronics and TIMA, FR; 2STMicroelectronics, FR; 3TIMA, FR
Abstract
Workload characterization of digital circuits using industry standard benchmarks gives an insight into the performance and energy characteristics of processor designs. Aging studies of digital circuits due to BTI, HCI is gaining importance since a higher impact on the performance of circuits can be observed as we scale down gate dimensions. For embedded system applications, the workload may very well dictate the lifetime of a system. This article aims to study the influence of different workloads on the degradation of critical path which determines the reliability of a system. A top-down circuit activity and probability analysis is carried out leading to an accurate estimation of aging due to HCI and BTI of critical path elements at the design stage. A dedicated simulation flow has been set up, from RTL simulation down to gate level cell timing analysis mapped onto 28nm FDSOI technology from STMicroelectronics. The objective is to correlate path delay timing with aging of critical path cells. Simulation results indicate that the higher complexity of an execution program may not necessarily lead to a higher rate of degradation of the critical path considering that aging is primarily driven by the workload dependent activity and the probability of critical path combinational logic elements. Keywords— Workload, Aging, Critical Path, Reliability

Download Paper (PDF; Only available from the DATE venue WiFi)
09:009.3.3FAST PROTOTYPING PLATFORM FOR NAVIGATION SYSTEMS WITH SENSORS FUSION
Speaker:
Karim Ben Chehida, CEA LIST, FR
Authors:
Charly Bechara1, Karim Ben Chehida1, Mickael Guibert1, Renaud Schmit1, Maria Lepecq1, Laurent Soulier1, Thomas Dombek1 and Yann Leclerc2
1CEA LIST, FR; 2M3Systems, FR
Abstract
With the increase demand for robust and precise navigation systems, sensor fusion algorithms have become the only solution exploited that meet the requirements of these systems. However, these algorithms are computation-intensive and require hybrid processing resources. In this paper, we present a unified fast prototyping platform for navigation systems with sensors fusion. The platform is designed based on representative sensor and navigation algorithms requirements analysis and consists of a complete hardware/software framework, as well as FPGA hardware accelerators for the identified compute intensive parts (vision and GNSS). For instance, the hardware point of interest tracker used in vision-based localization algorithms accelerates the performance by 10 with respect to the software implementation. The prototyping platform can be used by algorithms designers to implement and test rapidly their sensor fusion algorithms.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:159.3.4PRECISION TIMED INDUSTRIAL AUTOMATION SYSTEMS
Speaker:
Partha Roop, University of Auckland, NZ
Authors:
Matthew Kuo, Sidharta Andalam and Partha Roop, University of Auckland, NZ
Abstract
For Programmable Logic Controllers (PLCs) that implement safety-critical industrial automation systems, timing correctness is as important as its functional correctness. Modern PLCs employ run-time environments and/or general purpose processors designed by ARM, Intel and Freescale to implement real-time systems. However, general purpose processors are designed to improve the average case performance and ignore the worst case performance. This makes it nearly impossible to guarantee the timing correctness of safety-critical applications. In this paper, we apply the recently developed PRET philosophy to propose Precision Timed Industrial Automation (PTIA) Systems for the design of precision timed industrial automation systems.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:309.3.5AUTOSAR-BASED COMMUNICATION COPROCESSOR FOR AUTOMOTIVE ECUS
Speaker:
Ahmed Hamed, Mentor Graphics Corporation, EG
Authors:
Ahmed Hamed1, Mona Safar2, M. Watheq El-Kharashi2 and Ashraf Salem1
1Mentor Graphics Corporation, EG; 2Ain Shams University, EG
Abstract
In this paper, we present a novel approach to enhance the performance of the AUTOSAR-based Electronic Control Units. The execution time consumed by main functions, called from the application used in an Engine Control Management AUTOSAR-based Electronic Control Unit, has been analyzed. The analysis shows that the operations done by the AUTOSAR communication module are the most Electronic Control Unit time-consuming operations. Our approach modifies the design model of the AUTOSAR Layered Software Architecture by adding the communication coprocessor component. This model-based hardware/software codesign expedites the AUTOSAR communication operations while keeping the interfaces with the upper and lower layers unchanged. The coprocessor covers two communication-based operations. It consists of six building blocks. It communicates with the original Electronic Control Unit through the External Peripheral Interface module, which is a high speed parallel bus for external peripherals. The implemented coprocessor achieves up to 140x speedup over the software communication module solution. This gives a room to extend the automotive applications and increase the amount of the exchanged information by these applications without affecting the performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:459.3.6MANTISSA-MASKING FOR ENERGY-EFFICIENT FLOATING-POINT LTE UPLINK MIMO BASEBAND PROCESSING
Speaker:
Tomas Henriksson, Huawei Sweden, SE
Authors:
Daniel Guenther1, Tomas Henriksson2, Rainer Leupers1 and Gerd Ascheid1
1RWTH Aachen University, DE; 2Huawei, Sweden, SE
Abstract
The increasingly diverse wireless communication ecosystem has given rise to flexible, programmable platforms for wireless baseband processing. This industry case study presents advance development results of a fully programmable, flexible floating-point DSP architecture for uplink (UL) multiple-input, multiple-output (MIMO) baseband processing with runtime-adaptive precision. By tuning the floating-point precision to the application needs, energy consumption can be reduced by up to 23 % per task.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area

9.4 Optimization for Logic and Physical Design

Date: Thursday 17 March 2016
Time: 08:30 - 10:00
Location / Room: Konferenz 2

Chair:
Valeria Bertacco, Univ. of Michigan, US

Co-Chair:
Sven Peyer, IBM, DE

The first paper proposes minimization techniques for Majority-Inverter Graphs. The second paper presents functional rectification taking into account placement information. The third paper combines slack matching gate sizing and repeater insertion to optimize leakage power in asynchronous circuits.

TimeLabelPresentation Title
Authors
08:309.4.1OPTIMIZING MAJORITY-INVERTER GRAPHS WITH FUNCTIONAL HASHING
Speaker:
Mathias Soeken, École Polytechnique Fédérale de Lausanne (EPFL), CH
Authors:
Mathias Soeken1, Pierre-Emmanuel Gaillardon2, Luca Amaru2 and Giovanni De Micheli2
1University of Bremen, DE; 2École Polytechnique Fédérale de Lausanne (EPFL), CH
Abstract
A Majority-Inverter Graph (MIG) is a recently introduced logic representation form whose algebraic and Boolean properties allow for efficient logic optimization. In particular, when considering logic depth reduction, MIG algorithms obtained significantly superior synthesis results as compared to the state-of-the-art approaches based on AND-inverter graphs and commercial tools. In this paper, we present a new MIG optimization algorithm targeting size minimization based on functional hashing. The proposed algorithm makes use of minimum MIG representations which are precomputed for functions up to 4 variables using an approach based on Satisfiability Modulo Theories (SMT). Experimental results show that heavily-optimized MIGs can be further minimized also in size, thanks to our proposed methodology. When using the optimized MIGs as starting point for technology mapping, we were able to improve both depth and area for the arithmetic instances of the EPFL benchmarks beyond the current results achievable by state-of-the-art logic synthesis algorithms.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:009.4.2RESOURCE-AWARE FUNCTIONAL ECO PATCH GENERATION
Speaker:
Iris Hui-Ru Jiang, National Chiao Tung University, TW
Authors:
An-Che Cheng1, Iris Hui-Ru Jiang1 and Jing-Yang Jou2
1National Chiao Tung University, TW; 2National Central University, TW
Abstract
Functional Engineering Change Order (ECO) is necessary for logic rectification at late design stages. Existing works mainly focus on identifying a minimal logic difference between the original netlist and the revised netlist, which is called a patch. The patch is then implemented by technology mapping using spare cells. However, there may be insufficient spare cells around the physical location of the patch, or the wires connecting spare cells are too long, thus causing timing violations and routing congestion. In this paper, we propose a resource-aware functional patch generation approach by gate count and wiring cost estimations. In particular, we estimate the number of spare cells required by a patch and define a cost of wire length on it, which considers the physical location of the patch and a set of nearby spare cells. As a result, the patch with minimal wiring cost instead of minimal size is produced. The experiments are conducted on nine industrial testcases. These testcases reflect real problems faced by designers, and the results show our method is promising.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:309.4.3SIMULTANEOUS SLACK MATCHING, GATE SIZING AND REPEATER INSERTION FOR ASYNCHRONOUS CIRCUITS
Speaker:
Gang Wu, Iowa State University, US
Authors:
Gang Wu and Chris Chu, Iowa State University, US
Abstract
Slack matching, gate sizing and repeater insertion are well known techniques applied to asynchronous circuits to improve their power and performance. Existing asynchronous optimization flows typically perform these optimizations sequentially, which may result in sub-optimal solutions as all these techniques are interdependent and affect one another. In this paper, we present a unified leakage power optimization framework by performing simultaneous slack matching, gate sizing and repeater insertion. In particular, we apply Lagrangian relaxation to integrate all these techniques into a single optimization step. A methodology to handle slack matching under the Lagrangian relaxation framework is proposed. Also, an effective look-up table based repeater insertion technique is developed to speed up the algorithm. Our approach is evaluated using a set of asynchronous designs and compared with both a sequential approach and a commercial asynchronous optimization flow. The experimental results have achieved significant savings in leakage power and demonstrated the effectiveness of our approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP4-8, 287SYNTHESIS OF APPROXIMATE CODERS FOR ON-CHIP INTERCONNECTS USING REVERSIBLE LOGIC
Speaker:
Robert Wille, Johannes Kepler University Linz, AT
Authors:
Robert Wille1, Oliver Keszocze2, Stefan Hillmich2, Marcel Walter2 and Alberto Garcia-Ortiz3
1Johannes Kepler University Linz, AT; 2University of Bremen, DE; 3ITEM (U.Bremen), DE
Abstract
On-chip coding provides a remarkable potential to improve the energy efficiency of on-chip interconnects. However, the logic design of the encoder/decoder faces a main challenge: the area and power overhead should be minimal while, at the same time, decodability has to be guaranteed. To address these problems, we propose the concept of approximate coding, where the coding function is partially specified and the synthesis algorithm has a higher flexibility to simplify the circuit. Since conventional synthesis methods are unsuitable here, we propose an alternative synthesis approach based on reversible logic. Experimental evaluations confirm the benefits of both, the proposed concept of approximate codings as well as the proposed design method.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP4-9, 310DESIGN-SYNTHESIS CO-OPTIMISATION USING SKEWED AND TAPERED GATES
Speaker:
Ankur Shukla, India Systems Development Lab, IBM India, IN
Authors:
Ayan Datta1, James D. Warnock2, Ankur Shukla1, Saurabh Gupta1, Yiu Hing Chan2, Karthik Mohan1 and Charudhattan Nagarajan1
1India Systems Development Lab, IBM India, IN; 2IBM US, US
Abstract
This paper presents a novel technique to optimize the design of non-conventional tapered and skewed standard cell gates, and the synthesis algorithms for efficient usage of such gates in IBMs high-performance 22nm CMOS SOI technology. The focus is on design considerations to ensure that synthesis can use these gates efficiently, leveraging the resulting timing improvements for faster timing closure of high-performance microprocessor designs. A detailed analysis is presented, where by exposing these gates to synthesis at different points in the process, the optimal point of insertion is identified. Also an efficient algorithm is proposed to handle decisions regarding the conversion of conventional gates to non-conventional gates, after taking into account multiple factors including delay and slew. Results show 25 - 30% improvement in total negative slack of designs and 20 -25% reduction in the total number of negative paths, without any major impact on total power of the designs.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:02IP4-10, 690(Best Paper Award Candidate)
A SYNTHESIS-PARAMETER TUNING SYSTEM FOR AUTONOMOUS DESIGN-SPACE EXPLORATION
Speaker:
Matthew Ziegler, IBM T. J. Watson Research Center, US
Authors:
Matthew Ziegler1, Hung-Yi Liu2, George Gristede1, Bruce Owens3, Ricardo Nigaglioni3 and Luca Carloni2
1IBM T. J. Watson Research Center, US; 2Columbia University, US; 3IBM Systems and Technology Group, US
Abstract
Advanced logic and physical synthesis tools provide a vast num-ber of tunable parameters that can significantly impact physical design quality, but the complexity of the parameter design space requires intelligent search algorithms. To fully utilize the opti-mization potential of these tools, we propose SynTunSys, a sys-tem that adds a new level of abstraction between designers and design tools for managing the design space exploration process. SynTunSys takes control of the synthesis-parameter tuning pro-cess, i.e., job submission, results analysis, and next-step decision making, by automating a key portion of a human designer's decision process. We present the overall organization of Syn-TunSys, describe its main components, and provide results from employing it for the design of an industrial chip, the IBM z13 22nm high-performance server chip. During this major design, SynTunSys provided significant savings in human design effort and achieved a quality of results beyond what human designers alone could achieve, yielding on average a 36% improvement in total negative slack and a 7% power reduction.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area

9.5 Formal Bit Precise Reasoning

Date: Thursday 17 March 2016
Time: 08:30 - 10:00
Location / Room: Konferenz 3

Chair:
Markus Wedler, Synopsys GmbH, DE

Co-Chair:
Julien Schmaltz, Eindhoven University of Technology, NL

The session presents advancements in formal reasoning at the bit-level. The first paper makes significant improvements to the verification of multipliers. The second and third papers show progress in the analysis of memory-locked errors and clock domain crossing. The session closes with two IP presentations about using software analyzers for hardware verification and generating word-level models from bit-level designs.

TimeLabelPresentation Title
Authors
08:309.5.1(Best Paper Award Candidate)
FORMAL VERIFICATION OF INTEGER MULTIPLIERS BY COMBINING GRöBNER BASIS WITH LOGIC REDUCTION
Speaker:
Amr Sayed-Ahmed, University of Bremen, DE
Authors:
Amr Sayed Ahmed1, Daniel Grosse1, Ulrich Kühne1, Mathias Soeken1 and Rolf Drechsler2
1University of Bremen, DE; 2University of Bremen and DFKI, DE
Abstract
Formal verification utilizing symbolic computer algebra has demonstrated the ability to formally verify large Galois field arithmetic circuits and basic architectures of integer arithmetic circuits. The technique models the circuit as Gröbner basis polynomials and reduces the polynomial equation of the circuit specification wrt. the polynomials model. However, during the Gröbner basis reduction, the technique suffers from exponential blow-up in the size of the polynomials, if it is applied on parallel adders and recoded multipliers. In this paper, we address the reasons of this blow-up and present an approach that allows to apply the technique on basic and complex parallel architectures of multipliers. The approach is based on applying a logic reduction rule during Gröbner basis rewriting. The rule uses structural circuit information to remove terms that evaluate to zero before their blow-up. The experiments show that the approach is applicable up to 128 bit multipliers.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:009.5.2ROOT-CAUSE ANALYSIS FOR MEMORY-LOCKED ERRORS
Speaker:
John Adler, University of Toronto, CA
Authors:
John Adler, Djordje Maksimovic and Andreas Veneris, University of Toronto, CA
Abstract
Half of the time in the design cycle today is spent on verifying and debugging the correctness of a design. Although some debugging tasks have been automated, determining the root-cause of errors that have been locked in memory for a number of clock cycles before they propagate to an observation point remains a time consuming effort. This is because the error traces exposing such behavior can be excessively long, a fact that requires modeling the circuit for many time-frames. This paper introduces a performance-driven debugging methodology for pinpointing the root-cause of memory-locked errors. The technique models only a sliding time window and a final time window explicitly at any one time, while interstitial time-frames are linked with a lightweight memory model. This technique is later extended to a complete methodology that diagnoses errors that may be missed. Experiments on industrial designs with memory-locked errors demonstrate a 72% reduction in peak memory usage with a comparable runtime to existing methodologies.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:309.5.3FORMAL VERIFICATION OF CLOCK DOMAIN CROSSING USING GATE-LEVEL MODELS OF METASTABLE FLIP-FLOPS
Speaker:
Ghaith Tarawneh, Newcastle University, GB
Authors:
Ghaith Tarawneh, Andrey Mokhov and Alex Yakovlev, Newcastle University, GB
Abstract
Verifying clock domain boundary logic is a major challenge to the design of modern multi-clock systems. We present a novel verification approach that addresses the issue of domain crossing failures at a fundamental level. The approach relies on substituting flip-flops with model circuits and applying topological transformations to simulate the transfer of timing violations in gate-level netlists. This makes timing violations and their effects reproducible in discrete cycle-based simulation and amenable for identification and debugging similar to typical synchronous design failures. We show that this approach, when combined with formal verification, is inherently capable of reproducing many of the problematic issues at clock domain boundaries and outperforms the structural and functional heuristics used by state of the art commercial tools in several respects. It reports fewer false positives, can be applied to non-stereotypical designs, can determine failure consequences, can demonstrate failures in signal waveforms and requires no input from the designer about what design patterns are used. Case examples and verification results of several multi-clock testbench designs are presented.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP4-11, 274UNBOUNDED SAFETY VERIFICATION FOR HARDWARE USING SOFTWARE ANALYZERS
Speaker:
Rajdeep Mukherjee, University of Oxford, GB
Authors:
Rajdeep Mukherjee, Peter Schrammel, Daniel Kroening and Tom Melham, University of Oxford, GB
Abstract
Demand for scalable hardware verification is ever increasing. We propose an unbounded safety verification framework for hardware, at the heart of which is a software verifier. To this end, we synthesize Verilog at register transfer level into a software-netlist, represented as a word-level ANSI-C program. The proposed tool flow allows us to leverage the precision and scalability of state-of-the-art software verification techniques. In particular, we evaluate unbounded proof techniques, such as predicate abstraction, k-induction, interpolation, and IC3/PDR; and we compare the performance of verification tools from the hardware and software domains that use these techniques. To the best of our knowledge, this is the first attempt to perform unbounded verification of hardware using software analyzers.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP4-12, 765VERILOG2SMV: A TOOL FOR WORD-LEVEL VERIFICATION
Speaker:
Ahmed Irfan, Fondazione Bruno Kessler and University of Trento, IT
Authors:
Ahmed Irfan1, Alessandro Cimatti2, Alberto Griggio2, Marco Roveri2 and Roberto Sebastiani3
1Fondazione Bruno Kessler and University of Trento, IT; 2Fondazione Bruno Kessler, IT; 3University of Trento, IT
Abstract
Verification is an essential step of the hardware design lifecycle. Usually verification is done at the gate level (Boolean level). We present verilog2smv, a tool that generates word-level model checking problems from Verilog designs augmented with assertions. A key aspect of our tool is that memories in the designs are treated without any form of abstraction. verilog2smv can be used for RTL verification by chaining with a word-level model checker like nuXmv. To this extent, we present also some experimental results over Verilog verification benchmarks, using verilog2smv + nuXmv as a tool-chain.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:02IP4-13, 717TOWARDS FORMAL VERIFICATION OF REAL-WORLD SYSTEMC TLM PERIPHERAL MODELS - A CASE STUDY
Speaker:
Vladimir Herdt, University of Bremen, DE
Authors:
Hoang M. Le1, Vladimir Herdt1, Daniel Grosse1 and Rolf Drechsler2
1University of Bremen, DE; 2University of Bremen and DFKI, DE
Abstract
SystemC-based Virtual Prototypes (VPs) serve as reference models for various activities in the modern design flow and therefore, the functional correctness of each individual components and the VPs as a whole should be subjected to rigorous formal verification. In the last few years, notable progress on SystemC formal verification has been made. This paper presents a case study on applying a recent approach to formally verify TLM peripheral models. To the best of our knowledge, this is the first formal verification case study targeting this important class of VP components. First, we show how to bridge the gap between the industry-accepted modeling pattern for TLM peripheral models and the semantics currently supported by SystemC formal verification approaches. Then, we report verification results for the interrupt controller of the LEON3-based SoCRocket VP used by the European Space Agency and reflect on our experiences and lessons learned in the process.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area

9.6 Real-Time Scheduling

Date: Thursday 17 March 2016
Time: 08:30 - 10:00
Location / Room: Konferenz 4

Chair:
Frank Slomka, Universität Ulm, DE

Co-Chair:
Kai Lampka, Uppsala University, SE

The papers in this session introduce new scheduling algorithms and schedulability analyses for modern real-time systems, including systems with parallel and self-suspending tasks, and memory-constrained systems.

TimeLabelPresentation Title
Authors
08:309.6.1RESPONSE-TIME ANALYSIS OF DAG TASKS UNDER FIXED PRIORITY SCHEDULING WITH LIMITED PREEMPTIONS
Speaker:
Maria A. Serrano, Barcelona Supercomputing Center and Technical University of Catalonia, ES
Authors:
Maria A. Serrano1, Alessandra Melani2, Marko Bertogna3 and Eduardo Quinones4
1Barcelona Supercomputing Center and Technical University of Catalonia, ES; 2Scuola Superiore Sant'Anna, IT; 3University of Modena, IT; 4Barcelona Supercomputing Center, ES
Abstract
Limited preemptive (LP) scheduling has been demonstrated to effectively improve the schedulability of fully preemptive (FP) and fully non-preemptive (FNP) paradigms. On one side, LP reduces the preemption related overheads of FP; on the other side, it restricts the blocking effects of FNP. However, LP has been applied to multi-core scenarios only when completely sequential task systems are considered. This paper extends the current state-of-the-art response time analysis for global fixed priority scheduling with fixed preemption points by deriving a new response time analysis for DAG-based task-sets.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:009.6.2SPEED OPTIMIZATION FOR TASKS WITH TWO RESOURCES
Speaker:
Alessandra Melani, Scuola Superiore Sant'Anna, IT
Authors:
Alessandra Melani1, Renato Mancuso2, Daniel Cullina2, Marco Caccamo2 and Lothar Thiele3
1Scuola Superiore Sant'Anna, IT; 2University of Illinois at Urbana-Champaign, US; 3Swiss Federal Institute of Technology (ETH), CH
Abstract
Multiple resource co-scheduling algorithms and pipelined execution models are becoming increasingly popular, as they better capture the heterogeneous nature of modern architectures. The problem of scheduling tasks composed of multiple stages tied to different resources goes under the name of "flow-shop scheduling". This problem, studied since the '50s to optimize production plants, is known to be NP-hard in the general case. In this paper, we consider a specific instance of the flow-shop task model that captures the behavior of a two-resource (DMA- CPU) system. In this setting, we study the problem of selecting the optimal operating speed of either resource with the goal of minimizing power consumption while meeting schedulability constraints. We derive an algorithm that finds an exact solution to the problem in polynomial time, hence it is suitable for online operation even in the presence of variable real-time workload.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:309.6.3SELF-SUSPENSION REAL-TIME TASKS UNDER FIXED-RELATIVE-DEADLINE FIXED-PRIORITY SCHEDULING
Speaker:
Wen-Hung Huang, TU Dortmund, DE
Authors:
Wen-Hung Huang and Jian-Jia Chen, TU Dortmund, DE
Abstract
Selfhyp{}suspension is becoming a prominent characteristic in real-time systems such as: (i) I/O-intensive systems (ii) multi-core processors, and (iii) computation offloading systems with coprocessors, like Graphics Processing Units (GPUs). In this work, we study self-suspension systems under fixed-priority (FP) fixed-relative-deadline (FRD) algorithm by using release enforcement to control self-suspension tasks' behavior. Specifically, we use equal-deadline assignment (EDA) to assign the release phases of computations and suspensions. We provide analysis for deriving the speedup factor of the FP FRD scheduler using suspension-laxity-monotonic (SLM) priority assignment. This is the first positive result to provide bounded speedup factor guarantees for general multi-segment self-suspending task systems.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area

9.7 Temperature Awareness in Computing Systems

Date: Thursday 17 March 2016
Time: 08:30 - 10:00
Location / Room: Konferenz 5

Chair:
Muhammad Shafique, Karlsruhe Institute of Technology (KIT), DE

Co-Chair:
Marina Zapater, Complutense University of Madrid, ES

This session covers different hardware and software approaches for thermal optimization in computing systems, from hybrid memory cubes, pipelined real-time systems or multiprocessors.

TimeLabelPresentation Title
Authors
08:309.7.1THERMAL-AWARE DYNAMIC PAGE ALLOCATION POLICY BY FUTURE ACCESS PATTERNS FOR HYBRID MEMORY CUBE (HMC)
Speaker:
Wei Hen Lo, National Tsing Hua University, TW
Authors:
Wei-Hen Lo, Kai-zen Liang and TingTing Hwang, National Tsing Hua University, TW
Abstract
The Hybrid Memory Cube (HMC) is a promising solution to overcome memory wall by stacking DRAM chips on top of a logic die and connecting them with dense and fast Through Silicon Vias (TSVs). However, 3D stacking technique brings another problem: high temperature and temperature variations between the DRAM dies. The thermal problem may lead to chip failure of 3D stacked DRAMs since the temperature may exceed the maximum operating temperature. Dynamic thermal management (DTM) scheme such as bandwidth throttling can effectively decrease the temperature. However, it results in the loss of the performance. To maximize the performance of the system with HMC, the appropriate memory mapping should consider the thermal characteristics of HMC, memory interference and bandwidth variations among processes, and current temperature conditions of each memory channel. This paper proposes a thermal-aware dynamic OS page allocation using future access pattern to find a best performance-oriented setting of the above factors. An analytical model has been proposed to estimate the system performance considering the memory interference, the bandwidth variation, and the throttling impact. Our method can improve the system performance by 12.7% compared to best performance-oriented allocation method (MCP) [1]. The average error rate of our analytical model to predict the trend of performance variations is only 0.86%.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:009.7.2MINIMIZING PEAKTEMPERATURE FOR PIPELINED HARD REAL-TIME SYSTEMS
Speaker:
Long Cheng, Tech­nische Univer­sität München (TUM), DE
Authors:
Long Cheng1, Kai Huang2, Gang Chen1, Biao Hu1 and Alois Knoll1
1Technische Universität München (TUM), DE; 2Sun Yat-sen University, CN
Abstract
This paper addresses the problem of minimizing the peak temperature for pipelined multi-core systems under hard end-to-end deadline constraints by adversely using the Pay-Burst-Only-Once principle. The Periodic Thermal Management is adopted to control the temperature and every core is periodically switched between two power modes. With the peak temperature representation, we first formulate the problem of finding the thermal optimal periodic schemes which satisfies deadline constraints and then present a fast heuristic algorithm to solve it. Adopting real life processor platforms and applications, our simulation demonstrates that our approach reduces the peak temperature by up to 15 celsius on the 4-stage arm platform compared to sub-deadline partition approach. Moreover, our algorithm is shown to be scalable w.r.t. the number of pipelined stages and its effectiveness is validated by the brutally searching approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:309.7.3THERMAL AWARE SCHEDULING AND MAPPING OF MULTIPHASE APPLICATIONS ONTO CHIP MULTIPROCESSOR
Speaker and Author:
Aryabartta Sahu, IIT Guwahati, IN
Abstract
Thermal hot spot and high temperature gradient degrades the reliability and performance of chip multiprocessor. This is an important issue in the current days high transistor density chip multiprocessor. In this paper, we explored the benefits of different temperature aware scheduling and mapping approaches of applications onto chip multiprocessor to reduce the peak temperature. As most application's run time exhibit phase wise behavior, we have exploited the run time phase wise power consumption behavior of the applications to schedule and map the applications on to multicore chip to reduce peak temperature. We have evaluated five scheduling approaches (critical path, modified critical path, energy capped critical path, naive load balancing, and task partitioning and scheduling (TPS)) and five mapping approaches (random, greedy, row-col, checker board and boundary fix checker board) for both synthetic data and real benchmarks on assumed $8 imes 8$ chip multiprocessor. We have taken benefit of both (a) optimal scheduling of tree or chain of unit time tasks on multiprocessor using critical path heuristics and (b) phase wise behavior of applications. Result shows that greedy based mapping approach perform badly as compared to simple low overhead (without incurring extra cost of temperature sensing or prediction) location exchange based approaches when the effect of temperature of neighbor processors is significant. Boundary fix checker board mapping approach achieves up to 40\% reduction in peak temperature as compared to costly greedy mapping approach. Also our results shows critical path based scheduling in combination with location based mapping can reduce peak temperature of chip significantly without much increasing the execution time in executing phase wise applications on chip multiprocessor.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP4-14, 207FREQUENCY SCHEDULING FOR RESILIENT CHIP MULTI-PROCESSORS OPERATING AT NEAR THRESHOLD VOLTAGE
Speaker:
Huawei Li, Chinese Academy of Sciences, CN
Authors:
Ying Wang, Huawei Li and Xiaowei Li, Chinese Academy of Sciences, CN
Abstract
With the recently proposed redundancy-based core salvaging technology, resilient processors can survive the threat of severe timing violation induced by near-threshold Vdd and function correctly at aggressive clock rates. In our observation, proactively disabling the weakest components that limit the core frequency can still maintain a higher throughput at Near Threshold Voltage (NTV) supply if the cores with defected components are salvaged at a low cost. In this work, a resilience-aware frequency scaling and mapping strategy that considers defected processor states in scheduling is proposed to exploit the fault-tolerant architectures for higher energy efficiency. In our evaluation, it is witnessed that typical resilient multi-core processors can achieve significantly higher performance per watt in experiments compared to conventional scheduling policy.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP4-15, 324(Best Paper Award Candidate)
A LOW OVERHEAD ERROR CONFINEMENT METHOD BASED ON APPLICATION STATISTICAL CHARACTERISTICS
Speaker:
Anupam Chattopadhyay, Nanyang Technological University, SG
Authors:
Zheng Wang1, Georgios Karakonstantis2 and Anupam Chattopadhyay3
1RWTH-Aachen University, DE; 2Queen's University, GB; 3Nanyang Technological University, SG
Abstract
Reliability has emerged as a critical design constraint especially in memories. Designers have spent great efforts to guarantee fault free operation of the underlying silicon by adopting redundancy-based techniques, which essentially try to detect and correct every single error. However, such techniques come at a cost of large area, power and performance overheads which make many to doubt their efficiency especially for error resilient systems where 100% accuracy is not always required. In this paper, we present an alternative method focusing on the confinement of the resulting output error induced by any reliability issues. By focusing on memory faults, rather than correcting every single error the proposed method exploits the statistical characteristics of any target application and replaces any erroneous data with the best available estimate of that data. To realize the proposed method a RISC processor is augmented with custom instructions and special-purpose functional units. We apply the method on the proposed enhanced processor by studying the statistical characteristics of the various algorithms involved in a popular multimedia application. Our experimental results show that in contrast to state-of-the-art fault tolerance approaches, we are able to reduce runtime and area overhead by 71.3% and 83.3% respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area

9.8 Embedded Tutorial: Analog-/Mixed-Signal Verification Methods for AMS Coverage Analysis

Date: Thursday 17 March 2016
Time: 08:30 - 10:00
Location / Room: Exhibition Theatre

Organiser:
Gregor Nitsche, OFFIS, DE

Chair:
Lars Hedrich, Johann Wolfgang Goethe-Universität, DE

Co-Chair:
Christoph Grimm, University of Kaiserslautern, DE

Analog-/Mixed-Signal (AMS) design verification is one of the most challenging and time consuming tasks of today's complex system on chip (SoC) designs. Hence, to optimize time to market while ensuring safety and quality of the design, measuring the verification quality became crucial in deciding whether the regarded system is sufficiently tested or verified. Especially in the area of safety-critical design - e.g. automotive hardware and software applications - coverage metrics are commonly used to evaluate the amount of the already invested verification effort by comparing the number of analyzed verification or test scenarios with an overall number of scenarios. Due to the finite and discrete nature of digital systems the overall number can either be obtained from the model of the design (structural coverage) or from its specification (functional coverage). In contrast to digital system design, AMS designers have to deal with a continuous state space of conservative quantities, highly nonlinear relationships, differential equations etc., impeding compositionality and enlarging the number of possible states and behaviors to infinity. In addition to these functional properties, non-functional effects like crosstalk over supply or parasitic coupling have to be investigated in industrial size designs. Moreover, several levels of abstraction have to be considered, requiring methods for system level as well as transistor level circuits. Since digital domain coverage metrics are not directly applicable for AMS circuits and systems, industrial use-cases demand for novel coverage-oriented modeling and verification strategies to be investigated to tackle this challenge, making the quality and quantity of AMS verification measurable. Within this embedded tutorial we present methods and concepts to improve the AMS verification process and to allow for the evaluation of the coverage, proposing different metrics of AMS coverage.

TimeLabelPresentation Title
Authors
08:309.8.0EMBEDDED TUTORIAL: ANALOG-/MIXED-SIGNAL VERIFICATION METHODS FOR AMS COVERAGE ANALYSIS
Speaker:
Gregor Nitsche, OFFIS -- Insitute for Information Technology, DE
Authors:
Gregor Nitsche1, Carna Radojicic2 and Georg Gläser3
1OFFIS Institute for Information Technology, DE; 2University of Kaiserslautern, DE; 3IMMS Institut für Mikroelektronik- und Mechatronik-Systeme gemeinnützige GmbH, DE
Abstract
Analog-/Mixed-Signal (AMS) design verification is one of the most challenging and time consuming tasks of today's complex system on chip (SoC) designs. In contrast to digital system design, AMS designers have to deal with a continuous state space of conservative quantities, highly nonlinear relationships, non-functional influences, etc. enlarging the number of possibly critical scenarios to infinity. In this special session we demonstrate the verification of functional properties using simulative and formal methods. We combine different approaches including automated abstraction and refinement of mixed-level models, state-space discretization as well as affine arithmetic. To reach sufficient verification coverage with reasonable time and effort, we use enhanced simulation schemes to avoid conventional simulation drawbacks.

Download Paper (PDF; Only available from the DATE venue WiFi)
08:309.8.1TOWARDS MORE DEPENDABLE VERIFICATION USING SYMBOLIC SIMULATION
Speaker:
Carna Radojicic, University of Kaiserslautern, DE
Authors:
Carna Radojicic1, Christoph Grimm1, Fabian Speicher2 and Stefan Heinen2
1University of Kaiserslautern, DE; 2RWTH Aachen, DE
09:009.8.2IDENTIFICATION OF CRITICAL SCENARIOS IN AMS VERIFICATION: METHODOLOGY FOR FINDING THE SAFE OPERATING AREA OF AMS SYSTEMS
Speaker:
Georg Gläser, IMMS Institut für Mikroelektronik- und Mechatronik-Systeme gemeinnützige GmbH, DE
Authors:
Georg Gläser1, Hyun-Sek Lukas Lee2, Markus Olbrich2, Erich Barke2 and Eckhard Hennig3
1IMMS Institut für Mikroelektronik- und Mechatronik-Systeme gemeinnützige GmbH, DE; 2Leibniz Universität Hannover, DE; 3Reutlingen University, DE
09:309.8.3AMS LEAF-COMPONENT CHARACTERIZATION WITH CONTRACTS AND SATISFACTION CHECKING VS. ELECTRONIC CIRCUIT SCHEMATICS
Speaker:
Gregor Nitsche, OFFIS - Institute for Information Technology, DE
Authors:
Gregor Nitsche1, Andreas Fürtig2, Lars Hedrich2 and Wolfgang Nebel3
1OFFIS Institute for Information Technology, DE; 2Goethe University, DE; 3University of Oldenburg and OFFIS, DE
10:00End of session
Coffee Break in Exhibition Area

IP4 Interactive Presentations

Date: Thursday 17 March 2016
Time: 10:00 - 10:30
Location / Room: Conference Level, Foyer

Interactive Presentations run simultaneously during a 30-minute slot. A poster associated to the IP paper is on display throughout the afternoon. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session, prior to the actual Interactive Presentation. Moreover, one "Best Interactive Presentation Award" will be given.

LabelPresentation Title
Authors
IP4-1A Q-GRAM BIRTHMARKING APPROACH TO PREDICTING REUSABLE HARDWARE
Speaker:
Kevin Zeng, Virginia Tech, US
Authors:
Kevin Zeng and Peter Athanas, Virginia Tech, US
Abstract
Designer productivity is a growing concern as overall hardware complexity rises. Design reuse, a key component in productivity, is underutilized. Not only can existing designs be reused, but also the patterns and information contained within them as well. With the increase in the number of circuits available, there requires a need to analyze and retrieve designs with ease in order to accelerate design entry. In this paper, a birthmarking approach using q-grams is presented. Using this technique, design patterns regarding existing circuits can be captured and used to not only suggest similar and reusable designs, but functional blocks throughout the design phase, with little to no effort from the user. Preliminary experiments and case studies of the q-gram birthmarking technique were performed on over 250 circuits from various sources in order to show the feasibility of the proposed methods.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-2CAPTOPRIL: REDUCING THE PRESSURE OF BIT FLIPS ON HOT LOCATIONS IN NON-VOLATILE MAIN MEMORIES
Speaker:
Majid Jalili, Sharif University of Technology, IR
Authors:
Majid Jalili and Hamid Sarbazi-Azad, Sharif University of Technology, IR
Abstract
High static power consumption and insufficient scalability of the commonly used DRAM main memory technology appear to be tough challenges in upcoming years. Hence, adopting new technologies, i.e. non-volatile memories (NVMs), is a proper choice. NVMs tolerate a low number of write operations while having good scalability and low static power consumption. Due to the non-destructive nature of a read operation and the long latency of a write operation in NVMs, designers use read-before-write (RBW) mechanism to mask the unchanged bits during write operation in order to reduce bit flips. Based on this observation that some specific locations of blocks are responsible for the majority of bit flips, we extend the RBW to further reduce the number of bit flips per write in the memory system. The results taken from full-system simulations reveal that our proposal, called Captopril, can reduce the number of bit flips by 21% and 9%, on average, compared to the baseline and state-of-the-art designs, respectively

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-3HANDLING COMPLEX DEPENDENCIES IN SYSTEM DESIGN
Speaker:
Mischa Möstl, Technische Universität Braunschweig, DE
Authors:
Mischa Möstl and Rolf Ernst, Technische Universität Braunschweig, DE
Abstract
In this paper we describe a novel strategy to reveal and handle complex dependencies in an incremental and distributed design processes even under the ubiquitous presence of uncertainties concerning model and design. We demonstrate in a case study how to handle epistemic design uncertainty in an iterative process and present how it is possible to selectively exclude dependency paths under certain concerns such as timing by including third party analysis results based on the used models into the dependency analysis. Since the implementation of our approach relies on modern graph analysis libraries it can scale to realistic problem instances.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-4A SYNTHESIS-AGNOSTIC BEHAVIORAL FAULT MODEL FOR HIGH GATE-LEVEL FAULT COVERAGE
Speaker:
Anton Karputkin, Tallinn University of Technology, EE
Authors:
Anton Karputkin and Jaan Raik, Tallinn University of Technology, EE
Abstract
Early design space exploration is a practice for avoiding issues that manifest themselves at late design phases. Nevertheless, the test development has traditionally been postponed to the final stages of the design process. At the same time, more and more IP designs are sold at the RTL, where details of exact gate-level implementation are unknown. While a range of RTL ATPG methods has been developed over the past decades, the fault models are too inaccurate in order to guarantee full coverage for the gate-level faults. This paper fills the gap by proposing a synthesis-agnostic ATPG based on extending behavioral fault models in order to allow targeting stuck-at faults in the gate-level implementations of RTL designs regardless of the synthesis decisions made. Moreover, the approach does not require adding scan paths and therefore the obtained test sequences serve as at-speed, functional mode tests. Experiments on a set of benchmarks and an industrial design show that the proposed fault models are superior to the previous approaches in terms of stuck-at fault coverage. Comparison with a state-of-the-art gate-level sequential ATPG show higher or equal coverage for the proposed technique achieved at shorter runtimes.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-6(Best Paper Award Candidate)
COMBINING GRAPH-BASED GUIDANCE AND ERROR EFFECT SIMULATION FOR EFFICIENT SAFETY ANALYSIS
Speaker:
Jo Laufenberg, Universität Tübingen, DE
Authors:
Jo Laufenberg1, Sebastian Reiter2, Alexander Viehl2, Thomas Kropf1, Wolfgang Rosenstiel1 and Oliver Bringmann1
1Universität Tübingen, DE; 2FZI Forschungszentrum Informatik, DE
Abstract
The increasing number of complex embedded systems used in safety relevant tasks produce a major challenge in the field of safety analysis. This paper presents a simulation-based safety analysis that will overcome the challenges resulting from this development. The presented approach consists of two parts: an Error Effect Simulation (EES) and a graph-based specification. The EES is composed of a system simulation with fault injection capability and a generic fault specification. The graph-based specification approach guides systematically the EES and enables a very efficient exploration of the analysis space. Inherent in the graph-based specification is the documentation of the safety analysis and a coverage approach to assess the executed safety analysis. Combining these parts leads to an efficient and automatable framework for safety analysis. A use case of an interconnected electronic control system shows the application of the approach and highlights the benefits for a safety analysis, for example a failure mode and effect analysis.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-7PACKET SECURITY WITH PATH SENSITIZATION FOR NOCS
Speaker:
Travis Boraten, Ohio University, US
Authors:
Travis Boraten and Avinash Kodi, Ohio University, US
Abstract
Hardware security is becoming a major concern as integrated circuits (IC) are exponentially growing thanks to technology scaling. With ICs reaching upwards of billions of transistors, detecting hardware trojans (HT) is like finding a needle in a haystack. Therefore, it becomes imperative to protect critical computing infrastructure from malicious attackers attempting to unearth vital information. Security enhancements should offer resiliency to limit their impact on overall chip performance as HTs are likely to slip through detection mechanisms. In this paper, we propose packet-security (P-Sec) a packet validation technique to protect compromised network-on-chip (NoC) architectures from fault injection side channel attacks and covert HT communication by merging two robust error detection schemes, namely algebraic manipulation detection (AMD) and cyclic redundancy check (CRC) codes. With P-Sec, applications containing sensitive and encrypted data can be protected from an ideal attacker using AMD codes at the cost of marginal area and power overhead in the network interface but with enhanced security on demand.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-8SYNTHESIS OF APPROXIMATE CODERS FOR ON-CHIP INTERCONNECTS USING REVERSIBLE LOGIC
Speaker:
Robert Wille, Johannes Kepler University Linz, AT
Authors:
Robert Wille1, Oliver Keszocze2, Stefan Hillmich2, Marcel Walter2 and Alberto Garcia-Ortiz3
1Johannes Kepler University Linz, AT; 2University of Bremen, DE; 3ITEM (U.Bremen), DE
Abstract
On-chip coding provides a remarkable potential to improve the energy efficiency of on-chip interconnects. However, the logic design of the encoder/decoder faces a main challenge: the area and power overhead should be minimal while, at the same time, decodability has to be guaranteed. To address these problems, we propose the concept of approximate coding, where the coding function is partially specified and the synthesis algorithm has a higher flexibility to simplify the circuit. Since conventional synthesis methods are unsuitable here, we propose an alternative synthesis approach based on reversible logic. Experimental evaluations confirm the benefits of both, the proposed concept of approximate codings as well as the proposed design method.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-9DESIGN-SYNTHESIS CO-OPTIMISATION USING SKEWED AND TAPERED GATES
Speaker:
Ankur Shukla, India Systems Development Lab, IBM India, IN
Authors:
Ayan Datta1, James D. Warnock2, Ankur Shukla1, Saurabh Gupta1, Yiu Hing Chan2, Karthik Mohan1 and Charudhattan Nagarajan1
1India Systems Development Lab, IBM India, IN; 2IBM US, US
Abstract
This paper presents a novel technique to optimize the design of non-conventional tapered and skewed standard cell gates, and the synthesis algorithms for efficient usage of such gates in IBMs high-performance 22nm CMOS SOI technology. The focus is on design considerations to ensure that synthesis can use these gates efficiently, leveraging the resulting timing improvements for faster timing closure of high-performance microprocessor designs. A detailed analysis is presented, where by exposing these gates to synthesis at different points in the process, the optimal point of insertion is identified. Also an efficient algorithm is proposed to handle decisions regarding the conversion of conventional gates to non-conventional gates, after taking into account multiple factors including delay and slew. Results show 25 - 30% improvement in total negative slack of designs and 20 -25% reduction in the total number of negative paths, without any major impact on total power of the designs.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-10(Best Paper Award Candidate)
A SYNTHESIS-PARAMETER TUNING SYSTEM FOR AUTONOMOUS DESIGN-SPACE EXPLORATION
Speaker:
Matthew Ziegler, IBM T. J. Watson Research Center, US
Authors:
Matthew Ziegler1, Hung-Yi Liu2, George Gristede1, Bruce Owens3, Ricardo Nigaglioni3 and Luca Carloni2
1IBM T. J. Watson Research Center, US; 2Columbia University, US; 3IBM Systems and Technology Group, US
Abstract
Advanced logic and physical synthesis tools provide a vast num-ber of tunable parameters that can significantly impact physical design quality, but the complexity of the parameter design space requires intelligent search algorithms. To fully utilize the opti-mization potential of these tools, we propose SynTunSys, a sys-tem that adds a new level of abstraction between designers and design tools for managing the design space exploration process. SynTunSys takes control of the synthesis-parameter tuning pro-cess, i.e., job submission, results analysis, and next-step decision making, by automating a key portion of a human designer's decision process. We present the overall organization of Syn-TunSys, describe its main components, and provide results from employing it for the design of an industrial chip, the IBM z13 22nm high-performance server chip. During this major design, SynTunSys provided significant savings in human design effort and achieved a quality of results beyond what human designers alone could achieve, yielding on average a 36% improvement in total negative slack and a 7% power reduction.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-11UNBOUNDED SAFETY VERIFICATION FOR HARDWARE USING SOFTWARE ANALYZERS
Speaker:
Rajdeep Mukherjee, University of Oxford, GB
Authors:
Rajdeep Mukherjee, Peter Schrammel, Daniel Kroening and Tom Melham, University of Oxford, GB
Abstract
Demand for scalable hardware verification is ever increasing. We propose an unbounded safety verification framework for hardware, at the heart of which is a software verifier. To this end, we synthesize Verilog at register transfer level into a software-netlist, represented as a word-level ANSI-C program. The proposed tool flow allows us to leverage the precision and scalability of state-of-the-art software verification techniques. In particular, we evaluate unbounded proof techniques, such as predicate abstraction, k-induction, interpolation, and IC3/PDR; and we compare the performance of verification tools from the hardware and software domains that use these techniques. To the best of our knowledge, this is the first attempt to perform unbounded verification of hardware using software analyzers.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-12VERILOG2SMV: A TOOL FOR WORD-LEVEL VERIFICATION
Speaker:
Ahmed Irfan, Fondazione Bruno Kessler and University of Trento, IT
Authors:
Ahmed Irfan1, Alessandro Cimatti2, Alberto Griggio2, Marco Roveri2 and Roberto Sebastiani3
1Fondazione Bruno Kessler and University of Trento, IT; 2Fondazione Bruno Kessler, IT; 3University of Trento, IT
Abstract
Verification is an essential step of the hardware design lifecycle. Usually verification is done at the gate level (Boolean level). We present verilog2smv, a tool that generates word-level model checking problems from Verilog designs augmented with assertions. A key aspect of our tool is that memories in the designs are treated without any form of abstraction. verilog2smv can be used for RTL verification by chaining with a word-level model checker like nuXmv. To this extent, we present also some experimental results over Verilog verification benchmarks, using verilog2smv + nuXmv as a tool-chain.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-13TOWARDS FORMAL VERIFICATION OF REAL-WORLD SYSTEMC TLM PERIPHERAL MODELS - A CASE STUDY
Speaker:
Vladimir Herdt, University of Bremen, DE
Authors:
Hoang M. Le1, Vladimir Herdt1, Daniel Grosse1 and Rolf Drechsler2
1University of Bremen, DE; 2University of Bremen and DFKI, DE
Abstract
SystemC-based Virtual Prototypes (VPs) serve as reference models for various activities in the modern design flow and therefore, the functional correctness of each individual components and the VPs as a whole should be subjected to rigorous formal verification. In the last few years, notable progress on SystemC formal verification has been made. This paper presents a case study on applying a recent approach to formally verify TLM peripheral models. To the best of our knowledge, this is the first formal verification case study targeting this important class of VP components. First, we show how to bridge the gap between the industry-accepted modeling pattern for TLM peripheral models and the semantics currently supported by SystemC formal verification approaches. Then, we report verification results for the interrupt controller of the LEON3-based SoCRocket VP used by the European Space Agency and reflect on our experiences and lessons learned in the process.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-14FREQUENCY SCHEDULING FOR RESILIENT CHIP MULTI-PROCESSORS OPERATING AT NEAR THRESHOLD VOLTAGE
Speaker:
Huawei Li, Chinese Academy of Sciences, CN
Authors:
Ying Wang, Huawei Li and Xiaowei Li, Chinese Academy of Sciences, CN
Abstract
With the recently proposed redundancy-based core salvaging technology, resilient processors can survive the threat of severe timing violation induced by near-threshold Vdd and function correctly at aggressive clock rates. In our observation, proactively disabling the weakest components that limit the core frequency can still maintain a higher throughput at Near Threshold Voltage (NTV) supply if the cores with defected components are salvaged at a low cost. In this work, a resilience-aware frequency scaling and mapping strategy that considers defected processor states in scheduling is proposed to exploit the fault-tolerant architectures for higher energy efficiency. In our evaluation, it is witnessed that typical resilient multi-core processors can achieve significantly higher performance per watt in experiments compared to conventional scheduling policy.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-15(Best Paper Award Candidate)
A LOW OVERHEAD ERROR CONFINEMENT METHOD BASED ON APPLICATION STATISTICAL CHARACTERISTICS
Speaker:
Anupam Chattopadhyay, Nanyang Technological University, SG
Authors:
Zheng Wang1, Georgios Karakonstantis2 and Anupam Chattopadhyay3
1RWTH-Aachen University, DE; 2Queen's University, GB; 3Nanyang Technological University, SG
Abstract
Reliability has emerged as a critical design constraint especially in memories. Designers have spent great efforts to guarantee fault free operation of the underlying silicon by adopting redundancy-based techniques, which essentially try to detect and correct every single error. However, such techniques come at a cost of large area, power and performance overheads which make many to doubt their efficiency especially for error resilient systems where 100% accuracy is not always required. In this paper, we present an alternative method focusing on the confinement of the resulting output error induced by any reliability issues. By focusing on memory faults, rather than correcting every single error the proposed method exploits the statistical characteristics of any target application and replaces any erroneous data with the best available estimate of that data. To realize the proposed method a RISC processor is augmented with custom instructions and special-purpose functional units. We apply the method on the proposed enhanced processor by studying the statistical characteristics of the various algorithms involved in a popular multimedia application. Our experimental results show that in contrast to state-of-the-art fault tolerance approaches, we are able to reduce runtime and area overhead by 71.3% and 83.3% respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)

UB09 Session 9

Date: Thursday 17 March 2016
Time: 10:00 - 12:00
Location / Room: Booth 15, Exhibition Area

LabelPresentation Title
Authors
UB09.1AHLS_DESYNC: DESYNCHRONIZATION TOOL FOR HIGH-LEVEL SYNTHESIS OF ASYNCHRONOUS CIRCUITS
Presenter:
Jean Simatic, TIMA Laboratory, FR
Authors:
Jean Simatic, Rodrigo Possamai Bastos and Laurent Fesquet, TIMA Laboratory, FR
Abstract
We present a tool for the high-level synthesis (HLS) of event-driven (asynchronous) circuits. Our approach first uses an existing HLS tool, AUGH, to generate a synchronous finite state machine (FSM) and a data-path. Then, the presented tool desynchronizes solely the FSM in 5 steps: 1. Parse the FSM to build a state graph containing the control signal assignments. 2. Separate multiplexer control and register control signals by analyzing the data-path. 3. Generate an event-driven FSM netlist by mapping the state graph on a dedicated set of asynchronous controllers. 4. Synthesize the data-path thanks to a commercial synthesis tool (Design Compiler). 5. Estimate the delays in the data-path with a static timing analysis tool (PrimeTime). Insert delays in the controller accordingly. Our demonstration will exhibit two testbenches: a GCD algorithm to expose the basic concepts and a non-uniform sampling FIR filter more representative of real-life applications.

Download Paper (PDF)
UB09.2D-VASIM: TIMING ANALYSIS OF GENETIC LOGIC CIRCUITS USING D-VASIM
Presenter:
Hasan Baig, Technical University of Denmark, DK
Authors:
Hasan Baig and Jan Madsen, Technical University of Denmark, DK
Abstract
A genetic logic circuit is a gene regulator network implemented by re-engineering the DNA of a cell, in order to control gene expression or metabolic pathways, through a logic combination of external signals, such as chemicals or proteins. As for electronic logic circuits, timing and propagation delay analysis may also play a very significant role in the designing of genetic logic circuits. In this demonstration, we present the capability of D-VASim (Dynamic Virtual Analyzer and Simulator) to perform the timing and propagation delay analysis of a single as well as cascaded genetic logic circuits. D-VASim allows user to change the circuit parameters during runtime simulation to observe their effects on circuit's timing behavior. The results obtained from D-VASim can be used not only to characterize the timing behavior of genetic logic circuits but also to analyze the timing constraints of cascaded genetic logic circuits.

Download Paper (PDF)
UB09.3IN-NODE PROCESSING: MODELLING FRAMEWORK FOR IN-NODE PROCESSING IN INDUSTRIAL SENSOR AND ACTUATOR NETWORKS.
Presenter:
Qaiser Anwar, Mid Sweden University, SE
Authors:
Qaiser Anwar, Muhammad Imran and Mattias O´Nils, Mid Sweden University, SE
Abstract
Architecting efficient systems with on-board sensing capabilities with a growing number of sensing devices is a challenging task, in particular because of the range of the technological field, as well as the diversity and complexity of requirements. We present a novel modeling framework, which can describe different implementation strategies for computation of data locally. In this framework, we first describe the systems in Architecture Analysis and Design Language (AADL), following which the described system is exported to XML which is then given input to java based software program. This program automatically generates different implementation options, illustrates different parameters such as processing energy, communication energy, latency and design complexity. To show a proof-of-concept, we have modelled a real-life system in a modelling framework, which shows that the framework can be of use in automated design space and architecture exploration for in-node processing.

Download Paper (PDF)
UB09.4FORMAL VERIFICATION OF CLOCK DOMAIN CROSSING USING GATE-LEVEL MODELS OF METASTABLE FLIP-FLOPS
Presenter:
Ghaith Tarawneh, Newcastle University, GB
Authors:
Ghaith Tarawneh, Andrey Mokhov and Alex Yakovlev, Newcastle University, GB
Abstract
We present a first prototype of a gate-level tool that enables simple and intuitive verification of multi-clock designs. The tool's underlying methodology (described in the paper "Formal Verification of Clock Domain Crossing using Gate-level Models of Metastable Flip-Flops" to be presented in the conference) relies on transforming gate-level netlists so that they can reproduce problematic CDC behaviour digitally. Processed netlists can then be passed to formal verification tools to identify and debug CDC faults. The tool is at an early development stage but consists of a functional Verilog parser and CDC transformation functions that can be invoked from the command line. The demo will showcase the tool using simple sender-receiver circuits. Synthesized netlists will be processed by the tool and then fed to a formal verification tool to identify CDC issues (e.g. missing synchronizers, path convergence). Verification output from source and processed netlists will be compared.

Download Paper (PDF)
UB09.5LISA: ENABLING LAYERED INTEROPERABILITY FOR INTERNET OF THINGS THROUGH LISA
Presenter:
Behailu Shiferaw Negash, University of Turku, FI
Authors:
Behailu Shiferaw Negash1, Amir-Mohammad Rahmani1, Tomi Westerlund1, Pasi Liljeberg1 and Hannu Tenhunen2
1University of Turku, FI; 2University of Turku, FI and Royal Institute of Technology (KTH), SE
Abstract
There is high expectation towards the changes that come with the implementation of the Internet of Things (IoT). However, this vision is limited by the heterogeneous nature of IoT devices. This led to vertical application silos that are incapable of working together. To ease this problem of heterogeneity, we have developed a lightweight interoperability framework, LISA, to hide variations in communication technology and data formats and provide a uniform API for programmers. LISA is inspired by Network on Terminal Architecture (NoTA), an open framework from Nokia Research Center. There are few frameworks for interoperability of IoT. However, these frameworks fail to address the resource limitations of the majority of IoT devices. To the best of our knowledge, LISA is the first framework designed for resource constrained devices. This demonstration shows LISA in action, helping heterogeneous devices interoperate through a gateway in the fog layer between the devices and the cloud.

Download Paper (PDF)
UB09.6AUTOMATED REFINEMENT OF ANALOG/MIXED-SIGNAL SYSTEMC MODELS BY NON-FUNCTIONAL EFFECTS
Presenter:
Georg Gläser, IMMS, DE
Authors:
Georg Gläser1, Hyun-Sek Lukas Lee2, Eckhard Hennig3, Markus Olbrich2 and Erich Barke2
1IMMS, DE; 2Leibniz Universität Hannover, DE; 3Reutlingen University, DE
Abstract
Virtual prototyping of analog/mixed-signal (A/MS) systems is a key concern in the modern design process. The main challenge is performing the verification of functional properties with respect to non-functional effects, e.g. signal and power integrity. System architects are challenged by identifying critical scenarios where these effects possibly degrade or even destroy the system's functionality. We demonstrate a method to automatically extend an existing functional model by non-functional effects. Combined with an accelerated, piecewise-linear (PWL) simulation scheme (PRAISE), we explore the resulting system acceptance regions and identify critical scenarios.

Download Paper (PDF)
UB09.7A CIRCUIT EXTRACTION TOOL FOR FULL CUSTOM DESIGNED MEMS SENSORS
Presenter:
Axel Hald, Robert Bosch GmbH, DE
Authors:
Axel Hald1, Johannes Seelhorst1, Mathias Reimann1, Juergen Scheible2 and Jens Lienig3
1Robert Bosch GmbH, DE; 2Reutlingen University, DE; 3Technische Universität Dresden, DE
Abstract
In contrast to IC design, MEMS design still lacks sophisticated component libraries. Therefore, the physical design of today's MEMS sensors is mostly done by simply drawing polygons. Hence, the sensor structure is only given as plain graphic data which hinders the identification and investigation of topology elements. The growing complexity of future MEMS designs demands a deep and detailed analysis of the sensor structures and the topology elements in order to get a better understanding of the coupling capacitances and parasitics. Our tool is able to extract a circuit out of a MEMS sensor designed in a polygon based design flow. The key feature of this tool is a rule based structure recognition algorithm which identifies the topology elements of the sensor. Thereafter, the electrostatic RC-extraction is performed by a commercial field solver. The extracted lumped elements can be used for further simulation and optimization tasks during the design phase.

Download Paper (PDF)
UB09.8PSMGEN: AUTOMATIC GENERATION OF POWER STATE MACHINES
Presenter:
Alessandro Danese, University of Verona, IT
Authors:
Alessandro Danese1, Graziano Pravadelli1 and Daniel Lorenz2
1University of Verona, IT; 2OFFIS - Institute for Information Technology, DE
Abstract
Power State Machines are a well-known approach to model and simulate the time-based energy consumption of IP cores for early virtual prototyping of SoCs. However, in the most of the works either the presence of PSMs is assumed or they are manually defined starting from a more or less precise knowledge of the functional blocks composing the target IP. To allow a tighter definition of PSMs, we present PSMGen, a tool implementing an automatic methodology for PSMs' generation and an efficient statistical approach for their simulation. The tool requires as input a set of functional traces exposing the IP's behaviours and the corresponding set of power traces over time that represent the golden model of the IP's energy consumption.It then generates PSM's states and transitions through a mining procedure that extracts the IP behaviours from the functional traces, analyses power changes on the power traces and annotate each PSM's state with the corresponding power characterization

Download Paper (PDF)
UB09.96CH-SDR-PLATFORM: 6 CHANNEL SDR PROTOTYPING PLATFORM FOR VEHICLE SELF-LOCALIZATION
Presenter:
Marko Rößler, Technische Universität Chemnitz, DE
Authors:
Marko Rößler1, Ulrich Heinkel1, Daniel Fross1 and Ahmad El-Assaad2
1Technische Universität Chemnitz, DE; 2Novero GmbH, DE
Abstract
Many modern applications depend on location information. Precision and availability out- and indoor get more and more crucial. Acquisition of this information from radio links used for wireless data transfer is logical step. Link-availability, RSSI, timing or phase shifts are byproducts that carry knowledge about the distance between communication endpoints. Extensive signal processing, advanced receiver setups and statistical algorithms allow the extraction of reliable position information. We present a high performance multichannel SDR platform based on FPGA that allows the quick development of respective technology parts. It is based on KC705-Board connecting a Linux PC via PCIe. Featuring three RF-Frontends (AD-FMCOMM-S3) we are able to control six independent paths time synchronous. With 50 MSa/s at 12 bit resolution a data stream of 7.2 Gbit/s can be processed. We target for radio frequency based vehicle self-localization using smart array antennas.

Download Paper (PDF)
UB09.10UCAF TOOL: AN OPTIMIZATION-BASED DESIGN METHODOLOGY FOR ULTRA-LOW VOLTAGE ANALOG INTEGRATED CIRCUITS
Presenter:
Lucas Severo, Federal University of Pampa, BR
Authors:
Lucas Severo1 and Wilhelmus Noije2
1Federal University of Pampa, BR; 2University of São Paulo, BR
Abstract
This work presents an Ultra-Low Voltage (ULV) analog integrated circuit design methodology. This methodology is able to sizing analog circuits using an exploration in design space with Simulated Annealing optimization heuristic and an electrical simulator for the specifications estimation. This exploration includes the analysis of Process, Voltage and Temperature variations in order to reduce the effect of these variations in the circuit specifications. The methodology implementation is optimized to ULV circuits and has several testbenches, making possible to design a large number of circuit topologies. Parallel simulations are used to decrease the execution time. As an application of this methodology a 0.6 V fully differential Operational Transconductance Amplifier (OTA) is designed. In a second time, using a bottom-up approach, an active low pass filter is designed using the previously designed OTA. The filter results is in according with the IEEE 802.15.4 standard requirements.

Download Paper (PDF)
12:00End of session
12:30Lunch Break in Großer Saal + Saal 1
Keynote Lecture in "Saal 2" 13:30 - 14:00

10.1 SPECIAL DAY Hot Topic: Lightweight Security for Embedded Processors

Date: Thursday 17 March 2016
Time: 11:00 - 12:30
Location / Room: Saal 2

Chair:
Tilo Müller, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE

Co-Chair:
Patrick Schaumont, Virginia Tech, US

Past research has shown that SW-only solutions cannot provide guarantee about SW security. A minimum HW root of trust is required. In embedded context, the research challenge is to find and demonstrate the 'minimum' root-of-trust. The first two papers search for this minimum requirements: "Scaling Down: Lightweight Approaches to IoT Security" and "SOFIA: Software and Control Flow Integrity Architecture." The third paper addresses the fundamental question on how to build and verify trust in embedded devices.

TimeLabelPresentation Title
Authors
11:0010.1.1SCALING DOWN: LIGHTWEIGHT APPROACHES TO IOT SECURITY
Speaker and Author:
Matthias Schunter, Intel Labs, DE
Abstract
The Internet of Things promises to grow to billions of connected devices over the next decade. From the perspective of security, the IoT presents unique challenges which, if left unmitigated, have the potential to significantly hinder its growth and widespread adoption. In particular, traditional security solutions do not scale down to typical IoT endpoints which are highly constrained in terms of power, performance and cost. Research conducted at the Intel Collaborative Research Institute for Secure Computing in Darmstadt, Germany is focused on enabling capabilities such as Trusted Execution and Control-flow Integrity Enforcement at the IoT endpoint level. We will survey this work and show how architectural support for security offers significant advantages over current software solutions in terms of efficiency and security. The design and implementation of proof-of-concept implementations on an Intel research architecture will be described together with an evaluation of the solutions from a performance, resource usage and security perspective.
11:3010.1.2SOFIA: SOFTWARE AND CONTROL FLOW INTEGRITY ARCHITECTURE
Speaker:
Ruan de Clercq, Katholieke Universiteit Leuven, BE
Authors:
Ruan de Clercq1, Ronald De Keulenaer2, Bart Coppens2, Bohan Yang1, Pieter Maene1, Koen De Bosschere2, Bart Preneel1, Bjorn De Sutter2 and Ingrid Verbauwhede1
1Katholieke Universiteit Leuven, BE; 2Ghent University, BE
Abstract
Microprocessors used in safety-critical systems are extremely sensitive to software vulnerabilities, as their failure can lead to injury, damage to equipment, or environmental catastrophe. This paper proposes a hardware-based security architecture for microprocessors used in safety-critical systems. The proposed architecture provides protection against code injection and code reuse attacks. It has mechanisms to protect software integrity, perform control flow integrity, prevent execution of tampered code, and enforce copyright protection. We are the first to propose a mechanism to enforce control flow integrity at the finest possible granularity. The proposed architectural features was added to the LEON3 open source soft microprocessor, and was evaluated on an FPGA running a software benchmark. The results show that the hardware area is 28.2% larger and a 84.6% slower clock, while the software benchmark has a cycle overhead of 13.7% and a total execution time overhead of 110% when compared to an unmodified processor.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.1.3TRUST, BUT VERIFY: WHY AND HOW TO ESTABLISH TRUST IN EMBEDDED DEVICES
Speaker and Author:
Aurélien Francillon, EURECOM, FR
Abstract
A lot of research efforts have been put into constructing secure systems. However, experience has shown that, while there are many products which have a good level of security, others are really insecure. Some are security devices: security is at the core of their purpose; while other are not. We nevertheless often rely on the their security in our daily life and their failure can have serious consequences. In this paper, we discuss why we are in this situation and what we can do to improve the situation. In particular, we defend the thesis that more transparency and more openness in embedded systems hardware and software will foster a more secure ecosystem. First, there is an economic problem. Besides being a difficult problem to solve correctly, security is most of the times an expensive. Second, trust is something that is not blindly granted but that is earned by verifying it. Currently, trusted computing mechanisms often rely on unconditional trust on the systems manufacturer. However, users have too few ways to verify that the systems are trustworthy other than blindly trust the manufacturer. We should design systems where the users, i.e., the devices owners, can decide whom and what to trust. We call this Design For User Trust, where users are in control of the system. Finally, one can only trust a system fully if he can inspect it. Unfortunately, the first security measures that are implemented in embedded systems often prevent such an independent analysis (e.g., deactivation of a debug port, secure boot, encrypted file system, obfuscation). But such measures are more hiding the problems (making it difficult to discover software vulnerabilities) than solving it. They are often useful in securing a system (slowing down an attacker) but should not jeopardize our ability to analyze them. We call this Design For Security Testing. We conclude that more research is needed to make it easier to build secure systems, in particular, in the areas of concrete architectures for Design For User Trust and Design For Security Testing.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal + Saal 1
Keynote Lecture in "Saal 2" 13:30 - 14:00

10.2 Does it Work or NoC?

Date: Thursday 17 March 2016
Time: 11:00 - 12:30
Location / Room: Konferenz 6

Chair:
Davide Bertozzi, University of Ferrara, IT

Co-Chair:
Kees Goossens, Eindhoven University of Technology, NL

Reliable operation of NoCs is crucial for the correct operation of the complete system. Errors can occur at runtime coming from ill-defined clock domain crossing interfaces, partial design-time verification scenarios that did not cover all functional errors, and technology-scaling related sideffects that increase circuit's susceptibility to permanent and intermittent faults. The first paper addresses the implications of asynchronous clock domain crossing in virtual channel flow control. The second paper proposes a NoC architecture that enables the detection of functional bugs at runtime. The third paper employs dynamic link sharing for achieving fault tolerance 3D NoC designs.

TimeLabelPresentation Title
Authors
11:0010.2.1CROSSOVER: CLOCK DOMAIN CROSSING UNDER VIRTUAL-CHANNEL FLOW CONTROL
Speaker:
Anastasios Psarras, Democritus University of Thrace, GR
Authors:
Michalis Paschou1, Anastasios Psarras1, Chrysostomos Nicopoulos2 and Giorgos Dimitrakopoulos1
1Democritus University of Thrace, GR; 2University of Cyprus, CY
Abstract
Technology scaling, process variations, and/or 3D integration make the design of fully synchronous Systems-on- Chip (SoC) a challenging task. Partitioning the SoC into Globally Asynchronous, Locally Synchronous (GALS) islands - aka clock domains - partially alleviates the difficulties in clock distribution. Such partitioning of the SoC is also necessary when supporting Dynamic Voltage and Frequency Scaling (DVFS) across parts of the system to minimize power consumption. The Network-on- Chip (NoC) is an inherently distributed architecture that is physically spread over the entire chip; thus, it should readily support communication across multiple asynchronous clock domains. In this paper, we generalize the fundamental properties of Virtual-Channel (VC) flow control across asynchronous clock domains. A new set of flow control rules is presented, which lead to efficient and deadlock-free communication, while still respecting the properties of traditional (synchronous) VC-based flow control. The derived flow control policy, called CrossOver, opens up a new design space, which is quantitatively explored in this paper. The goal of this investigation is to identify the configuration that maximizes throughput with the least cost, in terms of buffering requirements.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.2.2CORRECT RUNTIME OPERATION FOR NOCS THROUGH ADAPTIVE REGION PROTECTION
Speaker:
Rawan Abdel-Khalek, University of Michigan, US
Authors:
Rawan Abdel-Khalek and Valeria Bertacco, University of Michigan, US
Abstract
Networks-on-chip (NoCs) are increasingly being adopted as the interconnect model for systems-on-chip and chip-multiprocessors. As the only communication medium in these designs, the NoC's functional correctness is critical. In practice, design-time verification of NoCs is always partial, due to their large scale and the challenges that hinder verification efforts. As a result, functional design bugs are bound to escape and potentially manifest at runtime, compromising system functionality. We propose REPAIR, a runtime solution to detect and recover from functional design errors that have escaped in NoCs. Existing runtime verification techniques incur significant area and performance overheads to monitor and check the correctness of every packet traversing the network. However, REPAIR relies on a retransmission-based technique that adaptively determines the subset of packets requiring protection by identifying dynamic network regions where the specific runtime execution is likely to expose functional design bugs. We achieve runtime correctness at lower performance and area costs, relative to a traditional solution: on average, we are able to achieve more than 50% better overall performance with 2-3x fewer retransmission buffers.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.2.3FAULT-TOLERANT 3-D NETWORK-ON-CHIP DESIGN USING DYNAMIC LINK SHARING
Speaker:
Mehdi Modarressi, University of Tehran, IR
Authors:
Seyyed Hossein Seyyedaghaei Rezaei1, Mehdi Modarressi1, Reza Yazdani1 and Masoud Daneshtalab2
1University of Tehran, IR; 2KTH Royal Institute of Technology, SE
Abstract
The most important challenge in the emerging 3D integration technology is the higher temperature, particularly in the layers that are more distant from the heat sink, compared to planar 2D chips. High temperature, in turn, increases circuit's susceptibility to permanent and intermittent faults. On the other hand, the fast and high-bandwidth vertical links in the 3D integration technology have opened new horizons for network-on-chip (NoC) design innovations. In this paper, we leverage these ultra-low-latency vertical links to design a fault-tolerant 3D NoC architecture. In this architecture, permanent and intermittent defects on links and crossbars are bypassed by borrowing the idle bandwidth from vertically adjacent links and crossbars. Evaluation results under synthetic and realistic workloads show that the proposed fault-tolerance mechanism offers higher reliability and lower performance loss, when compared with state-of-the-art fault-tolerant 3D NoC designs.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP5-1, 288RELIABILITY AND PERFORMANCE TRADE-OFFS FOR 3D NOC-ENABLED MULTICORE CHIPS
Speaker:
Partha Pande, Washington State University, US
Authors:
Sourav Das1, Janardhan Rao Doppa1, Partha Pande1 and Krishnendu Chakrabarty2
1Washington State University, US; 2Duke University, US
Abstract
Three-dimensional (3D) integration, a breakthrough technology to achieve "More Moore and More Than Moore," provides the benefits of better performance, lower power consumption, and increased bandwidth through the use of vertical interconnects and 3D stacking. The vertical interconnects enable the design of a high-bandwidth and energy-efficient small-world (SW) network-based 3D network-on-Chip (3D SWNoC) for massive multicore platforms. However, the anticipated performance gain of a 3D SWNoC-enabled multicore chip may be compromised due to the potential failures of through-silicon- vias (TSVs) that are predominantly used as vertical interconnects. In particular, due to the non-homogeneous traffic patterns, heavily used TSVs may wear-out quickly and can also contribute to the wear-out of neighboring TSVs. As a result, the mean-time-to-failure (MTTF) of those TSVs will decrease, which will adversely affect the overall lifetime of the chip. In this paper, we address this traffic-dependent TSV wear-out problem in 3D SWNoC. We demonstrate that by employing an adaptive routing mechanism, we can improve the MTTF of 3D SWNoC significantly while still providing 21% lower energy-delay-product (EDP) compared to a conventional 3D MESH.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:31IP5-2, 455MEMORY-ACCESS AWARE DVFS FOR NETWORK-ON-CHIP IN CMPS
Speaker:
Yuan Yao, KTH Royal Institute of Technology, SE
Authors:
Yuan Yao and Zhonghai Lu, KTH Royal Institute of Technology, SE
Abstract
We present a new DVFS technique for network-on-chip (NoC) that adjusts the voltage/frequency scales of routers according to memory-access characteristics of application running on the CMP. The memory characteristics are periodically profiled, reflecting both resource-access density in the network and memory-access criticality for application performance. The network conducts per-router voltage/frequency tuning using the memory-access density information while it performs priority-based switch allocation to speed up critical packets and avoid starvation using the memory-criticality information. Compared to a latest per-router DVFS approach, benchmark experiments demonstrate that our memory-access characteristics aware DVFS technique achieves not only better power saving, energy-delay product, but also enhanced network and application performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal + Saal 1
Keynote Lecture in "Saal 2" 13:30 - 14:00

10.3 Design Experiences for Multimedia and Communication Applications

Date: Thursday 17 March 2016
Time: 11:00 - 12:30
Location / Room: Konferenz 1

Chair:
Theocharis Theocharides, University of Cyprus, CY

Co-Chair:
Steffen Paul, University Bremen, DE

This session presents new design experiences for multimedia and communication applications. The first presentation demonstrates the feasibility having a heterogeneous system for speeding-up computation-intensive algorithms at an ultra-low-power sub-10 mW budget. Two contributions provide new ideas on approximate computing and its application in real design cases. Novel architectures related to channel decoding are presented in two papers. One paper demonstrates the feasibility of high-performance and high-quality depth and colour sensor fusion targeting mobile devices. The session also includes an approach for designing an integrated prototype for a portable telepresence robot.

TimeLabelPresentation Title
Authors
11:0010.3.1ENABLING THE HETEROGENEOUS ACCELERATOR MODEL ON ULTRA-LOW POWER MICROCONTROLLER PLATFORMS
Speaker:
Francesco Conti, Università di Bologna, IT
Authors:
Francesco Conti1, Daniele Palossi2, Andrea Marongiu1, Davide Rossi1 and Luca Benini1
1Università di Bologna, IT; 2ETH Zurich, CH
Abstract
The stringent power constraints of complex microcontroller based devices (e.g. smart sensors for the IoT) represent an obstacle to the introduction of sophisticated functionality. Programmable accelerators would be extremely beneficial to provide the flexibility and energy efficiency required by fast-evolving IoT applications; however, the integration complexity and sub-10mW power budgets have been considered insurmountable obstacles so far. In this paper we demonstrate the feasibility of coupling a low power microcontroller unit (MCU) with a heterogenous programmable accelerator for speeding-up computation-intensive algorithms at an ultra-low power (ULP) sub-10mW budget. Specifically, we develop a heterogeneous architecture coupling a Cortex-M series MCU with PULP, a programmable accelerator for ULP parallel computing. Complex functionality is enabled by the support for offloading parallel computational kernels from the MCU to the accelerator using the OpenMP programming model. We prototype this platform using a STM Nucleo board and a PULP FPGA emulator. We show that our methodology can deliver up to 60x gains in performance and energy efficiency on a diverse set of applications, opening the way for a new class of ULP heterogeneous architectures.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.3.2THERMAL OPTIMIZATION USING ADAPTIVE APPROXIMATE COMPUTING FOR VIDEO CODING
Speaker:
Muhammad Shafique, Karlsruhe Institute of Technology (KIT), DE
Authors:
Daniel Palomino1, Muhammad Shafique2, Altamiro Susin1 and Jörg Henkel2
1Universidade Federal do Rio Grande do Sul (UFRGS), BR; 2Karlsruhe Institute of Technology (KIT), DE
Abstract
This paper presents a thermal optimization technique that adaptively employs varying degree of approximations at both algorithm and data levels in order to reduce the temperature associated with the high efficiency video coding process while maintaining good quality results. The technique evaluates, at run-time, the regions of a video sequence, frame-by-frame, in terms of tolerance to imprecise computations. It adapts the amount of approximation errors based on the video sequence properties and application-specific knowledge. The proposed technique adaptively controls the strength of approximations (at both algorithm and data levels) depending upon the varying resilience properties of coding different regions with different texture/motion properties. Our content-driven approximate computing technique demonstrates the potential to improve the thermal profile of a chip. Experimental results show that our technique improves temperature profiles by reducing the on-chip temperature by about 10° C on average, while maintaining good quality results.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.3.3HIGH PERFORMANCE TIME-OF-FLIGHT AND COLOR SENSOR FUSION WITH IMAGE-GUIDED DEPTH SUPER RESOLUTION
Speaker:
Hannes Plank, Infineon Technologies Austria AG, AT
Authors:
Hannes Plank, Gerald Holweg, Thomas Herndl and Norbert Druml, Infineon Technologies Austria AG, AT
Abstract
In recent years, depth sensing systems have gained popularity and have begun to appear on the consumer market. Of these systems, PMD-based Time-of-Flight cameras are the smallest available and will soon be integrated into mobile devices such as smart phones and tablets. Like all other available depth sensing systems, PMD-based Time-of-Flight cameras do not produce perfect depth data. Because of the sensor's characteristics, the data is noisy and the resolution is limited. Fast movements cause motion artifacts, which are undefined depth values due to corrupted measurements. Combining the data of a Time-of-Flight and a color camera can compensate these flaws and vastly improve depth image quality. This work uses color edge information as a guide so the depth image is upscaled with resolution gain and lossless noise reduction. A novel depth upscaling method is introduced, combining the creation of high quality depth data with fast execution. A high end smart phone development board, a color, and a Time-of-Flight camera are used to create a sensor fusion prototype. The complete processing pipeline is efficiently implemented on the graphics processing unit in order to maximize performance. The prototype proves the feasibility of our proposed fusion method on mobile devices. The result is a system capable of fusing color and depth data at interactive frame rates. When there is depth information available for every color pixel, new possibilities in computer vision, augmented reality and computational photography arise. The evaluation shows, our sensor fusion solution provides depth images with upscaled resolution, increased sharpness, less noise, less motion artifacts, and achieves high frame rates at the same time; thus significantly outperforms state-of-the-art solutions.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:1510.3.4SATURATED MIN-SUM DECODING: AN "AFTERBURNER" FOR LDPC DECODER HARDWARE
Speaker:
Stefan Scholl, University of Kaiserslautern, DE
Authors:
Stefan Scholl, Philipp Schläfer and Norbert Wehn, University of Kaiserslautern, DE
Abstract
LDPC codes are usually decoded by iterative belief propagation. However especially for small block lengths conventional belief propagation exhibits significant losses in signal-tonoise ratio compared to maximum likelihood decoding. In this paper we propose the combination of a conventional min-sum decoder enhanced by an advanced decoding scheme, that acts as a kind of "afterburner" to improve the frame error rate. We present hardware architectures and implementation results for a 28nm ASIC technology. The new decoder has a slightly higher complexity, but provides a gain of up to 1.6 dB signalto- noise ratio over conventional belief propagation decoding for short block length. In addition, we show, that the new decoder implementation can decrease the amount of dark silicon.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP5-3, 196A DYNAMICALLY RECONFIGURABLE ECC DECODER ARCHITECTURE
Speaker:
Philippe Coussy, Universite Bretagne Sud / Lab-STICC, FR
Authors:
Awais Sani1, Philippe Coussy2 and Cyrille Chavet3
1Universite de Bretagne-Sud, FR; 2Universite de Bretagne-Sud / Lab-STICC, FR; 3Lab-STICC / Université de Bretagne Sud, FR
Abstract
Due to their impressive error correction performances, Error Correcting Codes (ECC) are now widely used in communication systems. In order to achieve high throughput requirements ECC decoders are based on parallel architectures, which results in a major issue: memory access conflicts. In this paper, we introduce a new class of ECC decoder architectures that dynamically reconfigures by executing on-chip a memory mapping approach. For that purpose, a dedicated algorithm taking into account network constraint is presented. A smart architecture based on a butterfly network and a reconfiguration unit is also proposed. Experimental results show that real-time reconfiguration at reasonable hardware cost is possible.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:31IP5-4, 530RESISTIVE BLOOM FILTERS: FROM APPROXIMATE MEMBERSHIP TO APPROXIMATE COMPUTING WITH BOUNDED ERRORS
Speaker:
Abbas Rahimi, University of California, Berkeley, US
Authors:
Vahideh Akhlaghi1, Abbas Rahimi2 and Rajesh K. Gupta1
1University of California, San Diego, US; 2University of California, Berkeley, US
Abstract
Approximate computing provides an opportunity for exploiting application characteristics to trade the accuracy for gains in energy efficiency. However, such opportunity must be able to bound the error that the system designer provides to the application developer. Space-efficient probabilistic data structure such as Bloom filter can provide one such means. Bloom filter supports approximate set membership queries with a tunable rate of false positives (i.e., errors) and no false negatives. We propose a resistive Bloom filter (ReBF) to approximate a function by tightly integrating it to a functional unit (FU) implementing the function. ReBF approximately mimics partial functionality of the FU by recalling its frequent input patterns for computational reuse. The accuracy of the target FU is guaranteed by bounding the ReBF error behavior at the design time. We further lower energy consumption of a FU by designing its ReBF using low-power memristor arrays. The experimental results show that function approximation using ReBF for five image processing kernels running on the AMD Southern Islands GPU yields on average 24.1% energy saving in 45 nm technology compared to the exact computation.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:32IP5-5, 353REAL-TIME SYSTEM-LEVEL IMPLEMENTATION OF A TELEPRESENCE ROBOT USING AN EMBEDDED GPU PLATFORM
Speaker:
Swathi Gurumani, Advanced Digital Sciences Center, SG
Authors:
Muhammad Teguh Satria1, Swathi Gurumani1, Wang Zheng2, Keng Peng Tee2, Augustine Koh1, Pan Yu2, Kyle Rupnow1 and Deming Chen3
1Advanced Digital Sciences Center, SG; 2Institute for Infocomm Research, SG; 3UIUC, US
Abstract
Real-time applications such as telepresence systems present an opportunity to use embedded GPUs for compute acceleration to meet platform goals. In this paper, we develop a prototype of a portable, standalone telepresence robot that performs real-time attention-directed control using an NVIDIA Jetson TK1 embedded platform. We perform platform-specific optimizations to improve thread occupancy, optimize computa- tion workload and improve accuracy of face detection on the embedded GPU and achieve real-time performance of 30 frames per second on the Jetson TK1 and an overall speedup of 10x compared to the ARM CPU version.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal + Saal 1
Keynote Lecture in "Saal 2" 13:30 - 14:00

10.4 Stochastic Methods for Circuit Analysis & Synthesis

Date: Thursday 17 March 2016
Time: 11:00 - 12:30
Location / Room: Konferenz 2

Chair:
Ibhraim Elfadel, Masdar Institute of Technology, AE

Co-Chair:
L. Miguel Silveira, INESC-ID, IST, U Lisboa, PT

Stochastic methods are continuing to play a fundamental role in circuit analysis and synthesis in order to handle both the growing complexity of integrated circuits as well as the effects of process variations. The first paper combines a Markov Chain Monte Carlo method with a Floating Random Walk method in order to speed up capacitance extraction and handle circuits containing IP protected substructures. The second paper builds a parameterized surrogate model of node voltages in power grids which can be used for efficient evaluation of multiple variation settings. The third paper uses an iterative variation-aware circuit synthesis flow to improve performance and energy efficiency.

TimeLabelPresentation Title
Authors
11:0010.4.1(Best Paper Award Candidate)
UTILIZING MACROMODELS IN FLOATING RANDOM WALK BASED CAPACITANCE EXTRACTION
Speaker:
Wenjian Yu, Tsinghua University, CN
Authors:
Wenjian Yu1, Bolong Zhang1, Chao Zhang1, Haiquan Wang1 and Luca Daniel2
1Tsinghua University, CN; 2Massachusetts Institute of Technology (MIT), US
Abstract
This paper presents techniques that use macromodels in order to extend and improve the floating random walk (FRW) method for capacitance extraction. A macromodel is built for each sub-structure for which it is necessary or convenient to hide its geometry details during capacitance extraction. Then, a macromodel-aware random walk scheme connects the Markov-chain random walk inside the macromodels and the FRW outside through scalable blank patch regions. This method can be used for instance to extract capacitances for structure with encrypted sub-structures, and extend the FRW method's capability for structure with complex geometry or repeated layout patterns. Numerical results validate the merits of the proposed method with structures including encrypted FinFET layout, complex geometry features, and cyclic layout patterns.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.4.2VARIABILITY AND STATISTICAL ANALYSIS FLOW FOR DYNAMIC LINEAR SYSTEMS WITH LARGE NUMBER OF INPUTS
Speaker:
L. Miguel Silveira, INESC-ID, Instituto Superior Técnico, PT
Authors:
António Lucas Martins1, Jorge Fernandez Villena2 and L. Miguel Silveira1
1INESC-ID, Instituto Superior Técnico, PT; 2Cadence Design Systems, DE
Abstract
Fast analysis of the dynamics of large linear systems with large number of inputs, such as power grid (PG) nets, is a required component of system verification platforms. Such analysis, exhibiting a considerable memory footprint and requiring intensive computations and advanced numerical techniques, has been the framework of recent approaches. However analyzing the effect of design variability, which can have a critical impact on the power distribution across the chip, especially when considering its dynamic performance, poses a unmet challenge. Existing approaches collect information about the voltage and current fluctuations in key nodes that may lead to erroneous behavior or relevant performance changes. This is achieved through repetitive extraction and/or simulation of the large linear RC network for a very broad number of parameter settings. Unfortunately network size and the plethora of different settings that requires investigation implies that such an approach can be exceedingly time consuming, even if parallel architectures are used. In order to address such a challenge, this paper introduces an alternative analysis flow that builds a parameterized model of the time domain node voltages on the fly, using the nominal time domain simulation as starting point. Once such model is generated, the effect of variability in the time response can be efficiently evaluated for multiple settings, allowing collection of relevant variation and statistic information of the impact of a large number of parameters in the current design. The performance of the methodology is evaluated on an set of standard PG extracted netlists, showing large improvements in terms of speed with modest memory requirements while maintaining an acceptable degree of accuracy.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.4.3VARIATION-AWARE NEAR THRESHOLD CIRCUIT SYNTHESIS
Speaker:
Mohammad Saber Golanbari, Karlsruhe Institute of Technology (KIT), DE
Authors:
Mohammad Saber Golanbari, Saman Kiamehr, Mojtaba Ebrahimi and Mehdi Tahoori, Karlsruhe Institute of Technology (KIT), DE
Abstract
Near-Threshold Computing (NTC) is shown to be a promising approach for improving the energy efficiency of VLSI circuits. Nevertheless, by reducing the supply voltage the delay impact of process variation significantly increases, leading to up to 20x performance variation compared to the nominal voltage. As a result, it is wasteful of energy and performance to deal with such variation by increasing the timing margins, which is common in nominal voltage. Therefore, considering the impact of process variation during the near-threshold circuit design phase is of decisive importance. In this paper, we propose a variation-aware synthesis flow for NTC to address this problem. The objective is to improve the performance and energy efficiency of a circuit during design time by considering statistical variation information. This is done by providing variation information to the synthesis tool, evaluating the performance of the synthesized circuit by Statistical Static Timing Analysis (SSTA), and adjusting the timing constraints accordingly in an iterative manner. Simulation results for a set of benchmark circuits show that our proposed flow reduces the variation by 86.6% and improves the performance and energy by 24.9% and 7.4%, respectively, at the expense of 4.8% area overhead.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal + Saal 1
Keynote Lecture in "Saal 2" 13:30 - 14:00

10.5 Enhancing Memory in Next-Generation Platforms

Date: Thursday 17 March 2016
Time: 11:00 - 12:30
Location / Room: Konferenz 3

Chair:
Fancisco Cazorla, Barcelona Supercomputing Center, ES

Co-Chair:
Jeronimo Castrillon, Technische Universität Dresden, DE

This session presents three interesting paper describing different approaches for enhancing the memory for obtaining significant performance and energy improvement with respect to standard processor-centric architectures. The first paper introduces a Near-Data Processing solution compatible with existing processor memory interfaces such as DDR3/4 with minimal changes. The second paper introduces the HIVE architecture, which allows performing common vector operations directly inside the HMC, avoiding contention on the interconnections as well as cache pollution. The third paper proposes a minimalistic clustered flash array which exposes a simple, stable, error-free, shared-memory flash interface that enables a flexible cross-layer flash management optimizations and a scalable distributed storage coordination.

TimeLabelPresentation Title
Authors
11:0010.5.1(Best Paper Award Candidate)
BUFFERED COMPARES: EXCAVATING THE HIDDEN PARALLELISM INSIDE DRAM ARCHITECTURES WITH LIGHTWEIGHT LOGIC
Speaker:
Kiyoung Choi, Seoul National University, KR
Authors:
Jinho Lee, Jung Ho Ahn and Kiyoung Choi, Seoul National University, KR
Abstract
We propose an approach called buffered compares, a less-invasive processing-in-memory solution that can be used with existing processor memory interfaces such as DDR3/4 with minimal changes. The approach is based on the observation that multi-bank architecture, a key feature of modern main memory DRAM devices, can be used to provide huge internal bandwidth without any major modification. We place a small buffer and a simple ALU per bank, define a set of new DRAM commands to fill the buffer and feed data to the ALU, and return the result for a set of commands (not for each command) to the host memory controller. By exploiting the under-utilized internal bandwidth using 'compare-n-op' operations, which are frequently used in many applications, we not only reduce the amount of energy-inefficient processor-memory communication, but also accelerate the computation of big data processing applications by utilizing parallelism of the buffered compare units in DRAM banks. Experimental results show that our solution significantly improves the performance and efficiency of the system on the tested workloads.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.5.2LARGE VECTOR OPERATIONS INSIDE HMC
Speaker:
Luigi Carro, Universidade Federal do Rio Grande do Sul (UFRGS), BR
Authors:
Marco Antonio Zanata Alves, Matthias Diener, Paulo Santos and Luigi Carro, Universidade Federal do Rio Grande do Sul (UFRGS), BR
Abstract
One of the main challenges for embedded systems is the transfer of data between memory and processor. In this context, Hybrid Memory Cubes (HMCs) can provide substantial energy and bandwidth improvements compared to traditional memory organizations, while also allowing the execution of simple atomic instructions in the memory. However, the complex memory hierarchy still remains a bottleneck, especially for applications with a low reuse of data, limiting the usable parallelism of the HMC vaults and banks. In this paper, we introduce the HIVE architecture, which allows performing common vector operations directly inside the HMC, avoiding contention on the interconnections as well as cache pollution. Our mechanism achieves substantial speedups of up to 17.3x (9.4x on average) compared to a baseline system that performs vector operations in a 8-core processor. We show that the simple instructions provided by HMC actually hurt performance for streaming applications.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.5.3MINFLASH: A MINIMALISTIC CLUSTERED FLASH ARRAY
Speaker:
Ming Liu, Massachusetts Institute of Technology (MIT), US
Authors:
Ming Liu1, Sang-Woo Jun1, Sungjin Lee1, Jamey Hicks2 and Arvind1
1Massachusetts Institute of Technology (MIT), US; 2Quanta Research Cambridge, US
Abstract
NAND flash is seeing increasing adoption in the data center because of its orders of magnitude lower latency and higher bandwidth compared to hard disks. However, flash performance is often degraded by (i) inefficient storage I/O stack that hides flash characteristics under Flash Translation Layer (FTL), and (ii) long latency network protocols for distributed storage. In this paper, we propose a minimalistic clustered flash array (minFlash). First, minFlash exposes a simple, stable, error-free, shared-memory flash interface that enables the host to per- form cross-layer flash management optimizations in file systems, databases and other user applications. Second, minFlash uses a controller-to-controller network to connect multiple flash drives with very little overhead. We envision minFlash to be used within a rack cluster of servers to provide fast scalable distributed flash storage. We show through benchmarks that minFlash can access both local and remote flash devices with negligible latency overhead, and it can expose near theoretical max performance of the NAND chips in a distributed setting.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP5-6, 681EXPLORING SPECIALIZED NEAR-MEMORY PROCESSING FOR DATA INTENSIVE OPERATIONS
Speaker:
Salessawi Ferede Yitbarek, University of Michigan, US
Authors:
Salessawi Ferede Yitbarek1, Tao Yang2, Reetuparna Das1 and Todd Austin1
1University of Michigan, US; 2University of California, San Diego, US
Abstract
Emerging 3D stacked memory systems provide significantly more bandwidth than current DDR modules. However, general purpose processors do not take full advantage of these resources offered by the memory modules. Taking advantage of the increased bandwidth requires the use of specialized processing units. In this paper, we evaluate the benefits of placing hardware accelerators at the bottom layer of a 3D stacked memory system compared to accelerators that are placed external to the memory stack. Our evaluation of the design using cycle-accurate simulation and RTL synthesis shows that, for important data intensive kernels, near-memory accelerators inside a single 3D memory package provide 3x-13x speedup over a Quad-core Xeon processor. Most of the benefits are from the application of accelerators, as the near-memory configurations provide marginal benefits compared to the same number of accelerators placed on a die external to the memory package. This comparable performance for external accelerators is due to the high bandwidth afforded by the high-speed off-chip links. On the other hand, near-memory accelerators consume 7%-39% less energy than the external accelerators.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal + Saal 1
Keynote Lecture in "Saal 2" 13:30 - 14:00

10.6 Compilers and Tools for GPUs and MPSoCs

Date: Thursday 17 March 2016
Time: 11:00 - 12:30
Location / Room: Konferenz 4

Chair:
Frank Hannig, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE

Co-Chair:
Lars Bauer, Karlsruhe Institute of Technology, DE

This session covers compiler optimisations and tools for efficient execution on GPUs and MPSoCs. The first paper presents a lightweight OpenMP implementation for parallel accelerators. The next two papers in this session focus on GPU performance modelling and tuning. The final paper leverages approximation to improve the throughput of OpenCL programs on FPGAs. In addition, an interactive presentation deals with Matlab to ASIP compilation.

TimeLabelPresentation Title
Authors
11:0010.6.1AN OPTIMIZED TASK-BASED RUNTIME SYSTEM FOR RESOURCE-CONSTRAINED PARALLEL ACCELERATORS
Speaker:
Daniele Cesarini, Università di Bologna, IT
Authors:
Daniele Cesarini, Andrea Marongiu and Luca Benini, Università di Bologna, IT
Abstract
Manycore accelerators have recently proven a promising solution for increasingly powerful and energy efficient computing systems. This raises the need for parallel programming models capable of effectively leveraging hundreds to thousands of processors. Programming approaches that put the burden of handling the complexity of performance scalability on application developers are bound to fail at a wide scale. Distributing parallel work in an efficient manner to the available hardware resources should be controlled by system software libraries and runtime environments, while the programmers should focus on expressing parallelism at the application level. Task-based parallelism has the potential to provide such features, offering flexible support to fine-grained and irregular parallelism. However, efficiently supporting this programming paradigm on resource-constrained parallel accelerators is a challenging task. In this paper, we present an optimized implementation of the OpenMP tasking model for embedded parallel accelerators, discussing the key design solution that guarantee small memory (footprint) and minimize performance overheads. We validate our design by comparing to several state-of-the-art tasking implementations, using the most representative parallelization patterns. The experimental results confirm that tasking can be efficiently enabled on embedded parallel accelerators by our proposal.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.6.2A FINE-GRAINED PERFORMANCE MODEL FOR GPU ARCHITECTURES
Speaker:
Federico Busato, University of Verona, IT
Authors:
Nicola Bombieri, Federico Busato and Franco Fummi, University of Verona, IT
Abstract
The increasing programmability, performance, and cost/effectiveness of GPUs have led to a widespread use of such many-core architectures to accelerate general purpose applications. Nevertheless, tuning applications to efficiently exploit the GPU potentiality is a very challenging task, especially for inexperienced programmers. This is due to the difficulty of developing a SW application for the specific GPU architectural configuration, which includes managing the memory hierarchy and optimizing the execution of thousands of concurrent threads while maintaining the semantic correctness of the application. Even though several profiling tools exist, which provide programmers with a large number of metrics and measurements, it is often difficult to interpret such information for effectively tuning the application. This paper presents a performance model that allows accurately estimating the potential performance of the application under tuning on a given GPU device and, at the same time, it provides programmers with interpretable profiling hints. The paper shows the results obtained by applying the proposed model for profiling commonly used primitives and real codes.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.6.3CRITICAL POINTS BASED REGISTER-CONCURRENCY AUTOTUNING FOR GPUS
Speaker:
Ang Li, Eindhoven University of Technology, NL
Authors:
Ang Li1, Shuaiwen Leon Song2, Akash Kumar3, Eddy Z. Zhang4, Daniel Chavarria2 and Henk Corporaal1
1Eindhoven University of Technology, NL; 2Pacific Northwest National Lab, US; 3Technische Universität Dresden, DE; 4Rutgers University, US
Abstract
The unprecedented prevalence of GPGPU is largely attributed to its abundant on-chip register resources, which allow massively concurrent threads and extremely fast context switch. However, due to internal memory size constraints, there is a tradeoff between the per-thread register usage and the overall thread concurrency. This becomes a design problem in terms of performance tuning, since the performance ``sweet spot'' which can be significantly affected by these two factors is generally unknown beforehand. In this paper, we propose an effective autotuning solution to quickly and efficiently select the optimal number of registers per-thread for delivering the best GPU performance. Experiments on three generations of GPUs (Nvidia Fermi, Kepler and Maxwell) demonstrate that our simple strategy can achieve an average of 10% performance improvement while a max of 50% over the original version without modifying the user code. Additionally, to reduce local cache misses due to register spilling and further improve performance, we explore three optimization schemes (i.e. bypass L1 for global memory access, enlarge local L1 cache and spill into shared memory) and discuss their impact on performance on a Kepler GPU.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:1510.6.4GRATER: AN APPROXIMATION WORKFLOW FOR EXPLOITING DATA-LEVEL PARALLELISM IN FPGA ACCELERATION
Speaker:
Abbas Rahimi, UC Berkeley, US
Authors:
Atieh Lotfi1, Abbas Rahimi2, Amir Yazdanbakhsh3, Hadi Esmaeilzadeh3 and Rajesh Gupta1
1UC San Diego, US; 2UC Berkeley, US; 3Georgia Institute of Technology, US
Abstract
Modern applications including graphics, multimedia, web search, and data analytics not only can benefit from acceleration, but also exhibit significant degrees of tolerance to imprecise computation. This amenability to approximation provides an opportunity to trade quality of the results for higher performance and better resource utilization. Exploiting this opportunity is particularly important for FPGA accelerators that are inherently subject to many resource constraints. To better utilize the FPGA resources, we devise, GRATER, an automated design workflow for FPGA accelerators that leverages imprecise computation to increase data-level parallelism and achieve higher computational throughput. The core of our workflow is a source-to-source compiler that takes in an input kernel and applies a novel optimization technique that selectively reduces the precision of kernel's data and operations. By selectively reducing the precision of the data and operation, the required area to synthesize the kernels on the FPGA decreases allowing to integrate a larger number of operations and parallel kernels in the fixed area of the FPGA. The larger number of integrated kernels provides more hardware context to better exploit data-level parallelism in the target applications. To effectively explore the possible design space of approximate kernels, we exploit a genetic algorithm to find a subset of safe-to-approximate operations and data elements and then tune their precision levels until the desired output quality is achieved. GRATER exploits a fully software technique and does not require any changes to the underlying FPGA hardware. We evaluate GRATER on a diverse set of data-intensive OpenCL benchmarks from the AMD SDK. The synthesis result on a modern Altera FPGA shows that our approximation workflow yields 1.4×-3.0× higher throughput with less than 1% quality loss.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP5-7, 426MATLAB TO C COMPILATION TARGETING APPLICATION SPECIFIC INSTRUCTION SET PROCESSORS
Speaker:
Francky Catthoor, Interuniversity Microelectronics Centre (IMEC), BE
Authors:
Ioannis Latifis1, Karthick Parashar2, Grigoris Dimitroulakos1, Hans Cappelle2, Christakis Lezos1, Konstantinos Masselos1 and Francky Catthoor2
1University of Peloponnese, GR; 2Interuniversity Microelectronics Centre (IMEC), BE
Abstract
This paper discusses a MATLAB to C compiler exploiting custom instructions such as instructions for SIMD processing and instructions for complex arithmetic present in Application Specific Instruction Set Processors (ASIPs). The compiler generates ANSI C code in which the processor's special instructions are represented via specialized intrinsic functions. By doing this the generated code can be used as input to any C/C++ compiler. Thus the proposed compiler allows the description of the specialized instruction set of the target processor in a parameterized way allowing the support of any processor. The proposed compiler has been used for the generation of application code for an ASIP targeting DSP applications. The code generated by the proposed compiler achieves a speed up between 2x-30x on the targeted ASIP for six DSP benchmarks compared to the code generated by Mathworks MATLAB to C compiler. Thus the proposed compiler can be employed to reduce the development time/effort/cost and time to market by raising the abstraction of application design in an embedded systems / system-on-chip development context while still improving implementation efficiency.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal + Saal 1
Keynote Lecture in "Saal 2" 13:30 - 14:00

10.7 Reliable System Design

Date: Thursday 17 March 2016
Time: 11:00 - 12:30
Location / Room: Konferenz 5

Chair:
Mohamed Sabry Aly, Stanford University, US

Co-Chair:
Semeen Rehman, Technische Universität Dresden, DE

This session explores several approaches for the analysis, simulation, and repair of integrated systems, from 3D ICs, to STT-RAMs.

TimeLabelPresentation Title
Authors
11:0010.7.1(Best Paper Award Candidate)
A HOLISTIC TRI-REGION MLC STT-RAM DESIGN WITH COMBINED PERFORMANCE, ENERGY, AND RELIABILITY OPTIMIZATIONS
Speaker:
Yiran Chen, University of Pittsburgh, US
Authors:
Wujie Wen1, Mengjie Mao2, Hai Li2, Yiran Chen2, Yukui Pei3 and Ning Ge3
1Florida International University, US; 2University of Pittsburgh, US; 3Tsinghua University, CN
Abstract
Multi-level cell spin-transfer torque random access memory (MLC STT-RAM) demonstrates great potentials in on chip cache design for its high storage density and non-volatility but also suffers from the degraded access time, reliability and energy efficiency. The existing MLC STT-RAM cache designs primarily focus on the performance and energy optimizations, however, often ignore the crucial demand for reliability. In this work, we propose a tri-region MLC STT-RAM cache design (TMSC) to simultaneously meet the requirements of performance, energy, and reliability. The tri-region MLC STT-RAM cache is optimized partitioned into fast, mixed, and slow ways according to different access performance, energy and reliability. A new error correction code (ECC) scheme, namely, non-uniform strength ECC (NUS-ECC), is also developed to tolerate the different bit failure rates in these ways. Compared to the latest performance-driven MLC STT-RAM cache design with pessimistic ECC scheme, our TMSC technique can improve the system performance and energy by averagely 9.3% and 9.4%, respectively, for various applications. The additional area cost associated with NUS-ECC is limited by 3.2% compared to the pessimistic ECC scheme.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.7.2THERMAL-AWARE TSV REPAIR FOR ELECTROMIGRATION IN 3D ICS
Speaker:
Shengcheng Wang, Karlsruhe Institute of Technology (KIT), DE
Authors:
Shengcheng Wang1, Krishnendu Chakrabarty2 and Mehdi Tahoori1
1Karlsruhe Institute of Technology (KIT), DE; 2Duke University, US
Abstract
Electromigration (EM) occurrence on through-silicon-vias (TSVs) is a major reliability concern for Three-Dimensional Integrated-Circuits (3D ICs), and EM can severely reduce the mean-time-to-failure (MTTF). In this work, a novel fault tolerant technique is proposed to increase the MTTF of the functional TSV network through the assignment of spare TSVs to EM-vulnerable functional TSVs. The objective is to meet the target MTTF with minimum spare TSVs and minimal impact on the circuit timing. By considering the impact of temperature variation, the proposed technique provides a more robust repair solution for EM-induced TSV defects with minimum delay overhead, compared to previous thermal-unaware methods.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.7.3ELECTROTHERMAL SIMULATION OF BONDING WIRE DEGRADATION UNDER UNCERTAIN GEOMETRIES
Speaker:
Thorben Casper, Technische Universität Darmstadt, DE
Authors:
Thorben Casper1, Herbert De Gersem1, Renaud Gillon2, Tomas Gotthans3, Tomáš Kratochvíl3, Peter Meuris4 and Sebastian Schöps1
1Technische Universität Darmstadt, DE; 2ON Semiconductor, BE; 3Brno University of Technology, CZ; 4Magwel NV, Leuven, BE
Abstract
In this paper, electrothermal field phenomena in electronic components are considered. This coupling is tackled by multiphysical field simulations using the Finite Integration Technique (FIT). In particular, the design of bonding wires with respect to thermal degradation is investigated. Instead of resolving the wires by the computational grid, lumped element representations are introduced as point-to-point connections in the spatially distributed model. Fabrication tolerances lead to uncertainties of the wires' parameters and influence the operation and reliability of the final product. Based on geometric measurements, the resulting variability of the wire temperatures is determined using the stochastic electrothermal field-circuit model.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP5-8, 250SAMPLING-BASED BUFFER INSERTION FOR POST-SILICON YIELD IMPROVEMENT UNDER PROCESS VARIABILITY
Speaker:
Grace Li Zhang, Technische Universität München (TUM), DE
Authors:
Grace Li Zhang, Bing Li and Ulf Schlichtmann, Technische Universität München (TUM), DE
Abstract
At submicron manufacturing technology nodes process variations affect circuit performance significantly. This trend leads to a large timing margin and thus overdesign to maintain yield. To combat this pessimism, post-silicon clock tuning buffers can be inserted into circuits to balance timing budgets of critical paths with their neighbors. After manufacturing, these clock buffers can be configured for each chip individually so that chips with timing failures may be rescued to improve yield. In this paper, we propose a sampling-based method to determine the proper locations of these buffers. The goal of this buffer insertion is to reduce the number of buffers and their ranges, while still maintaining a good yield improvement. Experimental results demonstrate that our algorithm can achieve a significant yield improvement (up to 35%) with only a small number of buffers.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal + Saal 1
Keynote Lecture in "Saal 2" 13:30 - 14:00

10.8 Presentations from Campus 3D-IC Integration: Opportunities for SMEs and Outlook 2020+

Date: Thursday 17 March 2016
Time: 11:00 - 12:30
Location / Room: Exhibition Theatre

Organiser:
Hans-Jürgen Brand, IDT/ZMDI, DE

This session features presentations given by exhibitors from the Campus on 3D-IC Integration, highlighting special opportunities for SMEs and giving an outlook to 2020 and beyond. Attendees are invited to also visit the campus booths for further details and discussions.

TimeLabelPresentation Title
Authors
11:0010.8.1HIGH PERFORMANCE CENTER FUNCTIONAL INTEGRATION IN MICRO AND NANOELECTRONICS - OPPORTUNITIES FOR SMES IN PRODUCT AND TECHNOLOGY DEVELOPMENT
Speaker:
Mario Walter, Fraunhofer Institute for Photonic Microsystems IPMS, DE
12:0010.8.2SOME THOUGHTS ON IC INTEGRATION IN 2020 & BEYOND
Speaker:
Anna Fontanelli, Monozukuri S.p.A., IT
12:30End of session
Lunch Break in Großer Saal + Saal 1
Keynote Lecture in "Saal 2" 13:30 - 14:00

UB10 Session 10

Date: Thursday 17 March 2016
Time: 12:00 - 14:30
Location / Room: Booth 15, Exhibition Area

LabelPresentation Title
Authors
UB10.1HIGH-END 122GHZ MINIATURE RADAR SENSOR FOR AUTONOMOUS AIRCRAFTS
Presenter:
Federico Nava, Heinz Nixdorf Institute - Universität Paderborn, DE
Authors:
Federico Nava1 and Christoph Scheytt2
1Heinz Nixdorf Institute - Universität Paderborn, DE; 2Heinz Nixdorf Institute - Paderborn, DE
Abstract
The importance of high precision sensors, sensors-arrays and the concept of sensor fusion are rising interest in the field of scientific research for autonomous vehicles. For this reason the System and Circuit Technology group at the Heinz Nixdorf Institute is currently developing a highly integrated radar module as a sensor for Unmanned Aerial Vehicle applications. The presented system is composed of a radar IC (130nm SiGe) with in-package antennas and operating frequency of 122GHz mounted on a FLEX-PCB including a CORTEX M4 MCU for a total size of 30x30mm.
The presentation will show the FMCW/CW radar functions of the device, allowing the tracking of velocity and distance for multiple objects. The results of the radar measurements will be presented on a screen showing the raw data acquired in time domain and a FFT representation. Different objects will move simultaneously in the area of reception of the sensor. The results of the tracked distances will be then plotted on screen.

Download Paper (PDF)
UB10.2AGAMID: A TLM FRAMEWORK FOR EVALUATION OF HARDWARE-ENHANCED MANY-CORE RUN-TIME MANAGEMENT
Presenter:
Daniel Gregorek, University of Bremen, DE
Authors:
Daniel Gregorek and Alberto Garcia-Ortiz, University of Bremen, DE
Abstract
The advent of many-core processors raises novel demands to system design. Power-limitations and abundant parallelism require for efficient and scalable run-time management. But the design of a many-core run-time manager generally suffers from exhaustive evaluation time. AGAMID is a novel research framework for design space exploration of hardware-enhanced many-core run-time management. In this demo, we use AGAMID for the interactive analysis of many-core architectures and run-time management systems. We perform hands-on comparison of RTM architectures, RTM algorithms and HW/SW partitionings. We also give insights into the design and architecture of the framework itself.

Download Paper (PDF)
UB10.3COMPSOC: VIRTUALISING CONTROL APPLICATIONS ON A DISTRIBUTED COMPSOC PLATFORM
Presenter:
Kees Goossens, Eindhoven University of Technology, NL
Author:
Kees Goossens, Eindhoven University of Technology, NL
Abstract
In our University Booth we will demonstrate that multiple real-time control applications can be developed independently even though they share platform resources. We show that they can run together with other applications on a wireless network of multiple CompSOC platforms, where each platform has multiple processors, NOC, and a complete microkernel, streaming software, and resource management stack. We will also show that (control) applications can be quickly and safely loaded and started without interference to other (real-time control) applications, thus implementing a network of MPSOCs for distributed mixed time-criticality applications.

Download Paper (PDF)
UB10.4CLASH: DIGITAL CIRCUITS IN CλASH
Presenter:
Christiaan Baaij, University of Twente, NL
Authors:
Christiaan Baaij and Jan Kuper, University of Twente, NL
Abstract
CλaSH is a novel compiler system for generating digital circuits as described by a mathematical/functional specification of the architecture. We will demonstrate several applications written in CλaSH: * Tunneling ball device: With a minimal amount of acceleration, a fast spinning metal disk is either sped up or slowed down so that a falling ball can fall through one of the metal disk's two holes. * Music synthesizer and spectrum analyser: An audio CODEC samples music being played from either an MP3 player or a computer. We can apply several digital filters which affect the music. The effects of these filters can be both seen on a monitor, and heard through speakers connected to the FPGA board. * Multi-processor system: The system is used in a compiler construction course, where the compiler is written in the Haskell. Because CλaSH is proper subset of Haskell, students can build and experiment with the compiler and the multi-processor system in the same environment.

Download Paper (PDF)
UB10.5CONTREP: A SINGLE-SOURCE FRAMEWORK FOR UML-BASED MODELLING AND DESIGN OF MIXED-CRITICALITY SYSTEMS
Presenter:
Fernando Herrera, University of Cantabria, ES
Authors:
Fernando Herrera and Eugenio Villar, University of Cantabria, ES
Abstract
Mixed-criticality systems integrate applications, platform resources and requirements with different criticality. A criticality reflects the impact of either a failure of a component or a violation of a requirement, which can range from irrelevant to catastrophic effects. This booth presents the CONTREP framework, which supports UML/MARTE based modeling, analysis and design of mixed-criticality embedded systems. The booth shows a model of a quadcopter control system which integrates safety critical (e.g. flight control), mission-critical (e.g., a video processing payload), and non-critical (e.g., monitoring) functions. The booth shows how mixed-criticality is captured, together with the description of the functional architecture, and of the multi-core embedded platform where the system is implemented; how CONTREP automates different design activities, i.e. model validation, performance assessment and design space exploration, exploiting mixed-criticality information in every case.

Download Paper (PDF)
UB10.6BIOVIZ: AN INTERACTIVE VISUALIZATION ENGINE FOR MICROFLUIDIC BIOCHIPS
Presenter:
Oliver Keszöcze, University of Bremen, DE
Authors:
Oliver Keszöcze1, Jannis Stoppe2, Robert Wille3 and Rolf Drechsler2
1University of Bremen, DE; 2DFKI and University of Bremen, DE; 3Johannes Kepler University, AT, DFKI and University of Bremen, DE
Abstract
In order to shorten the required time for the analysis of medical substances, digital microfluidic biochips (DMFBs) have been suggested. Issues such as routing and layouting are complex and currently being investigated. Although first automatic solutions assist the designers, the results are usually provided in a complex and non-intuitive fashion. Creating solutions requires testing of different setups, comparing the results and debugging of algorithms. Solutions, while being technically correct, often include negative aspects such as e.g. unnecessary cell usage. These aspects are difficult to spot without being able to visually inspect the design. Still, while designers would benefit from visualization tools, no dedicated tools have been built yet. We present BioViz, an interactive visualization tool for DMFBs that explicitly addresses these problems.

Download Paper (PDF)
UB10.7WORKCRAFT: FRAMEWORK FOR INTERPRETED GRAPHS
Presenter:
Danil Sokolov, Newcastle University, GB
Author:
Danil Sokolov, Newcastle University, GB
Abstract
A large number of models that are employed in the field of concurrent systems' design, such as Petri nets, gate-level circuits, dataflow structures, etc. - all have an underlying static graph structure. Their semantics, however, is defined using additional entities, e.g. tokens or node/arc states, which in turn form the overall state of the system. We jointly refer to such formalisms as interpreted graph models (IGMs). Workcraft is designed to provide a flexible common framework for development of IGMs, including visual editing, (co)simulation and analysis. The similarities between the IGMs allow for links between different formalisms to be created, either by means of adapter interfaces or by conversion from one model type into another. This greatly extends the range of applicable modelling and analysis techniques.

Download Paper (PDF)
UB10.8CHIMPANC: CHANGE MANAGEMENT USING CHIMPANC
Presenter:
Jannis Stoppe, DFKI and University of Bremen, DE
Authors:
Jannis Stoppe, Martin Ring and Rolf Drechsler, DFKI and University of Bremen, DE
Abstract
One approach to remedy the issue of increasing complexity in the hardware design process is to provide designers with more abstract languages that allow systems to be designed top-down, starting with an abstract model of the system and its requirements. Several of these languages such as SysML and SystemC are being used today. We propose the Change Impact Analysis and Control Tool (ChImpAnC) to handle these challenges. ChImpAnC extracts the relevant information from the models on the different levels and constructs mappings between them, thus allowing to check consistency and refinements, and moreover calculating the impact of changes. Thus, ChImpAnC ensures that e.g. a written specification or documentation is not made obsolete by changes in the implementation without being warned about it.

Download Paper (PDF)
UB10.9IDDD: AN INTERACTIVE DEPENDABILITY DRIVEN DESIGN SPACE EXPLORATION
Presenter:
Stefan Scharoba, Brandenburg University of Technology Cottbus-Senftenberg, DE
Authors:
Stefan Scharoba, Jacob Lorenz and Heinrich T. Vierhaus, Brandenburg University of Technology Cottbus-Senftenberg, DE
Abstract
Due to the downscaling of transistor feature sizes, today's integrated circuits are much more likely to be affected by transient or permanent faults. In order to still meet certain dependability requirements, many different fault tolerance techniques have been developed, which can handle these faults in the field. Each of these techniques is associated with distinct costs and benefits. As a consequence, finding the fault tolerant implementation of the system that meets the actual requirements best represents a challenging task. We propose a tool that supports this process. It offers a set of hardware based fault tolerance techniques that can be applied to a given VHDL model. Afterwards, costs and benefits of the respective design choice are estimated automatically. Thus several fault tolerant versions of the design can be evaluated and compared with each other without implementing them manually. Finally, the VHDL code of the preferred design candidate can be generated by the tool.

Download Paper (PDF)
UB10.10RESECU_4_AMBRAMS: TOWARDS INCREASED RELIABILITY AND HARDWARE SECURITY USING AMBRAMS
Presenter:
Petr Pfeifer, TU Liberec, CZ
Author:
Petr Pfeifer, TU Liberec, CZ
Abstract
AmBRAMs-The new method and developed advanced Analysis Tool and Framework for Advanced Measurements and Reliability Assessments on Modern Nanoscale FPGAs creates revolutionary new set of tools enables complex lab-on-chip solutions in nanoscale FPGAs.AmBRAMs has been enhanced of advanced measurements and data processing supporting platform identification and security support functionality including tampering detection preferably in modern nanoscale programmable devices.It will be presented on VLIW soft processor cores equipped with a security IP,and showing also POF solutions and related functionality.Detection of power voltage variation using AmBRAMs technology is incorporated in the processor application and demonstrating it on a complex processor system.Presented on 28nm LP or 20nm HP UltraScale Xilinx FPGA devices.The 28nm FPGA solution will also show simple HW adjustments enabling support of power supply change required the demonstrator and for adaptive control presented as well.

Download Paper (PDF)
14:30End of session
15:30Coffee Break in Exhibition Area

Best-IP Best IP Award Presentation

Date: Thursday 17 March 2016
Time: 13:15 - 13:30
Location / Room: Saal 2

Organiser:
Gianluca Dini, University of Pisa, IT

TimeLabelPresentation Title
Authors
13:30End of session
15:30Coffee Break in Exhibition Area

11.0 LUNCH TIME KEYNOTE SESSION

Date: Thursday 17 March 2016
Time: 13:30 - 14:00
Location / Room: Saal 2

Chair:
Luca Fanucci, University of Pisa, IT

Co-Chair:
Matthias Schunter, Intel Corporation, DE

The lunch keynote presentation will be given by Dr. Rhines, recent 2015 recipient of the Phil Kaufman award. He will present a vision on design for security from EDA perspective.

TimeLabelPresentation Title
Authors
13:3011.0.1SECURE SILICON: ENABLER FOR THE INTERNET OF THINGS
Speaker and Author:
Walden C. Rhines, Mentor, US
Abstract
As electronic system hackers penetrate deeper—from applications to embedded software to OS to silicon—the impact of security threats is growing exponentially. Viruses and malware in the operating system, or application layer, are major concerns, but only affect a portion of users. In contrast, even small malicious modifications or compromised performance in the underlying silicon can devastate system security for all users. Growth of the Internet of Things magnifies the impact of the security problem by orders of magnitude. Since hardware is the root of trust in an electronic product, EDA companies will be increasingly pressured to solve the silicon security problems for their customers. This requires a new paradigm in silicon design creation and verification. The traditional EDA role is to design and then verify that the silicon does what it is supposed to do. Creating secure silicon, however, requires that verification ensure that the chip does nothing that it is NOT supposed to do. The industry is at the first stage of Secure Silicon awareness; it's going to become big business as future events unfold. Join Wally Rhines as he examines the growing threats to silicon security and EDA's possible solutions.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:00End of session
15:30Coffee Break in Exhibition Area

11.1 SPECIAL DAY Hot Topic: Embedded Security Applications

Date: Thursday 17 March 2016
Time: 14:00 - 15:30
Location / Room: Saal 2

Chair:
Tim Güneysu, University of Bremen, DE

Co-Chair:
X. Sharon Hu, University of Notre Dame, US

Embedded security devices end up in a wide range of applications. In this third session of the special day on secure systems, three industrial application fields are selected to illustrate the need for security and trust. The first application area is the one of smart grid and smart electricity distribution. The second application area is taken from Industry4.0, the goal of which is to develop the next generation smart factory. Automotive is a third exemplary field for which ICT security is essential: from the entertainment system to the SW controlling the engine or brakes.

TimeLabelPresentation Title
Authors
14:0011.1.1EMBEDDED SECURITY - FREQUENT PITFALLS AND SOLUTIONS
Speaker and Author:
Johann Heyszl, Fraunhofer Institute for Applied and Integrated Security AISEC, DE
Abstract
Based on the experience from many projects in embedded security for customers from the automotive, industrial control, and other domains, the talk will highlight the most common issues and pitfalls when trying to achieve acceptable information security in such embedded systems, including hardware-based attacks. Alongside, the talk will discuss methods and solutions which are ready for use to counteract those.
14:3011.1.2SECURITY IN INDUSTRIE 4.0 - CHALLENGES AND SOLUTIONS FOR THE FOURTH INDUSTRIAL REVOLUTION
Speaker:
Michael Kasper, Fraunhofer SIT, DE
Authors:
Michael Waidner1 and Michael Kasper2
1Technische Universität Darmstadt and Fraunhofer SIT, DE; 2Fraunhofer SIT, DE
Abstract
Information technology (IT) is one of the most important drivers of innovation in production and automation. In Germany, the term Industrie 4.0 summarizes various activities and developments involved in the evolution of industrial processes in production, logistics, automation, etc. Many research and development projects work on different aspects of these developments. In the view of politics, industry, and IT enterprises, sufficient IT security is considered an essential prerequisite for the future of production. Although many current IT security solutions can be applied in Industrie 4.0 context, they do not satisfy requirements of processes in Industrie 4.0. Work needs to be done on underlying security mechanisms as well as on security architectures.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.1.3SECURITY FOR AUTOMOTIVE AND THE INTERNET OF THINGS
Speaker:
Hans Löhr, Robert Bosch GmbH, DE
Authors:
Paul Duplys, Hans Löhr, Herve Seudie and Robert Szerwinski, Robert Bosch GmbH, DE
Abstract
Increasing connectivity of vehicles and devices from various domains such as building automation or home appliances leads to higher attention to security topics. In particular, several attacks on connected vehicles have been demonstrated at recent security conferences and gained high media attention. But connected sensors and devices in other areas are also increasingly exposed to risks. Bosch, as a leading global automotive supplier and manufacturer of sensors and devices, is carrying out a number of activities to analyse the security and privacy risks in a connected world, develop innovative security solutions, and advance research for security and privacy in our connected future. In this talk, we outline practical use cases and application scenarios from the automotive and other domains, as well as the current state of the art / state of the industry. Moreover, we present some of our efforts towards a secure Internet of Things.
15:30End of session
Coffee Break in Exhibition Area

11.2 Beating New Technology Paths for NoC

Date: Thursday 17 March 2016
Time: 14:00 - 15:30
Location / Room: Konferenz 6

Chair:
Fabien Clermidy, CEA-Leti, FR

Co-Chair:
Sébastien Le Beux, Le Beux, FR

Silicon photonics and wireless links are among the most interesting emerging technologies for on-chip communication. The first paper of this section presents a comprehensive approach for floorplanning a silicon-photonic NoC that accounts for cross-layer effects spanning the optical and electrical boundaries. The second and third papers propose new solutions for power and energy management of wireless NoCs.

TimeLabelPresentation Title
Authors
14:0011.2.1CROSS-LAYER FLOORPLAN OPTIMIZATION FOR SILICON PHOTONIC NOCS IN MANY-CORE SYSTEMS
Speaker:
Ayse Coskun, Boston University, US
Authors:
Ayse Coskun1, Anjun Gu2, Warren Jin3, Ajay Jayant Joshi1, Andrew B. Kahng2, Jonathan Klamkin3, Yenai Ma1, John Recchio2, Vaishnav Srinivas2 and Tiansheng Zhang1
1Boston University, US; 2University of California, San Diego, US; 3UC Santa Barbara, US
Abstract
Many-core chip architectures are now feasible, but the power consumption of electrical networks-on-chip does not scale well. Silicon photonic NoCs (PNoCs) are more scalable and power efficient, but floorplan optimization is challenging. Prior work optimizes PNoC floorplans through simultaneous place and route, but does not address cross-layer effects that span optical and electrical boundaries, chip thermal profiles, or effects of job scheduling policies. This paper proposes a more comprehensive, cross-layer optimization of the silicon PNoC and core cluster floorplan. Our simultaneous placement (locations of router groups and core clusters) and routing (waveguide layout) considers scheduling policy, thermal tuning, and heterogeneity in chip power profiles. The core of our optimizer is a mixed-integer linear programming formulation that minimizes NoC power, including (1) laser source power due to propagation, bend and crossing losses; (2) electrical and electrical-optical-electrical conversion power; and (3) thermal tuning power. Our experiments vary numbers of cores, optical data rate per wavelength, number of waveguides and other parameters to investigate scalability and tradeoffs through a large design space. We demonstrate how the optimal floorplan changes with cross-layer awareness: metrics of interest such as optimal waveguide length or thermal tuning power change significantly (up to 4X) based on power and utilization levels of cores, chip and cluster aspect ratio, and laser source sharing mechanism. Exploration of a large solution space is achieved with reasonable runtimes, and is perfectly parallelizable. Our optimizer thus affords designers with more accurate, cross-layer chip planning decision support to accelerate adoption of PNoC-based solutions.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.2.2ADAPTIVE MULTI-VOLTAGE SCALING IN WIRELESS NOC FOR HIGH PERFORMANCE LOW POWER APPLICATIONS
Speaker:
Sujay Deb, IIIT Delhi, IN
Authors:
Hemanta Kumar Mondal, Sri Harsha Gade, Raghav Kishore and Sujay Deb, IIIT Delhi, IN
Abstract
Networks-on-Chip (NoCs) have garnered significant interest as communication backbone for multicore processors used across a wide range of fields that demand higher computation capability. Wireless NoCs (WNoCs) by augmenting single hop, long range wireless links with wired interconnects; offer the most promising solution to reduce multi hop long distance communication bottlenecks and opens up innumerable possibilities of topological innovations that are not possible otherwise. However, energy consumption in routers along with Wireless Interface (WI) components still remains considerably high. Specifically for large systems with many nodes in the network, a significant amount of energy is consumed by the communication infrastructure (routers, links, WIs). The usage of the routers and WIs are application dependent and for most cases performance requirements can be met without operating the whole communication infrastructure to its maximum limit. Dynamic reconfigurable systems that can switch between both high performance and low power modes can cater to wide range of applications. In this paper, we propose a novel design methodology for energy efficient WNoC using Adaptive Multi-voltage Scaling (AMS) that reduces dynamic power consumption, along with power gating to prevent static power dissipation in routers and WIs. We evaluate our proposed design in presence of real and synthetic traffic patterns. This approach saves up to 62.50% of static power with less than 1% area overhead. In different traffic scenarios, the proposed WNoC reduces overall packet energy dissipation by 35% on average compared to a regular WNoC, without significant performance degradation. Design considerations for augmenting existing WNoCs with these routers and corresponding overheads are also presented.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.2.3ENERGY EFFICIENT TRANSCEIVER IN WIRELESS NETWORK ON CHIP ARCHITECTURES
Speaker:
Davide Patti, University of Catania, IT
Authors:
Vincenzo Catania1, Andrea Mineo1, Salvatore Monteleone1, Maurizio Palesi2 and Davide Patti1
1University of Catania, IT; 2Kore University, IT
Abstract
The emergent wireless Network-on-Chip (WiNoC) design paradigm has been proposed as a viable solution for addressing the scalability issues affecting the on-chip communication system in future manycores architectures. Within this scenario, the energy contribution of the buffers (both of the routers and radio-hubs) and the transceivers of the radio-hubs, account for a significant fraction of the total communication energy budget. In this paper, we propose a novel energy management scheme aimed at improving the energy efficiency of a WiNoC architecture based on the selective disabling of the power hungry modules that are predicted being not used during the forthcoming clock cycles.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30IP5-9, 115PRADA: COMBATING VOLTAGE NOISE IN THE NOC POWER SUPPLY THROUGH FLOW-CONTROL AND ROUTING ALGORITHMS
Speaker:
Prabal Basu, Utah State University, US
Authors:
Prabal Basu, Rajesh JayashankaraShridevi, Koushik Chakraborty and Sanghamitra Roy, Utah State University, US
Abstract
Network-on-Chip (NoC) has become the de-facto standard for on-chip communication in MPSoCs. The growing NoC power footprint, increase in the transistor current, and high switching speed of the logic devices, exacerbate the peak power supply noise (PSN) in the NoC power delivery network (PDN). Hence, preserving power supply integrity in the NoC PDN is critical. In this work, we propose PRADA (PSN-aware Runtime Adaptation)—a collection of a novel flow-control protocol (PAF) and an adaptive routing algorithm (PAR), to mitigate PSN in NoCs. Our best scheme achieves 14% and 12% improvements in the regional peak PSN and energy ef- ficiency, with an average of 4.6% performance overhead and marginal area and power footprints.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area

11.3 Microarchitectures and Workload Allocation for Energy Efficiency

Date: Thursday 17 March 2016
Time: 14:00 - 15:30
Location / Room: Konferenz 1

Chair:
Daniele Bortolotti, Univ. of Bologna, IT

Co-Chair:
Andreas Burg, École Polytechnique Fédérale de Lausanne (EPFL), CH

The session discusses novel power modeling, workload allocation, and microarchitectural techniques for improving energy efficiency in data centers and processors

TimeLabelPresentation Title
Authors
14:0011.3.1RESISTIVE CONFIGURABLE ASSOCIATIVE MEMORY FOR APPROXIMATE COMPUTING
Speaker:
Abbas Rahimi, University of California, Berkeley, US
Authors:
Mohsen Imani1, Abbas Rahimi2 and Tajana Rosing3
1UC San Diego, US; 2University of California, Berkeley, US; 3University of California, San Diego, US
Abstract
Modern computing machines are increasingly characterized by large scale parallelism in hardware (such as GP-GPUs) and advent of large scale and innovative memory blocks. Parallelism enables expanded performance tradeoffs whereas memories enable reuse of computational work. To be effective, however, one needs to ensure energy efficiency with minimal reuse overheads. In this paper, we describe a resistive configurable associative memory (ReCAM) that enables selective approximation and asymmetric voltage overscaling to manage delivered efficiency. The ReCAM structure matches an input pattern with pre-stored ones by applying an approximate search on selected bit indices (bitline-configurable) or selective pre-stored patterns (row-configurable). To further reduce energy, we explore proper ReCAM sizing, various configurable search operations with low overhead voltage overscaling, and different ReCAM update policies. Experimental result on the AMD Southern Islands GPUs for eight applications shows bitline-configurable and row-configurable ReCAM achieve on average to 43.6% and 44.5% energy savings with an acceptable quality loss of 10%.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.3.2EXPLOITING CPU-LOAD AND DATA CORRELATIONS IN MULTI-OBJECTIVE VM PLACEMENT FOR GEO-DISTRIBUTED DATA CENTERS
Speaker:
Ali Pahlevan, École Polytechnique Fédérale de Lausanne (EPFL), CH
Authors:
Ali Pahlevan, Pablo Garcia del Valle and David Atienza, École Polytechnique Fédérale de Lausanne (EPFL), CH
Abstract
Cloud computing has been proposed as a new paradigm to deliver services over the internet. The proliferation of cloud services and increasing users' demands for computing resources have led to the appearance of geo-distributed data centers (DCs). These DCs host heterogeneous applications with changing characteristics, like the CPU-load correlation, that provides significant potential for energy savings when the utilization peaks of two virtual machines (VMs) do not occur at the same time, or the amount of data exchanged between VMs, that directly impacts performance, i.e. response time. This paper presents a two-phase multi-objective VM placement, clustering and allocation algorithm, along with a dynamic migration technique, for geo-distributed DCs coupled with renewable and battery energy sources. It exploits the holistic knowledge of VMs characteristics, CPU-load and data correlations, to tackle the challenges of operational cost optimization and energy-performance trade-off. Experimental results demonstrate that the proposed method provides up to 55% operational cost savings, 15% energy consumption, and 12% performance (response time) improvements when compared to state-of-the-art schemes.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.3.3ENERGY EFFICIENCY IN CLOUD-BASED MAPREDUCE APPLICATIONS THROUGH BETTER PERFORMANCE ESTIMATION
Speaker:
Seyed Morteza Nabavinejad, Sharif University of Technology, IR
Authors:
Seyed Morteza Nabavinejad and Maziar Goudarzi, Sharif University of Technology, IR
Abstract
An important issue for efficient execution of MapReduce jobs on a cloud platform is selecting the best fitting virtual machine (VM) configuration(s) among the miscellany of choices that cloud providers offer. Wise selection of VM configurations can lead to better performance, cost and energy consumption. Therefore, it is crucial to explore the available configurations and choose the best one for each given MapReduce application. Executing the given application on all the configurations for comparison is a costly, time and energy consuming process. An alternative is to run the application on a subset of configurations (sample configurations) and estimate its performance on other configurations based on the obtained values on those sample configurations. We show that the choice of these sample configurations highly affects accuracy of later estimations. Our Smart Configuration Selection (SCS) scheme chooses better representatives from among all configurations by once-off analysis of given performance figures of the benchmarks so as to increase the accuracy of estimations of missing values, and consequently, to more accurately choose the configuration providing the highest performance. The results show that the SCS choice of sample configurations is very close to the best choice, and can reduce estimation error to 7.11% from the original 16.02% of random configuration selection. Furthermore, this more accurate performance estimation saves 24.3% energy on average.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:1511.3.4UNSUPERVISED POWER MODELING OF CO-ALLOCATED WORKLOADS FOR ENERGY EFFICIENCY IN DATA CENTERS
Speaker:
Juan Carlos Salinas-Hilburg, Universidad Politécnica de Madrid, ES
Authors:
Juan Carlos Salinas-Hilburg1, Marina Zapater2, José L. Risco-Martín3, Jose Manuel Moya1 and Jose L. Ayala3
1Universidad Politécnica de Madrid, ES; 2CEI Campus Moncloa, UCM-UPM, ES; 3Universidad Complutense de Madrid, ES
Abstract
Data centers are huge power consumers and their energy consumption keeps on rising despite the efforts to increase energy efficiency. A great body of research is devoted to the reduction of the computational power of these facilities, applying techniques such as power budgeting and power capping in servers. Such techniques rely on models to predict the power consumption of servers. However, estimating overall server power for arbitrary applications when running co-allocated in multithreaded servers is not a trivial task. In this paper, we use Grammatical Evolution techniques to predict the dynamic power of the CPU and memory subsystems of an enterprise server using the hardware counters of each application. On top of our dynamic power models, we use fan and temperature-dependent leakage power models to obtain the overall server power. To train and test our models we use real traces from a presently shipping enterprise server under a wide set of sequential and parallel workloads running at various frequencies We prove that our model is able to predict the power consumption of two different tasks co-allocated in the same server, keeping error below 8W. For the first time in literature, we develop a methodology able to combine the hardware counters of two individual applications, and estimate overall server power consumption without running the co-allocated application. Our results show a prediction error below 12W, which represents a 7.3% of the overall server power, outperforming previous approaches in the state of the art.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30IP5-10, 205A POWER-EFFICIENT 3-D ON-CHIP INTERCONNECT FOR MULTI-CORE ACCELERATORS WITH STACKED L2 CACHE
Speaker:
Kyungsu Kang, Samsung, KR
Authors:
Kyungsu Kang1, Luca Benini2, Giovanni De Micheli3, Sangho Park1 and Jong-Bae Lee1
1Samsung, KR; 2Università di Bologna, IT; 3École Polytechnique Fédérale de Lausanne (EPFL), CH
Abstract
The use of multi-core clusters is a promising option for data-intensive embedded applications such as multimodal sensor fusion, image understanding, mobile augmented reality. In this paper, we propose a power-efficient 3-D onchip interconnect for multi-core clusters with stacked L2 cache memory. A new switch design makes a circuit-switched Mesh-of-Tree (MoT) interconnect reconfigurable to support power-gating of processing cores, memory blocks, and unnecessary interconnect resources (routing switch, arbitration switch, inverters placed along the on-chip wires). The proposed 3-D MoT improves the power efficiency up to 77% in terms of energy-delay product (EDP).

Download Paper (PDF; Only available from the DATE venue WiFi)
15:31IP5-11, 898POWER-EFFICIENT LOAD-BALANCING ON HETEROGENEOUS COMPUTING PLATFORMS
Speaker:
Muhammad Shafique, Karlsruhe Institute of Technology (KIT), DE
Authors:
Muhammad Usman Karim Khan1, Muhammad Shafique1, Apratim Gupta2, Thomas Schumann2 and Jörg Henkel1
1Karlsruhe Institute of Technology (KIT), DE; 2University of Applied Sciences, Darmstadt, DE
Abstract
In order to address the throughput constraints of the system at minimal power consumption, the workload of computing nodes should be balanced. This requires accounting for the underlying hardware characteristics (e.g., power vs. frequency profiles) and throughput sustainable by these nodes. This work provides a workload distribution and balancing methodology of a divisible load under a throughput constraint, on heterogeneous nodes. The power efficiency of each node is considered during load distribution. For load balancing, the frequency of the node is determined which just fulfills the job requirements of the nodes. We functionally verify our methodology by implementing it on an FPGA-based system, with heterogeneous multi-cores and hardware accelerators, and report results for different image processing benchmarks. Compared to a state-of-the-art-approach, our approach results in up to 64% performance improvement for the benchmarks evaluated in this paper.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area

11.4 Automating Test Generation, Assertions and Diagnosis

Date: Thursday 17 March 2016
Time: 14:00 - 15:30
Location / Room: Konferenz 2

Chair:
Pablo Sanchez, University of Cantabria, ES

Co-Chair:
Ronny Morad, IBM, IL

The session presents methodologies for automatic test generation of memory controllers and arithmetic circuits. They are complemented with techniques that improve assertion simulation and post-silicon debugging. The interactive presentations will introduce new ideas about generating properties and tests.

TimeLabelPresentation Title
Authors
14:0011.4.1AUTOMATED TEST GENERATION FOR DEBUGGING ARITHMETIC CIRCUITS
Speaker:
Prabhat Mishra, University of Florida, US
Authors:
Farimah Farahmandi and Prabhat Mishra, University of Florida, US
Abstract
Optimized and custom arithmetic circuits are widely used in embedded systems such as multimedia applications, cryptography systems, signal processing and console games. Debugging of arithmetic circuits is a challenge due to increasing complexity coupled with non-standard implementations. Exiting equivalence checking techniques produce a remainder to indicate the presence of a potential bug. However, bug localization remains a major bottleneck. Simulation-based validation using random or constrained-random tests are not effective and can be infeasible for complex arithmetic circuits. In this paper, we present an automated test generation and bug location technique for debugging arithmetic circuits. This paper makes two important contributions. We propose an automated approach for generating directed tests by suitable assignments of input variables to make the reminder non-zero. The generated tests are guaranteed to activate the unknown bug(s). We also propose a bug detection/correction technique using remainder's terms scanning as well as the intersection of regions activated by the generated tests. Our experimental results demonstrate that the proposed approach can be used for automated debugging of complex arithmetic circuits.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.4.2MCXPLORE: AN AUTOMATED FRAMEWORK FOR VALIDATING MEMORY CONTROLLER DESIGNS
Speaker:
Mohamed Hassan, University of Waterloo, CA
Authors:
Mohamed Hassan and Hiren Patel, University of Waterloo, CA
Abstract
This work presents an automated framework for the validation of dynamic random access memory controllers (DRAM MCs) called MCXplore. In developing this framework, we construct formal models for memory requests interrelation and DRAM command interaction. The framework enables validation engineers to define their test plans precisely as temporal logic specifications. We use the NuSMV model-checker to generate counter-examples that serve as test templates; hence, MCXplore uses these test templates to generate memory tests to validate the correctness properties of the memory controller. We show the effectiveness of MCXplore by validating various state-of-the-art MC features as well as hard-to-detect timing violations that often occur. We also provide a set of predefined test plans, and regression tests that validate essential properties of modern DRAM MCs. We release MCXplore as an open-source framework to allow validation engineers and researchers to extend and use.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.4.3EAST: EFFICIENT ASSERTION SIMULATION TECHNIQUES
Speaker:
Ansuman Banerjee, Indian Statistical Institute, IN
Authors:
Debjyoti Bhattacharjee, Soumi Chattopadhyay and Ansuman Banerjee, Indian Statistical Institute, IN
Abstract
In the context of simulation-based verification, the Assertion-based Verification (ABV) methodology has become the technology of choice, with increasing proliferation of Verification / Assertion IPs for most commonly used protocols. To support the ABV flow, current generation simulators typically create threads for the assertions and evaluate each assertion separately by converting them into finite state automatons and monitoring their states during simulation. In this paper, we propose a different technique for assertion evaluation in a simulation-based verification flow. The proposed technique, EAST (Efficient Assertion Simulation Techniques), handles assertions in groups, instead of examining them in isolation, and achieves significant performance benefits. To this effect, our algorithm has a pre-processing phase (prior to simulation) which creates a shared data structure from the set of assertions using some simple rules, based on the assertion language operators. This is attached with the simulator and during simulation, at each evaluation cycle, EAST infers the decision of the assertions by a combination of lookup and substitution. We present our proposal using Linear Temporal Logic (LTL) assertions in this paper. Our prototype, EAST, achieves promising performance numbers in terms of both runtime and peak memory for both random and standard benchmark protocol designs.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:1511.4.4COMBINATIONAL TRACE SIGNAL SELECTION WITH IMPROVED STATE RESTORATION FOR POST-SILICON DEBUG
Speaker:
Bijan Alizadeh, University of Tehran, IR
Authors:
Siamack BeigMohammadi and Bijan Alizadeh, University of Tehran, IR
Abstract
Signal selection is the pre-silicon evaluation step which maximizes overall reconstruction rate of state elements. Due to its high complexity, recent efforts on signal selection has focused on sequential nodes in the circuit under debug. In this paper we propose a combinational signal selection algorithm which possibly selects combinational nodes of the circuit to maximize state restoration capability and achieve significant improvements compared to the state-of-the-art signal selection algorithms. To compensate for increase in problem complexity, we also propose a fast state restoration algorithm which offers significant improvement on simulation time over the state-of-the-art state restoration algorithms.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30IP5-12, 529TOPAZ: MINING HIGH-LEVEL SAFETY PROPERTIES FROM LOGIC SIMULATION TRACES
Speaker:
Fadi Kurdahi, University of California, Irvine, US
Authors:
Ahmed Nassar1, Fadi Kurdahi1 and Salam Zantout2
1University of California, Irvine, US; 2American University of Beirut, LB
Abstract
Formal specifications are hard to formulate and maintain for evolving complex digital hardware designs. Specification mining offers a (partially) automated route to discovering specifications from large simulation traces. In this paper, we embark on a novel and rigorous mining methodology (data preparation, mining algorithms, selection criteria, etc.) for finite-state automata checkers using an iterative and interactive mining tool, called Topaz. Topaz is evaluated using an open-source 32-bit RISC CPU design as a case study to demonstrate extraction of complex temporal properties cross-cutting through all CPU pipeline stages, guided by the CPU instruction set specification.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:31IP5-13, 808EXPLOITING TRANSACTION LEVEL MODELS FOR OBSERVABILITY-AWARE POST-SILICON TEST GENERATION
Speaker:
Prabhat Mishra, University of Florida, US
Authors:
Farimah Farahmandi1, Prabhat Mishra1 and Sandip Ray2
1University of Florida, US; 2Intel Corporation, US
Abstract
A critical problem in post-silicon debug is to generate efficient tests that both activate requisite coverage goals on the target hardware as well as produce results that are observable through a given on-chip design-for-debug architecture. Unfortunately, such tests cannot be generated directly from RTL models, both due to design complexity and due to bugs in the design itself. In this paper, we propose an approach to address this problem by exploiting transaction-level models (TLM). Our approach involves mapping test and observability requirements between TLM and RTL, enabling TLM analysis to generate post-silicon tests. We provide case studies from a number of different design classes to demonstrate the flexibility and effectiveness of the approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area

11.5 Design of Efficient Microarchitectures

Date: Thursday 17 March 2016
Time: 14:00 - 15:30
Location / Room: Konferenz 3

Chair:
Dionisios Pnevmatikatos, Technical University of Crete, GR

Co-Chair:
Todd Austin, University of Michigan, US

The microarchitecture session presents innovative ideas for the efficient design of computing components. The first paper presents a viable prediction technique to deactivate cache ways in order to save energy without compromising performance. The second paper proposes a micro-architectural extension for approximate computing that reduces bit-error-rate while providing the power benefits of extreme voltage scaling techniques. The third paper presents a faster and accurate version of logarithmic number unit (LNU) design and implementation using a co-transformation scheme.

TimeLabelPresentation Title
Authors
14:0011.5.1PRACTICAL WAY HALTING BY SPECULATIVELY ACCESSING HALT TAGS
Speaker:
Daniel Moreau, Chalmers University of Technology, SE
Authors:
Daniel Moreau1, Alen Bardizbanyan1, Magnus Själander2, Dave Whalley3 and Per Larsson-Edefors1
1Chalmers University of Technology, SE; 2Uppsala University, SE; 3Florida State University, US
Abstract
Conventional set-associative data cache accesses waste energy since tag and data arrays of several ways are simultaneously accessed to sustain pipeline speed. Different access techniques to avoid activating all cache ways have been previously proposed in an effort to reduce energy usage. However, a problem that many of these access techniques have in common is that they need to access different cache memory portions in a sequential manner, which is difficult to support with standard synchronous SRAM memory. We propose the speculative halt-tag access (SHA) approach, which accesses low-order tag bits, i.e., the halt tag, in the address generation stage instead of the SRAM access stage to eliminate accesses to cache ways that cannot possibly contain the data. The key feature of our SHA approach is that it determines which tag and data arrays need to be accessed early enough for conventional SRAMs to be used. We evaluate the SHA approach using a 65-nm processor implementation running MiBench benchmarks and find that it on average reduces data access energy by 25.6%.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.5.2LAZY PIPELINES: ENHANCING QUALITY IN APPROXIMATE COMPUTING
Speaker:
Georgios Tziantzioulis, Northwestern University, US
Authors:
Georgios Tziantzioulis1, Ali Murat Gok1, S M Faisal2, Nikos Hardavellas1, Seda Ogrenci-Memik1 and Srinivasan Parthasarathy2
1Northwestern University, US; 2The Ohio State University, US
Abstract
Approximate computing techniques based on Voltage Over-Scaling (VOS) can provide quadratic improvements in power efficiency. However, voltage scaling is limited by the inherent fault-tolerance of an application, thus preventing VOS schemes from realizing their full potential. To gain further power efficiency a reduction of the error rate experienced in a given voltage level is required. We propose Lazy Pipelines, a micro-architectural technique that utilizes vacant cycles in a VOS functional unit to extend execution and reduce the error rate.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.5.3HIGH-EFFICIENCY LOGARITHMIC NUMBER UNIT DESIGN BASED ON AN IMPROVED CO-TRANSFORMATION SCHEME
Speaker:
Youri Popoff, ETH Zürich, CH
Authors:
Youri Popoff, Florian Scheidegger, Michael Schaffner, Michael Gautschi, Frank K. Gürkaynak and Luca Benini, ETH Zürich, CH
Abstract
The logarithmic number system (LNS) has always been an interesting alternative for floating point calculations since the implementation of several arithmetic operations such as divisions, exponentiations and square-roots, which are required for computationally intensive nonlinear functions, is greatly simplified in the logarithmic space. However, additions and subtractions become nonlinear operations that have to be approximated using polynomials for area efficient realizations. A particular challenge is the accuracy within the so-called critical region which is encountered for subtractions where the difference between the operands is close to zero. In the literature, several arithmetic cotransformations that reduce the overhead of approximating these operations have been presented. Even so, the main problem with practical LNS realizations is the area overhead when compared to standard FPUs with comparable accuracy. In this paper, we propose a highly hardware-efficient novel cotransformation concept that not only reduces the area requirements by up to 35% when compared to the state-of-the-art, but also allows the LNU to calculate single cycle logarithms and exponentiations within the same datapath. We present comprehensive results for a complete processing system that includes the LNU and an OpenRISC based core in 65nm, and 28nm technologies. We compare this implementation with a system using a standard IEEE compliant FPU and show that the LNS based system can outperform its FP counterpart by up to 4.35x in speed. The final, pipelined LNU system when implemented in 65nm occupies an area of 54.3 kGE, allows 89 MFLOP per second and consumes 15.9- 136.7 pJ per operation at 1.2V under typical conditions and 25°C.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30IP5-14, 521SEERAD: A HIGH SPEED YET ENERGY-EFFICIENT ROUNDING-BASED APPROXIMATE DIVIDER
Speaker:
Ali Afzali-Kusha, University of Tehran, IR
Authors:
Reza Zendegani1, Mehdi Kamal1, Arash Fayyazi1, Ali Afzali-Kusha1, Saeed Safari1 and Massoud Pedram2
1University of Tehran, IR; 2University of Southern California, US
Abstract
In this paper, a high speed yet energy-efficient approximate divider for error resilient applications is proposed. For the division operation, the divisor is rounded to a value with a specific form resulting in the transformation of the division operation to the multiplication one. The proposed approximate divider enjoys the flexibility of increasing the accuracy at the price of higher delay and hardware usage. The efficacy of the proposed approximate divider is evaluated in comparison to three different implementations of the SRT divider. The results show that the delay and energy consumption of the proposed approximate divider are, on average, 14 and 300 times smaller than those of the Radix-2 SRT with the carry-save reminder computation. Additionally, the effectiveness of the proposed approximate divider is studied in an image division operation performed in image processing applications. The results suggest the appropriateness of the proposed approximate divider for digital signal processing applications.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:31IP5-15, 140IMPROVING PERFORMANCE GUARANTEES IN WORMHOLE MESH NOC DESIGNS
Speaker:
Milos Panic, Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES
Authors:
Milos Panic1, Carles Hernandez2, Jaume Abella2, Antoni Roca Perez3, Eduardo Quinones2 and Francisco Cazorla4
1Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES; 2Barcelona Supercomputing Center, ES; 3Universitat Politècnica de Catalunya, ES; 4Barcelona Supercomputing Center and IIIA-CSIC, ES
Abstract
Wormhole-based mesh Networks-on-Chip (wNoC) are deployed in high-performance many-core processors due to their physical scalability and low-cost. Delivering tight and time composable Worst-Case Execution Time (WCET) estimates for applications as needed in safety-critical real-time embedded systems is challenged by wNoCs due to their distributed nature. We propose a bandwidth control mechanism for wNoCs that enables the computation of tight time-composable WCET estimates with low average performance degradation and high scalability. Our evaluation with the EEMBC automotive suite and an industrial real-time parallel avionics application confirms so.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:32IP5-16, 906A DATA LAYOUT TRANSFORMATION (DLT) ACCELERATOR: ARCHITECTURAL SUPPORT FOR DATA MOVEMENT OPTIMIZATION IN ACCELERATED-CENTRIC HETEROGENEOUS SYSTEMS
Speaker:
Tung Hoang, University of Chicago, US
Authors:
Tung Hoang, Amirali Shambayati and Andrew A. Chien, University of Chicago, US
Abstract
Technology scaling and growing use of accelerators make optimization of data movement of increasing importance in all computing systems. Further, growing diversity in memory structures makes embedding such optimization in software non-portable. We propose a novel architectural solution called Data Layout Transformation (DLT) associated with a simple set of instructions that enable software to describe the required data movement compactly, and free the implementation to optimize the movement based on the knowledge of the memory hierarchy and system structure. The DLT architecture ideas can be applicable to both general-purpose and accelerator-based heterogeneous systems. Experiment results first show that the proposed DLT architecture can make use of the full bandwidth (>97%) of a wide range of memory systems (DDR3 and HMC) while its implementation cost, in 32nm, is low (only 0.246 mm2 and 75mW at 1GHz). Our evaluation of using the DLT accelerator in accelerated-based heterogeneous system across DDR3 and HMC memory shows that the DLT can enhance system performance in range of 4.6x-99x (DDR3), 4.4x-115x (HMC) which turns out 2.8x-48x (DDR3), 1.4x-39x (HMC) improvement for energy efficiency.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:33IP5-17, 203OUESSANT: FLEXIBLE INTEGRATION OF DEDICATED COPROCESSORS IN SYSTEMS ON CHIP
Speaker:
Pierre-Henri Horrein, Lab-STICC/Télécom Bretagne, FR
Authors:
Pierre-Henri Horrein, Philip-Dylan Gleonec, Erwan Libessart, André Lalevée and Matthieu Arzel, Lab-STICC/Télécom Bretagne, FR
Abstract
Integration of hardware accelerators in System on Chips is often complex. When dealing with reconfigurable hardware, this greatly limits the attainable flexibility. In this paper, we propose an alternative approach to the Molen paradigm [1]. This approach, named Ouessant, is based on a very simple general purpose instruction set designed for close interaction with dedicated hardware accelerators. This instruction set is used to program a dedicated controler, which commands the accelerator's execution and data transfer with minimal CPU intervention. The resulting architecture is flexible, extensible, and can be easily integrated in System on Chips. Adding new accelerators is also made easier. Implementation of the architecture on different FPGA resources show very low footprint and a very small impact on attainable performance. Ouessant is freely available under an open-source license.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area

11.6 Applications of Reconfigurable Computing

Date: Thursday 17 March 2016
Time: 14:00 - 15:30
Location / Room: Konferenz 4

Chair:
Alessandro Cilardo, University of Naples Federico II, IT

Co-Chair:
Koen Bertels, Delft University of Technology, NL

FPGAs and other reconfigurable architectures are becoming prolific as a platform for implementing a broad domain of applications. In this session, we have three papers and an interactive presentation focused on the design of computer vision, machine learning, and video processing on reconfigurable architectures.

TimeLabelPresentation Title
Authors
14:0011.6.1EFFICIENT FPGA ACCELERATION OF CONVOLUTIONAL NEURAL NETWORKS USING LOGICAL-3D COMPUTE ARRAY
Speaker:
Atul Rahman, UNIST, KR
Authors:
Atul Rahman1, Jongeun Lee1 and Kiyoung Choi2
1UNIST, KR; 2Seoul National University, KR
Abstract
Convolutional Deep Neural Networks (DNNs) are reported to show outstanding recognition performance in many image-related machine learning tasks. DNNs have a very high computational requirement, making accelerators a very attractive option. These DNNs have many convolutional layers with different parameters in terms of input/output/kernel sizes as well as input stride. Design constraints usually require a single design for all layers of a given DNN. Thus a key challenge is how to design a common architecture that can perform well for all convolutional layers of a DNN, which can be quite diverse and complex. In this paper we present a flexible yet highly efficient 3D neuron array architecture that is a natural fit for convolutional layers. We also present our technique to optimize its parameters including on-chip buffer sizes for a given set of resource constraint for modern FPGAs. Our experimental results targeting a Virtex-7 FPGA demonstrate that our proposed technique can generate DNN accelerators that can outperform the state-of-the-art solutions, by 22% for 32-bit floating-point MAC implementations, and are far more scalable in terms of compute resources and DNN size.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.6.2ENERGY EFFICIENT VIDEO FUSION WITH HETEROGENEOUS CPU-FPGA DEVICES
Speaker:
Peng Sun, University of Bristol, GB
Authors:
Peng Sun1, Alin Achim1, Ian Hasler2, Paul Hill1 and Jose Nunez-Yanez1
1University of Bristol, GB; 2Qioptiq LTD, GB
Abstract
This paper presents a complete video fusion system with hardware acceleration and investigates the energy trade-offs between computing in the CPU or the FPGA device. The video fusion application is based on the Dual-Tree Complex Wavelet Transforms (DT-CWT). Video fusion combines information from different spectral bands into a single representation and advanced algorithms based on wavelet transforms are compute and energy intensive. In this work the transforms are mapped to a hardware accelerator using high-level synthesis tools for the FPGA and also vectorized code for the single instruction multiple data (SIMD) engine available in the CPU. The accelerated system reduces computation time and energy by a factor of 2. Moreover, the results show a key finding that the FPGA is not always the best choice for acceleration, and the SIMD engine should be selected when the wavelet decomposition reduces the frame size below a certain threshold. This dependency on workload size means that an adaptive system that intelligently selects between the SIMD engine and the FPGA achieves the most energy and performance efficiency point.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.6.3HIGHLY EFFICIENT RECONFIGURABLE PARALLEL GRAPH CUTS FOR EMBEDDED VISION
Speaker:
Antonis Nikitakis, Technical University of Crete, GR
Authors:
Antonis Nikitakis1 and Ioannis Papaefstathiou2
1Technical University of Crete, GR; 2Synelixis Solutions Ltd, GR
Abstract
Graph cuts are very popular methods for combinatorial optimization mainly utilized, while also being the most computational intensive part, in several vision schemes such as image segmentation and stereo correspondence; their advantage is that they are very efficient as they provide guarantees about the optimality of the reported solution. Moreover, when those vision schemes are executed in mobile devices there is a strong need, not only for real-time processing, but also for low power/energy consumption. In this paper, we present a novel architecture for the implementation, in reconfigurable hardware, of one of the most widely used graph cuts algorithms, which is also the fastest sequential one, called BK. Our novelty comes from the fact that we use a 2-level hierarchical decomposition method to parallelize it in a very modular way allowing it to be efficiently implemented in FPGAs with different number of logic cells and/or memory resources. We fast-prototyped the architecture, using a High level synthesis workflow, in a state-of-the-art FPGA device; our implementation outperforms an optimized reference software solution by more than 6x, while consuming 35 times less energy;. To the best of our knowledge this is the first parallel implementation of this very widely used algorithm in reconfigurable hardware.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30IP5-18, 92A NOVEL BACKGROUND SUBTRACTION SCHEME FOR IN-CAMERA ACCELERATION IN THERMAL IMAGERY
Speaker:
Konstantinos Makantasis, Institute of Communication and Computer Systems, GR
Authors:
Antonis Nikitakis1, Ioannis Papaefstathiou2, Konstantinos Makantasis3 and Anastasios Doulamis4
1Technical University of Crete, GR; 2Synelixis Solutions Ltd, GR; 3Institute of Communication and Computer Systems, GR; 4National Technical University of Athens, GR
Abstract
Real-time segmentation of moving regions in image sequences is a very important task in numerous surveillance and monitoring applications. A common approach for such tasks is the "background subtraction" which tries to extract regions of interest from the image background for further processing or action; as a result its accuracy as well as its real-time performance is of great significance. In this work we utilize a novel scheme, designed and optimized for FPGA-based implementations, which models the intensities of each pixel as a mixture of Gaussian components; following a Bayesian approach, our method automatically estimates the number of Gaussian components as well as their parameters. Our novel system is based on an efficient and highly accurate on-line updating mechanism, which permits our system to be automatically adapted to dynamically changing operation conditions, while it avoids over/under fitting. We also present two reference implementations of our Background Subtraction Parallel System (BSPS) in Reconfigurable Hardware achieving both high performance as well as low power consumption; the presented FPGA-based systems significantly outperform a multi-core ARM and two multi-core low power Intel CPUs in terms of energy consumed per processed pixel as well as frames per second. Moreover, our low-cost, low-power devices allow for the implementation, for the first time, of a highly distributed surveillance system which will alleviate the main problems of the existing centralized approaches.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:31IP5-19, 213RADIATION-HARDENED DSP CONFIGURATIONS FOR IMPLEMENTING ARITHMETIC FUNCTIONS ON FPGA
Speaker:
Felipe Serrano, Universidad Complutense de Madrid, ES
Authors:
Marcos Sanchez-Elez, Inmaculada Pardines, Felipe Serrano and Hortensia Mecha, Universidad Complutense de Madrid, ES
Abstract
This paper presents a study of different implementations of arithmetic operations on FPGAs. Radiation vulnerability has been analyzed for each implementation using the fault injection platform NESSY. Results in terms of area, delay and reliability are presented. Taking into account the performed tests we propose to build a library of HDL templates. This library is used during the design process with a synthesis tool that implements digital circuits as reliable as possible. Experimental results show that those implementations using DSP slices are the ones which achieve better results.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:32IP5-20, 486CONFIGURATION PREFETCHING AND REUSE FOR PREEMPTIVE HARDWARE MULTITASKING ON PARTIALLY RECONFIGURABLE FPGAS
Speaker:
Ann Gordon-Ross, University of Florida, US
Authors:
Aurelio Morales-Villanueva, Rohit Kumar and Ann Gordon-Ross, University of Florida, US
Abstract
Partially reconfigurable (PR) FPGAs enable preemptive hardware (HW) multitasking using PR regions (PRRs). To enable this multitasking, the HW task's partial bitstream is downloaded to only the task's PRR, and only that PRR is reconfigured. Since only a small portion of the FPGA fabric is reconfigured, reconfiguration time is significantly reduced as compared to reconfiguring the entire fabric, however this time is not negligible. Reconfiguration time can be reduced/hidden using two techniques: configuration prefetching and configuration reuse. Even though these techniques can effectively reduce/hide reconfiguration overhead, prior works in preemptive HW multitasking did not use these techniques. To the best of our knowledge, no prior work evaluated physical implementations of these techniques on PR FPGAs, which precludes consideration of physical-implementation-specific details, such as delays in accessing bitstreams, speed limitations during reconfiguration, etc. In this work, we present a novel implementation of configuration prefetching and reuse for preemptive HW multitasking on a Virtex-5 FPGA, however, our established fundamentals are device-family independent.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area

11.7 Naked Analog Synthesis

Date: Thursday 17 March 2016
Time: 14:00 - 15:30
Location / Room: Konferenz 5

Chair:
Árpád Bürmen, University of Ljubljana, SI

Co-Chair:
Francisco Fernandez, IMSE-CNM, ES

The first paper introduces exciting Boolean methods into analog layout design. The second paper shows how to boost circuit optimization by data mining. The third paper presents a cocktail of model-based and simulation-based optimization. Two IPs complete the session with parallelization and learning in synthesis.

TimeLabelPresentation Title
Authors
14:0011.7.1(Best Paper Award Candidate)
PARETO FRONT ANALOG LAYOUT PLACEMENT USING SATISFIABILITY MODULO THEORIES
Speaker:
Sherif Saif, Electronics Research Institute, EG
Authors:
Sherif Saif1, Mohamed Dessouky2, M. Watheq El-Kharashi3, Hazem Abbas4 and Salwa Nassar1
1Electronics Research Institute, EG; 2Mentor Graphics Corporation, EG; 3Faculty of Engineering, Ain Shams University, EG; 4Faculty of Media Engineering & Technology, GUC, EG
Abstract
This paper presents an analog layout placement tool with emphasis on Pareto front generation. In order to handle the exploding number of analog physical constraints, a new approach based on the use of a Satisfiability Modulo Theories (SMT) solver is suggested. SMT is an area concerned with checking the satisfiability of logical formulas over one or more theories. SMT is usually well-tuned to solve specific problems. To our knowledge, this is the first effort to use SMT to tackle analog placement. The proposed tool implicitly generates multiple layouts that fulfill the given constraints. Therefore, it gives the user the option to choose from the feasible solutions through specifying an aspect ratio or by selecting the optimum solution from the Pareto front of the generated shape function. In contrast to most of the existing techniques, as the number of physical constraints increases the SMT solver processing time decreases. The proposed system yielded layouts with a competitive area and run time compared to other techniques.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.7.2EFFICIENT MULTIPLE STARTING POINT OPTIMIZATION FOR AUTOMATED ANALOG CIRCUIT OPTIMIZATION VIA RECYCLING SIMULATION DATA
Speaker:
Bo Peng, Fudan University, CN
Authors:
Bo Peng, Fan Yang, Changhao Yan, Xuan Zeng and Dian Zhou, Fudan University, CN
Abstract
Multiple starting point optimization is an efficient approach for automated analog circuit optimization. Starting from a set of starting points, the corresponding local optimums are reached by local optimization method Sequential Quadratic Programming (SQP). The global optimum is then selected from these local optimums. If one starting point is located in a valley, it converges rapidly to the local optimum by the local search. Such a region-hit property makes the multiple starting optimization approach more likely to reach the global optimum. However, the SQP method needs the gradients to drive the optimization. In the traditional method, the gradients are approximated by finite differences. A large number of simulations are needed to obtain the gradients, which becomes the bottleneck of the circuit optimization. We find that for a new point, it is usually surrounded by several neighboring points which have been evaluated in the previous SQP steps. In this paper, we propose an efficient method to calculate the gradient by recycling the previous evaluated points. It is based on the relationship between gradients and the directional derivatives along the directions of the neighbor points. If the neighboring points are not enough for gradient calculation, we will sample adequate neighboring points for gradient calculation. Furthermore, since the performances of the circuits are not sensitive to some design parameters, the gradients are usually sparse. We can thus further employ the idea of sparse recovery to recover the sparse gradients with fewer simulations. Our experimental results demonstrate that with these strategies, the number of simulations can be reduced by up to 63\% without significantly surrendering the accuracy of the optimization results.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.7.3POLYGP: IMPROVING GP-BASED ANALOG OPTIMIZATION THROUGH ACCURATE HIGH-ORDER MONOMIALS AND SEMIDEFINITE RELAXATION
Speaker:
Ye Wang, The University of Texas at Austin, US
Authors:
Ye Wang, Michael Orshansky and Constantine Caramanis, The University of Texas at Austin, US
Abstract
Geometric programming (GP) is popular for use in equation-based optimization of analog circuits thanks to GP-compatible analog performance functions, and its convexity, hence computational tractability. The main challenge in using GP, and thus a roadblock to wider use and adoption, is the mismatch between what GP can accurately fit, and the behavior of many common device/circuit functions. In this paper, we leverage recent tools from sums-of-squares, moment optimization, and semidefinite optimization (SDP), in order to present a novel and powerful extension to address the monomial inaccuracy: fitting device models as higher-order monomials, defined as the exponential functions of polynomials in the logarithmic variables. By the introduction of high-order monomials, the original GP problems become polynomial geometric programming (PolyGP) problems with non-linear and non-convex objective and constraints. Our PolyGP framework allows significant improvements in model accuracy when symbolic performance functions in terms of device models are present. Via SDP-relaxations inspired by polynomial optimization (POP), we can obtain efficient near-optimal global solutions to the resulting PolyGP. Experimental results through established circuits show that compared to GP, we are able to reduce fitting error of device models to 3.5% from 10.5% on average. Hence, the fitting error of performance functions decrease from 12% of GP and 9% of POP, to 3% accordingly. This translates to the ability of identifying superior solution points and the dramatic decrease of constraint violation in contrast to both GP and POP.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30IP5-21, 923ANALOG CIRCUIT TOPOLOGICAL FEATURE EXTRACTION WITH UNSUPERVISED LEARNING OF NEW SUB-STRUCTURES
Speaker:
Alex Doboli, Stony Brook University, US
Authors:
Hao Li, Fanshu Jiao and Alex Doboli, Stony Brook University, US
Abstract
This paper presents novel techniques to automatically extract the topological (structural) features in analog circuits. The extracted features include basic building blocks, structural templates and hierarchical structures. Finding structural features is important for tasks like circuit synthesis and sizing, design verification, design reuse, and design knowledge description, summarization and management. The paper presents algorithms for supervised feature extraction and unsupervised learning of new block connections. Experiments discuss feature extraction for a set of 34 state-of-the-art analog circuits.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:31IP5-22, 860DESIGN AUTOMATION TASKS SCHEDULING FOR ENHANCED PARALLEL EXECUTION OF A STATE-OF-THE-ART LAYOUT-AWARE SIZING APPROACH
Speaker:
Nuno Horta, Instituto de Telecomunicações/Instituto Superior Técnico, PT
Authors:
David Neves, Ricardo Martins, Nuno Lourenço and Nuno Horta, Instituto de Telecomunicações/Instituto Superior Técnico, PT
Abstract
This paper presents an innovative methodology to efficiently schedule design automation tasks during the execution of an analog IC layout-aware sizing process. The referred synthesis process includes several sub-tasks such as DC simulation, floorplanning, placement, global routing, parasitic extraction, and circuit simulations in multiple worst case corners. The schedule of the design tasks is here optimized taking into account standard multi-core architectures, tasks dependencies, accurate time estimations for each task and a limited number of licenses for using commercial tools, e.g., number of simulator licenses. The proposed methodology, first, considers a directed acyclic graph for representing the design flow and task dependencies, then, an evolutionary kernel is used to implement a single-objective multi-constraint optimization. The efficiency and impact of the proposed approach is validated by using a state-of-the-art Analog IC design automation environment.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area

11.8 Launch of the Worldwide MEMS Design Contest

Date: Thursday 17 March 2016
Time: 14:00 - 17:30
Location / Room: Exhibition Theatre

Organiser:
Anton Klotz, Cadence Design Systems, DE

The presentations at DATE Exhibition Theatre sessions 11.8 and 12.8 on 17th March from 14:00-17:30 are dedicated to the worldwide MEMS Design Contest, which is organized by Cadence Design Systems, Coventor, X-FAB and Reutlingen University. The aim of the contest is to motivate design teams to start designing chips with MEMS and mixed-signal blocks using X-FAB PDK and tools from Coventor and Cadence, to spread the knowledge about the co-design of MEMS and mixed-signal and to get new ideas how MEMS can be used and what kind of MEMS can be designed using the X-FAB PDK. The winning team gets the possibility to manufacture a demonstrator in silicon in order to prove the functionality of the flow.

The session will start with a motivational talk about MEMS research in academia provided by Prof. Ibrahim Elfadel from MASDAR Institute. Then Anton Klotz, University Program Manager EMEA from Cadence Design Systems, explains the rules of the contest. Jörg Doblaski, Director Design Support from X-FAB, will present the mixed-signal and MEMS PDKs, which will be used in the contest. After that, Christopher Welham, Director Applications Engineering from Coventor, will explain front-end modeling of MEMS using Coventor tools. Finally, Ahmed Osman, Principal Application Engineerfrom Cadence Design Systems, will explain the design flow, which was developed based on the research work done in the BMBF-funded MEMS2015 project and will be used for the creation of mixed-signal logic and MEMS design at the contest.

TimeLabelPresentation Title
Authors
14:0011.8.1ACADEMIC MEMS GOES FABLESS: THE MASDAR INSTITUTE PERSPECTIVE
Speaker:
Ibrahim Elfadel, MASDAR Institute of Science and Technology, AE
14:2011.8.2MEMS DESIGN CONTEST RULES
Speaker:
Anton Klotz, Cadence Design Systems, DE
15:0011.8.3PDK-BASED DESIGN AUTOMATION ENABLEMENT FOR MEMS- AND CMOS PROCESSES
Speaker:
Joerg Doblaski, X-FAB, DE
16:0011.8.4COVENTOR TOOLS FOR MODELLING AND SIMULATION OF MEMS
Speaker:
Christopher Welham, Coventor, FR
16:4511.8.5A MEMS-ASIC CO-DESIGN FLOW: AN EDA PERSPECTIVE
Speaker:
Ahmed Hussein Osman, Cadence Design Systems, DE
17:30End of session

UB11 Session 11

Date: Thursday 17 March 2016
Time: 14:30 - 16:30
Location / Room: Booth 15, Exhibition Area

LabelPresentation Title
Authors
UB11.1MICROTESK ARMV8 EDITION: SPECIFICATION-BASED TEST PROGRAM GENERATOR
Presenter:
Andrei Tatarnikov, Russian Academy of Sciences (RAS), RU
Authors:
Andrei Tatarnikov, Alexander Kamkin and Artem Kotsynyak, Russian Academy of Sciences (RAS), RU
Abstract
This work presents a test program generation tool for ARMv8 microprocessors. The tool consists of two parts: an architecture-independent test program generation core and ARMv8 specifications. The specifications provide information on the instruction set architecture and the memory management unit of an ARMv8 microprocessor. Test programs are generated on the basis of test templates provided by users and testing knowledge extracted from the specifications. Test templates describe scenarios to be covered in terms of test situations, while testing knowledge specifies constraints that should be satisfied in order for these situations to occur. The architecture-independent test program generation core implements a wide range of test generation techniques including random generation, combinatorial generation, constraint solving and symbolic execution. Flexible architecture of the tool allows integrating different generation methods and extending the test generation core with new engines.

Download Paper (PDF)
UB11.2AIPHS: ADAPTIVE PROFILING HARDWARE SUB-SYSTEM
Presenter:
Luigi Pomante, Università degli Studi dell'Aquila, IT
Authors:
Luigi Pomante1, Giacomo Valente2 and Vittoriano Muttillo2
1Università degli Studi dell'Aquila, IT; 2Università Degli Studi Dell'Aquila, IT
Abstract
Run-time monitoring systems on reconfigurable logic have the advantage that they can be customized with respect to specific applications: in the context of automated testing, this can lead to powerful scenarios. This demo presents a smart monitoring system by showing both a customization for stalls identification in a message passing scenario (based on four MicroBlaze that executes a bare-metal FFT application), and a customization for bus utilization monitoring in a symmetric multi-processing system scenario (based on four Leon3 running a custom Linux kernel). The whole development flow (and related prototypal EDA tools), that starts exploiting a library of elements to compose the desired hardware profiler, that leads to the introduction of such a profiler in the target architecture, and that allows profiling data collection and analysis will be shown. Moreover, a comparison among different functionalities will be illustrated. Both systems will be illustrated by using Zynq7000 SoC.

Download Paper (PDF)
UB11.3COMPSOC: VIRTUALISING CONTROL APPLICATIONS ON A DISTRIBUTED COMPSOC PLATFORM
Presenter:
Kees Goossens, Eindhoven University of Technology, NL
Author:
Kees Goossens, Eindhoven University of Technology, NL
Abstract
In our University Booth we will demonstrate that multiple real-time control applications can be developed independently even though they share platform resources. We show that they can run together with other applications on a wireless network of multiple CompSOC platforms, where each platform has multiple processors, NOC, and a complete microkernel, streaming software, and resource management stack. We will also show that (control) applications can be quickly and safely loaded and started without interference to other (real-time control) applications, thus implementing a network of MPSOCs for distributed mixed time-criticality applications.

Download Paper (PDF)
UB11.4CLASH: DIGITAL CIRCUITS IN CλASH
Presenter:
Christiaan Baaij, University of Twente, NL
Authors:
Christiaan Baaij and Jan Kuper, University of Twente, NL
Abstract
CλaSH is a novel compiler system for generating digital circuits as described by a mathematical/functional specification of the architecture. We will demonstrate several applications written in CλaSH: * Tunneling ball device: With a minimal amount of acceleration, a fast spinning metal disk is either sped up or slowed down so that a falling ball can fall through one of the metal disk's two holes. * Music synthesizer and spectrum analyser: An audio CODEC samples music being played from either an MP3 player or a computer. We can apply several digital filters which affect the music. The effects of these filters can be both seen on a monitor, and heard through speakers connected to the FPGA board. * Multi-processor system: The system is used in a compiler construction course, where the compiler is written in the Haskell. Because CλaSH is proper subset of Haskell, students can build and experiment with the compiler and the multi-processor system in the same environment.

Download Paper (PDF)
UB11.5CONTREP: A SINGLE-SOURCE FRAMEWORK FOR UML-BASED MODELLING AND DESIGN OF MIXED-CRITICALITY SYSTEMS
Presenter:
Fernando Herrera, University of Cantabria, ES
Authors:
Fernando Herrera and Eugenio Villar, University of Cantabria, ES
Abstract
Mixed-criticality systems integrate applications, platform resources and requirements with different criticality. A criticality reflects the impact of either a failure of a component or a violation of a requirement, which can range from irrelevant to catastrophic effects. This booth presents the CONTREP framework, which supports UML/MARTE based modeling, analysis and design of mixed-criticality embedded systems. The booth shows a model of a quadcopter control system which integrates safety critical (e.g. flight control), mission-critical (e.g., a video processing payload), and non-critical (e.g., monitoring) functions. The booth shows how mixed-criticality is captured, together with the description of the functional architecture, and of the multi-core embedded platform where the system is implemented; how CONTREP automates different design activities, i.e. model validation, performance assessment and design space exploration, exploiting mixed-criticality information in every case.

Download Paper (PDF)
UB11.6RETRASCOPE: TOOLKIT FOR ANALYSIS AND VERIFICATION OF HDL DESIGNS
Presenter:
Sergey Smolov, Russian Academy of Sciences (RAS), RU
Authors:
Sergey Smolov, Alexander Kamkin and Mikhail Lebedev, Russian Academy of Sciences (RAS), RU
Abstract
Retrascope is an open-source toolkit for Reverse Engineering and TRAnsformation of digital hardware designs described in such hardware description languages as Verilog and VHDL. The toolkit allows analyzing HDL descriptions, reconstructing the underlying models (guarded actions, extended finite state machines, high-level decision diagrams etc.) and using the derived models for test generation, property checking and other tasks. Retrascope is organized as an extendible framework with the ability to add new types of models as well as tools for their analysis and transformation. The primary application domain of the toolkit is functional verification of hardware at the unit level.

Download Paper (PDF)
UB11.7ELECTRO-, STRESS- AND THERMOMIGRATION: THREE FORCES, ONE PROBLEM
Presenter:
Steve Bigalke, Technische Universität Dresden, DE
Authors:
Steve Bigalke and Jens Lienig, Technische Universität Dresden, DE
Abstract
It is well-known that the downscaling of microelectronic structures ("Moore's Law") reduces the reliability due to an increase in potential material migration. Electro-, stress- and thermomigration have been identified as the main causes of materiel dislocation in integrated circuits (ICs). They are driven by current densities, stress and temperature gradients, respectively, but they also depend on common parameters like material constants. While each of these three driving forces causes migration, they can compensate or amplify each other, resulting in various overall material dislocations. These interactions are poorly understood which complicates the prevention of migration processes in ICs. Our software demonstrator presents a basic approach to identify the predominate migration within various circuit conditions including the interaction of all three forces. Our approach can also be adjusted to three-dimensional circuits (3D ICs) and alternating conditions.

Download Paper (PDF)
UB11.8CHIMPANC: CHANGE MANAGEMENT USING CHIMPANC
Presenter:
Jannis Stoppe, DFKI and University of Bremen, DE
Authors:
Jannis Stoppe, Martin Ring and Rolf Drechsler, DFKI and University of Bremen, DE
Abstract
One approach to remedy the issue of increasing complexity in the hardware design process is to provide designers with more abstract languages that allow systems to be designed top-down, starting with an abstract model of the system and its requirements. Several of these languages such as SysML and SystemC are being used today. We propose the Change Impact Analysis and Control Tool (ChImpAnC) to handle these challenges. ChImpAnC extracts the relevant information from the models on the different levels and constructs mappings between them, thus allowing to check consistency and refinements, and moreover calculating the impact of changes. Thus, ChImpAnC ensures that e.g. a written specification or documentation is not made obsolete by changes in the implementation without being warned about it.

Download Paper (PDF)
UB11.9IDDD: AN INTERACTIVE DEPENDABILITY DRIVEN DESIGN SPACE EXPLORATION
Presenter:
Stefan Scharoba, Brandenburg University of Technology Cottbus-Senftenberg, DE
Authors:
Stefan Scharoba, Jacob Lorenz and Heinrich T. Vierhaus, Brandenburg University of Technology Cottbus-Senftenberg, DE
Abstract
Due to the downscaling of transistor feature sizes, today's integrated circuits are much more likely to be affected by transient or permanent faults. In order to still meet certain dependability requirements, many different fault tolerance techniques have been developed, which can handle these faults in the field. Each of these techniques is associated with distinct costs and benefits. As a consequence, finding the fault tolerant implementation of the system that meets the actual requirements best represents a challenging task. We propose a tool that supports this process. It offers a set of hardware based fault tolerance techniques that can be applied to a given VHDL model. Afterwards, costs and benefits of the respective design choice are estimated automatically. Thus several fault tolerant versions of the design can be evaluated and compared with each other without implementing them manually. Finally, the VHDL code of the preferred design candidate can be generated by the tool.

Download Paper (PDF)
UB11.10RESECU_4_AMBRAMS: TOWARDS INCREASED RELIABILITY AND HARDWARE SECURITY USING AMBRAMS
Presenter:
Petr Pfeifer, TU Liberec, CZ
Author:
Petr Pfeifer, TU Liberec, CZ
Abstract
AmBRAMs-The new method and developed advanced Analysis Tool and Framework for Advanced Measurements and Reliability Assessments on Modern Nanoscale FPGAs creates revolutionary new set of tools enables complex lab-on-chip solutions in nanoscale FPGAs.AmBRAMs has been enhanced of advanced measurements and data processing supporting platform identification and security support functionality including tampering detection preferably in modern nanoscale programmable devices.It will be presented on VLIW soft processor cores equipped with a security IP,and showing also POF solutions and related functionality.Detection of power voltage variation using AmBRAMs technology is incorporated in the processor application and demonstrating it on a complex processor system.Presented on 28nm LP or 20nm HP UltraScale Xilinx FPGA devices.The 28nm FPGA solution will also show simple HW adjustments enabling support of power supply change required the demonstrator and for adaptive control presented as well.

Download Paper (PDF)
16:30End of session

IP5 Interactive Presentations

Date: Thursday 17 March 2016
Time: 15:30 - 16:00
Location / Room: Conference Level, Foyer

Interactive Presentations run simultaneously during a 30-minute slot. A poster associated to the IP paper is on display throughout the afternoon. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session, prior to the actual Interactive Presentation. Moreover, one "Best Interactive Presentation Award" will be given.

LabelPresentation Title
Authors
IP5-1RELIABILITY AND PERFORMANCE TRADE-OFFS FOR 3D NOC-ENABLED MULTICORE CHIPS
Speaker:
Partha Pande, Washington State University, US
Authors:
Sourav Das1, Janardhan Rao Doppa1, Partha Pande1 and Krishnendu Chakrabarty2
1Washington State University, US; 2Duke University, US
Abstract
Three-dimensional (3D) integration, a breakthrough technology to achieve "More Moore and More Than Moore," provides the benefits of better performance, lower power consumption, and increased bandwidth through the use of vertical interconnects and 3D stacking. The vertical interconnects enable the design of a high-bandwidth and energy-efficient small-world (SW) network-based 3D network-on-Chip (3D SWNoC) for massive multicore platforms. However, the anticipated performance gain of a 3D SWNoC-enabled multicore chip may be compromised due to the potential failures of through-silicon- vias (TSVs) that are predominantly used as vertical interconnects. In particular, due to the non-homogeneous traffic patterns, heavily used TSVs may wear-out quickly and can also contribute to the wear-out of neighboring TSVs. As a result, the mean-time-to-failure (MTTF) of those TSVs will decrease, which will adversely affect the overall lifetime of the chip. In this paper, we address this traffic-dependent TSV wear-out problem in 3D SWNoC. We demonstrate that by employing an adaptive routing mechanism, we can improve the MTTF of 3D SWNoC significantly while still providing 21% lower energy-delay-product (EDP) compared to a conventional 3D MESH.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-2MEMORY-ACCESS AWARE DVFS FOR NETWORK-ON-CHIP IN CMPS
Speaker:
Yuan Yao, KTH Royal Institute of Technology, SE
Authors:
Yuan Yao and Zhonghai Lu, KTH Royal Institute of Technology, SE
Abstract
We present a new DVFS technique for network-on-chip (NoC) that adjusts the voltage/frequency scales of routers according to memory-access characteristics of application running on the CMP. The memory characteristics are periodically profiled, reflecting both resource-access density in the network and memory-access criticality for application performance. The network conducts per-router voltage/frequency tuning using the memory-access density information while it performs priority-based switch allocation to speed up critical packets and avoid starvation using the memory-criticality information. Compared to a latest per-router DVFS approach, benchmark experiments demonstrate that our memory-access characteristics aware DVFS technique achieves not only better power saving, energy-delay product, but also enhanced network and application performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-3A DYNAMICALLY RECONFIGURABLE ECC DECODER ARCHITECTURE
Speaker:
Philippe Coussy, Universite Bretagne Sud / Lab-STICC, FR
Authors:
Awais Sani1, Philippe Coussy2 and Cyrille Chavet3
1Universite de Bretagne-Sud, FR; 2Universite de Bretagne-Sud / Lab-STICC, FR; 3Lab-STICC / Université de Bretagne Sud, FR
Abstract
Due to their impressive error correction performances, Error Correcting Codes (ECC) are now widely used in communication systems. In order to achieve high throughput requirements ECC decoders are based on parallel architectures, which results in a major issue: memory access conflicts. In this paper, we introduce a new class of ECC decoder architectures that dynamically reconfigures by executing on-chip a memory mapping approach. For that purpose, a dedicated algorithm taking into account network constraint is presented. A smart architecture based on a butterfly network and a reconfiguration unit is also proposed. Experimental results show that real-time reconfiguration at reasonable hardware cost is possible.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-4RESISTIVE BLOOM FILTERS: FROM APPROXIMATE MEMBERSHIP TO APPROXIMATE COMPUTING WITH BOUNDED ERRORS
Speaker:
Abbas Rahimi, University of California, Berkeley, US
Authors:
Vahideh Akhlaghi1, Abbas Rahimi2 and Rajesh K. Gupta1
1University of California, San Diego, US; 2University of California, Berkeley, US
Abstract
Approximate computing provides an opportunity for exploiting application characteristics to trade the accuracy for gains in energy efficiency. However, such opportunity must be able to bound the error that the system designer provides to the application developer. Space-efficient probabilistic data structure such as Bloom filter can provide one such means. Bloom filter supports approximate set membership queries with a tunable rate of false positives (i.e., errors) and no false negatives. We propose a resistive Bloom filter (ReBF) to approximate a function by tightly integrating it to a functional unit (FU) implementing the function. ReBF approximately mimics partial functionality of the FU by recalling its frequent input patterns for computational reuse. The accuracy of the target FU is guaranteed by bounding the ReBF error behavior at the design time. We further lower energy consumption of a FU by designing its ReBF using low-power memristor arrays. The experimental results show that function approximation using ReBF for five image processing kernels running on the AMD Southern Islands GPU yields on average 24.1% energy saving in 45 nm technology compared to the exact computation.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-5REAL-TIME SYSTEM-LEVEL IMPLEMENTATION OF A TELEPRESENCE ROBOT USING AN EMBEDDED GPU PLATFORM
Speaker:
Swathi Gurumani, Advanced Digital Sciences Center, SG
Authors:
Muhammad Teguh Satria1, Swathi Gurumani1, Wang Zheng2, Keng Peng Tee2, Augustine Koh1, Pan Yu2, Kyle Rupnow1 and Deming Chen3
1Advanced Digital Sciences Center, SG; 2Institute for Infocomm Research, SG; 3UIUC, US
Abstract
Real-time applications such as telepresence systems present an opportunity to use embedded GPUs for compute acceleration to meet platform goals. In this paper, we develop a prototype of a portable, standalone telepresence robot that performs real-time attention-directed control using an NVIDIA Jetson TK1 embedded platform. We perform platform-specific optimizations to improve thread occupancy, optimize computa- tion workload and improve accuracy of face detection on the embedded GPU and achieve real-time performance of 30 frames per second on the Jetson TK1 and an overall speedup of 10x compared to the ARM CPU version.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-6EXPLORING SPECIALIZED NEAR-MEMORY PROCESSING FOR DATA INTENSIVE OPERATIONS
Speaker:
Salessawi Ferede Yitbarek, University of Michigan, US
Authors:
Salessawi Ferede Yitbarek1, Tao Yang2, Reetuparna Das1 and Todd Austin1
1University of Michigan, US; 2University of California, San Diego, US
Abstract
Emerging 3D stacked memory systems provide significantly more bandwidth than current DDR modules. However, general purpose processors do not take full advantage of these resources offered by the memory modules. Taking advantage of the increased bandwidth requires the use of specialized processing units. In this paper, we evaluate the benefits of placing hardware accelerators at the bottom layer of a 3D stacked memory system compared to accelerators that are placed external to the memory stack. Our evaluation of the design using cycle-accurate simulation and RTL synthesis shows that, for important data intensive kernels, near-memory accelerators inside a single 3D memory package provide 3x-13x speedup over a Quad-core Xeon processor. Most of the benefits are from the application of accelerators, as the near-memory configurations provide marginal benefits compared to the same number of accelerators placed on a die external to the memory package. This comparable performance for external accelerators is due to the high bandwidth afforded by the high-speed off-chip links. On the other hand, near-memory accelerators consume 7%-39% less energy than the external accelerators.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-7MATLAB TO C COMPILATION TARGETING APPLICATION SPECIFIC INSTRUCTION SET PROCESSORS
Speaker:
Francky Catthoor, Interuniversity Microelectronics Centre (IMEC), BE
Authors:
Ioannis Latifis1, Karthick Parashar2, Grigoris Dimitroulakos1, Hans Cappelle2, Christakis Lezos1, Konstantinos Masselos1 and Francky Catthoor2
1University of Peloponnese, GR; 2Interuniversity Microelectronics Centre (IMEC), BE
Abstract
This paper discusses a MATLAB to C compiler exploiting custom instructions such as instructions for SIMD processing and instructions for complex arithmetic present in Application Specific Instruction Set Processors (ASIPs). The compiler generates ANSI C code in which the processor's special instructions are represented via specialized intrinsic functions. By doing this the generated code can be used as input to any C/C++ compiler. Thus the proposed compiler allows the description of the specialized instruction set of the target processor in a parameterized way allowing the support of any processor. The proposed compiler has been used for the generation of application code for an ASIP targeting DSP applications. The code generated by the proposed compiler achieves a speed up between 2x-30x on the targeted ASIP for six DSP benchmarks compared to the code generated by Mathworks MATLAB to C compiler. Thus the proposed compiler can be employed to reduce the development time/effort/cost and time to market by raising the abstraction of application design in an embedded systems / system-on-chip development context while still improving implementation efficiency.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-8SAMPLING-BASED BUFFER INSERTION FOR POST-SILICON YIELD IMPROVEMENT UNDER PROCESS VARIABILITY
Speaker:
Grace Li Zhang, Technische Universität München (TUM), DE
Authors:
Grace Li Zhang, Bing Li and Ulf Schlichtmann, Technische Universität München (TUM), DE
Abstract
At submicron manufacturing technology nodes process variations affect circuit performance significantly. This trend leads to a large timing margin and thus overdesign to maintain yield. To combat this pessimism, post-silicon clock tuning buffers can be inserted into circuits to balance timing budgets of critical paths with their neighbors. After manufacturing, these clock buffers can be configured for each chip individually so that chips with timing failures may be rescued to improve yield. In this paper, we propose a sampling-based method to determine the proper locations of these buffers. The goal of this buffer insertion is to reduce the number of buffers and their ranges, while still maintaining a good yield improvement. Experimental results demonstrate that our algorithm can achieve a significant yield improvement (up to 35%) with only a small number of buffers.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-9PRADA: COMBATING VOLTAGE NOISE IN THE NOC POWER SUPPLY THROUGH FLOW-CONTROL AND ROUTING ALGORITHMS
Speaker:
Prabal Basu, Utah State University, US
Authors:
Prabal Basu, Rajesh JayashankaraShridevi, Koushik Chakraborty and Sanghamitra Roy, Utah State University, US
Abstract
Network-on-Chip (NoC) has become the de-facto standard for on-chip communication in MPSoCs. The growing NoC power footprint, increase in the transistor current, and high switching speed of the logic devices, exacerbate the peak power supply noise (PSN) in the NoC power delivery network (PDN). Hence, preserving power supply integrity in the NoC PDN is critical. In this work, we propose PRADA (PSN-aware Runtime Adaptation)—a collection of a novel flow-control protocol (PAF) and an adaptive routing algorithm (PAR), to mitigate PSN in NoCs. Our best scheme achieves 14% and 12% improvements in the regional peak PSN and energy ef- ficiency, with an average of 4.6% performance overhead and marginal area and power footprints.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-10A POWER-EFFICIENT 3-D ON-CHIP INTERCONNECT FOR MULTI-CORE ACCELERATORS WITH STACKED L2 CACHE
Speaker:
Kyungsu Kang, Samsung, KR
Authors:
Kyungsu Kang1, Luca Benini2, Giovanni De Micheli3, Sangho Park1 and Jong-Bae Lee1
1Samsung, KR; 2Università di Bologna, IT; 3École Polytechnique Fédérale de Lausanne (EPFL), CH
Abstract
The use of multi-core clusters is a promising option for data-intensive embedded applications such as multimodal sensor fusion, image understanding, mobile augmented reality. In this paper, we propose a power-efficient 3-D onchip interconnect for multi-core clusters with stacked L2 cache memory. A new switch design makes a circuit-switched Mesh-of-Tree (MoT) interconnect reconfigurable to support power-gating of processing cores, memory blocks, and unnecessary interconnect resources (routing switch, arbitration switch, inverters placed along the on-chip wires). The proposed 3-D MoT improves the power efficiency up to 77% in terms of energy-delay product (EDP).

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-11POWER-EFFICIENT LOAD-BALANCING ON HETEROGENEOUS COMPUTING PLATFORMS
Speaker:
Muhammad Shafique, Karlsruhe Institute of Technology (KIT), DE
Authors:
Muhammad Usman Karim Khan1, Muhammad Shafique1, Apratim Gupta2, Thomas Schumann2 and Jörg Henkel1
1Karlsruhe Institute of Technology (KIT), DE; 2University of Applied Sciences, Darmstadt, DE
Abstract
In order to address the throughput constraints of the system at minimal power consumption, the workload of computing nodes should be balanced. This requires accounting for the underlying hardware characteristics (e.g., power vs. frequency profiles) and throughput sustainable by these nodes. This work provides a workload distribution and balancing methodology of a divisible load under a throughput constraint, on heterogeneous nodes. The power efficiency of each node is considered during load distribution. For load balancing, the frequency of the node is determined which just fulfills the job requirements of the nodes. We functionally verify our methodology by implementing it on an FPGA-based system, with heterogeneous multi-cores and hardware accelerators, and report results for different image processing benchmarks. Compared to a state-of-the-art-approach, our approach results in up to 64% performance improvement for the benchmarks evaluated in this paper.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-12TOPAZ: MINING HIGH-LEVEL SAFETY PROPERTIES FROM LOGIC SIMULATION TRACES
Speaker:
Fadi Kurdahi, University of California, Irvine, US
Authors:
Ahmed Nassar1, Fadi Kurdahi1 and Salam Zantout2
1University of California, Irvine, US; 2American University of Beirut, LB
Abstract
Formal specifications are hard to formulate and maintain for evolving complex digital hardware designs. Specification mining offers a (partially) automated route to discovering specifications from large simulation traces. In this paper, we embark on a novel and rigorous mining methodology (data preparation, mining algorithms, selection criteria, etc.) for finite-state automata checkers using an iterative and interactive mining tool, called Topaz. Topaz is evaluated using an open-source 32-bit RISC CPU design as a case study to demonstrate extraction of complex temporal properties cross-cutting through all CPU pipeline stages, guided by the CPU instruction set specification.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-13EXPLOITING TRANSACTION LEVEL MODELS FOR OBSERVABILITY-AWARE POST-SILICON TEST GENERATION
Speaker:
Prabhat Mishra, University of Florida, US
Authors:
Farimah Farahmandi1, Prabhat Mishra1 and Sandip Ray2
1University of Florida, US; 2Intel Corporation, US
Abstract
A critical problem in post-silicon debug is to generate efficient tests that both activate requisite coverage goals on the target hardware as well as produce results that are observable through a given on-chip design-for-debug architecture. Unfortunately, such tests cannot be generated directly from RTL models, both due to design complexity and due to bugs in the design itself. In this paper, we propose an approach to address this problem by exploiting transaction-level models (TLM). Our approach involves mapping test and observability requirements between TLM and RTL, enabling TLM analysis to generate post-silicon tests. We provide case studies from a number of different design classes to demonstrate the flexibility and effectiveness of the approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-14SEERAD: A HIGH SPEED YET ENERGY-EFFICIENT ROUNDING-BASED APPROXIMATE DIVIDER
Speaker:
Ali Afzali-Kusha, University of Tehran, IR
Authors:
Reza Zendegani1, Mehdi Kamal1, Arash Fayyazi1, Ali Afzali-Kusha1, Saeed Safari1 and Massoud Pedram2
1University of Tehran, IR; 2University of Southern California, US
Abstract
In this paper, a high speed yet energy-efficient approximate divider for error resilient applications is proposed. For the division operation, the divisor is rounded to a value with a specific form resulting in the transformation of the division operation to the multiplication one. The proposed approximate divider enjoys the flexibility of increasing the accuracy at the price of higher delay and hardware usage. The efficacy of the proposed approximate divider is evaluated in comparison to three different implementations of the SRT divider. The results show that the delay and energy consumption of the proposed approximate divider are, on average, 14 and 300 times smaller than those of the Radix-2 SRT with the carry-save reminder computation. Additionally, the effectiveness of the proposed approximate divider is studied in an image division operation performed in image processing applications. The results suggest the appropriateness of the proposed approximate divider for digital signal processing applications.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-15IMPROVING PERFORMANCE GUARANTEES IN WORMHOLE MESH NOC DESIGNS
Speaker:
Milos Panic, Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES
Authors:
Milos Panic1, Carles Hernandez2, Jaume Abella2, Antoni Roca Perez3, Eduardo Quinones2 and Francisco Cazorla4
1Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES; 2Barcelona Supercomputing Center, ES; 3Universitat Politècnica de Catalunya, ES; 4Barcelona Supercomputing Center and IIIA-CSIC, ES
Abstract
Wormhole-based mesh Networks-on-Chip (wNoC) are deployed in high-performance many-core processors due to their physical scalability and low-cost. Delivering tight and time composable Worst-Case Execution Time (WCET) estimates for applications as needed in safety-critical real-time embedded systems is challenged by wNoCs due to their distributed nature. We propose a bandwidth control mechanism for wNoCs that enables the computation of tight time-composable WCET estimates with low average performance degradation and high scalability. Our evaluation with the EEMBC automotive suite and an industrial real-time parallel avionics application confirms so.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-16A DATA LAYOUT TRANSFORMATION (DLT) ACCELERATOR: ARCHITECTURAL SUPPORT FOR DATA MOVEMENT OPTIMIZATION IN ACCELERATED-CENTRIC HETEROGENEOUS SYSTEMS
Speaker:
Tung Hoang, University of Chicago, US
Authors:
Tung Hoang, Amirali Shambayati and Andrew A. Chien, University of Chicago, US
Abstract
Technology scaling and growing use of accelerators make optimization of data movement of increasing importance in all computing systems. Further, growing diversity in memory structures makes embedding such optimization in software non-portable. We propose a novel architectural solution called Data Layout Transformation (DLT) associated with a simple set of instructions that enable software to describe the required data movement compactly, and free the implementation to optimize the movement based on the knowledge of the memory hierarchy and system structure. The DLT architecture ideas can be applicable to both general-purpose and accelerator-based heterogeneous systems. Experiment results first show that the proposed DLT architecture can make use of the full bandwidth (>97%) of a wide range of memory systems (DDR3 and HMC) while its implementation cost, in 32nm, is low (only 0.246 mm2 and 75mW at 1GHz). Our evaluation of using the DLT accelerator in accelerated-based heterogeneous system across DDR3 and HMC memory shows that the DLT can enhance system performance in range of 4.6x-99x (DDR3), 4.4x-115x (HMC) which turns out 2.8x-48x (DDR3), 1.4x-39x (HMC) improvement for energy efficiency.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-17OUESSANT: FLEXIBLE INTEGRATION OF DEDICATED COPROCESSORS IN SYSTEMS ON CHIP
Speaker:
Pierre-Henri Horrein, Lab-STICC/Télécom Bretagne, FR
Authors:
Pierre-Henri Horrein, Philip-Dylan Gleonec, Erwan Libessart, André Lalevée and Matthieu Arzel, Lab-STICC/Télécom Bretagne, FR
Abstract
Integration of hardware accelerators in System on Chips is often complex. When dealing with reconfigurable hardware, this greatly limits the attainable flexibility. In this paper, we propose an alternative approach to the Molen paradigm [1]. This approach, named Ouessant, is based on a very simple general purpose instruction set designed for close interaction with dedicated hardware accelerators. This instruction set is used to program a dedicated controler, which commands the accelerator's execution and data transfer with minimal CPU intervention. The resulting architecture is flexible, extensible, and can be easily integrated in System on Chips. Adding new accelerators is also made easier. Implementation of the architecture on different FPGA resources show very low footprint and a very small impact on attainable performance. Ouessant is freely available under an open-source license.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-18A NOVEL BACKGROUND SUBTRACTION SCHEME FOR IN-CAMERA ACCELERATION IN THERMAL IMAGERY
Speaker:
Konstantinos Makantasis, Institute of Communication and Computer Systems, GR
Authors:
Antonis Nikitakis1, Ioannis Papaefstathiou2, Konstantinos Makantasis3 and Anastasios Doulamis4
1Technical University of Crete, GR; 2Synelixis Solutions Ltd, GR; 3Institute of Communication and Computer Systems, GR; 4National Technical University of Athens, GR
Abstract
Real-time segmentation of moving regions in image sequences is a very important task in numerous surveillance and monitoring applications. A common approach for such tasks is the "background subtraction" which tries to extract regions of interest from the image background for further processing or action; as a result its accuracy as well as its real-time performance is of great significance. In this work we utilize a novel scheme, designed and optimized for FPGA-based implementations, which models the intensities of each pixel as a mixture of Gaussian components; following a Bayesian approach, our method automatically estimates the number of Gaussian components as well as their parameters. Our novel system is based on an efficient and highly accurate on-line updating mechanism, which permits our system to be automatically adapted to dynamically changing operation conditions, while it avoids over/under fitting. We also present two reference implementations of our Background Subtraction Parallel System (BSPS) in Reconfigurable Hardware achieving both high performance as well as low power consumption; the presented FPGA-based systems significantly outperform a multi-core ARM and two multi-core low power Intel CPUs in terms of energy consumed per processed pixel as well as frames per second. Moreover, our low-cost, low-power devices allow for the implementation, for the first time, of a highly distributed surveillance system which will alleviate the main problems of the existing centralized approaches.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-19RADIATION-HARDENED DSP CONFIGURATIONS FOR IMPLEMENTING ARITHMETIC FUNCTIONS ON FPGA
Speaker:
Felipe Serrano, Universidad Complutense de Madrid, ES
Authors:
Marcos Sanchez-Elez, Inmaculada Pardines, Felipe Serrano and Hortensia Mecha, Universidad Complutense de Madrid, ES
Abstract
This paper presents a study of different implementations of arithmetic operations on FPGAs. Radiation vulnerability has been analyzed for each implementation using the fault injection platform NESSY. Results in terms of area, delay and reliability are presented. Taking into account the performed tests we propose to build a library of HDL templates. This library is used during the design process with a synthesis tool that implements digital circuits as reliable as possible. Experimental results show that those implementations using DSP slices are the ones which achieve better results.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-20CONFIGURATION PREFETCHING AND REUSE FOR PREEMPTIVE HARDWARE MULTITASKING ON PARTIALLY RECONFIGURABLE FPGAS
Speaker:
Ann Gordon-Ross, University of Florida, US
Authors:
Aurelio Morales-Villanueva, Rohit Kumar and Ann Gordon-Ross, University of Florida, US
Abstract
Partially reconfigurable (PR) FPGAs enable preemptive hardware (HW) multitasking using PR regions (PRRs). To enable this multitasking, the HW task's partial bitstream is downloaded to only the task's PRR, and only that PRR is reconfigured. Since only a small portion of the FPGA fabric is reconfigured, reconfiguration time is significantly reduced as compared to reconfiguring the entire fabric, however this time is not negligible. Reconfiguration time can be reduced/hidden using two techniques: configuration prefetching and configuration reuse. Even though these techniques can effectively reduce/hide reconfiguration overhead, prior works in preemptive HW multitasking did not use these techniques. To the best of our knowledge, no prior work evaluated physical implementations of these techniques on PR FPGAs, which precludes consideration of physical-implementation-specific details, such as delays in accessing bitstreams, speed limitations during reconfiguration, etc. In this work, we present a novel implementation of configuration prefetching and reuse for preemptive HW multitasking on a Virtex-5 FPGA, however, our established fundamentals are device-family independent.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-21ANALOG CIRCUIT TOPOLOGICAL FEATURE EXTRACTION WITH UNSUPERVISED LEARNING OF NEW SUB-STRUCTURES
Speaker:
Alex Doboli, Stony Brook University, US
Authors:
Hao Li, Fanshu Jiao and Alex Doboli, Stony Brook University, US
Abstract
This paper presents novel techniques to automatically extract the topological (structural) features in analog circuits. The extracted features include basic building blocks, structural templates and hierarchical structures. Finding structural features is important for tasks like circuit synthesis and sizing, design verification, design reuse, and design knowledge description, summarization and management. The paper presents algorithms for supervised feature extraction and unsupervised learning of new block connections. Experiments discuss feature extraction for a set of 34 state-of-the-art analog circuits.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-22DESIGN AUTOMATION TASKS SCHEDULING FOR ENHANCED PARALLEL EXECUTION OF A STATE-OF-THE-ART LAYOUT-AWARE SIZING APPROACH
Speaker:
Nuno Horta, Instituto de Telecomunicações/Instituto Superior Técnico, PT
Authors:
David Neves, Ricardo Martins, Nuno Lourenço and Nuno Horta, Instituto de Telecomunicações/Instituto Superior Técnico, PT
Abstract
This paper presents an innovative methodology to efficiently schedule design automation tasks during the execution of an analog IC layout-aware sizing process. The referred synthesis process includes several sub-tasks such as DC simulation, floorplanning, placement, global routing, parasitic extraction, and circuit simulations in multiple worst case corners. The schedule of the design tasks is here optimized taking into account standard multi-core architectures, tasks dependencies, accurate time estimations for each task and a limited number of licenses for using commercial tools, e.g., number of simulator licenses. The proposed methodology, first, considers a directed acyclic graph for representing the design flow and task dependencies, then, an evolutionary kernel is used to implement a single-objective multi-constraint optimization. The efficiency and impact of the proposed approach is validated by using a state-of-the-art Analog IC design automation environment.

Download Paper (PDF; Only available from the DATE venue WiFi)

12.1 SPECIAL DAY Hot Topic: Design Methods for Security and Trust

Date: Thursday 17 March 2016
Time: 16:00 - 17:30
Location / Room: Saal 2

Chair:
Jean-Luc Danger, Télécom ParisTech, FR

Co-Chair:
Ilia Polian, Universität Passau, DE

The last session of the special day on secure systems focuses on novel technologies to support the previous HW and SW architecture modifications. The first paper provides a design method for the remote integrity checking of complex PCBs based on Physically Unclonable Functions (PUFs). The second paper focuses on metrics to quantify and measure actually hardware attack resistance. The last paper focuses on design methods to support instruction set extensions on embedded micro-controllers.

TimeLabelPresentation Title
Authors
16:0012.1.1A DESIGN METHOD FOR REMOTE INTEGRITY CHECKING OF COMPLEX PCBS
Speaker:
Patrick Schaumont, Virginia Tech, US
Authors:
Aydin Aysu, Shravya Gaddam, Harsha Mandadi, Carol Pinto, Luke Wegryn and Patrick Schaumont, Virginia Tech, US
Abstract
Modern, complex printed circuit boards contain high-end commercial off-the-shelf components such as high-capacity FPGAs and expensive peripherals. This paper describes a strategy to build a hardware attestation protocol for such a board. The owner or operator of the PCB wants to achieve the assurance that the board installed in the field is physically the same as the one that was originally deployed. Our methodology builds a unique identifier for the PCB by cryptographically linking individual component-level identifiers from the board. The component-level identifiers are implemented using Physical Unclonable Functions (PUF) within the components of the board. We discuss a generic methodology for design and dimensioning of the critical post-processing parameters of the PUF, and we present several strategies to combine multiple PUF into a combined Fusion PUF. We present a prototype of the proposed technique on an FPGA board running uClinux, and we characterize the performance of the proposed protocol on a population of 22 PCBs.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.1.2QUANTIFYING HARDWARE SECURITY USING JOINT INFORMATION FLOW ANALYSIS
Speaker:
Ryan Kastner, University of California, San Diego, US
Authors:
Ryan Kastner, Wei Hu and Alric Althoff, University of California, San Diego, US
Abstract
Existing hardware design methodologies provide limited methods to detect security flaws or derive a measure on how well a mitigation technique protects the system. Information flow analysis provides a powerful method to test and verify a design against security properties that are typically expressed using the notion of noninterference. While this is useful in many scenarios, it does have drawbacks primarily related to its strict enforcement of limiting all information flows -- even those that could only occur in rare circumstances. Quantitative metrics based upon information theoretic measures provide an approach to loosen such restrictions. Furthermore, they are useful in understanding the effectiveness of security mitigations techniques. In this work, we discuss information flow analysis using noninterference and qualitative metrics. We describe how to use them in a synergistic manner to perform joint information flow analysis. And we use this novel technique to analyze security properties across several different hardware cryptographic cores.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.1.3INSTRUCTION SET EXTENSIONS FOR SECURE APPLICATIONS
Speaker:
Francesco Regazzoni, ALaRI, CH
Authors:
Francesco Regazzoni1 and Paolo Ienne2
1ALaRI, CH; 2École Polytechnique Fédérale de Lausanne (EPFL), CH
Abstract
The main goal of this paper is to expose the community to past achievements and future possible uses of Instruction Set Extension (ISE) in security applications. Processor customization has proven to be an effective way for achieving high performance with limited area and energy overhead for several applications, ranging from signal processing to graphical computation. Concerning cryptographic algorithms, a large body of work exists on speeding up block ciphers and asymmetric cryptography with specific ISEs. These algorithms often mix non-standard operations with regular ones, thus representing an ideal target for being accelerated with dedicated instructions. Tools supporting automatic generations of ISEs demonstrated to be useful for algorithm exploration, while secure instructions can increase the robustness against side channels attacks of software routines. In this paper, we discuss how processor customization and the relative tool chains can be used by designers to address security problems and we highlight possible research directions.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session

12.2 Hot Topic: Exploiting New Transistor Technologies to Enhance Hardware Security (without PUFs!)

Date: Thursday 17 March 2016
Time: 16:00 - 17:30
Location / Room: Konferenz 6

Organiser:
Michael Niemier, University of Notre Dame, South Bend, US

Chair:
Sri Parameswaran, The University of New South Wales, AU

Like performance, power, and reliability, security is becoming a critical design consideration. As a representative example, hardware security threats in the integrated circuit (IC) supply chain, including hardware counterfeiting, IP piracy, and reverse engineering cost the US economy more than $200 billion annually. Problems are further exacerbated by the rapid growth in the "Internet of Things" (IoT). This session will highlight how emerging transistor technologies can enhance existing hardware security primitives, and also lead to new hardware security primitives. We begin by addressing security threats that are enabled by insecure hardware. A special emphasis will be placed on the need for standards to address hardware security across all aspects of the supply chain. We then highlight how emerging transistor technologies could impact encryption engines. We consider not only how new devices could lead to more sophisticated/robust encryption ciphers in resource constrained environments, but also how new devices may make said ciphers more resilient to attacks such as differential power analysis (DPA). We conclude with a discussion of how unique I-V characteristics offered by beyond CMOS transistors can enable new hardware security primitives that could facilitate IC supply chain protection, help prevent/stop sidechannel attacks, etc.. Presently, most emerging technologies being studied in the context of hardware security are related to designing physically unclonable functions (PUFs) and random number generators (RNGs). However, most PUF and RNG designs leverage larger device-to-device variations in emerging technologies. Ironically, said variations often represent shortcomings when viewed through the lens of an original device target - i.e., reliable digital logic or memory. In contrast, we will discuss emerging transistor technologies for hardware security related applications that are not RNGs or PUFs, and do not inherently rely on device variations as a means to an end. This session will provide important insight into the following questions: Can new devices lead to more efficient hardware primitives than CMOS in countering hardware attacks? What properties should an emerging technology-based hardware infrastructure provide to better support software level protection schemes? Can such properties be reliably demonstrated by a given device? This session is especially timely as the 2015 International Technology Roadmap for Semiconductors (ITRS) chapter on Emerging Research Devices (ERD) will include the first section on how new devices might be employed to enhance hardware security. As such, the time has come to engage the design automation community in this new and important research vector.

TimeLabelPresentation Title
Authors
16:0012.2.1MITIGATING HARDWARE THREATS TO ENABLE THE INTERNET OF SECURE THINGS
Speaker:
Yaw Obeng, National Institute of Standards and Technology, US
Authors:
Yaw Obeng1, Colm Nolan2 and David Brown3
1National Institute of Standards and Technology, US; 2IBM, IE; 3Intel Corporation, US
Abstract
This paper examines the current issues pertaining to the hardware security and how they could affect the overall security of applications such as the internet of things. Specifically, we review the ongoing industry-led activities aimed at mitigating the hardware threats through supply chain assurance. The impact of emerging technologies on hardware- based needs, and the need for technical standards are discussed from brand owners' perspectives. The paper is illustrated with the ongoing work of the International Technology Roadmap for Semiconductors (ITRS) Emerging Research Devices (ERD) hardware security working group, the counterfeit risk mitigation efforts from iNEMI, and the High-Density Package User Group (HDPUG), as well as published standards from SEMI and the Open Group. All these efforts are aimed at mitigating counterfeits in the electronics supply chain through product traceability and authentication. Finally, we will discuss how existing and emerging technologies can be used for product authentication throughout the supply chain.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.2.2LEVERAGE EMERGING TECHNOLOGIES FOR DPA-RESILIENT BLOCK CIPHER DESIGN
Speaker:
Michael Niemier, University of Notre Dame, US
Authors:
Yu Bi1, Kaveh Shamsi1, Jiann-Shiun Yuan1, Francois-Xavier Standaert2 and Yier Jin1
1University of Central Florida, US; 2Université Catholique de Louvain, BE
Abstract
Emerging devices have been designed and fabricated to extend Moore's Law. While the benefits over traditional metrics such as power, energy, delay, and area certainly apply to emerging device technologies, new devices may offer additional benefits in addition to improvements in the aforementioned metrics. In this sense, we consider how new transistor technologies could also have a positive impact on hardware security. More specifically, we consider how tunneling FETs (TFET) and silicon nanowire FETs (SiNW FETs) could offer superior protection to integrated circuits and embedded systems that are subject to hardware-level attacks -- e.g., differential power analysis (DPA). Experimental results on SiNW FET and TFET CML gates are presented. In addition, simulation results of utilizing TFET CML on a light-weight cryptographic circuit, KATAN32, show that TFET-based current mode logic (CML) can both improve DPA resilience and preserve low power consumption in the target design. Compared to the CMOS-based CML designs, the TFET CML circuit consumes 15 times less power while achieving a similar level of DPA resistance.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.2.3USING EMERGING TECHNOLOGIES FOR HARDWARE SECURITY BEYOND PUFS
Speaker:
X. Sharon Hu, University of Notre Dame, US
Authors:
An Chen1, X. Sharon Hu2, Yier Jin3, Michael Niemier2 and Xunzhao Yin2
1ITRS ERD working group chair, US; 2University of Notre Dame, US; 3University of Central Florida, US
Abstract
We discuss how the unique I-V characteristics offered by emerging, post-CMOS transistors can be used to enhance hardware security. Different from most existing work that exploits emerging technologies for hardware security, we (i) focus on transistor characteristics that either do not exist in, or are difficult to duplicate with MOSFETs, and (ii) aim to move beyond hardware implementations of physically unclonable functions (PUFs) and random number generators (RNGs).

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session

12.3 System Support for Resilience and Robustness

Date: Thursday 17 March 2016
Time: 16:00 - 17:30
Location / Room: Konferenz 1

Chair:
Oliver Bringmann, University of Tuebingen, DE

Co-Chair:
Dirk Stroobandt, Ghent University, BE

This session discusses a wide range of innovative techniques from instruction scheduling to mobile virtualization to characterize and improve system resilience and robustness.

TimeLabelPresentation Title
Authors
16:0012.3.1EFFECT OF LFSR SEEDING, SCRAMBLING AND FEEDBACK POLYNOMIAL ON STOCHASTIC COMPUTING ACCURACY
Speaker:
Jason H. Anderson, University of Toronto, CA
Authors:
Jason H. Anderson1, Yuko Hara-Azumi2 and Shigeru Yamashita3
1University of Toronto, CA; 2Tokyo Institute of Technology, JP; 3Ritsumeikan University, JP
Abstract
Stochastic computing (SC) has received attention recently as a paradigm to improve energy efficiency and fault tolerance. SC uses hardware-generated random bitstreams to represent numbers in the [0:1] range - the number represented is the probability of a bit in the stream being logic-1. The generation of random bitstreams is typically done using linear-feedback shift register (LFSR)-based random number generators. In this paper, we consider how best to design such LFSR-based stochastic bitstream generators, as a means of improving the accuracy of stochastic computing. Three design criteria are evaluated: 1) LFSR seed selection, 2) the utility of scrambling LFSR output bits, and 3) the LFSR polynomials (i.e. locations of the feedback taps) and whether they should be unique vs. uniform across stream generators. For a recently proposed multiplexer-based stochastic logic architecture, we demonstrate that careful seed selection can improve accuracy results vs. the use of arbitrarily selected seeds. For example, we show that stochastic logic with seed-optimized 255-bit stream lengths achieves accuracy better than that of using 1023-bit stream lengths with arbitrary seeds: an improvement of over 4X in energy for equivalent accuracy

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.3.2EFFICIENT PROGRAM TRACING AND MONITORING THROUGH POWER CONSUMPTION - WITH A LITTLE HELP FROM THE COMPILER
Speaker:
Carlos Moreno, University of Waterloo, CA
Authors:
Carlos Moreno, Sean Kauffman and Sebastian Fischmeister, University of Waterloo, CA
Abstract
Ensuring correctness and enforcing security are growing concerns given the complexity of modern connected devices and safety-critical systems. A promising approach is non-intrusive runtime monitoring through reconstruction of program execution traces from power consumption measurements. This can be used for verification, validation, debugging, and security purposes. In this paper, we propose a framework for increasing the effectiveness of power-based program tracing techniques. These systems determine the most likely block of source code that produced an observed power trace (CPU power consumption as a function of time). Our framework maximizes distinguishability between power traces for different code blocks. To this end, we provide a special compiler optimization stage that reorders intermediate representation (IR) and determines the reorderings that lead to power traces with highest distances between each other, thus reducing the probability of misclassification. Our work includes an experimental evaluation, using LLVM for an ARM architecture. Experimental results confirm the effectiveness of our technique.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.3.3FLIC: FAST, LIGHTWEIGHT CHECKPOINTING FOR MOBILE VIRTUALIZATION USING NVRAM
Speaker:
Kan Zhong, Chongqing University, CN
Authors:
Kan Zhong1, Duo Liu1, Liang Liang1, Linbo Long1, Yi Lin1 and Zili Shao2
1Chongqing University, CN; 2The Hong Kong Polytechnic University, HK
Abstract
Checkpointing is a key enabler of hibernation, live migration and fault-tolerance for virtual machines (VMs) in mobile devices. However, checkpointing a VM is usually heavyweight: the VM's entire memory needs to be dumped to storage, which induces a significant amount of (slow) I/O operations, degrading system performance and user experience. In this paper, we propose FLIC, a fast and lightweight checkpointing machinery for virtualized mobile devices by taking advantages of recent byte-addressable, non-volatile memory (NVRAM). Instead of saving the VM's entire memory to storage, we store its working set pages in NVRAM, avoiding accessing slow flash memory (compared to server-grade SSDs). To cope with the energy constraint of mobile systems, we further deduplicate VM snapshots, reducing the VM's image size and saving storage space. Experimental results based on an Exynos 5250 SoC show that our approach can effectively improve the performance of checkpointing in mobile virutalization and save energy.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:1512.3.4PAIS: PARALLELIZATION AWARE INSTRUCTION SCHEDULING FOR IMPROVING SOFT-ERROR RELIABILITY OF GPU-BASED SYSTEMS
Speaker:
Mohammad Abdullah Al Faruque, University of California, Irvine, US
Authors:
Haeseung Lee, Hsinchung Chen and Mohammad Abdullah Al Faruque, University of California, Irvine, US
Abstract
For decades the semiconductor industry has been driven by Moore's Law and performed aggressive technology scaling to achieve low-power and high-performance. Meanwhile, the semiconductor industry has faced severe reliability challenges like soft-error. Many methodologies (such as redundancy methodologies) have been proposed to improve the soft-error reliability of GPU based systems. However, the GPU compiler has yet to be considered for improving the soft-error reliability of the GPU. In this paper, we propose a novel GPU architecture-aware compilation methodology to further improve the soft-error reliability. The proposed methodology jointly considers the parallel behavior of the GPU and the applications and minimizes the vulnerability of the GPU applications during instruction scheduling. The experimental results show that our methodology is able to perform the scheduling within 5.88 seconds on average and achieves soft-error reliability improvement up to 40% compared to the state-of-the-art compilation techniques. The results show that the performance and power overheads of our methodology are less than 10% in most of the cases.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session

12.4 Simulating Everything: From Timing to Instructions

Date: Thursday 17 March 2016
Time: 16:00 - 17:30
Location / Room: Konferenz 2

Chair:
Graziano Pravadelli, Universita degli Studi di Verona, IT

Co-Chair:
Valeria Bertacco, University of Michigan, US

The session deals with several facets of simulation optimization, ranging from timing, circuit, and instruction decoding.

TimeLabelPresentation Title
Authors
16:0012.4.1ACCELERATING SOURCE-LEVEL TIMING SIMULATION
Speaker:
Oliver Bringmann, Universität Tübingen, DE
Authors:
Simon Schulz1 and Oliver Bringmann2
1Universität Tübingen, DE; 2Universität Tübingen / FZI, DE
Abstract
Abstract—Source-level timing simulation (SLTS) is a promising method to overcome one major challenge in early and rapid prototyping: fast and accurate simulation of timing behavior. However, most of existing SLTS approaches are still coupled with a considerable simulation overhead. We present a method to reduce source-level timing simulation overhead by removing superfluous instrumentation based on instrumentation dependency graphs. We show in experiments, that our optimizations decrease simulation overhead significantly (up to factor 7.7), without losing accuracy. Our detailed experiments are based on benchmarks as well as real life production code, that is simulated in a virtual environment.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.4.2SPARSITY-ORIENTED SPARSE SOLVER DESIGN FOR CIRCUIT SIMULATION
Speaker:
Xiaoming Chen, Tsinghua University, CN
Authors:
Xiaoming Chen, Lixue Xia, Yu Wang and Huazhong Yang, Tsinghua University, CN
Abstract
The sparse solver is a critical component in circuit simulators. The widely used solver KLU is based on a pure column-level algorithm. In this paper, we point out that KLU is not always the best algorithm for circuit matrices by experiments. We also demonstrate that the optimal algorithm strongly depends on the sparsity of the matrix. Two sparse LU factorization algorithms are proposed for extremely sparse matrices and dense matrices. A simple but effective strategy is proposed to select the optimal algorithm according to the sparsity. By combining the two new algorithms and the selection method together, the proposed solver achieves much higher performance than both KLU and PARDISO.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.4.3INTEGRATION OF MIXED-SIGNAL COMPONENTS INTO VIRTUAL PLATFORMS FOR HOLISTIC SIMULATION OF SMART SYSTEMS
Speaker:
Davide Quaglia, University of Verona, IT
Authors:
Enrico Fraccaroli1, Michele Lora1, Sara Vinco2, Davide Quaglia1 and Franco Fummi1
1University of Verona, IT; 2Politecnico di Torino, IT
Abstract
Nowadays, the design of applications based on smart systems requires the joint simulation of both digital and analog aspects. Even if analog-mixed-signal (AMS) extensions of hardware description languages are an enabling factor, they do not provide a general methodology for the integration of AMS models into digital virtual platforms. This paper defines the problem and provides two main contributions: 1) the automatic conversion of analog models from Verilog-AMS to C++/SystemC, to remove the overhead of co-simulation with traditional virtual platform tools, and 2) the automatic abstraction of analog conservative models, with the goal of increasing simulation speed. Experimental results show that the virtual platform with automatically integrated analog components is 40 times faster than co-simulation with Verilog-AMS, and the increase of speed due to abstraction is more than 100%.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:1512.4.4DECISION TREE GENERATION FOR DECODING IRREGULAR INSTRUCTIONS
Speaker:
Katsumi Okuda, Mitsubishi Electric Corporation, JP
Authors:
Katsumi Okuda and Haruhiko Takeyama, Mitsubishi Electric Corporation, JP
Abstract
Instruction set simulators (ISS) are indispensable tools for the development of new architectures and embedded software. One essential part of any ISS is its instruction decoder. Since manual implementation of an instruction decoder for a complex instruction set is tedious and error-prone, automatic generation of an instruction decoder is required. However, as a result of the increasing irregularity of instruction encoding because of the incremental addition of instructions, generating efficient instruction decoders is complicated. In this paper, we propose a generation algorithm of a decision tree for decoding irregular instructions. Our algorithm can generate decision trees by using not only significant bits of opcode patterns but also exclusion conditions in decoding entries. Our results on ARM, Thumb-2, MIPS64, RH850, and TriCore show that our algorithm generates efficient instruction decoders in terms of both depth and memory consumption regardless of whether the target instruction set is irregular or not.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session

12.5 Accelerator Design and Heterogeneous Architectures

Date: Thursday 17 March 2016
Time: 16:00 - 17:30
Location / Room: Konferenz 3

Chair:
Cristina Silvano, Politecnico di Milano, IT

Co-Chair:
Todd Austin, University of Michigan, US

This session presents papers on heterogenous systems with focus on hardware acceleration. The first two papers propose acceleration for general purpose and domain specific computing, respectively. The third paper addresses the issue of system interconnect for many-accelerator systems. The last paper introduces a data oriented accelerator design for sparse matrix operations.

TimeLabelPresentation Title
Authors
16:0012.5.1(Best Paper Award Candidate)
A RECONFIGURABLE HETEROGENEOUS MULTICORE WITH A HOMOGENEOUS ISA
Speaker:
Antonio Carlos Schneider Beck, Universidade Federal do Rio Grande do Sul (UFRGS), BR
Authors:
Jeckson Dellagostin Souza1, Luigi Carro1, Mateus Beck Rutzig2 and Antonio Carlos Schneider Beck Filho1
1Universidade Federal do Rio Grande do Sul (UFRGS), BR; 2Universidade Federal de Santa Maria, BR
Abstract
Given the large diversity of embedded applications one can find in current portable devices, for energy and performance reasons one must exploit both Thread- and Instruction Level Parallelism. While MPSoCs are largely used for this purpose, they fail when one considers software productivity, since it comprises different ISAs that must be programmed separately. On the other hand, general purpose multicores implement the same ISA, but are composed of a homogeneous set of very power consuming superscalar processors. In this paper we show how one can effectively use a regular fabric to provide a number of different possible heterogeneous configurations while still sustaining the same ISA. This is done by leveraging the intrinsic regularity of a reconfigurable fabric, so several different organizations can be easily built with little effort. To ensure ISA compatibility, we use a binary translation mechanism that transforms code to be executed on the fabric at run-time. Using representative benchmarks, we show that one version of the heterogeneous system can outperform its homogenous counterpart in average by 59% in performance and 10% in energy, with EDP improvements in almost every scenario.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.5.2THE NEURO VECTOR ENGINE: FLEXIBILITY TO IMPROVE CONVOLUTIONAL NETWORK EFFICIENCY FOR WEARABLE VISION
Speaker:
Maurice Peemen, Eindhoven University of Technology, NL
Authors:
Maurice Peemen1, Bart Mesman1, Henk Corporaal1, Runbin Shi2, Sohan Lal3 and Ben Juurlinik3
1Eindhoven University of Technology, NL; 2Soochow University, CN; 3TU Berlin, DE
Abstract
Deep Convolutional Networks (ConvNets) are currently superior in benchmark performance, but the associated demands on computation and data transfer prohibit straightforward mapping on energy constrained wearable platforms. The computational burden can be overcome by dedicated hardware accelerators, but it is the sheer amount of data transfer, and level of utilization that determines the energy-efficiency of these implementations. This paper presents the Neuro Vector Engine (NVE) a SIMD accelerator for ConvNets for visual object classification, targeting portable and wearable devices. Our accelerator is very flexible due to the usage of VLIW ISA, at the cost of instruction fetch overhead. We show that this overhead is insignificant when the extra flexibility enables advanced data locality optimizations, and improves HW utilization over ConvNet vision applications. By co-optimizing accelerator architecture and algorithm loop structure, 30Gops is achieved with a power envelope of 54mW and only 0.26mm^2 silicon footprint at TSMC 40nm technology, enabling high-end visual object recognition by portable and even wearable devices.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.5.3IMPROVING SCALABILITY OF CMPS WITH DENSE ACCS COVERAGE
Speaker:
Gunar Schirner, Northeastern University, US
Authors:
Nasibeh Teimouri, Hamed Tabkhi and Gunar Schirner, Northeastern University, US
Abstract
This article opens a path toward efficient integration of many hardware Accelerators (ACCs) on a single chip. To this end, the article at first identifies 4 major semantic aspects of ACC communication: data access model, data granularity, marshalling and synchronization. Based on the identified semantics, the article proposes Transparent Self-Synchronizing (TSS) architecture as an extensible architecture template to efficiently integrate many ACCs. In principle, TSS proposes a shift from the current processor-centric view to a more equal, peer view between ACCs and the host processors. It offers a programmable MUX-based interconnect with fine-tuned local buffers per ACC as well as an autonomous control to reduce the synchronization load on the host processor. TSS is mainly suitable for class of streaming applications. Our results using 8 streaming applications demonstrate significant benefits of TSS including 3x speedup over current ACC-based architectures

Download Paper (PDF; Only available from the DATE venue WiFi)
17:1512.5.4HARDWARE ACCELERATOR FOR ANALYTICS OF SPARSE DATA
Speaker:
Eriko Nurvitadhi, Intel Corporation, US
Authors:
Eriko Nurvitadhi, Asit Mishra, Yu Wang, Ganesh Venkatesh and Debbie Marr, Intel Corporation, US
Abstract
Rapid growth of Internet led to web applications that produce large unstructured sparse datasets (e.g., texts, ratings). Machine learning (ML) algorithms are the basis for many important analytics workloads that extract knowledge from these datasets. This paper characterizes such workloads on a high-end server for real-world datasets and shows that a set of sparse matrix operations dominates runtime. Further, they run inefficiently due to low compute-per-byte and challenging thread scaling behavior. As such, we propose a hardware accelerator to perform these operations with extreme efficiency. Simulations and RTL synthesis to 14nm ASIC demonstrate significant performance and performance/Watt improvements over conventional processors, with only a small area overhead.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session

12.6 Reconfigurable Computing Platforms and Architectures

Date: Thursday 17 March 2016
Time: 16:00 - 17:30
Location / Room: Konferenz 4

Chair:
Dirk Stroobandt, Ghent University, BE

Co-Chair:
Jürgen Becker, Karlsruhe Institute of Technology (KIT), DE

In this session, we have three papers focused on design of platform and architectures for reconfigurable computing. The first paper described a dedicated hardware accelerator addressing the prohibitive computing demand of Homomorphic Encryption. The second paper develop larger, more efficient, overlays using multiple DSP blocks and then maximising their utilisation. The third paper proposes a novel scheme to dynamically optimize a reconfigurable VLIW processor by predicting and matching the number of active data-paths for each application phase.

TimeLabelPresentation Title
Authors
16:0012.6.1SECURING THE CLOUD WITH RECONFIGURABLE COMPUTING: AN FPGA ACCELERATOR FOR HOMOMORPHIC ENCRYPTION
Speaker:
Alessandro Cilardo, University of Naples Federico II, IT
Authors:
Alessandro Cilardo and Domenico Argenziano, University of Naples Federico II, IT
Abstract
A hot topic in current cloud security research, homomorphic encryption is a recently introduced technique allowing computation to take place on encrypted data. This work presents the architecture and implementation of a dedicated FPGA-based accelerator addressing the prohibitive computing demand of homomorphic encryption. In particular, the accelerator targets the most time consuming operation used by the encryption primitive, large integer multiplication. Based on an Altera's Stratix V FPGA platform, the prototype implementation achieves significant improvements in terms of execution time -under a comparable hardware cost- against alternative solutions previously presented in the technical literature.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.6.2THROUGHPUT ORIENTED FPGA OVERLAYS USING DSP BLOCKS
Speaker:
Douglas L. Maskell, Nanyang Technological University, SG
Authors:
Abhishek K. Jain1, Douglas L. Maskell1 and Suhaib A. Fahmy2
1Nanyang Technological University, SG; 2University of Warwick, GB
Abstract
Design productivity is a major concern preventing the mainstream adoption of FPGAs. Overlay architectures have emerged as one possible solution to this challenge, offering fast compilation and software-like programmability. However, overlays typically suffer from area and performance overheads due to limited consideration for the underlying FPGA architecture. These overlays have often been of limited size, supporting only relatively small compute kernels. This paper examines the possibility of developing larger, more efficient, overlays using multiple DSP blocks and then maximising utilisation by mapping multiple instances of kernels simultaneously onto the overlay to exploit kernel level parallelism. We show a significant improvement in achievable overlay size and overlay utilisation, with a reduction of almost 70% in the overlay tile requirement compared to existing overlay architectures, an operating frequency in excess of 300 MHz, and kernel throughputs of almost 60 GOPS.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.6.3RUN-TIME PHASE PREDICTION FOR A RECONFIGURABLE VLIW PROCESSOR
Speaker:
Stephan Wong, TUDelft, NL
Authors:
Qi Guo1, Anderson Sartor2, Anthony Brandon3, Xuehai Zhou1 and Stephan Wong3
1University of Science and Technology of China, CN; 2Universidade Federal do Rio Grande do Sul (UFRGS), BR; 3TUDelft, NL
Abstract
It is well-known that different applications exhibit varying amounts of ILP. Execution of these applications on the same fixed-width VLIW processor will result (1) in wasted energy due to underutilized resources if the issue-width of the processor is larger than the inherent ILP; or alternatively, (2) in lower performance if the issue-width is smaller than the inherent ILP. Moreover, even within a single application distinct phases can be observed with varying ILP and therefore changing resource requirements. With this in mind, we designed the rVEX processor, which is a VLIW processor that can change its issue-width at run-time. In this paper, we propose a novel scheme to dynamically (i.e., at run-time) optimize the resource utilization by predicting and matching the number of active data-paths for each application phase. The purpose is to achieve low energy consumption for applications with low ILP, and high performance for applications with high ILP, on a single VLIW processor design. We prototyped the rVEX processor on an FPGA and obtained the dynamic traces of applications running on top of a Linux port. Our results show that it is possible in some cases to achieve the performance of an 8-issue core with 10% lower energy consumption, while in others we achieve the energy consumption of a 2-issue core with close to 20% lower execution time.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session

12.7 Formal System Level Verification

Date: Thursday 17 March 2016
Time: 16:00 - 17:30
Location / Room: Konferenz 5

Chair:
Mathias Soeken, École Polytechnique Fédérale de Lausanne (EPFL), CH

Co-Chair:
Gianpiero Cabodi, Politecnio di Torino, IT

The session considers verification at the system level. The first paper deals with the combination of protocols and networks. The second paper refines real-time analysis from task level to the level of runnable entities within tasks. The third one focuses on the correctness of synthesis from high level models.

TimeLabelPresentation Title
Authors
16:0012.7.1ADVOCAT: AUTOMATED DEADLOCK VERIFICATION FOR ON-CHIP CACHE COHERENCE AND INTERCONNECTS
Speaker:
Freek Verbeek, Open University of The Netherlands, NL
Authors:
Freek Verbeek1, Pooria Yaghini2, Ashkan Eghbal2 and Nader Bagherzadeh2
1Open University of The Netherlands, NL; 2University of California, Irvine, US
Abstract
Cache coherence plays a major role in manycore systems. The verification of deadlocks is a challenge in particular, because deadlock freedom is an emergent property. Formal methods often decouple verification of the protocol from verification of the communication interconnect. Modern communication fabrics, however, become more advanced and include a network topology, routing, arbitration, synchronization, and more. In this paper, an integrated approach called ADVOCAT is proposed that allows cross-layer verification of both the cache coherence protocol and the communication fabric all at once. An automated methodology for deriving cross-layer invariants is proposed. These invariants relate the state of the application-layer protocols to en route packets in the communication fabric. We apply this methodology in a case study where cross-layer deadlocks occur if queues are wrongly sized. Our methodology is generally applicable and shows promising scalability.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.7.2GUARANTEES FOR RUNNABLE ENTITIES WITH HETEROGENEOUS REAL-TIME REQUIREMENTS
Speaker:
Leonie Ahrendts, Technische Universität Braunschweig, DE
Authors:
Leonie Ahrendts, Zain A. H. Hammadeh and Rolf Ernst, Technische Universität Braunschweig, DE
Abstract
Classical real-time (RT) analysis proves temporal properties of tasks. In industrial practice, however, tasks are often composed of runnable entities with heterogeneous RT requirements. If RT guarantees are only available at task granularity, the strictest RT requirement of a runnable entity determines the RT requirement of the entire task. However, by giving RT guarantees for each runnable entity, this over-provisioning can be avoided. We provide an analysis which is fine-grained enough to provide hard and weakly-hard response time guarantees for runnable entities and show the improvement in an industrial case study.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.7.3VALIDATING SCHEDULING TRANSFORMATION FOR BEHAVIORAL SYNTHESIS
Speaker:
Sandip Ray, Intel Corporation, US
Authors:
Zhenkun Yang1, Kecheng Hao2, Kai Cong3, Li Lei1, Sandip Ray3 and Fei Xie1
1Portland State University, US; 2Xilinx Inc., US; 3Intel Corporation, US
Abstract
Behavioral synthesis automatically compiles an electronic system-level description of a hardware design into an RTL implementation. Scheduling in behavioral synthesis is an important, sophisticated, and error-prone transformation which converts the untimed or partially timed description into a fully timed implementation. We present a scalable equivalence checking algorithm for validating scheduling transformations. The equivalence checking accounts for control/data dependency, scheduling modes, and subtle interface protocols. Our experimental results demonstrate that our approach scales to industrial benchmarks. Furthermore, our checker found bugs in industrial synthesis tool implementations.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session