Booklet Proof Reading

Printer-friendly version PDF version

Goto Session:

1.1 Opening Session: Plenary, Awards Ceremony & Keynote Addresses

Date: Tuesday, March 26, 2019
Time: 08:30 - 10:30
Location / Room: Palazzo dei Congressi

Chair:
Jürgen Teich, DATE 2019 General Chair, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE, Contact Jürgen Teich

Co-Chair:
Franco Fummi, DATE 2019 Programme Chair, Universita' di Verona, IT, Contact Franco Fummi

TimeLabelPresentation Title
Authors
08:301.1.1WELCOME ADDRESSES
Speakers:
Jürgen Teich1 and Franco Fummi2
1Friedrich-Alexander-Universität Erlangen-Nürnberg, DE; 2Universita' di Verona, IT
08:451.1.2PRESENTATION OF AWARDS
09:151.1.3KEYNOTE: WORKING WITH SAFE, DETERMINISTIC AND SECURE INTELLIGENCE FROM CLOUD TO EDGE
Speaker:
Astrid Elbe, Intel, DE
Abstract
The Internet of Things (IoT) will be the largest revolution in the data economy. At Intel, we understand the exponential power of data, and we're making it practical and economical to put it to work from the edge to the cloud. Intel® technologies purpose-built for IoT deliver optimized performance at every point, practical ways to use artificial intelligence, broad connectivity support, and a built-in foundation of functional safety, time determinism and security to help protect and make dependable your data and systems. By harnessing the massive flood of data generated by connected things—and using it to gain actionable insights—we'll accelerate business transformation to a degree never seen before. Managing services and infrastructure at the edge is a complex balancing act that has to meet much more demanding timing and dependability constraints and requires vastly more speed and precision than in a conventional cloud data center. Satisfying the competing objectives of stringent Quality of Service (QoS) and workload consolidation in this complex IoT environment requires new approaches and advancements. Virtualization alone does not deliver the full potential for this IoT transformation. E.g. for challenging industrial workloads an automatic and self-managing approach will be needed.
1.1.4KEYNOTE: ASSISTED AND AUTOMATED DRIVING
Speaker:
Jürgen Bortolazzi, Porsche, DE
Abstract

Since the introduction of Park Distance Control and Adaptive Cruise Control in the Mid 2000s, PORSCHE follows a systematic strategy to adapt driver assistance and automated driving to their product lines. There is no contradiction to the philosophy of a sports car: customers that enjoy driving on their own in case of appropriate traffic conditions expect significant ease of driving in stressful, time-consuming situations like traffic jams, or heavily occupied parking spaces. Furthermore, new functionalities like the predictive Innodrive system enabling efficient cruise control based on sophisticated planning algorithms provides a perfect contribution to the PORSCHE Intelligent Performance strategy.
Although the common discussion focuses on the higher levels of automation from SAE Level 3 to Level 4, at least for the next decade Level 1 and 2 systems will play a significant role being the technological state-of-the-art for a majority of cars. Therefore, PORSCHE focuses on increasing the performance and functionality of Level1/2 driver assistance system in parallel to participating in development programs to enable Level3/4 automated driving. This offers the opportunity to systematically build the necessary competency both in the technological fields of sensing, sensor fusion, planning and control as well as the necessary processes, methods and tools that are mandatory to develop, approve and release higher level automated systems. Systems Engineering has to be combined with approaches to process very large amounts of data whereas traditional random road based testing has to be replaced by a combination of virtual and systematic real world testing. Last but not least, a new end-to-end EE architecture is necessary to provide the seamless integration of the vehicle into an IT based service infrastructure.

The keynote will address the following topics:

  • Benefits and challenges of assisted and automated driving
  • Status of L1/2 assisted driving
  • Challenges and technology assets for L3/4 automated driving
  • Data driven development methodologies
  • End-to-End Electronic Architecture (E³)
10:30End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


2.1 Executive Session 1: Panel "Life After CMOS"

Date: Tuesday, March 26, 2019
Time: 11:30 - 13:00
Location / Room: Room 1

Organisers:
Marco Casale-Rossi, Synopsys, IT, Contact Marco Casale-Rossi
Jamil Kawa, Synopsys, US, Contact Jamil Kawa

Moderator:
G. Dan Hutcheson, VLSI Research, US, Contact G. Dan Hutcheson

Sixty years ago, Robert Noyce filed U.S. Patent 2,981,877, which marked the birth of the monolithic integrated circuit. Roughly thirty years later, the IC broke the 1-micron barrier — a 100X improvement over Noyce's IC. Today, 7-nanometer is in early manufacturing, and 5-nanometer is under development, marking another 100X improvement. This cannot continue forever: the silicon atom diameter is 2.92 Ångstroms, approximately 0.3 nanometers. Even if we envision the use of atomic layer epitaxy — where we take the cross section of a FET to be that of a single silicon atom source/drain separated by a channel of a single silicon atom vacancy resulting in a handful total available carriers — will it be possible to design and manufacture an IC made of trillions of those transistors in volume? And even if the progress in Moore's law continues relentlessly by going vertical — we have already infringed the second clause of Moore's law: "at the same cost." Yet, the computing and memory requirements of artificial intelligence (AI), biochemistry, medicine, pharmacology, and physics applications greatly exceed the capabilities of current electronics and are unlikely to be met by evolutionary improvements in devices, channel and interconnect materials, or integrated circuit architectures alone. Meeting them means thinking outside the CMOS box. Many are already doing so with circuits based on super-conducting electronics (SCE) and architectures based on quantum computing (QC), where researchers have made significant advances in recent years. In the short term, Josephson junction-based SCE promise to reinvigorate HPC by delivering at least an order of magnitude more performance while using 100-1,000X less power. In the long term, today's foundations for new classes of computers based on the laws of quantum physics — or quantum computers (QC) — may dramatically change the landscape of HPC. They already bring the promise of solving today's intractable problems. This panel, moderated by Dan Hutcheson, brings together some of the industry's greatest thinkers to explore these questions, to color an image of our industry's future, and to go beyond the CMOS box.

Panelists:

  • Alessandro Curioni, IBM, CH
  • Antun Domic, Synopsys, US
  • Mark Heiligman, IARPA, US
  • Buvna Ayyagari-Sangamalli, AMAT, US
13:00End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


2.2 Physical Attacks

Date: Tuesday, March 26, 2019
Time: 11:30 - 13:00
Location / Room: Room 2

Chair:
Lejla Batina, Radboud University, NL, Contact Lejla Batina

Co-Chair:
Elif Kavun, University of Sheffield, GB, Contact Elif Kavun

This session covers state of the art fault analysis techniques such as persistent fault analysis, electromagnetic fault injection, and glitching. In addition, a practical attack is described on a very popular platform together with its corresponding countermeasure. Other topics in this session include the reconfigurability of FPGAs to defend against side-channel attacks and spying on IoT devices' temperature via DRAM.

TimeLabelPresentation Title
Authors
11:302.2.1ONE FAULT IS ALL IT NEEDS: BREAKING HIGHER-ORDER MASKING WITH PERSISTENT FAULT ANALYSIS
Speaker:
Shivam Bhasin, Nanyang Technological University, SG
Authors:
Jingyu Pan1, Shivam Bhasin2, Fan Zhang3 and Kui Ren3
1Nanyang Technological University, Zhejiang University, CN; 2Nanyang Technological University, SG; 3Zhejiang University, CN
Abstract
Persistent fault analysis (PFA) was proposed at CHES 2018 as a novel fault analysis technique. It was shown to completely defeat standard redundancy based countermeasure against fault analysis. In this work, we investigate the security of masking schemes against PFA. We show that with only one fault injection, masking countermeasures can be broken at any masking order. The study is performed on publicly available implementations of masking.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.2.2MULTI-TENANT FPGA-BASED RECONFIGURABLE SYSTEMS: ATTACKS AND DEFENSES
Speaker:
Rana Elnaggar, Duke University, US
Authors:
Rana Elnaggar1, Ramesh Karri2 and Krishnendu Chakrabarty1
1Duke University, US; 2NYU, US
Abstract
Partial reconfiguration of FPGAs improves system performance, increases utilization of hardware resources, and enables run-time update of system capabilities. However, the sharing of FPGA resources among various tenants presents security risks that affect the privacy and reliability of tenant applications running in the FPGA-based system. In this study, we examine the security ramifications of co-tenancy with a focus on address-redirection and task-hiding attacks. We design a countermeasure that protects FPGA-based systems against such attacks and prove that it resists these attacks. We present simulation results and an experimental demonstration using a Xilinx FPGA board to highlight the effectiveness of the countermeasure. The proposed countermeasure incurs negligible cost in terms of the area utilization of FPGAs currently used in the cloud.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.2.3SPYING ON TEMPERATURE USING DRAM
Speaker:
Nikolaos Athanasios Anagnostopoulos, TU Darmstadt, DE
Authors:
Wenjie Xiong1, Nikolaos Athanasios Anagnostopoulos2, André Schaller2, Stefan Katzenbeisser2 and Jakub Szefer1
1Yale University, US; 2Technische Universität Darmstadt, DE
Abstract
Today's ubiquitous IoT devices make spying on, and collecting data from, unsuspecting users possible. This paper shows a new attack where DRAM modules, widely used in IoT devices, can be abused to measure the temperature in the vicinity of the device in order to spy on a user's behavior. Specifically, the temperature dependency of the DRAM decay is used as a proxy for user's behavior in the vicinity of the device. The attack can be performed remotely by only changing the software of an IoT device, without requiring hardware changes, and with a resolution reaching 0.5 Celsius degree. Potential defenses to the temperature spying attack are presented in this paper as well.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:452.2.4MITIGATING POWER SUPPLY GLITCH BASED FAULT ATTACKS WITH FAST ALL-DIGITAL CLOCK MODULATION CIRCUIT
Speaker:
Nikhil Chawla, Georgia Institute of Technology, US
Authors:
Arvind Singh1, Monodeep Kar2, Nikhil Chawla1 and Saibal Mukhopadhyay1
1Georgia Institute of Technology, US; 2Intel Corporation, US
Abstract
This paper experimentally demonstrates that an on-chip integrated fast all-digital clock modulation (F-ADCM) circuit can be used as a countermeasure against supply glitch and temperature variations-based fault injection attacks (FIA). The F-ADCM circuit modulates clock edges in presence of DC/transient supply glitches and temperature variations to ensure correct operation of the underlying cryptographic circuit. With a testchip manufactured in 130nm CMOS process, we first demonstrate an inexpensive methodology to conduct a fault attack on hardware implementation of a 128-bit advanced encryption standard (AES) engine using externally controlled supply glitches. Next, we show that with F-ADCM circuit, it is no longer possible to inject supply/temperature glitch-based faults even after 10 million encryptions across varying operating conditions. Moreover, in extreme operating conditions, the F-ADCM circuit doesn't generate any clock edges, leading to complete failure of the AES encryption, indicating no exploitable faults are present.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00IP1-1, 384FAULT INJECTION ON HIDDEN REGISTERS IN A RISC-V ROCKET PROCESSOR AND SOFTWARE COUNTERMEASURES
Speaker:
Johan Laurent, Univ. Grenoble Alpes, Grenoble INP, LCIS, FR
Authors:
Johan Laurent1, Vincent Beroulle1, Christophe Deleuze1 and Florian Pebay-Peyroula2
1LCIS - Grenoble Institute of Technology - Univ. Grenoble Alpes, FR; 2CEA-Leti, FR
Abstract
To protect against hardware fault attacks, developers can use software countermeasures. They are generally designed to thwart software fault models such as instruction skip or memory corruption. However, these typical models do not take into account the actual implementation of a processor. By analyzing the processor microarchitecture, it is possible to bypass typical software countermeasures. In this paper, we analyze the vulnerability of a secure code from FISSC (Fault Injection and Simulation Secure Collection), by simulating fault injections in a RISC-V Rocket processor RTL description. We highlight the importance of hidden registers in the processor pipeline, which temporarily hold data during code execution. Secret data can be leaked by attacking these hidden registers. Software countermeasures against such attacks are also proposed.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:01IP1-2, 476METHODOLOGY FOR EM FAULT INJECTION: CHARGE-BASED FAULT MODEL
Speaker:
Haohao Liao, University of Waterloo, CA
Authors:
Haohao Liao and Catherine Gebotys, University of Waterloo, CA
Abstract
Recently electromagnetic fault injection (EMFI) techniques have been found to have significant implications on the security of embedded devices. Unfortunately there is still a lack of understanding of EM fault models and countermeasures for embedded processors. For the first time, this paper proposes an extended fault model based on the concept of critical charge and a new EMFI backside methodology based on over-clocking. Results show that exact timing of EM pulses can provide reliable repeatable instruction replacement faults for specific programs. An attack on AES is demonstrated showing that the EM fault injection requires on average less than 222 EM pulses and 5.3 plaintexts to retrieve the full AES key. This research is critical for ensuring embedded processors and their instruction set architectures are secure and resistant to fault injection attacks.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:02IP1-3, 807SECURING CRYPTOGRAPHIC CIRCUITS BY EXPLOITING IMPLEMENTATION DIVERSITY AND PARTIAL RECONFIGURATION ON FPGAS
Speaker:
Benjamin Hettwer, Robert Bosch GmbH, DE
Authors:
Benjamin Hettwer1, Johannes Petersen2, Stefan Gehrer1, Heike Neumann2 and Tim Güneysu3
1Robert Bosch GmbH, Corporate Sector Research, DE; 2Hamburg University of Applied Sciences, DE; 3Horst Görtz Institute for IT Security, Ruhr-University Bochum, DE
Abstract
Adaptive and reconfigurable systems such as Field Programmable Gate Arrays (FPGAs) play an integral part of many complex embedded platforms. This implies the capability to perform runtime changes to hardware circuits on demand. In this work, we make use of this feature to propose a novel countermeasure against physical attacks of cryptographic implementations. In particular, we leverage exploration of the implementation space on FPGAs to create various circuits with different hardware layouts from a single design of the Advanced Encryption Standard (AES), that are dynamically exchanged during device operation. We provide evidence from practical experiments based on a modern Xilinx ZYNQ UltraScale+ FPGA that our approach increases the resistance against physical attacks by at least factor two. Furthermore, the genericness of our approach allows an easy adaption to other algorithms and combination with other countermeasures

Download Paper (PDF; Only available from the DATE venue WiFi)
13:03IP1-4, 367STT-ANGIE: ASYNCHRONOUS TRUE RANDOM NUMBER GENERATOR USING STT-MTJ
Speaker:
Ben Perach, Faculty of Electrical Engineering. Technion - Israel Institute of Technology, IL
Authors:
Ben Perach and Shahar Kvatinsky, Technion, IL
Abstract
The Spin Transfer Torque Magnetic Tunnel Junction (STT-MTJ) is an emerging memory technology whose interesting stochastic behavior might benefit security applications. In this paper, we leverage this stochastic behavior to construct a true random number generator (TRNG), the basic module in the process of encryption key generation. Our proposed TRNG operates asynchronously and thus can use small and fast STT MTJ devices. As such, it can be embedded in low-power and low-frequency devices without loss of entropy. We evaluate the proposed TRNG using a numerical simulation, solving the Landau-Lifshitz-Gilbert (LLG) equation system of the STTMTJ devices. Design considerations, attack analysis, and process variation are discussed and evaluated. The evaluation shows that our solution is robust to process variation, achieving a Shannon-entropy generating rate between 99.7Mbps and 127.8Mbps for 90% of the instances.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


2.3 Special Session: Circuit design and design automation for flexible electronics

Date: Tuesday, March 26, 2019
Time: 11:30 - 13:00
Location / Room: Room 3

Organisers:
Jim Huang, Hewlett Packard Labs, US, Contact Tsung-Ching Huang
Mehdi Tahoori, Karlsruhe Institute of Technology (KIT), DE, Contact Mehdi Tahoori

Chair:
Jamil Kawa, Synopsys, US, Contact Jamil Kawa

Flexible electronics is an emerging and fast growing field which can be used in many demanding and emerging application domains such as wearables, smart sensors, and Internet of Things (IoT). There are several technologies, processes and paradigms which can be used to design and fabricate flexible circuits. Unlike traditional computing and electronics domain which is mostly driven by performance characteristics, flexible electronics are mainly associated with low fabrication costs (as they are used even in consumer market) and low energy consumption (as they could be used in energy-harvested systems). While the main advances in this field is mainly focused on fabrication and process aspects, the design and in particular design automation flow, had limited exposure. The purpose of this special session is to bring to the attention of design automation community on some of the key advances in the field of flexible electronics as well as some of the design (automation) aspects, which can hopefully inspire some further attention by design automation community to this fast-growing field.

TimeLabelPresentation Title
Authors
11:302.3.1DUAL-GATE SELF-ALIGNED A-INGAZNO TRANSISTOR MODEL FOR FLEXIBLE CIRCUIT APPLICATIONS
Speaker:
Kris Myny, imec, BE
Authors:
Florian De Roose, Hikmet Çeliker, Jan Genoe, Wim Dehaene and Kris Myny, imec, BE
Abstract
This work elaborates on an amorphous Indium- Gallium-Zinc Oxide thin-film transistor model for a dual-gate self-aligned transistor configuration, enabling the design and realization of complex integrated circuits. The model originates from a mobility-enhanced transistor behavior model, whereby the additional backgate impacts key parameters, such as threshold voltage, mobility and subthreshold slope. The model has been validated for the full design flow and compared to measurement results, from single transistors, to inverters, ring oscillators and RFID transponder chips.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:522.3.2PREDICTIVE MODELING AND DESIGN AUTOMATION OF INORGANIC PRINTED ELECTRONICS
Speaker:
Jasmin Aghassi-Hagmann, Offenburg University of Applied Sciences / Institute of Nanotechnology at Karlsruhe Institute of Technology, DE
Authors:
Farhan Rasheed1, Michael Hefenbrock1, Rajendra Bishnoi1, Michael Beigl1, Jasmin Aghassi-Hagmann2 and Mehdi B. Tahoori1
1Karlsruhe Institute of Technology (KIT), DE; 2Offenburg University of Applied Sciences / Institute of Nanotechnology at Karlsruhe Institute of Technology), DE
Abstract
Printed Electronics is perceived to have a major impact in the fields of smart sensors, Internet of Things and wearables. Especially low power printed technologies such as electrolyte gated field effect transistors (EGFETs) using solution- processed inorganic materials and inkjet printing are very promising in such application domains. In this paper, we discuss a modeling approach to describe the variations of printed devices. Incorporating these models and design flows into our previously developed printed design system allows for robust circuit design. Additionally, we propose a reliability-aware routing solution for printed electronics technology based on the technology constraints in printing crossovers. The proposed methodology was validated on multiple benchmark circuits and can be easily integrated with the design automation tools-set.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:142.3.3PROCESS DESIGN KIT AND DESIGN AUTOMATION FOR FLEXIBLE HYBRID ELECTRONICS
Speaker:
Tsung-Ching Jim Huang, Hewlett-Packard Labs, US
Authors:
Tsung-Ching Jim Huang1, Ting Lei2, Leilai Shao3, Sridhar Sivapurapu4, Madhavan Swaminathan4, Sicheng Li5, Zhenan Bao2, Kwang-Ting Cheng3 and Raymond Beausoleil5
1Hewlett-Packard Labs, US; 2Department of Chemical Engineering, Stanford University, US; 3Department of Electrical and Computer Engineering, University of California, US; 4School of Electrical and Computer Engineering, Georgia Institute of Technology, US; 5Hewlett Packard Labs, Palo Alto, US
Abstract
High-performance low-cost flexible hybrid electron- ics (FHE) are desirable for internet of things (IoT). Carbon- nanotube (CNT) thin-film transistor (TFT) is a promising candidate for high-performance FHE because of its high carrier mo- bility (25cm2/V.s), superior mechanical flexibility/stretchability, and material compatibility with low-cost printing and solution processes. Flexible sensors and peripheral CNT-TFT circuits, such as decoders, drivers and sense amplifiers, can be printed and integrated with thinned (<50μm) silicon chips on soft, thin, and flexible substrates for appealing product designs and form factors. Here we report: 1) process design kit (PDK) to enable FHE design automation, from device modeling to physical verification, and 2) open-source and solution-process proven intellectual property (IP) blocks, including Pseudo-CMOS digital logic and analog amplifiers on flexible substrates, as shown in Figure 1. The proposed FHE-PDK and circuit design IP are fully compatible with silicon design EDA tools, and can be readily used for co-design with both CNT-TFT circuits and silicon chips.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:362.3.4CIRCUIT DESIGN AND DESIGN AUTOMATION FOR PRINTED ELECTRONICS
Speaker:
Eugenio Cantatore, Eindhoven University of Technology, NL
Authors:
M. Fattori1, J.A. Fijn1, L. Hu1, Eugenio Cantatore1, Fabrizion Torricelli2 and Micael Charbonneau3
1Eindhoven University of Technology, NL; 2University of Brescia, IT; 3CEA-LITEN, FR
Abstract
A Process Design Kit (PDK) for gravure-printed Organic Thin-Film Transistor (OTFT) technology is presented in this paper. The transistor model developed in the PDK enables an accurate prediction of static, dynamic and noise performance of complex organic circuits. The developed Electronic Design Automation (EDA) tools exploit an adaptive strategy to improve the versatility of the PDK in relation to the advancements of the manufacturing process. The design and experimental characterization of a Charge Sensitive Amplifier is used to demonstrate the effectiveness of the PDK. The availability of a versatile and accurate Process Design Kit is expected to enable a reliable design process for complex circuits based on an organic printed technology.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


2.4 Temperature and Variability Driven Modeling and Runtime Management

Date: Tuesday, March 26, 2019
Time: 11:30 - 13:00
Location / Room: Room 4

Chair:
Marco Domenico Santambrogio, Polytechnic University of Milan, IT, Contact Marco Domenico Santambrogio

Co-Chair:
Ronald Ronald Dreslinski Jr, University of Michigan, US, Contact Ronald G. Dreslinski

Thermal modelling, hot spot prediction and optimization, and managing temperature variability during run-time are key questions to be answered during system design. This session consists of four regular papers and two IP papers that address these challenges using novel techniques ranging from manufacturing and hardware, all the way up to computational models. Considerations such as lithographic variations, cooling system design, run-time adaptivity are discussed in these papers.

TimeLabelPresentation Title
Authors
11:302.4.1HOT SPOT IDENTIFICATION AND SYSTEM PARAMETERIZED THERMAL MODELING FOR MULTI-CORE PROCESSORS THROUGH INFRARED THERMAL IMAGING
Speaker:
Sheldon Tan, University of California, Riverside, US
Authors:
Sheriff Sadiqbatcha1, Hengyang Zhao1, Hussam Amrouch2, Joerg Henkel2 and Sheldon Tan1
1University of California, Riverside, US; 2Karlsruhe Institute of Technology, DE
Abstract
Accurate thermal models suitable for system level dynamic thermal, power and reliability regulation and management are vital for many commercial multi-core processors. However, developing such accurate thermal models and identifying the related thermal-power relevant spatial locations for commercial processors is a challenging task due to the lack of information and available tools. Existing tools such as HotSpot-like thermal models may suffer from inaccuracy or inefficiency for online applications, primarily because most rely on parameters that cannot be precisely quantified, such as power-traces, while others are numerical methods not suitable for runtime use. In this work, we propose a novel approach to automatically detecting the major heat-sources on a commercial multi-core microprocessor using an infrared thermal imaging setup. Our approach involves a number of steps including 2D discrete cosine transformation filter for noise reduction on the measured thermal maps, and Laplacian transformation followed by K-mean clustering for heat-source identification. Since the identified heat-sources are the thermally vulnerable areas of the die, we propose a novel approach to deriving a thermal model capable of predicting their temperatures during runtime. We apply Long-Short-Term-Memory (LSTM) networks to build a dynamic thermal model which uses system-level variables such as chip frequency, voltage and instruction count as inputs. The model is trained and tested exclusively using measured thermal data from a commercial multi-core processor. Experimental results show that the proposed thermal model achieves very high accuracy (root-mean-square-error: 2.04C to 2.57C) in predicting the temperature of all the identified heat-sources on the chip.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.4.2LITHO-GPA: GAUSSIAN PROCESS ASSURANCE FOR LITHOGRAPHY HOTSPOT DETECTION
Speaker:
David Z. Pan, University of Texas, Austin, US
Authors:
Wei Ye, Mohamed Baker Alawieh, Meng Li, Yibo Lin and David Z. Pan, University of Texas, Austin, US
Abstract
Lithography hotspot detection is one of the fundamental steps in physical verification. Due to the increasingly complicated design patterns, early and quick feedback for lithography hotspots is desired to guide design closure in early stages. Machine learning approaches have been successfully applied to hotspot detection while demonstrating a remarkable capability of generalization to unseen hotspot patterns. However, most of the proposed machine learning approaches are not yet able to answer one critical question: how much a hotspot predicted from a trained model can be trusted? In this work, we present Litho-GPA, a lithography hotspot detection framework, with Gaussian Process assurance to provide confidence in each prediction. The framework also incorporates a data selection scheme with a sequence of weak classifiers to sample representative data and eventually reduce the amount of training data and lithography simulations needed. Experimental results demonstrate that our Litho-GPA is able to achieve the state-of-the-art accuracy while obtaining on average 28% reduction in false alarms.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.4.3PINT: POLYNOMIAL IN TEMPERATURE DECODE WEIGHTS IN A NEUROMORPHIC ARCHITECTURE
Speaker:
Scott Reid, Stanford University, US
Authors:
Scott Reid, Antonio Montoya and Kwabena Boahen, Stanford University, US
Abstract
We present Polynomial in Temperature (PinT) decode weights, a novel approach to approximating functions with an ensemble of silicon neurons that increases thermal robustness. In mixed-signal neuromorphics, computing accurately across a wide range of temperatures is challenging because of individual silicon neurons' thermal sensitivity. To compensate for the resulting changes in the neuron's tuning-curves in the PinT framework, weights change continuously as a polynomial function of temperature. We validate PinT across a 38 °C range by applying it to tuning curves measured for ensembles of 64 to 1936 neurons on Braindrop, a mixed-signal neuromorphic chip fabricated in 28-nm FDSOI CMOS. LinT, the Linear in Temperature version of PinT, reduces error by a small margin on test data, relative to an ensemble with temperature-independent weights. LinT and higher-order models show much greater promise on training data, suggesting that performance can be further improved. When implemented on-chip, LinT's performance is very similar to the performance with temperature-independent decode weights. SpLinT and SpLSAT, the Sparse variants of LinT and LSAT, are promising avenues for efficiently reducing error. In the SpLSAT model, up to 90% of neurons on chip can be deactivated while maintaining the same function-approximation error.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:452.4.4ENHANCING TWO-PHASE COOLING EFFICIENCY THROUGH THERMAL-AWARE WORKLOAD MAPPING FOR POWER-HUNGRY SERVERS
Speaker:
Arman Iranfar, EPFL, CH
Authors:
Arman Iranfar, Ali Pahlevan, Marina Zapater and David Atienza, EPFL, CH
Abstract
The power density and, consequently, power hungriness of server processors is growing by the day. Traditional air cooling systems fail to cope with such high heat densities, whereas single-phase liquid-cooling still requires high mass flow-rate, high pumping power, and large facility size. On the contrary, in a micro-scale gravity-driven thermosyphon attached on top of a processor, the refrigerant, absorbing the heat, turns into a two-phase mixture. The vapor-liquid mixture exchanges heat with a coolant at the condenser side, turns back to the liquid, and descends thanks to gravity, eliminating the need for pumping power. However, similar to other cooling technologies, thermosyphon efficiency can considerably vary with respect to workload performance requirements and thermal profile, in addition to the platform features, such as packaging and die floorplan. In this work, we first address the workload- and platform-aware design of a two-phase thermosyphon. Then, we propose a thermal-aware workload mapping strategy considering the potential and limitations of a two-phase thermosyphon to further minimize hot spots and spatial thermal gradients. Our experiments, performed on an 8-core Intel Xeon E5 CPU reveal, on average, up to 10 °C reduction in thermal hot spots, and 45% reduction in the maximum spatial thermal gradient on the die. Moreover, our design and mapping strategy are able to decrease the chiller cooling power at least 45%.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00IP1-5, 711ADAPTIVE TRANSIENT LEAKAGE-AWARE LINEARISED MODEL FOR THERMAL ANALYSIS OF 3-D ICS
Speaker:
Milan Mihajlovic, University of Manchester, GB
Authors:
Chao Zhang, Milan Mihajlovic and Vasilis Pavlidis, The University of Manchester, GB
Abstract
Physics-based models for thermal simulation that involve numerical solution of the heat equation are well placed to accurately capture the heterogeneity of materials and structures in modern 3-D integrated circuits (ICs). The introduction of non-linear effects in thermal coefficients and leakage power improves significantly the accuracy of thermal models. However, this non-linearity increases significantly the complexity and computational time of the analysis. In this paper, we introduce a linearised thermal model by demonstrating that weak temperature dependence of the specific heat and the thermal conductivity of silicon-based materials has only minor effect to computed temperature profiles. Thus, these parameters can be considered constant in working temperature ranges of modern ICs. The non-linearity in leakage power is approximated by a piecewise linear least square fit and the resulting model is linearised by exact Newton's method contrary to previous works that employ either simple iterative or inexact Newton's method. The method is implemented in the context of transient thermal analysis with adaptive time step selection, where we demonstrate that it is essential to apply Newton corrections to obtain the right time step size selection. The resulting method is up to 2x faster than a full non-linear method, typically introducing a global relative error of less than 1%.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:01IP1-6, 363FASTCOOL: LEAKAGE AWARE DYNAMIC THERMAL MANAGEMENT OF 3D MEMORIES
Speaker:
Lokesh Siddhu, IIT Delhi, IN
Authors:
Lokesh Siddhu1 and Preeti Ranjan Panda2
1Indian Institute of Technology, Delhi, IN; 2IIT Delhi, IN
Abstract
3D memory systems offer several advantages in terms of area, bandwidth, and energy efficiency. However, thermal issues arising out of higher power densities have limited their widespread use. While prior works have looked at reducing dynamic power through reduced memory accesses, in these memories, both leakage and dynamic power consumption are comparable. Furthermore, as the temperature rises the leakage power increases, creating a thermal-leakage loop. We study the impact of leakage power on 3D memory temperature and propose turning OFF hot channels to meet thermal constraints. Data is migrated to a 2D memory before closing a 3D channel. We introduce an analytical model to assess the 2D memory delay and use the model to guide data migration decisions. Our experiments show that the proposed optimization improves performance by 27% on an average (up to 66%) over state-of-the-art strategies.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


2.5 Solutions for reliability and security of mixed-signal circuits

Date: Tuesday, March 26, 2019
Time: 11:30 - 13:00
Location / Room: Room 5

Chair:
Georges Gielen, KU Leuven, BE, Contact Georges Gielen

Co-Chair:
Manuel Barragan, TIMA, FR, Contact Manuel Barragan

The session presents techniques to analyse and optimize analog/mixed-signal circuits towards high reliability and security, addressing IR-aware routing, lifetime-aware optimization as well as securing mixed-signal circuits via logic locking.

TimeLabelPresentation Title
Authors
11:302.5.1IR-AWARE POWER NET ROUTING FOR MULTI-VOLTAGE MIXED-SIGNAL DESIGN
Speaker:
Mark Po-Hung Lin, National Chung Cheng University, TG
Authors:
Shuo-Hui Wang, Yen-Yu Su, Guan-Hong Liou and Mark Po-Hung Lin, National Chung Cheng University, TW
Abstract
Modern mixed-signal design usually contains multiple power signals with different supply voltages driving different sets of mixed-signal circuit blocks. As the process technology advances to nanometer era, IR drop becomes very significant, which may have great impact on circuit performance and reliability. Insufficient power supply to a circuit block will lead to performance degradation or even functional failure. Although such IR-drop problem can be minimized by widening metal wires or applying mesh routing structures of the power network, extra metal usage of those power nets with different supply voltages will significantly increase both chip area and cost. This paper presents a new IR-aware routing method to route multiple power nets simultaneously with the considerations of routing congestion, routing tree splitting, wire tapering, and metal layer optimization. Experimental results show that the presented method can effectively reduce total metal usage and satisfy IR-drop constraints.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.5.2GENERATION OF LIFETIME-AWARE PARETO-OPTIMAL FRONTS USING A STOCHASTIC RELIABILITY SIMULATOR
Authors:
Antonio Toro-Frias1, Pablo Saraza-Canflanca1, Fabio Passos1, Pablo Martin-Lloret1, Rafael Castro-Lopez1, Elisenda Roca1, Javier Martin-Martinez2, Rosana Rodriguez2, Montserrat Nafria2 and Francisco Vidal Fernandez1
1Instituto de Microelectrónica de Sevilla, ES; 2Universitat Autonoma de Barcelona, ES
Abstract
Process variability and time-dependent variability have become major concerns in deeply-scaled technologies. Two of the most important time-dependent variability phenomena are Bias Temperature Instability (BTI) and Hot-Carrier Injection (HCI), which can critically shorten the lifetime of circuits. Both BTI and HCI reveal a discrete and stochastic behavior in the nanometer scale, and, while process variability has been extensively treated, there is a lack of design methodologies that address the joint impact of these two phenomena on circuits. In this work, an automated and time-efficient design methodology that takes into account both process and time-dependent variability is presented. This methodology is based on the utilization of lifetime-aware Pareto-Optimal Fronts (POFs). The POFs are generated with a multi-objective optimization algorithm linked to a stochastic simulator. Both the optimization algorithm and the simulator have been specifically tailored to reduce the computational cost of the accurate evaluation of the impact on a circuit of both sources of variability.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.5.3MIXLOCK: SECURING MIXED-SIGNAL CIRCUITS VIA LOGIC LOCKING
Speaker:
Julian Leonhard, Sorbonne Université, CNRS, LIP6, FR
Authors:
Julian Leonhard1, Muhammad Yasin2, Shadi Turk3, Mohammed Thari Nabeel4, Marie-Minerve Louërat1, Roselyne Chotin-Avot1, Hassan Aboushady1, Ozgur Sinanoglu4 and Haralampos-G. Stratigopoulos1
1Sorbonne Université, CNRS, LIP6, FR; 2New York University, US; 3Seamless Waves, FR; 4New York University Abu Dhabi, AE
Abstract
In this paper, we propose a hardware security methodology for mixed-signal Integrated Circuits (ICs). The proposed methodology can be used as a countermeasure for IC piracy, including counterfeiting and reverse engineering. It relies on logic locking of the digital section of the mixed-signal IC, such that unless the correct key is provided, the mixed-signal performance will be pushed outside of the acceptable specification range. We employ a state-of-the-art logic locking technique, called Stripped Functionality Logic Locking (SFLL). We show that strong security levels are achieved in both mixed-signal and digital domains. In addition, the proposed methodology presents several appealing properties. It is non-intrusive for the analog section, it incurs reasonable area and power overhead, it can be fully automated, and it is virtually applicable to a wide range of mixed-signal ICs. We demonstrate it on a Sigma-Delta Analog-to-Digital Converter (ADC).

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00IP1-7, 364ON THE USE OF CAUSAL FEATURE SELECTION IN THE CONTEXT OF MACHINE-LEARNING INDIRECT TEST
Speaker:
Manuel Barragán, TIMA laboraory, FR
Authors:
Manuel Barragan1, Gildas Leger2, Florent Cilici3, Estelle Lauga-Larroze4, Sylvain Bourdel4 and Salvador Mir3
1TIMA Laboratory, FR; 2Instituto de Microelectronica de Sevilla, IMSE-CNM, (CSIC - Universidad de Sevilla), ES; 3TIMA, FR; 4RFICLab, FR
Abstract
The test of analog, mixed-signal and RF (AMS-RF) circuits is still considered as a matter of human creativity, and although many attempts have been made towards their automation, no accepted and complete solution is yet available. Indeed, capturing the design knowledge of an experienced analog designer is one of the key challenges faced by the Electronic Design Automation (EDA) community. In this paper we explore the use of causal inference tools in the context of AMS-RF design and test with the goal of defining a methodology for uncovering the root causes of performance variation in these systems. We believe that such an analysis can be a promising first step for future EDA algorithms for AMS-RF systems.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


2.6 Computational and resource-efficiency in quantum and approximate computing

Date: Tuesday, March 26, 2019
Time: 11:30 - 13:00
Location / Room: Room 6

Chair:
Martin Trefzer, University of York, GB, Contact Martin Albrecht Trefzer

Co-Chair:
Lukas Sekanina, Brno University of Technology, CZ, Contact Lukas Sekanina

Achieving computational and resource-efficiency is often promised by emerging technologies. This session addresses various aspects of efficiency in the context of quantum and approximate computing. Simulation of quantum computations is very computationally expensive. The first paper shows how a smart utilization of decision diagrams can significantly accelerate this process. Approximate computing is often advocated as an approach enabling to build more resource-efficient computing systems. The second paper of this session deals with an automated circuit approximation method capable of exploiting data distributions observable in the target application. The third paper presents a new application for approximate circuits in the area of wireless sensor networks. Finally, an efficient approximate random dropout technique for training acceleration of neural networks running on GPU is proposed in the fourth paper.

TimeLabelPresentation Title
Authors
11:302.6.1MATRIX-VECTOR VS. MATRIX-MATRIX MULTIPLICATION: POTENTIAL IN DD-BASED SIMULATION OF QUANTUM COMPUTATIONS
Speaker:
Alwin Zulehner, Johannes Kepler University Linz, AT
Authors:
Alwin Zulehner and Robert Wille, Johannes Kepler University Linz, AT
Abstract
The simulation of quantum computations basically boils down to the multiplication of vectors (describing the respective quantum state) and matrices (describing the respective quantum operations). However, since those matrices/vectors are exponential in size, most of the existing solutions (relying on arrays for their representation) are either limited to rather small quantum systems or require substantial hardware resources. To overcome these shortcomings, solutions based on decision diagrams (DD-based simulation) have been proposed recently. They exploit redundancies in quantum states as well as matrices and, by this, allow for a compact representation and manipulation. This offers further (unexpected) potential. In fact, simulation has been conducted thus far by applying one operation (i.e. one matrix-vector multiplication) after another. Besides that, there is the possibility to combine several operations (requiring a matrix-matrix multiplication) before applying them to a vector. But since, from a theoretical perspective, matrix-vector multiplication is significantly cheaper than matrix-matrix multiplication, the potential of this direction was rather limited thus far. In this work, we show that this changes when decision diagrams are employed. In fact, their more compact representation frequently makes matrix-matrix multiplication more beneficial—leading to substantial improvements by exploiting the combination of operations. Experimental results confirm the proposed strategies for combining operations lead to speed-ups of several factors or—when additionally exploiting further knowledge about the considered instance—even of several orders of magnitudes.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.6.2AUTOMATED CIRCUIT APPROXIMATION METHOD DRIVEN BY DATA DISTRIBUTION
Speaker:
Lukas Sekanina, Brno University of Technology, CZ
Authors:
Zdenek Vasicek, Vojtech Mrazek and Lukas Sekanina, Brno University of Technology, CZ
Abstract
We propose an application-tailored data-driven fully automated method for functional approximation of combinational circuits. We demonstrate how an application-level error metric such as the classification accuracy can be translated to a component-level error metric needed for an efficient and fast search in the space of approximate low-level components that are used in the application. This is possible by employing a weighted mean error distance (WMED) metric for steering the circuit approximation process which is conducted by means of genetic programming. WMED introduces a set of weights (calculated from the data distribution measured on a selected signal in a given application) determining the importance of each input vector for the approximation process. The method is evaluated using synthetic benchmarks and application-specific approximate MAC (multiply-and-accumulate) units that are designed to provide the best trade-offs between the classification accuracy and power consumption of two image classifiers based on neural networks.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.6.3TRADING DIGITAL ACCURACY FOR POWER IN AN RSSI COMPUTATION OF A SENSOR NETWORK TRANSCEIVER
Speaker:
Paul Detterer, Eindhoven University of Technology, NL
Authors:
Paul Detterer, Cumhur Erdin, Majid Nabi, Jose Pineda de Gyvez, Twan Basten and Hailong Jiao, Eindhoven University of Technology, NL
Abstract
Emerging Wireless Sensor Network (WSN) applications require more energy efficiency in the wireless transceivers, though the conventional energy efficient design techniques are reaching their limits. To handle the rigid power and energy constraints in the Digital BaseBand (DBB) of WSNs, we introduce approximate computing in DBB processing as a new power reduction method. The Received Signal Strength Indicator (RSSI) computation is a key element in DBB processing. We evaluate the trade-off in RSSI computation between Quality-of-Service (QoS) and power consumption through circuit-level approximation. RSSI elements are approximated in such a way that error propagation is minimized. In an industrial 40-nm CMOS technology, substantial power savings are achieved with limited accuracy loss, both at circuit level and at network level in a low-power listening WSN scenario.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:452.6.4APPROXIMATE RANDOM DROPOUT FOR DNN TRAINING ACCELERATION IN GPGPU
Speaker:
Li Jiang, Shanghai Jiao Tong University, CN
Authors:
Zhuoran Song, Ru Wang, Dongyu Ru, Zhenghao Peng, Hongru Huang, Hai Zhao, Xiaoyao Liang and Li Jiang, Shanghai Jiao Tong University, CN
Abstract
The training phases of Deep neural network (DNN) consumes enormous processing time and energy. Compression techniques utilizing the sparsity of DNNs can effectively accelerate the inference phase of DNNs. However, it can be hardly used in the training phase because the training phase involves dense matrix-multiplication using General Purpose Computation on Graphics Processors (GPGPU), which endorse regular and structural data layout. In this paper, we propose the Approximate Random Dropout that replaces the conventional random dropout of neurons and synapses with a regular and online generated patterns to eliminate the unnecessary computation and data access. We develop a SGD-based Search Algorithm that producing the distribution of dropout patterns to compensate the potential accuracy loss. We prove our approach is statistically equivalent to the previous dropout method. Experiments results on multilayer perceptron (MLP) and long short-term memory (LSTM) using well-known benchmarks show that the speedup rate brought by the proposed Approximate Random Dropout ranges from 1.18-2.16 (1.24-1.85) when dropout rate is 0.3-0.7 on MLP (LSTM) with negligible accuracy drop.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00IP1-8, 229ACCURACY AND COMPACTNESS IN DECISION DIAGRAMS FOR QUANTUM COMPUTATION
Speaker:
Alwin Zulehner, Johannes Kepler University Linz, AT
Authors:
Alwin Zulehner1, Philipp Niemann2, Rolf Drechsler3 and Robert Wille1
1Johannes Kepler University Linz, AT; 2Cyber-Physical Systems, DFKI GmbH, DE; 3University of Bremen, DE
Abstract
Quantum computation is a promising research field since it allows to conduct certain tasks exponentially faster than on conventional machines. As in the conventional domain, decision diagrams are heavily used in different design tasks for quantum computation like synthesis, verification, or simulation. However, unlike decision diagrams for the conventional domain, decision diagrams for quantum computation as of now suffer from a trade-off between accuracy and compactness that requires parameter fine-tuning on a case-by-case basis. In this work, we—for the first time—describe and evaluate the effects of this trade-off. Moreover, we propose an alternative approach that utilizes an algebraic representation of the occurring irrational numbers and outline how this can be incorporated in a decision diagram in order to overcome this trade-off.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:01IP1-9, 458ONE METHOD - ALL ERROR-METRICS: A THREE-STAGE APPROACH FOR ERROR-METRIC EVALUATION IN APPROXIMATE
Speaker:
Saman Fröhlich, University of Bremen/DFKI GmbH, DE
Authors:
Saman Fröhlich1, Daniel Grosse2 and Rolf Drechsler2
1University of Bremen/DFKI GmbH, DE; 2University of Bremen, DE
Abstract
Approximate Computing is a design paradigm that makes use of the error tolerance inherited by many applications, such as machine learning, media processing and data mining. The goal of Approximate Computing is to trade off accuracy for performance in terms of computation time, energy consumption and/or hardware complexity. In the field of circuit design for Approximate Computing, error-metrics are used to express the degree of approximation. Evaluating these error-metrics is a key challenge. Several approaches exist, however, to this day not all relevant metrics can be evaluated with formal methods. Recently, Symbolic Computer Algebra (SCA) has been used to evaluate error-metrics during approximate hardware generation. In this paper, we generalize the idea to use SCA and propose a methodology which is suitable for formal evaluation of all established error-metrics. This approach can be divided into three-stages: (i) Determine the remainder of the AC circuit wrt.the specification using SCA, (ii) build an Algebraic Decision Diagram (ADD) to represent the remainder and (iii) evaluate each error-metric by a tailored ADD traversal algorithm. Besides being the first to propose a closed formal method for evaluation of all relevant error-metrics, we are the first to ever propose formal algorithms for the evaluation of the worst-case-relative and the average-case-relative error-metrics. In the experiments, we apply our algorithms to a large and well-known benchmark set.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:02IP1-10, 657REVERSIBLE PEBBLING GAME FOR QUANTUM MEMORY MANAGEMENT
Speaker:
Giulia Meuli, EPFL, CH
Authors:
Giulia Meuli1, Mathias Soeken1, Martin Roetteler2, Nikolaj Bjorner2 and Giovanni De Micheli1
1EPFL, CH; 2Microsoft, US
Abstract
Quantum memory management is becoming a pressing problem, especially given the recent research effort to develop new and more complex quantum algorithms. The only existing automatic method for quantum states clean-up relies on the availability of many extra resources. In this work, we propose an automatic tool for quantum memory management. We show how this problem exactly matches the reversible pebbling game. Based on that, we develop a SAT-based algorithm that returns a valid clean-up strategy, taking the limitations of the quantum hardware into account. The developed tool empowers the designer with the flexibility required to explore the trade-off between memory resources and number of operations. We present two show-cases to prove the validity of our approach. First, we apply the algorithm to straight-line programs, widely used in cryptographic applications. Second, we perform a comparison with the existing approach, showing an average improvement of 52.77%.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


2.7 Analysis and optimization techniques for neural networks

Date: Tuesday, March 26, 2019
Time: 11:30 - 13:00
Location / Room: Room 7

Chair:
Mohamed Sabry, NTU, SG, Contact Mohamed M. Sabry

Co-Chair:
Francesco Setti, Università di Verona, IT, Contact Francesco Setti

This session presents three papers with new approaches to characterize the neural network behavior on edge devices in order to optimize their performance and energy consumption according to the target application.

TimeLabelPresentation Title
Authors
11:302.7.1LOW-COMPLEXITY DYNAMIC CHANNEL SCALING OF NOISE-RESILIENT CNN FOR INTELLIGENT EDGE DEVICES
Speaker:
Younghoon Byun, Pohang University of Science and Technology (POSTECH), KR
Authors:
Younghoon Byun, Minho Ha, Jeonghun Kim, Sunggu Lee and Youngjoo Lee, Pohang University of Science and Technology (POSTECH), KR
Abstract
In this paper, we present a novel channel scaling scheme for convolutional neural networks (CNNs), which can improve the recognition accuracy for the practical distorted images without increasing the network complexity. During the training phase, the proposed work first prepares multiple filters under the same CNN architecture by taking account of different noise models and strengths. We then newly introduce an FFT-based noise classifier, which determines the noise property in the received input image by calculating the partial sum of the frequency-domain values. Based on the detected noise class, we dynamically change the filters of each CNN layer to provide the dedicated recognition. Furthermore, we propose a channel scaling technique to reduce the number of active filter parameters if the input data is relatively clean. Experimental results show that the proposed dynamic channel scaling reduces the computational complexity as well as the energy consumption, still providing the acceptable accuracy for intelligent edge devices.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.7.2DATA LOCALITY OPTIMIZATION OF DEPTHWISE SEPARABLE CONVOLUTIONS FOR CNN INFERENCE ACCELERATORS
Speaker:
Hao-Ning Wu, National Tsing Hua University, TW
Authors:
Hao-Ning Wu and Chih-Tsun Huang, National Tsing Hua University, TW
Abstract
This paper presents a novel framework to maximize the data reusability in the depthwise separable convolutional layers with the Scan execution order of the tiled matrix multiplications. In addition, the fusion scheme across layers is proposed to minimize the data transfer of the intermediate activations, improving both the latency and energy consumption from the external memory accesses. The experimental results are validated against DRAMSim2 for the accurate timing and energy estimation. With a 64K-entry on-chip buffer, our approach can achieve the DRAM energy reduction of 67% on MobileNet V2.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.7.3A BINARY LEARNING FRAMEWORK FOR HYPERDIMENSIONAL COMPUTING
Speaker:
Mohsen Imani, University of California, San Diego, US
Authors:
Mohsen Imani1, John Messerly1, Fan Wu2, Wang Pi3 and Tajana Rosing1
1University of California San Diego, US; 2University of California Riverside, US; 3Peking University, CN
Abstract
Brain-inspired Hyperdimensional (HD) computing is a computing paradigm emulating a neuron's activity in high-dimensional space. In practice, HD first encodes all data points to high-dimensional vectors, called hypervectors, and then performs the classification task in an efficient way using a well-defined set of operations. In order to provide acceptable classification accuracy, the current HD computing algorithms need to map data points to hypervectors with non-binary elements. However, working with non-binary vectors significantly increases the HD computation cost and the amount of memory requirement for both training and inference. This makes HD computing less desirable for embedded devices which often have limited resources and battery. In this paper, we propose BinHD, a novel binarization framework which enables HD computing to be trained and tested using binarized hypervectors. BinHD encodes data points to binarized hypervectors and provides a framework which enables HD to perform the training task with significantly low resources and memory footprint. In inference, BinHD binarizes the model and simplifies the costly Cosine similarity used in existing HD computing algorithms to a hardware-friendly Hamming distance metric. In addition, for the first time, BinHD introduces the concept of learning rate in HD computing which gives an extra knob to the HD to control the training efficiency and accuracy. We accordingly design a digital hardware to accelerate BinHD computation. Our evaluations on four practical classification applications show that BinHD in training (inference) can achieve 12.4× and 6.3× (13.8× and 9.9×) energy efficiency and speedup as compared to the state-of-the-art HD computing algorithm while providing the similar classification accuracy.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00IP1-11, 247TYPECNN: CNN DEVELOPMENT FRAMEWORK WITH FLEXIBLE DATA TYPES
Speaker:
Lukas Sekanina, Brno University of Technology, CZ
Authors:
Petr Rek and Lukas Sekanina, Brno University of Technology, CZ
Abstract
The rapid progress in artificial intelligence technologies based on deep and convolutional neural networks (CNN) has led to an enormous interest in efficient implementations of neural networks in embedded devices and hardware. We present a new software framework for the development of (approximate) convolutional neural networks in which the user can define and use various data types for forward (inference) procedure, backward (training) procedure and weights. Moreover, non-standard arithmetic operations such as approximate multipliers can easily be integrated into the CNN under design. This flexibility enables to analyze the impact of chosen data types and non-standard arithmetic operations on CNN training and inference efficiency. The framework was implemented in C++ and evaluated using several case studies.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:01IP1-12, 963GUARANTEED COMPRESSION RATE FOR ACTIVATIONS IN CNNS USING A FREQUENCY PRUNING APPROACH
Speaker:
Sebatian Vogel, Robert Bosch GmbH, DE
Authors:
Sebastian Vogel1, Christoph Schorn1, Andre Guntoro1 and Gerd Ascheid2
1Robert Bosch GmbH, DE; 2RWTH Aachen University, DE
Abstract
Convolutional Neural Networks have become state of the art for many computer vision tasks. However, the size of Neural Networks prevents their application in resource constrained systems. In this work, we present a lossy compression technique for intermediate results of Convolutional Neural Networks. The proposed method offers guaranteed compression rates and additionally adapts to performance requirements. Our experiments with networks for classification and semantic segmentation show, that our method outperforms state-of-the-art compression techniques used in CNN accelerators.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:02IP1-13, 290RUNTIME MONITORING NEURON ACTIVATION PATTERNS
Speaker:
Chih-Hong Cheng, fortiss, DE
Authors:
Chih-Hong Cheng1, Georg Nührenberg1 and Hirotoshi Yasuoka2
1fortiss - Landesforschungsinstitut des Freistaats Bayern, DE; 2DENSO Corporation, JP
Abstract
For using neural networks in safety critical domains such as automated driving, it is important to know if a decision made by a neural network is supported by prior similarities in training. We propose runtime neuron activation pattern monitoring - after the standard training process, one creates a monitor by feeding the training data to the network again in order to store the neuron activation patterns in abstract form. In operation, a classification decision over an input is further supplemented by examining if a pattern similar (measured by Hamming distance) to the generated pattern is contained in the monitor. If the monitor does not contain any pattern similar to the generated pattern, it raises a warning that the decision is not based on the training data. Our experiments show that, by adjusting the similarity-threshold for activation patterns, the monitors can report a significant portion of misclassfications to be not supported by training with a small false-positive rate, when evaluated on a test set.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


2.8 How Electronic Systems can benefit from Machine Learning and from ESD Alliance

Date: Tuesday, March 26, 2019
Time: 11:30 - 13:00
Location / Room: Exhibition Theatre

Organiser:
Jürgen Haase, edacentrum, DE, Contact Jürgen Haase

In this session the Electronic System Design Alliance will present their newest initatives and results. Mentor, a Siemens Business will discuss approaches for application of Machine Learning for designing and producing microelectronics products. IngeniArs will analyze scenarios for realizing smart edge devices by using accelerators for executing Machine Learning and Deep Learning algorithms.

TimeLabelPresentation Title
Authors
11:302.8.1MACHINE LEARNING IS CHANGING THE GAME FOR VARIABILITY AND CHARACTERIZATION AND WILL SOON HELP ANALOG AND DIGITAL VERIFICATION
Speaker:
Amit Gupta, Mentor, a Siemens Business, US
Abstract

The Golden Age of machine learning is upon EDA. Over the past four years, we have seen large EDA suppliers and customers grow their internal ML teams and strategies, and ML research projects are emerging in all areas of EDA. But, we have not yet seen much of this investment convert into real production flows and work. This talk reviews a set of challenges that make it difficult to bring ML solutions to production for semiconductor design, and discusses approaches for solving them. We will discuss how these approaches are already benefiting variation-aware design and characterization flows, and the broader applicability to analog and digital verification.

12:002.8.2MACHINE LEARNING AT THE EDGE FOR EMBEDDED AND LOW POWER PLATFORMS: EXPLOITING THE INTEL MOVIDIUS NEURAL COMPUTING STICK
Speaker:
Gionata Benelli, IngeniArs, IT
Abstract

Machine Learning, Deep Learning and AI are a technology that lots of enterprise are using to provide smarter services to their customer. Usually, data are acquired by sensors and then sent to a cloud or a remote server to perform inference and get the results. This is no longer the only way, in fact, commercial products are already available to offload the execution of ML and deep-Learning algorithms from the CPU of small devices. In this talk we will analyse some scenarios in which these accelerators, like Neural Compute Stick, can be useful in meeting design goals and allow the realization of smarter edge device.

12:302.8.3THE ESD ALLIANCE - AT THE CENTER OF THE SEMICONDUCTOR UNIVERSE
Speaker:
Paul Cohen, ESDA, US
13:00End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


3.0 LUNCH TIME KEYNOTE SESSION

Date: Tuesday, March 26, 2019
Time: 13:50 - 14:20
Location / Room: Room 1

Chair:
Marco Casale‐Rossi, Synopsys, IT, Contact Marco Casale-Rossi

Co-Chair:
Giovanni De Micheli, EPFL, CH, Contact Giovanni De Micheli

TimeLabelPresentation Title
Authors
13:503.0.1LEONARDO DA VINCI, HUMANISM AND ENGINEERING BETWEEN FLORENCE AND MILAN
Author:
Claudio Giorgione, Museo Nazionale della Scienza e della Tecnologia Leonardo da Vinci, IT
Abstract
The machines and mechanical elements drawn by Leonardo through the course of his itinerary as engineer and technologist belong to the most disparate fields, highlighting his curiosity about the technological culture of his times. Just as for the other sectors of his activity, the first machines depicted by Leonardo follow in the tradition of the Renaissance Florentine workshop and are characterized by a practical, empirical approach aimed at resolution of problems progressively as they arose. During his first Milanese period (1482-1499), Leonardo was experimenting with, and refining ever more effective graphical systems of representation, which he would proceed in applying also to other sectors, like anatomy, architecture, and military engineering. Sections, prospect views, and transparent views were used to decompose machines into their constituent elements, finding solutions for automating and rendering more efficient the existing traditional mechanisms, or for conceiving completely new mechanisms. Leonardo moved, particularly in the 1490s, from documentation of practical problems to a more theoretical analysis of the principles regulating the functioning of machines, from the study of mechanical elements to their inter-relation. The studies on friction and on motion in general are to be inserted into this perspective, which led him to the idea of compiling a treatise on mechanics, based on the analysis of mechanisms and gears, the so-called "elementi macchinali"
14:20End of session
16:00Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


3.1 Executive Session 2: Panel "Semiconductor IP, Surfing the Next Big Wave"

Date: Tuesday, March 26, 2019
Time: 14:30 - 16:00
Location / Room: Room 1

Organisers:
Giovanni De Micheli, EPFL, CH, Contact Giovanni De Micheli
Jamil Kawa, Synopsys, US, Contact Jamil Kawa

Moderator:
Raul Camposano, Sage Design Automation, US, Contact Raul Camposano

Semiconductor IP has made a great deal of progress since ARM was incorporated in 1990, almost thirty years ago, while the IC was breaking the 1-micron barrier, and power was becoming designers' biggest concern. Back then, semiconductor IP was "hard", physical-IP, which required complex porting to each and every different process technology. Over the last thirty years, and thanks to the transition from "hard" to "soft", synthesizable-IP, it has dramatically expanded, and now spans processors, interconnect, interface, FPGA, and complete sub-systems, and has become a critical enabler of modern systems-on-a-chip. Our industry is now moving to the 7/5 nanometer nodes: power remains a concern, but it is the lagging processors frequency, the latency across the processors, memory, and storage stacks, as well as the signal losses in electrical transmission lines that prevents breakthrough improvements. After decades of dominance by general purpose CPU and GPU, innovation is disrupting computing architectures: massively parallel Tensor Processing Units (TPU) are emerging that have demonstrated unprecedented performance; new memories are emerging that may complement 3D DRAM and NAND; new technologies are emerging such as super-conducting electronics and silicon photonics, which require an unprecedented level of collaboration to rapidly achieve the maturity levels required for the design and manufacturing of VLSI systems. This panel, moderated by EDA industry veteran Raul Camposano, will explore the challenges and the opportunities of semiconductor IP for the next decade.

Panelists:

  • K. Charles Janac, Arteris, US
  • Joachim Kunkel, Synopsys, US
  • Andrei Vladimirescu, Berkeley, US
  • Greg Yeric, ARM, US
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


3.2 Special Session: Smart Resource Management and Design Space Exploration for Heterogenous Processors

Date: Tuesday, March 26, 2019
Time: 14:30 - 16:00
Location / Room: Room 2

Organisers:
Partha Pande, Washington State University, US, Contact Partha Pratim Pande
Jörg Henkel, Karlsruhe Institute of Technology, DE, Contact Jörg Henkel

Chair:
Petru Eles, Linköping University, SE, Contact Petru Eles

Co-Chair:
Sudeep Pasricha, Colorado State University, US, Contact Sudeep Pasricha

We experience a phenomenal growth in exciting, yet demanding, application areas such as deep learning, graph analytics, and scientific computing. These application areas have driven a demand for new devices that package high-performance computing into smaller form-factors that operate in heavily constrained application scenarios (e.g., deep learning inference in embedded systems). Naturally, this presents new design challenges to meet ever increasing performance, cost, and energy efficiency requirements. This special session will consider a holistic approach to the broad topic of heterogeneous architectures. Towards this end, it consists of three forward-looking talks addressing the fundamental challenges, existing proposals, and new approaches for designing and exploring heterogeneous systems. The first talk will focus on utilizing various learning techniques to achieve thermal efficiency in a heterogeneous system. The second talk will shift the discussion toward the problems of designing these heterogeneous systems to accelerate applications. We will present innovative machine learning techniques that can be used to make efficient application-specific hardware design as easy and inexpensive as developing the corresponding application software. Finally, achieving stringent performance requirements under tight thermal constraints require a systematic stability analysis due to the positive feedback between leakage power and temperature. The third talk will present a power-temperature stability and safety analysis technique that reveals the sufficient conditions under which the power-temperature trajectory converges to a stable fixed point. The following paragraphs briefly outline each topic that will be covered in this special session.

TimeLabelPresentation Title
Authors
14:303.2.1SMART THERMAL MANAGEMENT FOR HETEROGENEOUS MULTICORES
Speaker:
Joerg Henkel, Chair for Embedded Systems (CES), Karlsruhe Institute of Technology (KIT), DE
Authors:
Joerg Henkel, Heba Khdr and Martin Rapp, Karlsruhe Institute of Technology, DE
Abstract
Abstract: Due to the discontinuation of Dennard scaling, on-chip power densities are continuously increasing along with technology scaling, and hence on-chip temperatures are elevated. Therefore, several thermal management techniques have emerged to keep the temperature of the chip within safe limits. These techniques, however, lead to performance losses which become quite significant when heterogeneous multicore architectures are considered. This might ultimately erase a big portion of the expected performance gains from the heterogeneous architectures. Thus, it is indispensable to deploy thermal management techniques that are able to make efficient decisions that satisfy temperature constraints while at the same time maximizing the performance. This talk presents smart thermal management techniques for heterogeneous multicores that exploit relevant information diverse applications scenarios to maximize the performance under temperature constraints.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:003.2.2DESIGN AND OPTIMIZATION OF HETEROGENEOUS MANYCORE SYSTEMS ENABLED BY EMERGING INTERCONNECT TECHNOLOGIES: PROMISES AND CHALLENGES
Speaker:
Ryan Kim, Colorado State University, US
Authors:
Biresh Kumar Joardar1, Ryan Kim2, Janardhan Rao Doppa1 and Partha Pratim Pande1
1Washington State University, US; 2Colorado State University, US
Abstract
Due to the growing needs of Big Data computing applications (e.g., deep learning, graph analytics, and scientific computing) and the ending of Moore's law, there is a great need for low-cost, high-performance, and energy-efficient commodity many-core systems. However, with more stringent design objectives, application specialization, and more cores on a single chip, design-time optimization decisions become more complex. Moreover, with the advent of emerging interconnect technologies, like 3D integration makes the design optimization process more challenging. This increases the need for a holistically optimized design process that makes design decisions across multiple layers of the system, e.g., memory, compute, interconnect technology and network infrastructure. In this paper, we will present various challenges of designing heterogeneous manycore architectures using emerging interconnect technologies and associated optimization techniques.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:303.2.3POWER AND THERMAL ANALYSIS OF COMMERCIAL MOBILE PLATFORMS: EXPERIMENTS AND CASE STUDIES
Speaker:
Umit Ogras, Arizona State University, US
Authors:
Ganapati Bhat1, Suat Gumussoy2 and Umit Ogras1
1Arizona State University, US; 2Mathworks, US
Abstract
State-of-the-art mobile processors can deliver fast response time and high throughput to maximize the user experience. However, high performance also comes at the expense of increasing power consumption and chip temperature which severely limit applications from utilizing the full capabilities of the system. Higher operating temperatures also drive up the skin temperature, further deteriorating the user experience. Therefore, there is a strong need for analysis of power consumption and thermal behavior in mobile processors. In this paper, we present power and thermal models to analyze the power and thermal dynamics. We illustrate our models with experiments and case studies on two commercial system-on-chips used in Galaxy S4 and Nexus 6P smartphones.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


3.3 Methods and Characterisation techniques for Reliability

Date: Tuesday, March 26, 2019
Time: 14:30 - 16:00
Location / Room: Room 3

Chair:
Said Hamdioui, TU Delft, NL, Contact Said Hamdioui

Co-Chair:
Arnaud Virazel, LIRMM, FR, Contact Arnaud Virazel

This sections discusses the characterisation of BIT and ESD as well as a methodology to analyse the aging of SRAMs

TimeLabelPresentation Title
Authors
14:303.3.1NEW METHOD FOR THE AUTOMATED MASSIVE CHARACTERIZATION OF BIAS TEMPERATURE INSTABILITY IN CMOS TRANSISTORS
Speaker:
Pablo Sarazá Canflanca, Universidad de Sevilla, ES
Authors:
Pablo Saraza-Canflanca1, Javier Diaz-Fortuny2, Rafael Castro-Lopez1, Elisenda Roca1, Javier Martin-Martinez2, Rosana Rodriguez2, Montserrat Nafria2 and Francisco Vidal Fernandez1
1Instituto de Microelectrónica de Sevilla, ES; 2Universitat Autonoma de Barcelona UAB, ES
Abstract
Bias Temperature Instability has become a critical issue for circuit reliability. This phenomenon has been found to have a stochastic and discrete nature in nanometer-scale CMOS technologies. To account for this random nature, massive experimental characterization is necessary so that the extracted model parameters are accurate enough. However, there is a lack of automated analysis tools for the extraction of the BTI parameters from the extensive amount of generated data in those massive characterization tests. In this paper, a novel algorithm that allows the precise and fully automated parameter extraction from experimental BTI recovery current traces is presented. This algorithm is based on the Maximum Likelihood Estimation principles, and is able to extract, in a robust and exact manner, the threshold voltage shifts and emission times associated to oxide trap emissions during BTI recovery, required to properly model the phenomenon.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:003.3.2GUILTY AS CHARGED: COMPUTATIONAL RELIABILITY THREATS POSED BY ELECTROSTATIC DISCHARGE-INDUCED SOFT ERRORS
Speaker:
Keven Feng, University of Illinois at Urbana Champaign, US
Authors:
Keven Feng, Sandeep Vora, Rui Jiang, Elyse Rosenbaum and Shobha Vasudevan, ECE at Univ. of Illinois at Urbana-Champaign, US
Abstract
Electrostatic discharge (ESD) has been shown to cause severe reliability hazards at the physical level, resulting in permanent and transient failures. We present the first analysis of the effects of ESD induced errors on instruction level computation. Our data was measured on microcontroller test chip fabricated for this study, with discharges from a controlled ESD gun. Cosmic ray induced soft errors have been widely researched, and modeled as single event upsets (SEUs). Our observations across multiple trials on 3 test chips show that in contrast to radiation induced errors, ESD can cause much more widespread errors than SEUs. In our trials, we observe system hangs and clock glitches which are serious errors. We also observe errors in the following categories. Category A: multiple bit corruptions across multiple registers, Category B: multiple bit corruptions in the same register, and Category C: single bit corruptions across multiple registers. At the instruction level, these errors manifest as system hangs, serious malfunctioning of I/O operations, interrupt operations, data and program memory. We demonstrate that ESD induced errors form a significant reliability threat to higher level functionality, warranting modeling and mitigation techniques.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:303.3.3METHODOLOGY FOR APPLICATION-DEPENDENT DEGRADATION ANALYSIS OF MEMORY TIMING
Speaker:
Daniel Kraak, Delft University of Technology, NL
Authors:
Daniel Kraak1, Innocent Agbo1, Mottaqiallah Taouil1, Said Hamdioui1, Pieter Weckx2, Stefan Cosemans2 and Francky Catthoor2
1Delft University of Technology, NL; 2imec vzw., BE
Abstract
Memory designs typically contain design margins to compensate for aging. As aging impact becomes more severe with technology scaling, it is crucial to accurately predict such impact to prevent overestimation or underestimation of the margins. This paper proposes a methodology to accurately and efficiently analyze the impact of aging on the memory's digital logic (e.g., timing circuit and address decoder) while considering realistic workloads extracted from applications. To demonstrate the superiority of the methodology, we analyzed the degradation of the L1 data and instruction caches for an ARM v8-a processor using both our methodology as well as the state-of-the-art methods. The results show that the existing methods may significantly over- or underestimate the impact (e.g., the decoder margin up to 221% and the access time up to 20%) as compared with the proposed scheme. In addition, the results show that in general the instruction cache has the highest degradation. For example, its access time degrades up to 9% and its decoder margin up to 44%.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP1-14, 303CHIP HEALTH TRACKING USING DYNAMIC IN-SITU DELAY MONITORING
Speaker:
Hadi Ahmadi Balef, Eindhoven University of Technology, NL
Authors:
Hadi Ahmadi Balef1, Kees Goossens2 and José Pineda de Gyvez1
1Eindhoven University of Technology, NL; 2Eindhoven university of technology, NL
Abstract
Tracking the gradual effect of silicon aging on circuit delays requires fine-grain slack monitoring. The conventional slack monitoring techniques intend to measure the worst-case static slack, i.e. the slack of longest timing path. In sharp contrast to the conventional techniques, we propose a novel technique that is based on dynamic excitation of in-situ delay monitors (i.e. the dynamic excitation of timing paths that are monitored). As delays degrade, path delays increase and the monitors are excited more frequently. With the proposed technique, a fine-grained signature of delay degradation is extracted from the excitation rate of monitors. The in-situ monitors are inserted at intermediate points along timing paths to increase the sensitivity of signature to delay degradation. A new efficient monitor insertion algorithm is also proposed that reduces the number of monitors by ~2.1X compared to other works for an ARM Cortex M0 processor.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:01IP1-15, 541PCFI: PROGRAM COUNTER GUIDED FAULT INJECTION FOR ACCELERATING GPU RELIABILITY ASSESSMENT
Speaker:
Paolo Rech, UFRGS, BR
Authors:
Fritz Previlon, Charu Kalra, Devesh Tiwari and David Kaeli, Northeastern University, US
Abstract
Reliability has become a first-class design objective for GPU devices due to increasing soft-error rate. To assess the reliability of GPU programs, researchers rely on software fault-injection methods. Unfortunately, software fault-injection process is prohibitively expensive, requiring multiple days to complete a statistically sound fault-injection campaign. Therefore, to address this challenge, this paper proposes a novel fault-injection method, PCFI, that reduces the number of fault injections by exploiting the predictability in fault-injection outcome based on the program counter of the soft-error affected instruction. Evaluation on a variety of GPU programs covering a wide range of application domains shows that PCFI reduces the time to complete fault-injection campaigns by 22% on average without sacrificing the accuracy.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:02IP1-16, 696CHARACTERIZING THE RELIABILITY AND THRESHOLD VOLTAGE SHIFTING OF 3D CHARGE TRAP NAND FLASH
Speaker:
Weihua Liu, Huazhong University of Science and Technology, CN
Authors:
Weihua Liu1, Fei Wu1, Meng Zhang1, Yifei Wang1, Zhonghai Lu2, Xiangfeng Lu3 and Changsheng Xie1
1Huazhong University of Science and Technology, CN; 2KTH Royal Institute of Technology, SE; 3Beijing Memblaze Technology Co., Ltd., CN
Abstract
3D charge trap (CT) triple-level cell (TLC) NAND flash gradually becomes a mainstream storage component due to high storage capacity and performance, but introducing a concern about reliability. Fault tolerance and data management schemes are capable of improving reliability. Designing a more efficient solution, however, needs to understand the reliability characteristics of 3D CT TLC NAND flash. To facilitate such understanding, by exploiting a real-world testing platform, we investigate the reliability characteristics including the raw bit error rate (RBER) and the threshold voltage (Vth) shifting features after suffering from variable disturbances. We give an analysis of why these characteristics exist in 3D CT TLC NAND flash. We hope these observations can guide the designers to propose high efficient solutions to the reliability problem.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:03IP1-17, 882HIDDEN DELAY FAULT SENSOR FOR TEST, RELIABILITY AND SECURITY
Speaker:
Giorgio Di Natale, CNRS - TIMA, FR
Authors:
Giorgio Di Natale1, Elena Ioana Vatajelu2, Kalpana SENTHAMARAI KANNAN2 and Lorena Anghel3
1LIRMM, FR; 2TIMA, FR; 3Grenoble-Alpes University, FR
Abstract
In this paper we present a novel hidden-delay-fault sensor design and a preliminary analysis of its circuit integration and applicability. In our proposed method, the delay sensing is achieved by sampling data on both rising and falling clock edges and using a variable duty cycle to control the range of the sensed delay fault. The main advantage of our proposed method is that it works at nominal frequency, can cover a wide range of delay faults and it is versatile in its applicability. It can be used (i) during testing to perform user-defined hidden-delay-fault test, (ii) for reliability degradation estimation due to process, environmental variations and ageing, and (iii) in security to detect the insertion of Trojan horses that alter the path delay.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:03IP1-18, 219EFFECT OF DEVICE VARIATION ON MAPPING BINARY NEURAL NETWORK TO MEMRISTOR CROSSBAR ARRAY
Speaker:
Wooseok Yi, POSTECH, KR
Authors:
Wooseok Yi, Yulhwa Kim and Jae-Joon Kim, Pohang University of Science and Technology, KR
Abstract
In memristor crossbar array (MCA)-based neural network hardware, it is generally assumed that entire wordlines (WLs) are simultaneously enabled for parallel matrix-vector multiplication (MxV) operation. However, the error probability of MxV in a memristor crossbar array (MCA) increases as the resistance ratio (R-ratio) of a memristor decreases and the resistance variation and the number of simultaneously activated WLs increase. In this paper, we analyze the effect of R-ratio and variation of memristor devices on read sense margin and inference accuracy of MCA-based Binary Neural Network (BNN) hardware. We first show that only a limited number of WLs should be enabled to ensure correct MxV output when the R-ratio is small. On the other hand, we also show that, if the resistance variation becomes higher than a certain level, simultaneous activation of large number of WLs produces the higher accuracy even when R-ratio is small. Based on the analysis, we propose the Accuracy Estimation (AE) factor to find the optimal number of word lines that are simultaneously activated.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


3.4 Physical Design, Extraction and Timing Analysis

Date: Tuesday, March 26, 2019
Time: 14:30 - 16:00
Location / Room: Room 4

Chair:
Patrick Groeneveld, Cadence Design Systems, US, Contact Patrick Groeneveld

Co-Chair:
Po-Hung Lin Mark, National Chung Cheng University, TW, Contact Mark Po-Hung Lin

The first paper uses multivariate linear regression to increase the efficiency of corner based timing analysis. The second paper proposes an approach for zero skew clock tree construction yielding superior wirelength performance. The following two papers present macro placement algorithms: one adopting a dataflow driven approach, the other using a routability driven convolutional neural network predictor. The last paper addresses issues on reusability and reproducibalibty in parallelized random walk based capacitance extraction.

TimeLabelPresentation Title
Authors
14:303.4.1"UNOBSERVED CORNER" PREDICTION: REDUCING TIMING ANALYSIS EFFORT FOR FASTER DESIGN CONVERGENCE IN ADVANCED-NODE DESIGN
Speaker:
Uday Mallappa, University of California San Diego, US
Authors:
Andrew Kahng, Uday Mallappa, Lawrence Saul and Shangyuan Tong, University of California San Diego, US
Abstract
With diminishing margins for leading-edge products in advanced technology nodes, design closure and accuracy of timing analysis have emerged as serious concerns. A significant portion of design turnaround time is spent on timing analysis at combinations of process, voltage and temperature (PVT) corners. At the same time, accurate, signoff-quality timing analysis is desired during place-and-route and optimization steps, to avoid loops in the flow as well as overdesign that wastes area and power. We observe that timing results for a given path at different corners will have strong correlations, if only as a consequence of physics of devices and interconnects. We investigate a data-driven approach, based on multivariate linear regression, to predict the timing analysis at unobserved corners from analysis results at observed corners. We use a simple backward stepwise selection strategy to choose which corners to observe and which to predict. In order to accelerate convergence of the design process, the model must yield predicted values (from analysis at a limited number of observed corners) that are sufficiently accurate to substitute for unobserved ones. Our empirical results indicate that this is likely the case. With a 1M-instance example in foundry 16nm enablement, we obtain a model based on 10 observed corners that predicts timing results at the remaining 48 unobserved corners with less than 0.5% relative root mean squared error, and 99% of the model's relative prediction errors are less than 0.6%.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:003.4.2DIM SUM: LIGHT CLOCK TREE BY SMALL DIAMETER SUM
Speaker:
Gengjie Chen, The Chinese University of Hong Kong, HK
Authors:
Gengjie Chen and Evangeline Young, The Chinese University of Hong Kong, HK
Abstract
By retrospecting the classical deferred-merge embedding (DME) algorithm, we found an intrinsic relationship between the zero-skew tree (ZST) problem and the hierarchical clustering (HC) problem. To be more specific, the wire length of a ZST is proved a linear function of the sum of diameters of its corresponding HC. With this new insight, an effective O(n log n)-time O(1)-approximation algorithm and an optimal dynamic programming for ZST are designed. Using the ZST construction black box and a linear-time optimal tree decomposition algorithm, an improved algorithm for constructing the bounded-skew tree (BST) is derived. In the experiment, our approach shows superior wire length compared with previous methods for both ZST and BST.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:153.4.3ROUTABILITY-DRIVEN MACRO PLACEMENT WITH EMBEDDED CNN-BASED PREDICTION MODEL
Speaker:
Yu-Hung Huang, National Taiwan University of Science and Technology, TW
Authors:
Yu-Hung Huang1, Zhiyao Xie2, Guan-Qi Fang1, Tao-Chun Yu1, Haoxing Ren3, Shao-Yun Fang1, Yiran Chen2 and Jiang Hu4
1National Taiwan University of Science and Technology, TW; 2Duke University, US; 3NVIDIA Corporation, US; 4Texas A&M University, US
Abstract
With the dramatic shrink of feature size and the advance of semiconductor technology nodes, numerous and complicated design rules need to be followed, and a chip design can only be taped-out after passing design rule check (DRC). The high design complexity seriously deteriorates design routability, which can be measured by the number of DRC violations after the detailed routing stage. In addition, a modern large-scaled design typically consists of many huge macros due to the wide use of intellectual properties (IPs). Empirically, the placement of these macros greatly determines routability, while there exists no effective cost metric to directly evaluate a macro placement because of the extremely high complexity and unpredictability of cell placement and routing. In this paper, we propose the first work of routability-driven macro placement with deep learning. A convolutional neural network (CNN)-based routability prediction model is proposed and embedded into a macro placer such that a good macro placement with minimized DRC violations can be derived through a simulated annealing (SA) optimization process. Experimental results show the accuracy of the predictor and the effectiveness of the macro placer.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:303.4.4RTL-AWARE DATAFLOW-DRIVEN MACRO PLACEMENT
Speaker:
Alexandre Vidal Obiols, Polytechnic University of Catalonia, ES
Authors:
Alex Vidal-Obiols1, Jordi Cortadella1, Jordi Petit1, Marc Galceran-Oms2 and Ferran Martorell2
1UPC, ES; 2eSilicon EMEA, Barcelona, ES
Abstract
When RTL designers define the hierarchy of a system, they exploit their knowledge about the conceptual abstractions devised during the design and the functional interactions between the logical components. This valuable information is often lost during physical synthesis. This paper proposes a novel multi-level approach for the macro placement problem of complex designs dominated by macro blocks, typically memories. By taking advantage of the hierarchy tree, the netlist is divided into blocks containing macros and standard cells, and their dataflow affinity is inferred considering the latency and flow width of their interaction. The layout is represented using slicing structures and generated with a top-down algorithm capable of handling blocks with both hard and soft components, aimed at wirelength minimization. These techniques have been applied to a set of large industrial circuits and compared against both a commercial floorplanner and handcrafted floorplans by expert back-end engineers. The proposed approach outperforms the commercial tool and produces solutions with similar quality to the best handcrafted floorplans. Therefore, the generated floorplans provide an excellent starting point for the physical design iterations and contribute to reduce turn-around time significantly.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:453.4.5REALIZING REPRODUCIBLE AND REUSABLE PARALLEL FLOATING RANDOM WALK SOLVERS FOR PRACTICAL USAGE
Speaker:
Mingye Song, Tsinghua University, CN
Authors:
Mingye Song1, Zhezhao Xu1, Wenjian Yu1 and Lei Yin2
1Tsinghua University, CN; 2ANSYS Inc., US
Abstract
Capacitance extraction or simulation has become a challenging problem in the computer-aided design of integrated circuits (ICs), flat panel display, etc. Due to its scalability and reliability, the parallel floating random walk (FRW) based capacitance solver is widely used. In practice, the parallel FRW algorithms involve an issue of reproducibility and may spend a lot of time in the scenarios requesting high accuracy. To relieve these issues, techniques are developed in this paper to enhance the reproducibility and reusability of the parallel FRW based simulation. With them we ensure that same result is reproduced while rerunning the parallel FRW solver with same setting. Besides, a ``jump start'' feature is implemented to reduce the total runtime of simulating same structure with multiple accuracy criteria. Experiments on shared-memory and distributed-memory platforms have validated the effectiveness of the presented techniques. Compared with a synchronization based approach ensuring the reproducibility, the proposed technique with static workload allocation brings up to 4.8X more parallel speedup while sacrificing nothing.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:01IP1-19, 136ACCURATE WIRELENGTH PREDICTION FOR PLACEMENT-AWARE SYNTHESIS THROUGH MACHINE LEARNING
Speaker:
Daijoon Hyun, KAIST, KR
Authors:
Daijoon Hyun, Yuepeng Fan and Youngsoo Shin, KAIST, KR
Abstract
Placement-aware synthesis, which combines logic synthesis with virtual placement and routing (P&R) to better take account of wiring, has been popular for timing closure. The wirelength after virtual placement is correlated to actual wirelength, but correlation is not strong enough for some chosen paths. An algorithm to predict the actual wirelength from placement-aware synthesis is presented. It extracts a number of parameters from a given virtual path. A handful of synthetic parameters are compiled through linear discriminant analysis (LDA), and they are submitted to a few machine learning models. The final prediction of actual wirelength is given by the weighted sum of prediction from such machine learning models, in which weight is determined by the population of neighbors in parameter space. Experiments indicate that the predicted wirelength is 93% accurate compared to actual wirelength; this can be compared to conventional virtual placement, in which wirelength is predicted with only 79% accuracy.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:02IP1-20, 602A MIXED-HEIGHT STANDARD CELL PLACEMENT FLOW FOR DIGITAL CIRCUIT BLOCKS
Speaker:
Yi-Cheng Zhao, National Tsing Hua University, TW
Authors:
Yi-Cheng Zhao1, Yu-Chieh Lin1, Ting-Chi Wang1, Ting-Hsiung Wang2, Yun-Ru Wu2, Hsin-Chang Lin2 and Shu-Yi Kao2
1National Tsing Hua University, TW; 2Realtek Semiconductor Corp., TW
Abstract
In this paper, we present a mixed-height standard cell placement flow for digital circuit blocks. To our best knowledge, commercial tools currently do not support this type of flow in a fully automated manner. In our placement flow, we leverage a commercial placement tool and integrate it with several new point tools. Promising experimental results are reported to demonstrate the efficacy of our placement flow.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


3.5 Hardware authentication and attack prevention

Date: Tuesday, March 26, 2019
Time: 14:30 - 16:00
Location / Room: Room 5

Chair:
Johanna Sepulveda, TUM, DE, Contact Johanna Sepulveda

Co-Chair:
Ilia Polian, University of Stuttgart, DE, Contact Ilia Polian

Electronics industry involves considerable investment, which turns the protection of their Intellectual Property a main concern. The development of new technologies will depend on it. In this session, solutions based on obfuscated microfluidic biochips and PUF-like Quantum Dots (QD) devices are shown. Moreover, and attack that challenges PUF-based identifier techniques using machine learning is presented.

TimeLabelPresentation Title
Authors
14:303.5.1OPTICALLY INTERROGATED UNIQUE OBJECT WITH SIMULATION ATTACK PREVENTION
Speaker:
Povilas Marcinkevicius, Lancaster University, GB
Authors:
Povilas Marcinkevicius, Ibrahim Ethem Bagci, Nema M. Abdelazim, Christopher S. Woodhead, Robert J. Young and Utz Roedig, Lancaster University, GB
Abstract
A Unique Object (UNO) is a physical object with unique characteristics that can be measured externally. The usually analogue measurement can be converted into a digital representation - a fingerprint - which uniquely identifies the object. For practical applications it is necessary that measurements can be performed without the need of specialist equipment or complex measurement setup. Furthermore, a UNO should be able to defeat simulation attacks; an attacker may replace the UNO with a device or system that produces the expected measurement. Recently a novel type of UNOs based on Quantum Dots (QDs) and exhibiting unique photo-luminescence properties has been proposed. The uniqueness of these UNOs is based on quantum effects that can be interrogated using a light source and a camera. The so called Quantum Confinement UNO (QCUNO) responds uniquely to different light excitation levels which is exploited for simulation attack protection, as opposed to focusing on features too small to reproduce and therefore difficult to measure. In this paper we describe methods for extraction of fingerprints from the QCUNO. We evaluate our proposed methods using 46 UNOs in a controlled setup. Focus of the evaluation are entropy, error resilience and the ability to detect simulation attacks.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:003.5.2PUFS DEEP ATTACKS: ENHANCED MODELING ATTACKS USING DEEP LEARNING TECHNIQUES TO BREAK THE SECURITY OF DOUBLE ARBITER PUFS
Speaker:
Mahmoud Khalafalla, University Of Waterloo, CA
Authors:
Mahmoud Khalafalla and Catherine Gebotys, University of Waterloo, CA
Abstract
In the past decade and a half, physically unclonable functions (PUFs) have been introduced as a promising cryptographic primitive for hardware security applications. Since then, the race between proposing new complex PUF architectures and new attack schemes to break their security has been ongoing. Although modeling attacks using conventional machine learning techniques were successful against many PUFs, there are still some delay-based PUF architectures which remain unbroken against such attacks, such as the double arbiter PUFs. These stronger complex PUFs have the potential to be a promising candidate for key generation and authentication applications. This paper presents an in-depth analysis of modeling attacks using deep learning (DL) techniques against double arbiter PUFs (DAPUFs). Unlike more conventional machine learning techniques such as logistic regression and support vector machines, DL results show enhanced prediction accuracy of the attacked PUFs, thus pushing up the boundaries of modeling attacks to break more complex architectures. The attack on 3-1 DAPUFs has improved accuracy of over 86% (compared to previous research achieving a maximum of 76%) and the 4-1 DAPUFs accuracy ranges between 71%-81.5% (compared to previous research of maximum 63%). This research is crucial for analyzing the security of existing and future PUF architectures, confirming that as DL computations become more widely accessible, designers will need to hide the PUF's CRP relationship from attackers.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:303.5.3DESIEVE THE ATTACKER: THWARTING IP THEFT IN SIEVE-VALVE-BASED BIOCHIPS
Speaker:
Shayan Mohammed, new york university, US
Authors:
Mohammed Shayan1, Sukanta Bhattacharjee2, Yong Rafael Song2, Krishnendu Chakrabarty3 and Ramesh Karri4
1New York University, US; 2New York University Abu Dhabi, AE; 3Duke University, US; 4NYU, US
Abstract
Researchers develop bioassays following rigorous experimentation in the lab that involves considerable fiscal and highly-skilled-person-hour investment. Previous work shows that a bioassay implementation can be reverse engineered by using images or video and control signals of the biochip. Hence, techniques must be devised to protect the intellectual property (IP) rights of the bioassay developer. This study is the first step in this direction and it makes the following contributions: (1) it introduces a sieve-valve as the security primitive to obfuscate bioassay implementations; (2) it shows how sieve-valves can be used to obscure biochip building blocks such as multiplexers and mixers; (3) rules and metrics are presented for designing obfuscated biochips. We assess the cost-security trade-offs associated with this solution and demonstrate practical sieve-valve based obfuscation on real-life biochips.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


3.6 Software Solutions for Reliable Memories

Date: Tuesday, March 26, 2019
Time: 14:30 - 16:00
Location / Room: Room 6

Chair:
Valentin Gherman, CEA-Leti, FR, Contact Valentin Gherman

Co-Chair:
Borzoo Bonakdarpour, Iowa State University, US, Contact Borzoo Bonakdarpour

This session explores solutions for reliable memories at different levels. The first paper introduces a process-variation-resilient space allocation scheme for open-channel SSD with 3D charge-trap flash memory. The second paper presents an architecture-independent framework to mitigate read disturbance errors in STT-RAM. Finally, the third paper proposes a wear leveling aware memory allocator for PCM memories. The IP presentation deals with memory dependency speculation and how to take advantage of it during the Dynamic Binary Translation process by using VLIW cores.

TimeLabelPresentation Title
Authors
14:303.6.1PATCH: PROCESS-VARIATION-RESILIENT SPACE ALLOCATION FOR OPEN-CHANNEL SSD WITH 3D FLASH
Speaker:
Yi Wang, Shenzhen University, CN
Authors:
Jing Chen1, Yi Wang1, Amelie Chi Zhou1, Rui Mao1 and Tao Li2
1Shenzhen University, CN; 2University of Florida, US
Abstract
Advanced three-dimensional (3D) flash memory adopts charge-trap technology that can effectively improve the bit density and reduce the coupling effect. Despite these advantages, 3D charge-trap flash brings a number of new challenges. First, current etching process is unable to manufacture perfect channels with identical feature size. Second, the cell current in 3D chargetrap flash is only 20% compared to planar flash memory, making it difficult to give a reliable sensing margin. These issues are affected by process variation, and they pose threats to the integrity of data stored in 3D charge-trap flash. This paper presents PATCH, a process-variation-resilient space allocation scheme for open-channel SSD with 3D charge-trap flash memory. PATCH is a novel hardware and file system interface that can transparently allocate physical space in the presence of process variation. PATCH utilizes the rich functionalities provided by the system infrastructure of open-channel SSD to reduce the uncorrectable bit errors. We demonstrate the viability of the proposed technique using a set of extensive experiments. Experimental results show that PATCH can effectively enhance the reliability with negligible extra erase operations in comparison with representative schemes.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:003.6.2COMPILER-DIRECTED AND ARCHITECTURE-INDEPENDENT MITIGATION OF READ DISTURBANCE ERRORS IN STT-RAM
Speaker:
Chengmo Yang, University of Delaware, US
Authors:
Fateme Sadat Hosseini and Chengmo Yang, University of Delaware, US
Abstract
High density, negligible leakage power, and fast read speed have made Spin-Transfer Torque Random Access Memory (STT-RAM) one of the most promising candidates for next generation on-chip memories. However, STT-RAM suffers from read-disturbance errors, that is, read operations might accidentally change the value of the accessed memory location. Although these errors could be mitigated by applying a restore-after-read operation, the energy overhead would be significant. This paper presents an architecture-independent framework to mitigate read disturbance errors while reducing the energy overhead, by selectively inserting restore operations under the guidance of the compiler. For that purpose, the vulnerability of load operations to read disturbance errors is evaluated using a specifically designed fault model; a code transformation technique is developed to reduce the number of vulnerable loads; and, an algorithm is proposed to selectively insert restore operations. The evaluation results show that the proposed technique can effectively reduce up to 97% of restore operations and 66% of the energy overhead while maintaining 99.8% coverage of read disturbance errors.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:303.6.3A WEAR LEVELING AWARE MEMORY ALLOCATOR FOR BOTH STACK AND HEAP MANAGEMENT IN PCM-BASED MAIN MEMORY SYSTEMS
Speaker:
Qingan Li, Wuhan University, CN
Authors:
Wei Li1, Ziqi Shuai1, Chun Xue2, Mengting Yuan1 and Qingan Li1
1Wuhan University, CN; 2City University of Hong Kong, HK
Abstract
Phase change memory (PCM) has been considered as a replacement of DRAM, due to its potentials in high storage density and low leakage power. However, the limited write endurance presents critical challenges. Various wear leveling techniques have been proposed to mitigate this issue from different perspectives, including both hardware and software levels. This paper proposes a wear leveling aware memory allocator, which (1) always prefers allocating memory blocks with less writes upon memory requests, and (2) leaves blocks allocated more than a threshold value unallocable temporarily. Furthermore, for the first time, this allocator provides a uniform management scheme for both stack and heap areas, thus could better balance writes in stack and heap areas. Experimental evaluations show that, compared to state-of-the-art memory allocators (i.e., glibc malloc, NVMalloc and Walloc), the proposed memory allocator improves the PCM wear leveling, in terms of CoV (a wear leveling indicator) by 41.9%, 30.3%, and 35.8%, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP1-21, 233AGGRESSIVE MEMORY SPECULATION IN HW/SW CO-DESIGNED MACHINES
Speaker:
Simon Rokicki, INRIA, FR
Authors:
Simon Rokicki, Erven Rohou and Steven Derrien, IRISA, Rennes, FR
Abstract
Single-ISA heterogeneous systems (such as ARM big.LITTLE) are an attractive solution for embedded platforms as they expose many performance and energy consumption trade-offs directly to the operating system. Recent works have demonstrated the ability to increase their efficiency by using VLIW cores, supported through Dynamic Binary Translation (DBT). Such an approach exposes even more heterogeneity while maintaining the illusion of a single-ISA system. However, VLIW cores cannot rival with Out-of-Order (OoO) cores when it comes to performance. One of the reason is that OoO cores heavily rely on speculative execution. In this work, we study how it is possible to take advantage of memory dependency speculation during the DBT process. More specifically, our approach builds on a hardware accelerated DBT framework, which enables fine-grained dynamic iterative optimizations. This is achieved through a combination of hardware and software, following the principles of co-designed machines. The experimental study conducted demonstrates that our approach leads to a geo-mean speed-up of 20% while keeping the hardware overhead low.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


3.7 Design Automation of Cyber-Physical Systems

Date: Tuesday, March 26, 2019
Time: 14:30 - 16:00
Location / Room: Room 7

Chair:
Lei Bu, Nanjing University, CN, Contact Lei Bu

Co-Chair:
Stefano Centomo, University of Verona, IT, Contact Stefano Centomo

The session addresses design techniques for modern cyber-physical systems, e.g., design of the computation/communication platform according to system dynamics, design of variable-delay controllers and, last but not least, assume-guarantee contract optimization.

TimeLabelPresentation Title
Authors
14:303.7.1EXPLOITING SYSTEM DYNAMICS FOR RESOURCE-EFFICIENT AUTOMOTIVE CPS DESIGN
Speaker:
Wanli Chang, University of York, GB
Authors:
Leslie Maldonado1, Wanli Chang2, Debayan Roy3, Anuradha Annaswamy1, Dip Goswami4 and Samarjit Chakraborty3
1MIT, US; 2University of York, GB; 3TUM, DE; 4Eindhoven University of Technology, NL
Abstract
Automotive embedded systems are safety-critical, while being highly cost-sensitive at the same time. The former requires resource dimensioning that accounts for the worst case, even if such a case occurs infrequently, while this is in conflict with the latter requirement. In order to manage both of these aspects at the same time, one research direction being explored is to dynamically assign a mixture of resources based on needs and priorities of different tasks. Along this direction, in this paper we show that by properly modeling the physical dynamics of the systems that an automotive control software interacts with, it is possible to better save resources while still guaranteeing safety properties. Towards this, we focus on a distributed controller implementation that uses an automotive FlexRay bus. Our approach combines techniques from timing/schedulability analysis and control theory and shows the significance of synergistically combining the cyber component and physical processes in the cyber-physical systems (CPS) design paradigm.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:003.7.2IMPLEMENTATION-AWARE DESIGN OF IMAGE-BASED CONTROL WITH ON-LINE MEASURABLE VARIABLE-DELAY
Speaker:
Robinson Medina, Eindhoven University of Technology & TNO Powertrains, NL
Authors:
Róbinson Alberto Medina Sánchez, Sander Stuijk, Dip Goswami and Twan Basten, Eindhoven University of Technology, NL
Abstract
Image-based control uses image-processing algorithms to acquire sensing information. The sensing delay associated with the image-processing algorithm is typically platform-dependent and time-varying. Modern embedded platforms allow to characterize the sensing delay at design-time obtaining a delay histogram, and at run-time measuring its precise value. We exploit this knowledge to design variable-delay controllers. This design also takes into account the resource configuration of the image processing algorithm: sequential (with one processing resource) or pipelined (with multiprocessing capabilities). Since the control performance strongly depends on the model quality, we present a simulation benchmark that uses the model uncertainty and the delay histogram to obtain bounds on control performance. Our benchmark is used to select a variable-delay controller and a resource configuration that outperform a constant worst-case delay controller.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:303.7.3OPTIMIZING ASSUME-GUARANTEE CONTRACTS FOR THE DESIGN OF CYBER-PHYSICAL SYSTEMS
Speaker:
Chanwook Oh, University of Southern California, US
Authors:
Chanwook Oh1, Eunsuk Kang2, Shinichi Shiraishi2 and Pierluigi Nuzzo1
1University of Southern California, US; 2Toyota InfoTechnology Center, US
Abstract
Assume-guarantee (A/G) contracts are mathematical models enabling modular and hierarchical design and verification of complex systems by rigorous decomposition of system-level specifications into component-level specifications. Existing A/G contract frameworks, however, are not designed to effectively capture the behaviors of cyber-physical systems where multiple agents aim to maximize one or more objectives, and may interact with each other and the environment in a cooperative or non-cooperative way toward achieving their goals. This paper proposes an extension of the A/G contract framework, namely optimizing A/G contracts, that can be used to specify and reason about properties of component interactions that involve optimizing objectives. The proposed framework includes methods for constructing new contracts via conjunction and composition, along with algorithms to verify system properties via contract refinement. We illustrate its effectiveness on a set of case studies from connected and autonomous vehicles.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


3.8 DFG Collaborative Funding Instruments

Date: Tuesday, March 26, 2019
Time: 14:30 - 16:00
Location / Room: Exhibition Theatre

Organiser:
Jürgen Teich, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE, Contact Jürgen Teich

Moderator:
Andreas Raabe, DFG, DE, Contact Andreas Raabe

Collaborative interdisciplinary research is considered of paramount importance today for the achievement of breakthroughs and jumps in technical innovation. In this session, program director Dr. Andreas Raabe introduces which types of collaborative funding instruments are offered by the Deutsche Forschungsgemeinschaft (DFG) in Germany, but also funding opportunities for international cooperations. After an introduction into different funding instruments for short, medium and long term collaborative research, concrete example initiatives in the scope of topics of DATE will be shortly introduced and summarized by representatives with a majority of these initiatives also exhibiting during the conference week.

TimeLabelPresentation Title
Authors
14:303.8.1DFG COLLABORATIVE FUNDING INSTRUMENTS - AN OVERVIEW
Speaker:
Andreas Raabe, DFG, DE
14:453.8.2PRIORITY PROGRAM: SPP1648 SOFTWARE FOR EXASCALE COMPUTING
Speaker:
Hans-Joachim Bungartz, TUM, DE
14:523.8.3PRIORITY PROGRAM: SPP2037 SCALABLE DATA MANAGEMENT FOR FUTURE HARDWARE
Speaker:
Kai-Uwe Sattler, TU Ilmenau, DE
15:003.8.4RESEARCH UNIT: FOR1800 CONTROLLING CONCURRENT CHANGE - TOWARDS SELF-AWARE AUTOMOTIVE AND SPACE VEHICLES
Speaker:
Rolf Ernst, TU Braunschweig, IDA, DE
15:073.8.5COLLABORATIVE RESEARCH CENTRE: SFB 901 ON-THE-FLY COMPUTING
Speaker:
Marco Platzner, University of Paderborn, DE
15:153.8.6COLLABORATIVE RESEARCH CENTRE: SFB 876 PROVIDING INFORMATION BY RESOURCE-CONSTRAINED DATA ANALYSIS
Speaker:
Jian-Jia Chen, TU Dortmund, DE
15:223.8.7TRANSREGIONAL RESEARCH CENTRE: TR89 INVASIVE COMPUTING
Speaker:
Jürgen Teich, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE
15:303.8.8COLLABORATIVE RESEARCH CENTRE: SFB 912 HIGHLY ADAPTIVE ENERGY EFFICIENT COMPUTING
Speaker:
Gerhard Fettweis, Technische Universität Dresden, DE
15:373.8.9BI-NATIONAL RESEARCH PROJECT: CONQUERING MPSOC COMPLEXITY WITH PRINCIPLES OF A SELF-AWARE INFORMATION
Speaker:
Andreas Herkersdorf, TUM, DE
Author:
Rolf Ernst, TU Braunschweig, IDA, DE
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


IP1 Interactive Presentations

Date: Tuesday, March 26, 2019
Time: 16:00 - 16:30
Location / Room: Poster Area

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

LabelPresentation Title
Authors
IP1-1FAULT INJECTION ON HIDDEN REGISTERS IN A RISC-V ROCKET PROCESSOR AND SOFTWARE COUNTERMEASURES
Speaker:
Johan Laurent, Univ. Grenoble Alpes, Grenoble INP, LCIS, FR
Authors:
Johan Laurent1, Vincent Beroulle1, Christophe Deleuze1 and Florian Pebay-Peyroula2
1LCIS - Grenoble Institute of Technology - Univ. Grenoble Alpes, FR; 2CEA-Leti, FR
Abstract
To protect against hardware fault attacks, developers can use software countermeasures. They are generally designed to thwart software fault models such as instruction skip or memory corruption. However, these typical models do not take into account the actual implementation of a processor. By analyzing the processor microarchitecture, it is possible to bypass typical software countermeasures. In this paper, we analyze the vulnerability of a secure code from FISSC (Fault Injection and Simulation Secure Collection), by simulating fault injections in a RISC-V Rocket processor RTL description. We highlight the importance of hidden registers in the processor pipeline, which temporarily hold data during code execution. Secret data can be leaked by attacking these hidden registers. Software countermeasures against such attacks are also proposed.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-2METHODOLOGY FOR EM FAULT INJECTION: CHARGE-BASED FAULT MODEL
Speaker:
Haohao Liao, University of Waterloo, CA
Authors:
Haohao Liao and Catherine Gebotys, University of Waterloo, CA
Abstract
Recently electromagnetic fault injection (EMFI) techniques have been found to have significant implications on the security of embedded devices. Unfortunately there is still a lack of understanding of EM fault models and countermeasures for embedded processors. For the first time, this paper proposes an extended fault model based on the concept of critical charge and a new EMFI backside methodology based on over-clocking. Results show that exact timing of EM pulses can provide reliable repeatable instruction replacement faults for specific programs. An attack on AES is demonstrated showing that the EM fault injection requires on average less than 222 EM pulses and 5.3 plaintexts to retrieve the full AES key. This research is critical for ensuring embedded processors and their instruction set architectures are secure and resistant to fault injection attacks.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-3SECURING CRYPTOGRAPHIC CIRCUITS BY EXPLOITING IMPLEMENTATION DIVERSITY AND PARTIAL RECONFIGURATION ON FPGAS
Speaker:
Benjamin Hettwer, Robert Bosch GmbH, DE
Authors:
Benjamin Hettwer1, Johannes Petersen2, Stefan Gehrer1, Heike Neumann2 and Tim Güneysu3
1Robert Bosch GmbH, Corporate Sector Research, DE; 2Hamburg University of Applied Sciences, DE; 3Horst Görtz Institute for IT Security, Ruhr-University Bochum, DE
Abstract
Adaptive and reconfigurable systems such as Field Programmable Gate Arrays (FPGAs) play an integral part of many complex embedded platforms. This implies the capability to perform runtime changes to hardware circuits on demand. In this work, we make use of this feature to propose a novel countermeasure against physical attacks of cryptographic implementations. In particular, we leverage exploration of the implementation space on FPGAs to create various circuits with different hardware layouts from a single design of the Advanced Encryption Standard (AES), that are dynamically exchanged during device operation. We provide evidence from practical experiments based on a modern Xilinx ZYNQ UltraScale+ FPGA that our approach increases the resistance against physical attacks by at least factor two. Furthermore, the genericness of our approach allows an easy adaption to other algorithms and combination with other countermeasures

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-4STT-ANGIE: ASYNCHRONOUS TRUE RANDOM NUMBER GENERATOR USING STT-MTJ
Speaker:
Ben Perach, Faculty of Electrical Engineering. Technion - Israel Institute of Technology, IL
Authors:
Ben Perach and Shahar Kvatinsky, Technion, IL
Abstract
The Spin Transfer Torque Magnetic Tunnel Junction (STT-MTJ) is an emerging memory technology whose interesting stochastic behavior might benefit security applications. In this paper, we leverage this stochastic behavior to construct a true random number generator (TRNG), the basic module in the process of encryption key generation. Our proposed TRNG operates asynchronously and thus can use small and fast STT MTJ devices. As such, it can be embedded in low-power and low-frequency devices without loss of entropy. We evaluate the proposed TRNG using a numerical simulation, solving the Landau-Lifshitz-Gilbert (LLG) equation system of the STTMTJ devices. Design considerations, attack analysis, and process variation are discussed and evaluated. The evaluation shows that our solution is robust to process variation, achieving a Shannon-entropy generating rate between 99.7Mbps and 127.8Mbps for 90% of the instances.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-5ADAPTIVE TRANSIENT LEAKAGE-AWARE LINEARISED MODEL FOR THERMAL ANALYSIS OF 3-D ICS
Speaker:
Milan Mihajlovic, University of Manchester, GB
Authors:
Chao Zhang, Milan Mihajlovic and Vasilis Pavlidis, The University of Manchester, GB
Abstract
Physics-based models for thermal simulation that involve numerical solution of the heat equation are well placed to accurately capture the heterogeneity of materials and structures in modern 3-D integrated circuits (ICs). The introduction of non-linear effects in thermal coefficients and leakage power improves significantly the accuracy of thermal models. However, this non-linearity increases significantly the complexity and computational time of the analysis. In this paper, we introduce a linearised thermal model by demonstrating that weak temperature dependence of the specific heat and the thermal conductivity of silicon-based materials has only minor effect to computed temperature profiles. Thus, these parameters can be considered constant in working temperature ranges of modern ICs. The non-linearity in leakage power is approximated by a piecewise linear least square fit and the resulting model is linearised by exact Newton's method contrary to previous works that employ either simple iterative or inexact Newton's method. The method is implemented in the context of transient thermal analysis with adaptive time step selection, where we demonstrate that it is essential to apply Newton corrections to obtain the right time step size selection. The resulting method is up to 2x faster than a full non-linear method, typically introducing a global relative error of less than 1%.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-6FASTCOOL: LEAKAGE AWARE DYNAMIC THERMAL MANAGEMENT OF 3D MEMORIES
Speaker:
Lokesh Siddhu, IIT Delhi, IN
Authors:
Lokesh Siddhu1 and Preeti Ranjan Panda2
1Indian Institute of Technology, Delhi, IN; 2IIT Delhi, IN
Abstract
3D memory systems offer several advantages in terms of area, bandwidth, and energy efficiency. However, thermal issues arising out of higher power densities have limited their widespread use. While prior works have looked at reducing dynamic power through reduced memory accesses, in these memories, both leakage and dynamic power consumption are comparable. Furthermore, as the temperature rises the leakage power increases, creating a thermal-leakage loop. We study the impact of leakage power on 3D memory temperature and propose turning OFF hot channels to meet thermal constraints. Data is migrated to a 2D memory before closing a 3D channel. We introduce an analytical model to assess the 2D memory delay and use the model to guide data migration decisions. Our experiments show that the proposed optimization improves performance by 27% on an average (up to 66%) over state-of-the-art strategies.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-7ON THE USE OF CAUSAL FEATURE SELECTION IN THE CONTEXT OF MACHINE-LEARNING INDIRECT TEST
Speaker:
Manuel Barragán, TIMA laboraory, FR
Authors:
Manuel Barragan1, Gildas Leger2, Florent Cilici3, Estelle Lauga-Larroze4, Sylvain Bourdel4 and Salvador Mir3
1TIMA Laboratory, FR; 2Instituto de Microelectronica de Sevilla, IMSE-CNM, (CSIC - Universidad de Sevilla), ES; 3TIMA, FR; 4RFICLab, FR
Abstract
The test of analog, mixed-signal and RF (AMS-RF) circuits is still considered as a matter of human creativity, and although many attempts have been made towards their automation, no accepted and complete solution is yet available. Indeed, capturing the design knowledge of an experienced analog designer is one of the key challenges faced by the Electronic Design Automation (EDA) community. In this paper we explore the use of causal inference tools in the context of AMS-RF design and test with the goal of defining a methodology for uncovering the root causes of performance variation in these systems. We believe that such an analysis can be a promising first step for future EDA algorithms for AMS-RF systems.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-8ACCURACY AND COMPACTNESS IN DECISION DIAGRAMS FOR QUANTUM COMPUTATION
Speaker:
Alwin Zulehner, Johannes Kepler University Linz, AT
Authors:
Alwin Zulehner1, Philipp Niemann2, Rolf Drechsler3 and Robert Wille1
1Johannes Kepler University Linz, AT; 2Cyber-Physical Systems, DFKI GmbH, DE; 3University of Bremen, DE
Abstract
Quantum computation is a promising research field since it allows to conduct certain tasks exponentially faster than on conventional machines. As in the conventional domain, decision diagrams are heavily used in different design tasks for quantum computation like synthesis, verification, or simulation. However, unlike decision diagrams for the conventional domain, decision diagrams for quantum computation as of now suffer from a trade-off between accuracy and compactness that requires parameter fine-tuning on a case-by-case basis. In this work, we—for the first time—describe and evaluate the effects of this trade-off. Moreover, we propose an alternative approach that utilizes an algebraic representation of the occurring irrational numbers and outline how this can be incorporated in a decision diagram in order to overcome this trade-off.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-9ONE METHOD - ALL ERROR-METRICS: A THREE-STAGE APPROACH FOR ERROR-METRIC EVALUATION IN APPROXIMATE
Speaker:
Saman Fröhlich, University of Bremen/DFKI GmbH, DE
Authors:
Saman Fröhlich1, Daniel Grosse2 and Rolf Drechsler2
1University of Bremen/DFKI GmbH, DE; 2University of Bremen, DE
Abstract
Approximate Computing is a design paradigm that makes use of the error tolerance inherited by many applications, such as machine learning, media processing and data mining. The goal of Approximate Computing is to trade off accuracy for performance in terms of computation time, energy consumption and/or hardware complexity. In the field of circuit design for Approximate Computing, error-metrics are used to express the degree of approximation. Evaluating these error-metrics is a key challenge. Several approaches exist, however, to this day not all relevant metrics can be evaluated with formal methods. Recently, Symbolic Computer Algebra (SCA) has been used to evaluate error-metrics during approximate hardware generation. In this paper, we generalize the idea to use SCA and propose a methodology which is suitable for formal evaluation of all established error-metrics. This approach can be divided into three-stages: (i) Determine the remainder of the AC circuit wrt.the specification using SCA, (ii) build an Algebraic Decision Diagram (ADD) to represent the remainder and (iii) evaluate each error-metric by a tailored ADD traversal algorithm. Besides being the first to propose a closed formal method for evaluation of all relevant error-metrics, we are the first to ever propose formal algorithms for the evaluation of the worst-case-relative and the average-case-relative error-metrics. In the experiments, we apply our algorithms to a large and well-known benchmark set.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-10REVERSIBLE PEBBLING GAME FOR QUANTUM MEMORY MANAGEMENT
Speaker:
Giulia Meuli, EPFL, CH
Authors:
Giulia Meuli1, Mathias Soeken1, Martin Roetteler2, Nikolaj Bjorner2 and Giovanni De Micheli1
1EPFL, CH; 2Microsoft, US
Abstract
Quantum memory management is becoming a pressing problem, especially given the recent research effort to develop new and more complex quantum algorithms. The only existing automatic method for quantum states clean-up relies on the availability of many extra resources. In this work, we propose an automatic tool for quantum memory management. We show how this problem exactly matches the reversible pebbling game. Based on that, we develop a SAT-based algorithm that returns a valid clean-up strategy, taking the limitations of the quantum hardware into account. The developed tool empowers the designer with the flexibility required to explore the trade-off between memory resources and number of operations. We present two show-cases to prove the validity of our approach. First, we apply the algorithm to straight-line programs, widely used in cryptographic applications. Second, we perform a comparison with the existing approach, showing an average improvement of 52.77%.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-11TYPECNN: CNN DEVELOPMENT FRAMEWORK WITH FLEXIBLE DATA TYPES
Speaker:
Lukas Sekanina, Brno University of Technology, CZ
Authors:
Petr Rek and Lukas Sekanina, Brno University of Technology, CZ
Abstract
The rapid progress in artificial intelligence technologies based on deep and convolutional neural networks (CNN) has led to an enormous interest in efficient implementations of neural networks in embedded devices and hardware. We present a new software framework for the development of (approximate) convolutional neural networks in which the user can define and use various data types for forward (inference) procedure, backward (training) procedure and weights. Moreover, non-standard arithmetic operations such as approximate multipliers can easily be integrated into the CNN under design. This flexibility enables to analyze the impact of chosen data types and non-standard arithmetic operations on CNN training and inference efficiency. The framework was implemented in C++ and evaluated using several case studies.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-12GUARANTEED COMPRESSION RATE FOR ACTIVATIONS IN CNNS USING A FREQUENCY PRUNING APPROACH
Speaker:
Sebatian Vogel, Robert Bosch GmbH, DE
Authors:
Sebastian Vogel1, Christoph Schorn1, Andre Guntoro1 and Gerd Ascheid2
1Robert Bosch GmbH, DE; 2RWTH Aachen University, DE
Abstract
Convolutional Neural Networks have become state of the art for many computer vision tasks. However, the size of Neural Networks prevents their application in resource constrained systems. In this work, we present a lossy compression technique for intermediate results of Convolutional Neural Networks. The proposed method offers guaranteed compression rates and additionally adapts to performance requirements. Our experiments with networks for classification and semantic segmentation show, that our method outperforms state-of-the-art compression techniques used in CNN accelerators.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-13RUNTIME MONITORING NEURON ACTIVATION PATTERNS
Speaker:
Chih-Hong Cheng, fortiss, DE
Authors:
Chih-Hong Cheng1, Georg Nührenberg1 and Hirotoshi Yasuoka2
1fortiss - Landesforschungsinstitut des Freistaats Bayern, DE; 2DENSO Corporation, JP
Abstract
For using neural networks in safety critical domains such as automated driving, it is important to know if a decision made by a neural network is supported by prior similarities in training. We propose runtime neuron activation pattern monitoring - after the standard training process, one creates a monitor by feeding the training data to the network again in order to store the neuron activation patterns in abstract form. In operation, a classification decision over an input is further supplemented by examining if a pattern similar (measured by Hamming distance) to the generated pattern is contained in the monitor. If the monitor does not contain any pattern similar to the generated pattern, it raises a warning that the decision is not based on the training data. Our experiments show that, by adjusting the similarity-threshold for activation patterns, the monitors can report a significant portion of misclassfications to be not supported by training with a small false-positive rate, when evaluated on a test set.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-14CHIP HEALTH TRACKING USING DYNAMIC IN-SITU DELAY MONITORING
Speaker:
Hadi Ahmadi Balef, Eindhoven University of Technology, NL
Authors:
Hadi Ahmadi Balef1, Kees Goossens2 and José Pineda de Gyvez1
1Eindhoven University of Technology, NL; 2Eindhoven university of technology, NL
Abstract
Tracking the gradual effect of silicon aging on circuit delays requires fine-grain slack monitoring. The conventional slack monitoring techniques intend to measure the worst-case static slack, i.e. the slack of longest timing path. In sharp contrast to the conventional techniques, we propose a novel technique that is based on dynamic excitation of in-situ delay monitors (i.e. the dynamic excitation of timing paths that are monitored). As delays degrade, path delays increase and the monitors are excited more frequently. With the proposed technique, a fine-grained signature of delay degradation is extracted from the excitation rate of monitors. The in-situ monitors are inserted at intermediate points along timing paths to increase the sensitivity of signature to delay degradation. A new efficient monitor insertion algorithm is also proposed that reduces the number of monitors by ~2.1X compared to other works for an ARM Cortex M0 processor.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-15PCFI: PROGRAM COUNTER GUIDED FAULT INJECTION FOR ACCELERATING GPU RELIABILITY ASSESSMENT
Speaker:
Paolo Rech, UFRGS, BR
Authors:
Fritz Previlon, Charu Kalra, Devesh Tiwari and David Kaeli, Northeastern University, US
Abstract
Reliability has become a first-class design objective for GPU devices due to increasing soft-error rate. To assess the reliability of GPU programs, researchers rely on software fault-injection methods. Unfortunately, software fault-injection process is prohibitively expensive, requiring multiple days to complete a statistically sound fault-injection campaign. Therefore, to address this challenge, this paper proposes a novel fault-injection method, PCFI, that reduces the number of fault injections by exploiting the predictability in fault-injection outcome based on the program counter of the soft-error affected instruction. Evaluation on a variety of GPU programs covering a wide range of application domains shows that PCFI reduces the time to complete fault-injection campaigns by 22% on average without sacrificing the accuracy.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-16CHARACTERIZING THE RELIABILITY AND THRESHOLD VOLTAGE SHIFTING OF 3D CHARGE TRAP NAND FLASH
Speaker:
Weihua Liu, Huazhong University of Science and Technology, CN
Authors:
Weihua Liu1, Fei Wu1, Meng Zhang1, Yifei Wang1, Zhonghai Lu2, Xiangfeng Lu3 and Changsheng Xie1
1Huazhong University of Science and Technology, CN; 2KTH Royal Institute of Technology, SE; 3Beijing Memblaze Technology Co., Ltd., CN
Abstract
3D charge trap (CT) triple-level cell (TLC) NAND flash gradually becomes a mainstream storage component due to high storage capacity and performance, but introducing a concern about reliability. Fault tolerance and data management schemes are capable of improving reliability. Designing a more efficient solution, however, needs to understand the reliability characteristics of 3D CT TLC NAND flash. To facilitate such understanding, by exploiting a real-world testing platform, we investigate the reliability characteristics including the raw bit error rate (RBER) and the threshold voltage (Vth) shifting features after suffering from variable disturbances. We give an analysis of why these characteristics exist in 3D CT TLC NAND flash. We hope these observations can guide the designers to propose high efficient solutions to the reliability problem.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-17HIDDEN DELAY FAULT SENSOR FOR TEST, RELIABILITY AND SECURITY
Speaker:
Giorgio Di Natale, CNRS - TIMA, FR
Authors:
Giorgio Di Natale1, Elena Ioana Vatajelu2, Kalpana SENTHAMARAI KANNAN2 and Lorena Anghel3
1LIRMM, FR; 2TIMA, FR; 3Grenoble-Alpes University, FR
Abstract
In this paper we present a novel hidden-delay-fault sensor design and a preliminary analysis of its circuit integration and applicability. In our proposed method, the delay sensing is achieved by sampling data on both rising and falling clock edges and using a variable duty cycle to control the range of the sensed delay fault. The main advantage of our proposed method is that it works at nominal frequency, can cover a wide range of delay faults and it is versatile in its applicability. It can be used (i) during testing to perform user-defined hidden-delay-fault test, (ii) for reliability degradation estimation due to process, environmental variations and ageing, and (iii) in security to detect the insertion of Trojan horses that alter the path delay.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-18EFFECT OF DEVICE VARIATION ON MAPPING BINARY NEURAL NETWORK TO MEMRISTOR CROSSBAR ARRAY
Speaker:
Wooseok Yi, POSTECH, KR
Authors:
Wooseok Yi, Yulhwa Kim and Jae-Joon Kim, Pohang University of Science and Technology, KR
Abstract
In memristor crossbar array (MCA)-based neural network hardware, it is generally assumed that entire wordlines (WLs) are simultaneously enabled for parallel matrix-vector multiplication (MxV) operation. However, the error probability of MxV in a memristor crossbar array (MCA) increases as the resistance ratio (R-ratio) of a memristor decreases and the resistance variation and the number of simultaneously activated WLs increase. In this paper, we analyze the effect of R-ratio and variation of memristor devices on read sense margin and inference accuracy of MCA-based Binary Neural Network (BNN) hardware. We first show that only a limited number of WLs should be enabled to ensure correct MxV output when the R-ratio is small. On the other hand, we also show that, if the resistance variation becomes higher than a certain level, simultaneous activation of large number of WLs produces the higher accuracy even when R-ratio is small. Based on the analysis, we propose the Accuracy Estimation (AE) factor to find the optimal number of word lines that are simultaneously activated.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-19ACCURATE WIRELENGTH PREDICTION FOR PLACEMENT-AWARE SYNTHESIS THROUGH MACHINE LEARNING
Speaker:
Daijoon Hyun, KAIST, KR
Authors:
Daijoon Hyun, Yuepeng Fan and Youngsoo Shin, KAIST, KR
Abstract
Placement-aware synthesis, which combines logic synthesis with virtual placement and routing (P&R) to better take account of wiring, has been popular for timing closure. The wirelength after virtual placement is correlated to actual wirelength, but correlation is not strong enough for some chosen paths. An algorithm to predict the actual wirelength from placement-aware synthesis is presented. It extracts a number of parameters from a given virtual path. A handful of synthetic parameters are compiled through linear discriminant analysis (LDA), and they are submitted to a few machine learning models. The final prediction of actual wirelength is given by the weighted sum of prediction from such machine learning models, in which weight is determined by the population of neighbors in parameter space. Experiments indicate that the predicted wirelength is 93% accurate compared to actual wirelength; this can be compared to conventional virtual placement, in which wirelength is predicted with only 79% accuracy.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-20A MIXED-HEIGHT STANDARD CELL PLACEMENT FLOW FOR DIGITAL CIRCUIT BLOCKS
Speaker:
Yi-Cheng Zhao, National Tsing Hua University, TW
Authors:
Yi-Cheng Zhao1, Yu-Chieh Lin1, Ting-Chi Wang1, Ting-Hsiung Wang2, Yun-Ru Wu2, Hsin-Chang Lin2 and Shu-Yi Kao2
1National Tsing Hua University, TW; 2Realtek Semiconductor Corp., TW
Abstract
In this paper, we present a mixed-height standard cell placement flow for digital circuit blocks. To our best knowledge, commercial tools currently do not support this type of flow in a fully automated manner. In our placement flow, we leverage a commercial placement tool and integrate it with several new point tools. Promising experimental results are reported to demonstrate the efficacy of our placement flow.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-21AGGRESSIVE MEMORY SPECULATION IN HW/SW CO-DESIGNED MACHINES
Speaker:
Simon Rokicki, INRIA, FR
Authors:
Simon Rokicki, Erven Rohou and Steven Derrien, IRISA, Rennes, FR
Abstract
Single-ISA heterogeneous systems (such as ARM big.LITTLE) are an attractive solution for embedded platforms as they expose many performance and energy consumption trade-offs directly to the operating system. Recent works have demonstrated the ability to increase their efficiency by using VLIW cores, supported through Dynamic Binary Translation (DBT). Such an approach exposes even more heterogeneity while maintaining the illusion of a single-ISA system. However, VLIW cores cannot rival with Out-of-Order (OoO) cores when it comes to performance. One of the reason is that OoO cores heavily rely on speculative execution. In this work, we study how it is possible to take advantage of memory dependency speculation during the DBT process. More specifically, our approach builds on a hardware accelerated DBT framework, which enables fine-grained dynamic iterative optimizations. This is achieved through a combination of hardware and software, following the principles of co-designed machines. The experimental study conducted demonstrates that our approach leads to a geo-mean speed-up of 20% while keeping the hardware overhead low.

Download Paper (PDF; Only available from the DATE venue WiFi)

3ps.8 Publisher´s Session: How to Publish Your Research Work

Date: Tuesday, March 26, 2019
Time: 16:15 - 16:45
Location / Room: Exhibition Theatre

Speaker:
Charles Glaser, Springer, US, Contact Charles B. Glaser

This publisher´s session invites all attendees to discuss how and why to publish their research work with Springer Nature. Charles Glaser, Editorial Director for Springer, will present his advice for collaboration in research dissemination. He will be available in this session, as well as the entire exhibition, to discuss the publication of your next book.

TimeLabelPresentation Title
Authors
16:45End of session
18:30Exhibition Reception in Exhibition Area

The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.


4.1 Executive Session 3: The Future of Test

Date: Tuesday, March 26, 2019
Time: 17:00 - 18:30
Location / Room: Room 1

Chair:
Subhasish Mitra, Stanford University, US, Contact Subhasish Mitra

This session titled "The Future of Test" explores various aspects of test: from traditional manufacturing test, to its role in addressing yield and reliability at advanced technology nodes, all the way to the design and test of quantum computers. The role of testing beyond manufacturing (e.g., in system validation and security) will also be explored.

TimeLabelPresentation Title
Authors
17:004.1.1YIELD AND RELIABILITY CHALLENGES AND SOLUTIONS AT 7NM AND BELOW
Speaker and Author:
Andrzej Strojwas, Carnegie Mellon University and PDF Solutions, US
17:304.1.2THREE POSSIBLE ALTERNATE REALITIES FOR THE FUTURE OF TEST
Speaker and Author:
Jeff Rearick, AMD, US
18:004.1.3WHAT ABOUT THE DESIGN AND TEST OF QUANTUM COMPUTERS?
Speaker and Author:
Leon Stok, IBM, US
18:30End of session
Exhibition Reception in Exhibition Area

The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.


4.2 Reconfigurable Architecture and Tools

Date: Tuesday, March 26, 2019
Time: 17:00 - 18:30
Location / Room: Room 2

Chair:
Smail Niar, Université Polytechnique Hauts-de-France, FR, Contact Smail Niar

Co-Chair:
Marvin Damschen, Karlsruhe Institute of Technology, DE, Contact Marvin Damschen

This session presents three papers that improved application mapping onto coarse-grained reconfigurable arrays, thermal aware application mapping for FPGAs, and hardware security coprocessor for reconfigurable CPUs, along with two interactive presentations that introduce novel hardware accelerator interfaces to multi-core CPUs and approximate arithmetic components.

TimeLabelPresentation Title
Authors
17:004.2.1CONTEXT-MEMORY AWARE MAPPING FOR ENERGY EFFICIENT ACCELERATION WITH CGRAS
Speaker:
Satyajit Das, Univ. Bretagne-Sud, CNRS UMR 6285, Lab-STICC, FR
Authors:
Satyajit Das, Kevin Martin and Philippe Coussy, Université de Bretagne-Sud, FR
Abstract
Coarse Grained Reconfigurable Arrays (CGRAs) are emerging as low power computing alternative providing a high grade of acceleration. However, the area and energy efficiency of these devices are bottlenecked by the configuration/context memory when they are made autonomous and loosely coupled with CPUs. The size of these instruction memories is of prime importance due to their high area and impact on the power consumption. For instance, a 64-word instruction memory typically represents 40% of a processing element area. In this context, since traditional mapping approaches do not take the size of the context memory into account, CGRAs often become oversized which strongly degrade their performance and interest. In this paper, we propose a context memory aware mapping for CGRAs to achieve better area and energy efficiency. This paper motivates the need of constraining the size of the context memory inside the processing element (PE) for ultra low power acceleration. It also describes the mapping approach which tries to find at least one mapping solution for a given set of constraints defined by the context memories of the PEs. Experiments show that our proposed solution achieves an average of 2.3× energy gain (with a maximum of 3.1× and a minimum of 1.4×) compared to the mapping approach without the memory constraints, while using 2× less instruction memory. When compared to the CPU, the proposed mapping achieves an average of 14× (with a maximum of 23× and minimum of 5×) energy gain.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.2.2THERMAL-AWARE DESIGN AND FLOW FOR FPGA PERFORMANCE IMPROVEMENT
Speaker:
Tajana Rosing, University of California, San Diego, US
Authors:
Behnam Khaleghi and Tajana Rosing, University of California, San Diego, US
Abstract
To ensure reliable operation of circuits under elevated temperatures, designers are obliged to put a pessimistic timing margin proportional to the worst-case temperature (T worst ), which incurs significant performance overhead. The problem is exacerbated in deep-CMOS technologies with increased leakage power, particularly in Field-Programmable Gate Arrays (FPGAs) that comprise an abundance of leaky resources. We propose a two-fold approach to tackle the problem in FPGAs. For this end, we first obtain the performance and power characteristics of FPGA resources in a temperature range. Having the temperature-performance correlation of resources together with the estimated thermal distribution of applications makes it feasible to apply minimal, yet sufficient, timing margin. Second, we show how optimizing an FPGA device for a specific thermal corner affects its performance in the operating temperature range. This emphasizes the need for optimizing the device according to the target (range of) temperature. Building upon this observation, we propose thermal-aware optimization of FPGA architecture for foreknown field conditions. We performed a comprehensive set of experiments to implement and examine the proposed techniques. The experimental results reveal that thermal-aware timing on FPGAs yields up to 36.5% performance improvement. Optimizing the architecture further boosts the performance by 6.7%.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:004.2.3FIXER: FLOW INTEGRITY EXTENSIONS FOR EMBEDDED RISC-V
Speaker:
Swaroop Ghosh, The Pennsylvania State University, US
Authors:
Asmit De, Aditya Basu, Swaroop Ghosh and Trent Jaeger, Pennsylvania State University, US
Abstract
With the recent proliferation of Internet of Things (IoT) and embedded devices, there is a growing need to develop a security framework to protect such devices. RISC-V is a promising open source architecture that targets low-power embedded devices and SoCs. However, there is a dearth of practical and low-overhead security solutions in the RISC-V architecture. Programs compiled using RISC-V toolchains are still vulnerable to code injection and code reuse attacks such as buffer overflow and return-oriented programming (ROP). In this paper, we propose FIXER, a hardware implemented security extension to RISC-V that provides a defense mechanism against such attacks. FIXER enforces fine-grained control-flow integrity (CFI) of running programs on backward edges (returns) and forward edges (calls) without requiring any architectural modifications to the RISC-V processor core. We implement FIXER on RocketChip, a RISC-V SoC platform, by leveraging the integrated Rocket Custom Coprocessor (RoCC) to detect and prevent attacks. Compared to existing software based solutions, FIXER reduces energy overhead by 60% at minimal execution time (1.5%) and area (2.9%) overheads.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP2-1, 803TRANSREC: IMPROVING ADAPTABILITY IN SINGLE-ISA HETEROGENEOUS SYSTEMS WITH TRANSPARENT AND RECONFIGURABLE ACCELERATION
Speaker:
Marcelo Brandalero, Universidade Federal do Rio Grande do Sul (UFRGS), BR
Authors:
Marcelo Brandalero1, Muhammad Shafique2, Luigi Carro1 and Antonio Carlos Schneider Beck1
1UFRGS - Universidade Federal do Rio Grande do Sul, BR; 2Vienna University of Technology (TU Wien), AT
Abstract
Single-ISA heterogeneous systems, such as ARM's big.LITTLE, use microarchitecturally-different General-Purpose Processor cores to efficiently match the capabilities of the processing resources with applications' performance and energy requirements that change at run time. However, since only a fixed and non-configurable set of cores is available, reaching the best possible match between the available resources and applications' requirements remains a challenge, especially considering the varying and unpredictable workloads. In this work, we propose TransRec, a hardware architecture which improves over these traditional heterogeneous designs. TransRec integrates a shared, transparent (i.e., no need to change application binary) and adaptive accelerator in the form of a Coarse-Grained Reconfigurable Array that can be used by any of the General-Purpose Processor cores for on-demand acceleration. Through evaluations with cycle-accurate gem5 simulations, synthesis of real RISC-V processor designs for a 15nm technology, and considering the effects of Dynamic Voltage and Frequency Scaling, we demonstrate that TransRec provides better performance-energy tradeoffs that are otherwise unachievable with traditional big.LITTLE-like designs. In particular, for less than 40% area overhead, TransRec can improve performance in the low-energy mode (LITTLE) by 2.28x, and can improve both performance and energy efficiency by 1.32x and 1.59x, respectively, in high-performance mode (big).

Download Paper (PDF; Only available from the DATE venue WiFi)
18:31IP2-2, 116CADE: CONFIGURABLE APPROXIMATE DIVIDER FOR ENERGY EFFICIENCY
Speaker:
Mohsen Imani, University of California, San Diego, US
Authors:
Mohsen Imani, Ricardo Garcia, Andrew Huang and Tajana Rosing, University of California San Diego, US
Abstract
Approximate computing is a promising solution to design faster and more energy efficient systems, which provides an adequate quality for a variety of functions. Division, in particular, floating point division, is one of the most important operations in multimedia applications, which has been implemented less in hardware due to its significant cost and complexity. In this paper, we proposed CADE, a Configurable Approximate Divider which performs floating point division operation with a runtime controllable accuracy. The approximation of the CADE is accomplished by removing the costly division operation and replacing it with a subtraction of the input operands mantissa. To increase the level of accuracy, CADE analyses the first N bits (called tuning bits) of both input operands mantissa to estimate the division error. If CADE determines that the first approximation is unacceptable, a pre-computed value is retrieved from memory and subtracted from the first approximation mantissa. At runtime, CADE can provide a higher accuracy by increasing the number of tuning bits. The proposed CADE was integrated on the AMD GPU architecture. Our evaluation shows that CADE is at least 4.1× more energy efficient, 1.5× faster, and 1.7× higher area efficient as compared to state-of-the-art approximate dividers while providing 25% lower error rate. In addition, CADE gives a new knob to GPU in order to configure the level of approximation at runtime depending on the application/user accuracy requirement.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session
Exhibition Reception in Exhibition Area

The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.


4.3 Improving test generation and coverage

Date: Tuesday, March 26, 2019
Time: 17:00 - 18:30
Location / Room: Room 3

Chair:
Jaan Raik, Tallinn University of Technology, EE, Contact Jaan Raik

Co-Chair:
Sara Vinco, Polytechnic of Turin, IT, Contact Sara Vinco

This session targets improving coverage from different perspectives, to activate multiple targets with concolic testing, to improve functional coverage metrics for instruction set simulators, and to achieve path coverage in SystemC-AMS. Three IPs complete the session covering optimizations for system verification and design.

TimeLabelPresentation Title
Authors
17:004.3.1AUTOMATED ACTIVATION OF MULTIPLE TARGETS IN RTL MODELS USING CONCOLIC TESTING
Speaker:
Prabhat Mishra, University of Florida, US
Authors:
Yangdi Lyu, Alif Ahmed and Prabhat Mishra, University of Florida, US
Abstract
Simulation is widely used for validation of Register-Transfer-Level (RTL) models. While simulating with millions of random (or constrained-random) tests can cover majority of the targets (functional scenarios), the number of remaining targets can still be huge (hundreds or thousands) in case of today's industrial designs. Prior work on directed test generation using concolic testing can cover only one target at a time. A naive extension of prior work to activate the remaining targets would be infeasible due to wasted effort in multiple overlapping searches. In this paper, we propose an automated test generation technique for activating multiple targets in RTL models using concolic testing. This paper makes three important contributions. First, it efficiently prunes the targets that can be covered by the tests generated for activating the other targets. Next, it minimizes the overlapping searches while trying to generate tests for activating multiple targets. Finally, our approach effectively utilizes clustering of related targets as well as common path sharing between the targets in the same cluster to drastically reduce the test generation time. Experimental results demonstrate that our approach significantly outperforms the existing methods in terms of overall coverage (up to 5X, 1.2X on average) as well as test generation time (up to 146X, 80X on average).

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.3.2VERIFYING INSTRUCTION SET SIMULATORS USING COVERAGE-GUIDED FUZZING
Speaker:
Vladimir Herdt, University of Bremen, DE
Authors:
Vladimir Herdt, Daniel Grosse, Hoang M. Le and Rolf Drechsler, University of Bremen, DE
Abstract
Verification of Instruction Set Simulators (ISSs) is crucial. Predominantly simulation-based approaches are used. They require a comprehensive testset to ensure a thorough verification. We propose a novel coverage-guided fuzzing (CGF) approach to improve the testcase generation process. In addition to code coverage we integrate functional coverage and a custom mutation procedure tailored for ISS verification. As a case-study we apply our approach on a set of three publicly available RISC-V ISSs. We found several new errors, including one error in the official RISC-V reference simulator Spike.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:004.3.3DATA FLOW TESTING FOR SYSTEMC-AMS TIMED DATA FLOW MODELS
Speaker:
Muhammad Hassan, DFKI GmbH, DE
Authors:
Muhammad Hassan, Daniel Grosse, Hoang M. Le and Rolf Drechsler, University of Bremen, DE
Abstract
Internet-of-Things (IoT) devices have significantly increased the need for high quality Analog Mixed Signal (AMS) System-on-Chips (SoC). Virtual Prototyping (VP) can be utilized for an early design verification. The Timed Data Flow (TDF) model of computation available in SystemC-AMS offers here a good trade-off between accuracy and simulation-speed at the system-level. One of the main challenges in system-level verification of AMS design is to achieve full path coverage. In the software domain Data Flow Testing (DFT) has demonstrated to be a powerful testing strategy in this regard. In this paper we introduce a DFT approach for SystemC-AMS TDF models based on two major contributions: First, we develop a set of SystemC-AMS TDF models specific coverage criteria for DFT. This requires to consider the SystemC-AMS semantics of signal flow. Second, we explain how to automatically compute the data flow coverage result for given TDF models using a combination of static and dynamic analysis techniques. Our experimental results on real-world AMS VPs demonstrate the applicability and efficacy of our approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP2-3, 653HCFTL: A LOCALITY-AWARE PAGE-LEVEL FLASH TRANSLATION LAYER
Speaker:
Hao Chen, University of Science and Technology of China, CN
Authors:
Hao Chen1, Cheng Li1, Yubiao Pan2, Min Lyu1, Yongkun Li1 and Yinlong Xu1
1University of Science and Technology of China, CN; 2Huaqiao University, CN
Abstract
The increasing capacity of SSDs requires a large amount of built-in DRAM to hold the mapping information of logical-to-physical address translation. Due to the limited size of DRAM, existing FTL schemes selectively keep some active mapping entries in a Cached Mapping Table (CMT) in DRAM, while storing the entire mapping table on flash. To improve the CMT hit ratio with limited cache space on SSDs, in this paper, we propose a novel FTL, a hot-clusterity FTL (HCFTL) that clusters mapping entries recently evicted from the cache into dynamic translation pages (DTPs). Given the temporal localities that those hot entries are likely to be visited in near future, loading DTPs will increase the CMT hit ratio and thus improve the FTL performance. Furthermore, we introduce an index structure to speedup the lookup of mapping entries in DTPs. Our experiments show that HCFTL can improve the CMT hit ratio by up to 41.1% and decrease the system response time by up to 33.3%, compared to state-of-the-art FTL schemes.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:31IP2-4, 402MODEL CHECKING IS POSSIBLE TO VERIFY LARGE-SCALE VEHICLE DISTRIBUTED APPLICATION SYSTEMS
Speaker:
Haitao Zhang, School of Information Science and Engineering, Lanzhou University, CN
Authors:
Haitao Zhang1, Ayang Tuo1 and Guoqiang Li2
1Lanzhou University, CN; 2Shanghai Jiao Tong University, CN
Abstract
OSEK/VDX is a specification for vehicle-mounted systems. Currently, the specification has been widely adopted by many automotive companies to develop a distributed vehicle application system. However, the ever increasing complexity of the developed distributed application system has created a challenge for exhaustively ensuring its reliability. Model checking as an exhaustive technique has been applied to verify OSEK/VDX distributed application systems to discover subtle errors. Unfortunately, it faces a poor scalability for practical systems because the verification models derived from such systems are highly complex. This paper presents an efficient approach that addresses this problem by reducing the complexity of the verification model such that model checking can easily complete the verification.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:32IP2-5, 155AUTOMATIC ASSERTION GENERATION FROM NATURAL LANGUAGE SPECIFICATIONS USING SUBTREE ANALYSIS
Speaker:
Ian Harris, University of California, Irvine, US
Authors:
Junchen Zhao and Ian Harris, University of California Irvine, US
Abstract
We present an approach to generate assertions from natural language specifications by performing semantic analysis of sentences in the specification document. Other techniques for automatic assertion generation use information found in the design implementation, either by performing static or dynamic analysis. Our approach generates assertions directly from the specification document, so bugs in the implementation will not be reflected in the assertions. Our approach parses each sentence and examines the resulting syntactic parse trees to locate subtrees which are associated with important phrases, such as the antecedent and consequent of an implication. Formal assertions are generated using the information inside these subtrees to fill a set of assertion templates which we present. We evaluate the effectiveness of our approach using a set of statements taken from a real specification document.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session
Exhibition Reception in Exhibition Area

The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.


4.4 Digital processing with emerging memory technologies

Date: Tuesday, March 26, 2019
Time: 17:00 - 18:30
Location / Room: Room 4

Chair:
Shahar Kvatinsky, Technion, IL, Contact Shahar Kvatinsky

Co-Chair:
Elena-Ioana Vataleju, TIMA, FR, Contact Elena Ioana Vatajelu

This session looks at how emerging memory technologies improve the processing in digital systems for applications like processing-in-memory, graph processing, Binary Neural Networks and nonvolatile processors.

TimeLabelPresentation Title
Authors
17:004.4.1SAID: A SUPERGATE-AIDED LOGIC SYNTHESIS FLOW FOR MEMRISTIVE CROSSBARS
Speaker:
Roberto Giorgio Rizzo, Politecnico di Torino, IT
Authors:
Valerio Tenace1, Roberto Giorgio Rizzo1, Debjyoti Bhattacharjee2, Anupam Chattopadhyay2 and Andrea Calimera1
1Politecnico di Torino, IT; 2Nanyang Technological University, SG
Abstract
A Memristor is a two-terminal device that can serve as a non-volatile memory element with built-in logic capabilities. Arranged in a crossbar structure, memristive arrays allow to represent complex Boolean logic functions that adhere to the logic-in-memory paradigm, where data and logic gates are glued together on the same piece of hardware. Needless to say, novel and ad-hoc CAD solutions are required to achieve practical and feasible hardware implementations. Existing techniques aim at optimal mapping strategies that account for Boolean logic functions described by means of 2-input NOR and NOT gates, thus overlooking the optimization capabilities that a smart and dedicated technology-aware logic synthesis can provide. In this paper, we introduce a novel library-free supergate-aided (SAID) logic synthesis approach with a dedicated mapping strategy tailored on MAGIC crossbars. Supergates are obtained with a Look-Up Table (LUT)-based synthesis that splits a complex logic network into smaller Boolean functions. Those functions are then mapped on the crossbar array as to minimize latency. The proposed SAID flow allows to (i) maximize supergate-level parallelism, thus reducing the total number of computing cycles, and (ii) relax mapping constraints, allowing an easy and fast mapping of Boolean functions on memristive crossbars. Experimental results obtained on several benchmarks from ISCAS'85 and IWLS'93 suites demonstrate that our solution is capable to outperform other state-of-the-art techniques in terms of speedup (3.89X in the best case), at the expense of a very low area overhead.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.4.2GRAPHS: A GRAPH PROCESSING ACCELERATOR LEVERAGING SOT-MRAM
Speaker:
Deliang Fan, University of Central Florida, US
Authors:
Shaahin Angizi, Jiao Sun, Wei Zhang and Deliang Fan, University of Central Florida, US
Abstract
In this work, we present GraphS architecture, which transforms current Spin Orbit Torque Magnetic Random Access Memory (SOT-MRAM) to massively parallel computational units capable of accelerating graph processing applications. GraphS can be leveraged to greatly reduce energy consumption dealing with underlying adjacency matrix computations, eliminating unnecessary off-chip accesses and providing ultra-high internal bandwidth. The device-to-architecture co-simulation for three social network data-sets indicate roughly 3.6× higher energy-efficiency and 5.3× speed-up over recent ReRAM crossbar. It achieves ⁓4× higher energy-efficiency and 5.1× speed-up over recent processing-in-DRAM acceleration methods.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:004.4.3CORN: IN-BUFFER COMPUTING FOR BINARY NEURAL NETWORK
Speaker:
Liang Chang, Beihang University, CN
Authors:
Liang Chang1, Xin Ma2, Zhaohao Wang1, Youguang Zhang1, Weisheng Zhao1 and Yuan Xie2
1Beihang University, CN; 2University of California, Santa Barbara, US
Abstract
Binary Neural Networks (BNNs) have obtained great attention since they reduce memory usage and power consumption as well as achieve a satisfying recognition accuracy on Image Classification. In particular to computation, the multiply-accumulate operations of Conventional Neural Networks (CNNs) are replaced with the bit-wise operations (XNOR and pop-count). Such bit-wise operations are well suited for the hardware accelerator such as in-memory computing (IMC). However, an additional digital processing unit (DPU) is required for the pop-count operation, which induces considerable data movement between the Process Engines (PEs) and data buffers reducing the efficiency of the IMC. In this paper, we present a BNN computing accelerator, namely CORNs, which consists of a Non-Volatile Memory (NVM) based data buffer to perform the majority operation (to replace the pop-count process) with the NVM-based IMC to accelerate the computing of BNNs. CORN can naturally implement the XNOR operation in the NVM memory array, and feed results to the computing data buffer for the majority write operation. Such a design removes the pop-counter implemented by the DPU and reduces data movement between the data buffer and the memory array. Based on the evaluation results, CORN achieves 61% and 14% power saving with 1.74x and 2.12x speedup, compared to the FPGA and DPU based IMC architecture, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:154.4.5AN ENERGY EFFICIENT NON-VOLATILE FLIP-FLOP BASED ON COMET TECHNOLOGY
Speaker:
Robert Perricone, University of Notre Dame, US
Authors:
Robert Perricone1, Zhaoxin Liang2, Meghna Mankalale2, Michael Niemier1, Sachin S. Sapatnekar2, Jian-Ping Wang2 and X, Sharon Hu1
1University of Notre Dame, US; 2University of Minnesota, US
Abstract
As we approach the limits of CMOS scaling, researchers are developing "beyond-CMOS" technologies to sustain the technological benefits associated with device scaling. Spintronic technologies have emerged as a promising beyond-CMOS technology due to their inherent benefits over CMOS such as high integration density, low leakage power, radiation hardness, and non-volatility. These benefits make spintronic devices an attractive successor to CMOS-especially for memory circuits. However, spintronic devices generally suffer from slower switching speeds and higher write energy, which limits their usability. In an effort to close the energy-delay gap between CMOS and spintronics, device concepts such as CoMET (Composite-Input Magnetoelectric-base Logic Technology) have been introduced, which collectively leverage material phenomena such as the spin-Hall effect and the magnetoelectric effect to enable fast, energy efficient device operation. In this work, we propose a non-volatile flip-flop (NVFF) based on CoMET technology that is capable of achieving up to two orders of magnitude less write energy than CMOS. This low write energy (~2 aJ) makes our CoMET NVFF especially attractive to architectures that require frequent backup operations-e.g., for energy harvesting non-volatile processors.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session
Exhibition Reception in Exhibition Area

The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.


4.5 Hardware Trojans and Split Manufacturing

Date: Tuesday, March 26, 2019
Time: 17:00 - 18:30
Location / Room: Room 5

Chair:
Nele Mentens, KU Leuven, BE, Contact Nele Mentens

Co-Chair:
Giorgio Di Natale, TIMA, FR, Contact Giorgio Di Natale

This session elaborates on Hardware Trojans, which are an emerging threat to the security of hardware-software systems. Furthermore, it discusses split manufacturing as a technique to strengthen the security of semiconductor supply chains.

TimeLabelPresentation Title
Authors
17:004.5.1HARDWARE TROJAN IN EMERGING NON-VOLATILE MEMORIES
Speaker:
Swaroop Ghosh, The Pennsylvania State University, US
Authors:
Mohammad Nasim Imtiaz Khan, Karthikeyan Nagarajan and Swaroop Ghosh, Pennsylvania State University, US
Abstract
Emerging Non-Volatile Memories (NVMs) possess unique characteristics that make them a top target for deploying Hardware Trojan. In this paper, we investigate such knobs that can be targeted by the Trojans to cause read/write failure. For example, NVM read operation depends on clamp voltage which the adversary can manipulate. Adversary can also use ground bounce generated in NVM write operation to hamper another parallel read/write operation. We have designed a Trojan that can be activated and deactivated by writing a specific data pattern to a particular address. Once activated, the Trojan can couple two predetermined addresses and data written to one address (victim's address space) will get copied to another address (adversary's address space). This will leak sensitive information e.g., encryption keys. Adversary can also create read/write failure to predetermined locations (fault injection). Simulation results indicate that the Trojan can be activated by writing a specific data pattern to a specific address for 1956 times. Once activated, the attack duration can be as low as 52.4s and as high as 1.1ms (with reset-enable trigger). We also show that the proposed Trojan can scale down the clamp voltage by 400mV from optimum value which is sufficient to inject specific data-polarity read error. We also propose techniques to inject noise in the ground/power rail to cause read/write failure.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.5.2EVALUATING ASSERTION SET COMPLETENESS TO EXPOSE HARDWARE TROJANS AND VERIFICATION BLINDSPOTS
Speaker:
Nicole Fern, University of California Santa Barbara, US
Authors:
Nicole Fern1 and Tim Cheng2
1University of California Santa Barbara, US; 2HKUST, HK
Abstract
Assertion-based verification has been adopted by industry as an efficient specification mechanism. Handwritten assertions encode design intent in a parsable format and have been traditionally used to verify an implementation conforms to the properties outlined by the assertions. Our work makes the observation that design behavior not covered by the assertion set is equally revealing and can be leveraged to identify malicious behavior (hardware Trojans) as well as verification blindspots. The difficulty in examining this unspecified and unverified behavior is differentiating between benign functionality that is truly don't care and that which leaks information or violates design intent. Prior work exploring assertion set completeness suffers from this inability to distinguish benign unspecified functionality from actual verification holes, while existing Trojan detection techniques can differentiate these categories, but require unspecified functionality already be characterized. Our technique uses the assertion set and simulation trace data available in most industry design flows to characterize unspecified functionality then separates Trojans and verification blindspots from benign behavior using existing Trojan detection methods. Using our technique, we uncover missing functionality in a first-in first-out (FIFO) queue implementation and demonstrate detection of information leakage Trojans. We also illustrate Trojan detection for a system containing several components connected by an AXI4-Lite bus by analyzing the completeness of the AXI4-Lite assertion set provided by ARM.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:004.5.3EFFICIENT TEST GENERATION FOR TROJAN DETECTION USING SIDE CHANNEL ANALYSIS
Speaker:
Prabhat Mishra, University of Florida, US
Authors:
Yangdi Lyu and Prabhat Mishra, University of Florida, US
Abstract
Detection of hardware Trojans is vital to ensure the security and trustworthiness of System-on-Chip (SoC) designs. Side-channel analysis is effective for Trojan detection by analyzing various side-channel signatures such as power, current and delay. In this paper, we propose an efficient test generation technique to facilitate side-channel analysis utilizing dynamic current. While early work on current-aware test generation has proposed several promising ideas, there are two major challenges in applying it on large designs: (i) the test generation time grows exponentially with the design complexity, and (ii) it is infeasible to detect Trojans since the side-channel sensitivity is marginal compared to the noise and process variations. Our proposed work addresses both challenges by effectively exploiting the affinity between the inputs and rare (suspicious) nodes. We formalize the test generation problem as a searching problem and solve the optimization using genetic algorithm. The basic idea is to quickly find the profitable test patterns that can maximize switching in the suspicious regions while minimize switching in the rest of the circuit. Our experimental results demonstrate that we can drastically improve both the side-channel sensitivity (30x on average) and time complexity (4.6x on average) compared to the state-of-the-art test generation techniques.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:154.5.4A NEW PARADIGM IN SPLIT MANUFACTURING: LOCK THE FEOL, UNLOCK AT THE BEOL
Speaker:
Abhrajit Sengupta, New York University, US
Authors:
Abhrajit Sengupta1, Mohammed Nabeel2, Johann Knechtel2 and Ozgur Sinanoglu2
1New York University, US; 2New York University Abu Dhabi, AE
Abstract
Split manufacturing was introduced as an effective countermeasure against hardware-level threats such as IP piracy, overbuilding, and insertion of hardware Trojans. Nevertheless, the security promise of split manufacturing has been challenged by various attacks, which exploit the well-known working principles of physical design tools to infer the missing BEOL interconnects. In this work, we advocate a new paradigm to enhance the security for split manufacturing. Based on Kerckhoff's principle, we protect the FEOL layout in a formal and secure manner, by embedding keys. These keys are purposefully implemented and routed through the BEOL in such a way that they become indecipherable to the state-of-the-art FEOL-centric attacks. We provide our secure physical design flow to the community. We also define the security of split manufacturing formally and provide the associated proofs. At the same time, our technique is competitive with current schemes in terms of layout overhead, especially for practical, large-scale designs (ITC'99 benchmarks).

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP2-6, 191DETECTION OF HARDWARE TROJANS IN SYSTEMC HLS DESIGNS VIA COVERAGE-GUIDED FUZZING
Speaker:
Niklas Bruns, Cyber-Physical Systems, DFKI GmbH, DE
Authors:
Hoang M. Le, Daniel Grosse, Niklas Bruns and Rolf Drechsler, University of Bremen, DE
Abstract
High-level Synthesis (HLS) is being increasingly adopted as a mean to raise design productivity. HLS designs, which can be automatically translated into RTL, are typically written in SystemC at a more abstract level. Hardware Trojan attacks and countermeasures, while well-known and well-researched for RTL and below, have been only recently considered for HLS. The paper makes a contribution to this emerging research area by proposing a novel detection approach for Hardware Trojans in SystemC HLS designs. The proposed approach is based on coverage-guided fuzzing, a new promising idea from software (security) testing research. The efficiency of the approach in identifying stealthy behavior is demonstrated on a set of open-source benchmarks.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session
Exhibition Reception in Exhibition Area

The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.


4.6 Smart Communication Solutions for Automotive Systems

Date: Tuesday, March 26, 2019
Time: 17:00 - 18:30
Location / Room: Room 6

Chair:
Dirk Ziegenbein, Robert Bosch GmbH, DE, Contact Dirk Ziegenbein

Co-Chair:
Selma Saidi, Hamburg University of Technology, DE, Contact Selma Saidi

In this session, three approaches to smart communication in automotive systems design are presented. The first paper optimizes end-to-end latencies for time-sensitive networks with frame preemption. The second paper introduces a consensus scheme for vehicle platoon maneuvers. The third paper presents a decentralized approach to non-neighbor charge balancing in battery packs.

TimeLabelPresentation Title
Authors
17:004.6.1DESIGN OPTIMIZATION OF FRAME PREEMPTION IN REAL-TIME SWITCHED ETHERNET
Speaker:
Taeju Park, University of Michigan, US
Authors:
Taeju Park1, Soheil Samii2 and Kang Shin1
1University of Michigan, US; 2General Motors Research & Development, US
Abstract
Switched Ethernet technology is increasingly common in current and future real-time and embedded systems. The IEEE 802.1 working group has recently developed standards and technologies, commonly referred to as Time-Sensitive Networking (TSN), to enhance switched Ethernet with real-time and dependability properties. We address, for the first time, the synthesis problem for the TSN frame preemption standards IEEE 802.3br-2016 and 802.1Qbu-2016, which introduce two new configuration parameters: flow to queue and queue to Express/Preemptable MAC interface assignment. We present an optimization framework to determine these configuration parameters, considering reliability as optimization goals. Our experiments show that our proposed framework outperforms commonly used priority-assignment algorithms and an intuitive approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.6.2CUBA: CHAINED UNANIMOUS BYZANTINE AGREEMENT FOR DECENTRALIZED PLATOON MANAGEMENT
Speaker:
Emanuel Regnath, TUM, DE
Authors:
Emanuel Regnath and Sebastian Steinhorst, TUM, DE
Abstract
Autonomous driving, vehicle platoons and smart traffic management will dramatically improve our transportation systems. In contrast to centralized approaches, which do not scale efficiently with the actual traffic load, a decentralized traffic management based on distributed consensus could provide a robust, fair and well-scaling solution for infrastructures of variable density. In this paper, we propose a distributed platoon management scheme, where platoon operations such as join or merge are decided by consensus over a Vehicular ad hoc network (VANET). Since conventional consensus protocols are not suitable for Cyber-Physical Systems (CPS) such as platoons, we introduce CUBA, a new validated and verifiable consensus protocol especially tailored to platoons, which considers their special communication topology. We demonstrate that CUBA only introduces a small communication overhead compared to the centralized, Leader-based approach and significantly outperforms related distributed approaches.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:004.6.3DECENTRALIZED NON-NEIGHBOR ACTIVE CHARGE BALANCING IN LARGE BATTERY PACKS
Speaker:
Alexander Lamprecht, TUM CREATE, SG
Authors:
Alexander Lamprecht1, Martin Baumann2, Tobias Massier1 and Sebastian Steinhorst2
1TUM CREATE, SG; 2TUM, DE
Abstract
Recently, active charge balancing of the cells in battery packs has been gaining importance over state-of-the-art passive balancing solutions. The main advantage of active balancing lies in the ability to transfer charge between cells rather than dissipating it thermally. This enhances the overall efficiency and energy output of battery packs. In this paper, we develop a new class of strategies for decentralized operation of charge transfers between non-neighboring cells using appropriate balancing hardware architectures. While the benefits of the active balancing approach with a centralized controller have been discussed in literature extensively, the implementation of adequate strategies for scheduling charge transfers in decentralized battery management systems, which promise to be more robust and modular, have not been studied sufficiently so far. Furthermore, existing decentralized strategies only deal with charge transfers between neighboring cells. In order to compare our novel distributed non-neighbor balancing strategies to existing neighbor-only balancing strategies, we implement them in an open-source simulation framework for decentralized battery management systems. Our results show that we are able to improve the two most important metrics of balancing time and losses by up to 63% and 51%, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP2-7, 257DESIGN OPTIMIZATION FOR HARDWARE-BASED MESSAGE FILTERS IN BROADCAST BUSES
Speaker:
Lea Schönberger, TU Dortmund University, DE
Authors:
Lea Schönberger, Georg von der Brüggen, Horst Schirmeier and Jian-Jia Chen, Technical University of Dortmund, DE
Abstract
In the field of automotive engineering, broadcast buses, e.g., Controller Area Network (CAN), are frequently used to connect multiple electronic control units (ECUs). Each message transmitted on such buses can be received by each single participant, but not all messages are relevant for every ECU. For this purpose, all incoming messages must be filtered in terms of relevance by either hardware or software techniques. We address the issue of designing hardware filter configurations for clients connected to a broadcast bus in order to reduce the cost, i.e., the computation overhead, provoked by undesired but accepted messages. More precisely, we propose an SMT formulation that can be applied to i) retrieve a (minimal) perfect filter configuration, i.e., no undesired messages are received,ii) optimize the filter quality under given hardware restrictions, or iii) minimize the hardware cost for a given type of filter component and a maximum cost threshold.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:31IP2-8, 593VEHICLE SEQUENCE REORDERING WITH COOPERATIVE ADAPTIVE CRUISE CONTROL
Speaker:
Yun-Yun Tsai, National Tsing Hua University, TW
Authors:
Ta-Wei Huang1, Yun-Yun Tsai1, Chung-Wei Lin2 and Tsung-Yi Ho1
1National Tsing Hua University, TW; 2National Taiwan University, TW
Abstract
With Cooperative Adaptive Cruise Control (CACC) systems, vehicles are allowed to communicate and cooperate with each other to form platoons and improve the traffic throughput, traffic performance, and energy efficiency. In this paper, we take into account the braking factors of different vehicles so that there is a desired platoon sequence which minimizes the platoon length. We formulate the vehicle sequence reordering problem and propose an algorithm to reorder vehicles to their desired platoon sequence.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session
Exhibition Reception in Exhibition Area

The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.


4.7 Energy and power efficiency in GPU-based systems

Date: Tuesday, March 26, 2019
Time: 17:00 - 18:30
Location / Room: Room 7

Chair:
Muhammad Shafique, TU Wien, AT, Contact Muhammad Shafique

Co-Chair:
William Fornaciari, Politecnico di Milano, IT, Contact william fornaciari

This session presents three papers, two on energy efficiency for GPU-based systems and one about exploring performance and accuracy tradeoffs when using GPUs for SNN modeling. The first paper presents an online thermal and energy management mechanism for CPU-GPU system enabled by efficient thread partitioning, mapping, and respective models. The second paper identifies choke points in GPUs and boost the choke point induced critical warps for achieving high energy efficiency. The third paper presents a GPU-accelerated SNN simulator that introduces stochasticity in STDP and capability of performing low-precision simulation.

TimeLabelPresentation Title
Authors
17:004.7.1TEEM: ONLINE THERMAL- AND ENERGY-EFFICIENCY MANAGEMENT ON CPU-GPU MPSOCS
Speaker:
Amit Kumar Singh, University of Essex, GB
Authors:
Samuel Isuwa, Somdip Dey, Amit Kumar Singh and Klaus McDonald-Maier, University of Essex, GB
Abstract
Heterogeneous Multiprocessor System-on-Chip (MPSoC) are progressively becoming predominant in most modern mobile devices. These devices are required to perform processing of applications within thermal, energy and performance constraints. However, most stock power and thermal management mechanisms either neglect some of these constraints or rely on frequency scaling to achieve energy-efficiency and temperature reduction on the device. Although this inefficient technique can reduce temporal thermal gradient, but at the same time hurts the performance of the executing task. In this paper, we propose a thermal and energy management mechanism which achieves reduction in thermal gradient as well as energy-efficiency through resource mapping and thread-partitioning of applications with online optimization in heterogeneous MPSoCs. The efficacy of the proposed approach is experimentally appraised using different applications from Polybench benchmark suite on Odroid-XU4 developmental platform. Results show 28% performance improvement, 28.32% energy saving and reduced thermal variance of over 76% when compared to the existing approaches. Additionally, the method is able to free more than 90% in memory storage on the MPSoC, which would have been previously utilized to store several task-to-thread mapping configurations.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.7.2PREDICTING CRITICAL WARPS IN NEAR-THRESHOLD GPGPU APPLICATIONS USING A DYNAMIC CHOKE POINT ANALYSIS
Speaker:
Sourav Sanyal, Utah State University, US
Authors:
Sourav Sanyal, Prabal Basu, Aatreyi Bal, Sanghamitra Roy and Koushik Chakraborty, Utah State University, US
Abstract
General purpose graphics processing units (GP-GPU) cansignificantly improve the power consumption at the NTC operating region. However, process variation (PV) can drastically reduce its performance. In this paper, we examine choke points-a unique device-level characteristic of PV at NTC-that can exacerbate the warp criticality problem. We show that the modern warp schedulers cannot tackle the choke point induced critical warps in an NTC GPU. We propose Warp Latency Booster, a circuit-architectural solution to dynamically predict the critical warps and accelerate them in their respective execution units. Our best scheme achieves an average improvement of ∼32% and ∼41% in performance, and ∼21% and ∼19% in energy-efficiency, respectively, over two state-of-the-art warp schedulers.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:004.7.3FAST AND LOW-PRECISION LEARNING IN GPU-ACCELERATED SPIKING NEURAL NETWORK
Speaker:
Xueyuan She, Georgia Institute of Technology, US
Authors:
Xueyuan She, Yun Long and Saibal Mukhopadhyay, Georgia Institute of Technology, US
Abstract
Spiking neural network (SNN) uses biologically inspired neuron model coupled with Spike-timing-dependent-plasticity (STDP) to enable unsupervised continuous learning in artificial intelligence (AI) platform. However, current SNN algorithms shows low accuracy in complex problems and are hard to operate at reduced precision. This paper demonstrates a GPU-accelerated SNN architecture that uses stochasticity in the STDP coupled with higher frequency input spike trains. The simulation results demonstrate 2 to 3 times faster learning compared to deterministic SNN architectures while maintaining high accuracy for MNIST (simple) and fashion MNIST (complex) data sets. Further, we show stochastic STDP enables learning even with 2 bits of operation, while deterministic STDP fails.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session
Exhibition Reception in Exhibition Area

The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.


4.8 Embedded Tutorial: Paving the Way for Very Large Scale Integration of Superconductive Electronics

Date: Tuesday, March 26, 2019
Time: 17:00 - 18:30
Location / Room: Exh. Theatre

Organisers:
Jamil Kawa, Synopsys, US, Contact Jamil Kawa
Massoud Pedram, USC, US, Contact massoud pedram

Chair:
Jamil Kawa, Synopsys, US, Contact Jamil Kawa

Superconductive electronics (SCE) based on single flux quantum (SFQ) family of logic cells has appeared as a potent and within-reach "beyond-CMOS" technology. With proven switching speeds in 100's of GHz and energy dissipation approaching 10^(-19) Joules per transition (and lower for the adiabatic family), it is one of the most promising post-CMOS technologies that can break the current performance limit of 4 or so GHz CMOS processors, delivering a 30GHz single-threaded performance for a SCE processor. The state-of-the-art in terms of libraries, simulation and analysis, compact modeling, synthesis, physical design of SFQ-based logic is far behind that of CMOS, with semi-manual design of 16-bit SFQ adders, simple filters and ADCs, and bit-serial processors defining the state-of-the-art. To fulfill the potential of SCE logic families, it is essential that design methodologies and tools are developed to enable fully automated design of SCE VLSI circuits and processors on chip. The ac- and dc-biased SFQ logic families (such as RSFQ, ERSFQ, and AQFP) are, however, fundamentally different from CMOS logic families, for example, in terms of their reliance on two-terminal Josephson junctions with complex voltage-current (current-phase) behavior, cryogenic operation, pulse-based signaling, prevalence of inductors as key passive element, clocked nature of most logic cells and need for path balancing, limited fanout count of typically 2 or 3, use of biasing currents as the power source, etc. This tutorial aims at introducing the SCE SFQ technology starting from JJ device modeling and simulation to compact modeling of logic cells and superconductive transmission lines, to specialized logic synthesis, clock tree synthesis, bias distribution, and place&route engines.

TimeLabelPresentation Title
Authors
17:004.8.1PHYSICS-BASED MODELING AND DEVICE SIMULATION OF JOSEPHSON JUNCTIONS
Author:
Pooya Jannaty, Synopsys, US
Abstract
This tutorial will take a brief survey of the physics of Josephson-junction devices under equilibrium and non-equilibrium conditions using self-consistent quantum mean-field theory. The formation of Cooper pairs and the superconducting gap as well as the supercurrent and quasi-particle tunneling mechanism for a device under phase bias is studied. In non-equilibrium, the Floquet theory is used to model and simulate the frequency-domain behavior of the Josephson junctions under externally-applied voltage bias. Such physical phenomena as the proximity effect, the Andreev bound states, the multiple Andreev reflections, as well as the effect of barrier height, temperature, and disorder are discussed, and simulation results using Synopsys tools employing nonequilibrium Green's function formalism are presented.
17:304.8.2ARCHITECTURES, SYNTHESIS FLOW, AND PLACE & ROUTE ENGINE FOR DC-BIASED SFQ LOGIC CIRCUITS
Author:
Massoud Pedram, USC, US
Abstract
tbd
18:004.8.3LIBRARY DESIGN AND DESIGN TOOLS FOR ADIABATIC QUANTUM-FLUX-PARAMETRON LOGIC CIRCUITS (AC-BIASED SFQ LOGIC)
Author:
Nobuyuki Yoshikawa, Yokohama National University, JP
Abstract
Adiabatic quantum-flux-parametron (AQFP) logic is one of the superconducting logic families. Its attractive feature is the extremely high-energy efficiency due to the adiabatic operation of superconducting circuits that can intrinsically switch at very high speed. The bit energy of the AQFP logic is as small as 1 zJ (10-21 J) per gate even at 5 GHz or higher clock frequencies thanks to the logic operation near the quantum and thermal limit. The fundamental logic element of the AQFP logic is majority gates, which are driven by sinusoidal power clocks of multi-phases, typically four phases. The gate-to-gate connections are made by inductances with limited length. These unique features of the AQFP logic circuits make it difficult to use the existent EDA tools for CMOS circuits. We will show recent our development of a top-down design flow for AQFP logic circuits, which is composed of logic synthesis, logic simulation, and automated place & routing by using our HDL model and the physical layouts of the logic cells. Several AQFP circuits, such as carry-look-ahead adders, decoders, and shifters, were successfully demonstrated by using this approach. Perspective and challenges for designing large-scale AQFP circuits will be discussed based on the recent studies.
18:30End of session
Exhibition Reception in Exhibition Area

The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.


Exhibition-Reception Exhibition Reception

Date: Tuesday, March 26, 2019
Time: 18:30 - 19:30
Location / Room: Exhibition Area

The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.

TimeLabelPresentation Title
Authors
19:30End of session

5.1 Special Day on "Embedded Meets Hyperscale and HPC" Session: Heterogeneous Computing in the Datacenter and in HPC

Date: Wednesday, March 27, 2019
Time: 08:30 - 10:00
Location / Room: Room 1

Chair:
Christian Plessl, Paderborn University, DE, Contact Christian Plessl

Co-Chair:
Christoph Hagleitner, IBM Research, CH, Contact Christoph Hagleitner

Heterogeneous computing systems with accelerators are claiming the top ranks in the TOP500 list of the largest HPC computing systems and find increasing adoption in hyperscale cloud datacenters. Accelerators offer performance and efficiency gains despite the diminishing returns from traditional technology scaling. The talks in this session will set the stage for this special day and analyze the value proposition of accelerators for traditional and emerging workloads. An overview of this vibrant environment will be followed by more detailed presentations on systems using GPUs and FPGAs

TimeLabelPresentation Title
Authors
08:305.1.1SILICON HETEROGENEITY IN THE CLOUD
Speaker and Author:
Babak Falsafi, EPFL, CH
Abstract
Cloud providers are building infrastructure at unprecedented speeds. We have witnessed the emergence of data-centric information technology in almost every aspect of our life from commerce, healthcare, entertainment, governance to scientific discovery. The demand for processing, communicating and storing data has grown faster than conventional growth in digital platforms. Meanwhile the conventional silicon technologies we have relied on for the past several decades leading to the exponential growth in IT have slowed down (the conventional 40%/year increase in density has dropped to 17%/year in recent years) . In light of this increase in demand on data-centric IT and the diminishing returns in platform scalability, our future increasingly relies on emerging technologies that introduce heterogeneity in both logic and memory. In this talk, I will motivate the grand challenges in scaling digital platforms and data-centric technologies, then present opportunities for pushing the envelope on server architecture in the post-Moore era.
09:005.1.2GPU ACCELERATED COMPUTING IN HPC AND IN THE DATA CENTER
Speaker and Author:
Peter Messmer, NVidia, US
Abstract
Since the introduction of CUDA a bit over a decade ago, heterogeneous computing with GPUs has become increasingly popular in HPC. While the initial applications were mostly exploratory in nature, the processing power, the relatively intuitive programming model and a rapidly growing software ecosystem comprised of tools, libraries and training material helped a broad user community to adopt heterogeneous computing. Today, most of the top HPC applications are therefore GPU accelerated, covering all areas of computational science and engineering, including quantum chemistry, structural mechanics or weather simulation. This trend got an extra boost with the increasing computing demand of machine learning, specifically for training deep neural networks, where the processing power of GPUs was suddenly in demand from non-traditional HPC applications in the datacenter. Today, we therefore find GPUs not only in the fastest supercomputers in the world, but also in the largest datacenters. In this presentation, I will discuss the current impact of GPU in HPC and the data center, look at the challenges still faced by developers and how we are working on mitigating them.
09:305.1.3HETEROGENEOUS COMPUTE ARCHITECTURES FOR DEEP LEARNING IN THE CLOUD
Speaker and Author:
Ken O'Brien, Xilinx Research, IE
Abstract
Accuracy of deep learning algorithms continues to outpace many traditional algorithms, while requiring little domain expertise and no explicit programming. However, they are typically associated with astronomical computational and memory requirements which push the limits of projected performance scalability with future technology nodes. This has led to a surge in innovative computer architectures and chips. Within this talk, we'll take a deeper look at compute and memory requirements for a range of popular neural networks and discuss how emerging architectures, fuelled by cloud dynamics, are trying to overcome this through architectural innovation.
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


5.2 Improving Formal Verification and Applications to GPUs and High-Level Synthesis

Date: Wednesday, March 27, 2019
Time: 08:30 - 10:00
Location / Room: Room 2

Chair:
Alessandro Cimatti, Fondazione Bruno Kessler, IT, Contact Alessandro Cimatti

Co-Chair:
Gianpiero Cabodi, Politecnico di Torino, IT, Contact Gianpiero Cabodi

The session includes three technical and three application papers. The technical papers aim at improving and evaluating advanced model checking engines, and combining algebraic reasoning and SAT. The application papers show how formal verification is used for the correctness of GPU assembly programs, equivalence checking for high-level synthesis, and assessing failure rates of CMOS.

TimeLabelPresentation Title
Authors
08:305.2.1FBPDR: IN-DEPTH COMBINATION OF FORWARD AND BACKWARD ANALYSIS IN PROPERTY DIRECTED REACHABILITY
Speaker:
Tobias Seufert, University of Freiburg, DE
Authors:
Tobias Seufert and Christoph Scholl, University Freiburg, DE
Abstract
We describe a thoroughly interweaved forward and backward version of PDR/IC3 called fbPDR. Motivated by the complementary strengths of PDR and Reverse PDR and by related work showing the benefits of collaboration between the two, fbPDR lifts the combination to a new level. We lay the theoretical groundwork for sharing learned lemmas between PDR and Reverse PDR and demonstrate the effectiveness of our approach on benchmarks from the Hardware Model Checking Competition.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:005.2.2HIGH COVERAGE CONCOLIC EQUIVALENCE CHECKING
Speaker:
Sagar Chaki, Mentor, US
Authors:
Pritam Roy, Sagar Chaki and Pankaj Chauhan, Mentor, US
Abstract
A concolic approach, called SLEC-CF, to check sequential equivalence between a high-level (e.g., C++/SystemC) hardware description and an RTL (e.g., Verilog) is presented. SLEC-CF searches for counterexamples over the possible values of a set of "control signals" in a depth-first lexicographic manner, avoiding values that are unrealizable by any concrete input. In addition, SLEC-CF respects user-specified design constraints during search, thus only producing stimuli that are of relevance to users. It is a superior alternative to random simulations, which produce an overwhelming number of irrelevant stimuli for user-constrained designs, and are therefore of limited effectiveness. To handle complex designs, we present an incremental version of SLEC-CF, which iteratively increases the search depth, and set of control signals, and uses a cache to reuse prior results. We implemented SLEC-CF on top an existing industrial tool for sequential equivalence checking. Experimental results indicate that SLEC-CF clearly outperforms random simulation in terms of coverage achieved. On complex designs, incremental SLEC-CF demonstrates superior ability to achieve good coverage in almost all cases, compared to non-incremental SLEC-CF.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:305.2.3BOSPHORUS: BRIDGING ANF AND CNF SOLVERS
Speaker:
Mate Soos, National University of Singapore, SG
Authors:
Davin Choo1, Mate Soos2, Kian Ming A. Chai1 and Kuldeep S Meel2
1DSO National Laboratories, SG; 2National University of Singapore, SG
Abstract
Algebraic Normal Form (ANF) and Conjunctive Normal Form (CNF) are commonly used to encode problems in Boolean algebra. ANFs are typically solved via Grobner basis algorithms, often using more memory than is feasible; while CNFs are solved using SAT solvers, which cannot exploit the algebra of polynomials naturally. We propose a paradigm that bridges between ANF and CNF solving techniques: the techniques are applied in an iterative manner to learn facts to augment the original problems. Experiments on over 1,100 benchmarks arising from four different applications domains demonstrate that learnt facts can significantly improve runtime and enable more benchmarks to be solved.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:455.2.4CUDA AU COQ: A FRAMEWORK FOR MACHINE-VALIDATING GPU ASSEMBLY PROGRAMS
Speaker:
Benjamin Ferrell, Univeristy Texas at Dallas, US
Authors:
Benjamin Ferrell, Jun Duan and Kevin Hamlen, University of Texas at Dallas, US
Abstract
A prototype framework for formal, machine-checked validation of GPU pseudo-assembly code algorithms using the Coq proof assistant is presented and discussed. The framework is the first to afford GPU programmers a reliable means of formally machine-validating high-assurance GPU computations without trusting any specific source-to-assembly compilation toolchain. A formal operational semantics for the PTX pseudo-assembly language is expressed as inductive, dependent Coq types, facilitating development of proofs and proof tactics that refer directly to the compiled PTX object code. Challenges modeling PTX's complex and highly parallelized computation model in Coq, with sufficient clarity and generality to tractably prove useful properties of realistic GPU programs, are discussed. Examples demonstrate how the prototype can already be used to validate some basic yet realistic programs.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP2-9, 3USING STATISTICAL MODEL CHECKING TO ASSESS RELIABILITY FOR BATHTUB-SHAPED FAILURE RATES
Speaker and Author:
Josef Strnadel, Brno University of Technology, CZ
Abstract
Ideally, the reliability can be assessed analytically, provided that an analytical solution exists and its presumptions are met. Otherwise, alternative approaches to the assessment must apply. This paper proposes a novel, simulation based approach that relies on stochastic timed automata. Based on the automata, our paper explains principles of creating reliability models for various scenarios. Our approach expects that a reliability model is then processed by a statistical model checking method, used to assess the reliability by statistical processing of simulation results over the model. Main goal of this paper is to show that instruments of stochastic timed automata and statistical model checking are capable of facilitating the assessment process even for adverse conditions such as bathtub shaped failure rates.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP2-10, 498EMPIRICAL EVALUATION OF IC3-BASED MODEL CHECKING TECHNIQUES ON VERILOG RTL DESIGNS
Speaker:
Aman Goel, University of Michigan, US
Authors:
Aman Goel and Karem Sakallah, University of Michigan, US
Abstract
IC3-based algorithms have emerged as effective scalable approaches for hardware model checking. In this paper we evaluate six implementations of IC3-based model checkers on a diverse set of publicly-available and proprietary industrial Verilog RTL designs. Four of the six verifiers we examined operate at the bit level and two employ abstraction to take advantage of word-level RTL semantics. Overall, the word-level verifier employing data abstraction outperformed the others, especially on the large industrial designs. The analysis helped us identify several key insights on the techniques underlying these tools, their strengths and weaknesses, differences and commonalities, and opportunities for improvement.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


5.3 EU Projects

Date: Wednesday, March 27, 2019
Time: 08:30 - 10:00
Location / Room: Room 3

Chair:
Martin Schoeberl, Technical University of Denmark, DK, Contact Martin Schöberl

EU Projects

TimeLabelPresentation Title
Authors
08:305.3.1AXIOM: A SCALABLE, EFFICIENT AND RECONFIGURABLE EMBEDDED PLATFORM
Speaker:
Roberto Giorgi, University of Siena, IT
Authors:
Roberto Giorgi, Marco Procaccini and Farnam Khalili, University of Siena, IT
Abstract
Cyber-Physical Systems (CPSs) are becoming widely used in every application that requires interaction between humans and the physical environment. People expect this interaction to happen in real-time and this creates pressure onto system designs due to the ever-higher demand for data processing in the shortest possible and predictable time. Additionally, easy programmability, energy efficiency, and modular scalability are also important to ensure these systems to become widespread. All these requirements push new scientific and technological challenges towards the engineering community. The AXIOM project (Agile, eXtensible, fast I/O Module), presented in this paper, introduces a new hardware-software platform for CPS, which can provide an easy parallel programming model and fast connectivity, in order to scale-up performance by adding multiple boards. The AXIOM platform consists of a custom board based on a Xilinx Zynq Ultrascale+ ZU9EG SoC including four 64-bit ARM cores, the Arduino socket and four high-speed (up to 18Gbps) connectors on USB-C receptacles. By relying on this hardware, DF-Threads, a novel execution model based on dataflow modality, has been developed and tested. In this paper, we highlight some major conclusions of the AXIOM project, such as the gain in performance compared to other parallel programming models such as OpenMPI and Cilk.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:005.3.2APPLICATIONS OF COMPUTATION-IN-MEMORY ARCHITECTURES BASED ON MEMRISTIVE DEVICES
Speaker:
Said Hamdioui, Delft University of Technology, NL
Authors:
Said Hamdioui1, Abu Sebastian2, Shidhartha Das3, Geethan Karunaratne4, Hoang Anh Du Nguyen1, Manuel Le Gallo2, Siebren Schaafsma5, Abbas Rahimi4, Mottaqiallah Taouil1, Francky Catthoor6, Luca Benini4, Sandeep Pande5 and Fernando G. Redondo7
1Delft University of Technology, NL; 2IBM, CH; 3ARM Ltd., GB; 4ETHZ, CH; 5IMEC, NL; 6IMEC, BE; 7ARM, GB
Abstract
Today's computing architectures and device technologies are unable to meet the increasingly stringent demands on energy and performance posed by emerging applications. Therefore, alternative computing architectures are being explored that leverage novel post-CMOS device technologies. One of these is a Computation-in-Memory architecture based on memristive devices. This paper describes the concept of such an architecture and shows different applications that could significantly benefit from it. For each application, the algorithm, the architecture, the primitive operations, and the potential benefits are presented. The applications cover the domains of data analytics, signal processing, and machine learning.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:155.3.3CHIP-TO-CLOUD: AN AUTONOMOUS AND ENERGY EFFICIENT PLATFORM FOR SMART VISION APPLICATIONS
Speaker:
Simone Ciccia, Istituto Superiore Mario Boella (ISMB), IT
Authors:
Alberto Scionti, Simone Ciccia, Olivier Terzo and Giorgio Giordanengo, Istituto Superiore Mario Boella, IT
Abstract
Modern Cloud architectures encompass computing and communication elements that span from traditional data center computing nodes (offering almost infinite resources to satisfy any application demands) to edge-computing and IoT devices (to sense and act on the real world). This paper presents the Cloud architecture devised within the OPERA project, which provides new levels of energy efficiency as a full chip-to-Cloud solution. Focusing on a smart vision application (i.e., road traffic monitoring), the paper presents novel architectural solutions optimised to achieve high energy efficiency at any level: i) the computing elements supporting the acceleration of State-of-the-Art CNNs; and ii) an innovative wireless communication subsystem. Unlike conventional designs, our wireless communication subsystem exploits the advantages of software defined radio (SDN) firmware to control a reconfigurable antenna. To further extend the application range, an energy harvesting module is used to supply power. Besides the edge-IoT, high-density accelerated servers offer capabilities of running complex algorithms within a small power envelop. The effectiveness of the whole architecture has been tested in a real context (i.e., 2 installation sites). In-field measurements demonstrate our claim: high-performance coupled with high energy efficiency over the whole system.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:305.3.4ON THE USE OF HACKATHONS TO ENHANCE COLLABORATION IN LARGE COLLABORATIVE PROJECTS -A PRELIMINARY CASE STUDY OF THE MEGAM@RT2 EU PROJECT -
Speaker:
Alexandra Espinosa Hortelano, Mälardalen University, SE
Authors:
Andrey Sadovykh1, Dragos Truscan2, Pierluigi Pierini3, Gunnar Widforss4, Adnan Ashraf2, Hugo Bruneliere5, Pavel Smrz6, Alessandra Bagnato7, Wasif Afzal4 and Alexandra Espinosa Hortelano4
1SOFTEAM; Innopolis University, FR; 2ABO AKADEMI, FI; 3Intecs S.p.A., IT; 4MAELARDALENS HOEGSKOLA, SE; 5ASSOCIATION POUR LA RECHERCHE ET LE DEVELOPPEMENT DES METHODES ET PROCESS, FR; 6Brno University of Technology, CZ; 7SOFTEAM, FR
Abstract
In this paper, we present the MegaM@Rt2 ECSEL project and discuss our approach for fostering collaboration in the project. We choose an "internal hackathon" approach that focuses on technical collaboration between case study owners and tool/method providers. The novelty of the approach is that we organize the technical workshop as a challenge-based contest at our regular project progress meetings participated by all partners in the project. Case study partners submit their challenges related to the project goals and their use cases in advance. These challenges are concise enough to be experimented within approximately 4 hours. Teams are then formed to address those challenges. The teams comprise of tool/method providers, case study owners and researchers/developers from other consortium members. On the "hackathon" day, partners work together to come with results addressing the challenges that are both interesting to encourage collaboration and convincing to continue further deeper investigations. Obtained results demonstrate that the "hackathon" approach stimulated knowledge exchanges among project partners and triggered new collaborations, notably between tool providers and use case owners.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:455.3.5REALIZATION OF FOUR-TERMINAL SWITCHING LATTICES: TECHNOLOGY DEVELOPMENT AND CIRCUIT MODELING
Speaker:
Mustafa Altun, Istanbul Technical University, TR
Authors:
Serzat Safaltin1, Oguz Gencer1, M. Ceylan Morgul1, Levent Aksoy1, Sebahattin Gurmen1, Csaba Andras Moritz2 and Mustafa Altun1
1Istanbul Technical University, TR; 2University of Massachusetts, Amherst, US
Abstract
Our European Union's Horizon-2020 project aims to develop a complete synthesis and performance optimization methodology for switching nano-crossbar arrays that leads to the design and construction of an emerging nanocomputer. Within the project, we investigate different computing models based on either two-terminal switches, realized with field effect transistors, resistive and diode devices, or four-terminal switches. Although a four-terminal switch based model offers a significant area advantage, its realization at the technology level needs further justifications and raises a number of questions about its feasibility. In this study, we answer these questions. First, by using three dimensional technology computer-aided design (TCAD) simulations, we show that four-terminal switches can be directly implemented with the CMOS technology. For this purpose, we try different semiconductor gate materials in different formations of geometric shapes. Then, by fitting the TCAD simulation data to the standard CMOS current-voltage equations, we develop a Spice model of a four-terminal switch. Finally, we successfully perform Spice circuit simulations on four-terminal switches with different sizes. As a follow-up work within the project, we will proceed to the fabrication step.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP2-11, 1031CO-DESIGN IMPLICATIONS OF ON-DEMAND-ACCELERATION FOR CLOUD HEALTHCARE ANALYTICS: THE AEGLE APPROACH
Speaker:
Konstantina Koliogeorgi, National Technical University of Athens, GR
Authors:
Dimosthenis Masouros1, Konstantina Koliogeorgi1, Georgios Zervakis1, Alexandra Kosvyra2, Achilleas Chytas2, Sotirios Xydis1, Ioanna Chouvarda2 and Dimitrios Soudris3
1National Technical University of Athens, GR; 2Aristotle University of Thessaloniki, GR; 3Democritus University of Thrace, GR
Abstract
Nowadays, big data and machine learning are transforming the way we realize and manage our data. Even though the healthcare domain has recognized big data analytics as a prominent candidate, it has not yet fully grasped their promising benefits that allow medical information to be converted to useful knowledge. In this paper, we introduce AEGLE's big data infrastructure provided as a Platform as a Service. Utilizing the suite of genomic analytics from the Chronic Lymphocytic Leukaemia (CLL) use case, we show that on-demand acceleration is profitable w.r.t a pure software cloud-based solution. However, we further show that on-demand acceleration is not offered as a "free-lunch" and we provide an in-depth analysis and lessons learnt on the co-design implications to be carefully considered for enabling cost-effective acceleration at the cloud-level.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP2-12, 1045MODULAR FPGA ACCELERATION OF DATA ANALYTICS IN HETEROGENOUS COMPUTING
Speaker:
Christoforos Kachris, ICCS-NTUA, GR
Authors:
Christoforos Kachris, Dimitrios Soudris and Elias Koromilas, Democritus University of Thrace, GR
Abstract
Emerging cloud applications like machine learning, AI and big data analytics require high performance computing systems that can sustain the increased amount of data processing without consuming excessive power. Towards this end, many cloud operators have started deploying hardware accelerators, like FPGAs, to increase the performance of computational intensive tasks but increasing the programming complexity to utilize these accelerators. VINEYARD has developed an efficient framework that allows the seamless deployment and utilization of hardware accelerators in the cloud without increasing the programming complexity and offering the flexibility of software packages. This paper presents a modular approach for the acceleration of data analytics using FPGAs. The modular approach allows the automatic development of integrated hardware designs for the acceleration of data analytics. The proposed framework shows the data analytics modules can be used to achieve up to 10x speedup compared to high performance general-purpose processors.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


5.4 Emerging technologies for better NoCs

Date: Wednesday, March 27, 2019
Time: 08:30 - 10:00
Location / Room: Room 4

Chair:
Davide Bertozzi, Università di Ferrara, IT, Contact Davide Bertozzi

Co-Chair:
Gilles Sassatelli, LIRMM CNRS / University of Montpellier, FR, Contact Gilles Sassatelli

This section discusses emerging technologies such as Photonics and ReRam applied to NoCs to enhance functional and non-functional systems parameters. The first paper presents a flexible communication fabric for chiplets ensuring near-100% chip assembly yield. The second paper addresses the energy minimization problem in photonic NoCs by adaptively switching on and off the lasers still accounting for the thermal sensitivity of optical devices. The third paper proposes a NoC-based architecture for training CNNs using ReRAMs for in-memory computing to maximize energy efficiency.

TimeLabelPresentation Title
Authors
08:305.4.1SIPTERPOSER: A FAULT-TOLERANT SUBSTRATE FOR FLEXIBLE SYSTEM-IN-PACKAGE DESIGN
Speaker:
Pete Ehrett, University of Michigan, US
Authors:
Pete Ehrett, Todd Austin and Valeria Bertacco, University of Michigan, US
Abstract
As Moore's Law scaling slows down, specialized heterogeneous designs are needed to sustain computing performance improvements. Unfortunately, the non-recurring engineering (NRE) costs of chip design - designing interconnects, creating masks, etc. - are often prohibitive. Chiplet-based disintegrated design solutions could address these economic issues, but current technologies lack the flexibility to express a rich variety of designs without redesigning the communication substrate. Moreover, as the number of chiplets increases, yield suffers due to 2.5D assembly defects. This work addresses these problems by presenting a flexible communication fabric that supports construction of arbitrary network topologies and provides robust fault-tolerance, demonstrating near-100% chip assembly yield at typical bonding defect rates. We achieve these goals with less than 3% additional power and zero exposed latency overhead for various real-world applications running on an example SiP.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:005.4.2WAVES: WAVELENGTH SELECTION FOR POWER-EFFICIENT 2.5D-INTEGRATED PHOTONIC NOCS
Speaker:
Aditya Narayan, Boston University, US
Authors:
Aditya Narayan1, Yvain Thonnart2, Pascal Vivet2, César Fuguet Tortolero2 and Ayse Kivilcim Coskun1
1Boston University, US; 2Univ. Grenoble Alpes, CEA-Leti, FR
Abstract
Photonic Network-on-Chips (PNoCs) offer promising benefits over Electrical Network-on-Chips (ENoCs) in manycore systems owing to their lower latencies, higher bandwidth, and lower energy-per-bit communication with negligible data-dependent power. These benefits, however, are limited by a number of challenges. Microring resonators (MRRs) that are used for photonic communication have high sensitivity to process variations and on-chip thermal variations, giving rise to possible resonant wavelength mismatches. State-of-the-art microheaters, which are used to tune the resonant wavelength of MRRs, have poor efficiency resulting in high thermal tuning power. In addition, laser power and high static power consumption of drivers, serializers, comparators, and arbitration logic partially negate the benefits of the sub-pJ operating regime that can be obtained with PNoCs. To reduce PNoC power consumption, this paper introduces WAVES, a wavelength selection technique to identify and activate the minimum number of laser wavelengths needed, depending on an application's bandwidth requirement. Our results on a simulated 2.5D manycore system with PNoC demonstrate an average of 23% (resp. 38%) reduction in PNoC power with only <1% (resp. <5%) loss in system performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:305.4.3REGENT: A HETEROGENEOUS RERAM/GPU-BASED ARCHITECTURE ENABLED BY NOC FOR TRAINING CNNS
Speaker:
Biresh Joardar, Washington State University, US
Authors:
Biresh Joardar1, Bing Li2, Jana Doppa1, Hai (Helen) Li2, Partha Pratim Pande1 and Krishnendu Chakrabarty2
1Washington State University, US; 2Duke University, US
Abstract
The growing popularity of Convolutional Neural Networks (CNNs) has led to the search for efficient computational platforms to enable these algorithms. Resistive random-access memory (ReRAM)-based architectures offer a promising alternative to commonly used GPU-based platforms in this regard. However, backpropagation in CNNs is susceptible to the limited precision of ReRAMs. As a result, training CNNs on ReRAMs affects final accuracy of learned model. In this work, we propose REGENT, a heterogeneous architecture that combines ReRAM arrays with GPU cores using 3D integration and a high-throughput yet energy efficient Network-on-Chip (NoC) for high precision training. We also propose a bin-packing based framework that maps CNN layers and then optimizes the placement of computing elements to meet the targeted design objectives. Experimental evaluations indicate that the designed NoC can improve performance by 13.5% on average compared to another state-of-the-art counterpart. Also, REGENT improves full-system EDP on an average by 55.7% compared to conventional GPU-only platforms for training CNN workloads.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP2-13, 134ACDC: AN ACCURACY- AND CONGESTION-AWARE DYNAMIC TRAFFIC CONTROL METHOD FOR NETWORKS-ON-CHIP
Speaker:
Siyuan Xiao, South China University of Technology, CN
Authors:
Siyuan Xiao1, Xiaohang Wang1, Maurizio Palesi2, Amit Kumar Singh3 and Terrence Mak4
1South China University of Technology, CN; 2University of Catania, IT; 3University of Essex, GB; 4University of Southampton, GB
Abstract
Many applications exhibit error forgiving features. For these applications, approximate computing provides the opportunity of accelerating the execution time or reducing power consumption, by mitigating computation effort to get an approximate result. Among the components on a chip, network-on-chip (NoC) contributes a large portion to system power and performance. In this paper, we exploit the opportunity of aggressively reducing network congestion and latency by selectively dropping data. Essentially, the importance of the dropped data is measured based on a quality model. An optimization problem is formulated to minimize the network congestion with constraint of the result quality. A lightweight online algorithm is proposed to solve this problem. Experiments show that on average, our proposed method can reduce the execution time by as much as 12.87% and energy consumption by 12.42% under strict quality requirement, speedup execution by 19.59% and reduce energy consumption by 21.20% under relaxed requirement, compared to a recent work on approximate computing approach for NoCs.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP2-14, 300POWER AND PERFORMANCE OPTIMAL NOC DESIGN FOR CPU-GPU ARCHITECTURE USING FORMAL MODELS
Speaker:
Nader Bagherzadeh, University of California Irvine, US
Authors:
Lulwah Alhubail and Nader Bagherzadeh, University of California - Irvine, US
Abstract
Heterogeneous computing architectures that fuse both CPU and GPU on the same chip are common nowa-days. Using homogeneous interconnect for such heterogeneous processors each with different network demands can result in performance degradation. In this paper, we focused on designing a heterogeneous mesh-style network-on-chip (NoC) to connect heterogeneous CPU-GPU processors. We tackled three problems at once; mapping Processing Elements (PEs) to the routers of the mesh, assigning the number of virtual channels (VC), and assigning the buffer size (BS) for each port of each router in the NoC. By relying on formal models, we developed a method based on Strength Pareto Evolutionary Algorithm2 (SPEA2) to obtain the Pareto optimal set that optimizes communication performance and power consumption of the NoC. By validating our method on a full-system simulator, results show that the NoC performance can be improved by 17% while minimizing the power consumption by at least 2.3x and maintaining the overall system performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


5.5 Hardware Obfuscation

Date: Wednesday, March 27, 2019
Time: 08:30 - 10:00
Location / Room: Room 5

Chair:
Francesco Regazzoni, ALARI-USI, CH, Contact Francesco Regazzoni

Co-Chair:
Daniel Grosse, University of Bremen, DE, Contact Daniel Grosse

Obfuscation is becoming a popular technique to protect IPs and designs. This session reports the last advances in protection based on obfuscation and on methodology for attacking them.

TimeLabelPresentation Title
Authors
08:305.5.1DESIGN OBFUSCATION THROUGH SELECTIVE POST-FABRICATION TRANSISTOR-LEVEL PROGRAMMING
Speaker:
Yiorgos Makris, The University of Texas at Dallas, US
Authors:
Mustafa Shihab, Jingxiang Tian, Gaurav Rajavendra Reddy, Bo Hu, William Swartz Jr., Benjamin Carrion Schaefer, Carl Sechen and Yiorgos Makris, The University of Texas at Dallas, US
Abstract
Widespread adoption of the fabless business model and utilization of third-party foundries have increased the exposure of sensitive designs to security threats such as intellectual property (IP) theft and integrated circuit (IC) counterfeiting. As a result, concerted interest in various design obfuscation schemes for deterring reverse engineering and/or unauthorized reproduction and usage of ICs has surfaced. To this end, in this paper we present a novel mechanism for structurally obfuscating sensitive parts of a design through post-fabrication TRAnsistor-level Programming (TRAP). We introduce a transistor-level programmable fabric and we discuss its unique advantages towards design obfuscation, as well as a customized CAD framework for seamlessly integrating this fabric in an ASIC design flow. We theoretically analyze the complexity of attacking TRAP-obfuscated designs through both brute-force and intelligent SAT-based attacks and we present a silicon implementation of a platform for experimenting with TRAP. Effectiveness of the proposed method is evaluated through selective obfuscation of various modules of a modern microprocessor design. Results corroborate that, as compared to an FPGA implementation, TRAP-based obfuscation offers superior resistance against both brute-force and oracle-guided SAT attacks, while incurring an order of magnitude less area, power and delay overhead.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:005.5.2KC2: KEY-CONDITION CRUNCHING FOR FAST SEQUENTIAL CIRCUIT DEOBFUSCATION
Speaker:
Yier Jin, University of Florida, US
Authors:
Kaveh Shamsi1, Meng Li2, David Z. Pan2 and Yier Jin1
1University of Florida, US; 2University of Texas, Austin, US
Abstract
Logic locking and IC camouflaging are two promising techniques for thwarting an array of supply chain threats. Logic locking can hide the design from the foundry as well as end-users and IC camouflaging can thwart IC reverse engineering by end-users. Oracle-guided SAT-based deobfuscation attacks against these schemes have made it more and more difficult to securely implement them with low overhead. Almost all of the literature on SAT attacks is focused on combinational circuits. A recent first implementation of oracle-guided attacks on sequential circuits showed a drastic increase in deobfuscation time versus combinational circuits. In this paper we show that integrating the sequential SAT-attack with incremental bounded-model-checking, and dynamic simplification of key-conditions (Key-Condition Crunching or KC2), we are able to reduce the runtime of sequential SAT-attacks by two orders of magnitude across benchmark circuits, significantly reducing the gap between sequential and combinational deobfuscation. These techniques are applicable to combinational deobfuscation as well and thus represent a generic improvement to deobfuscation procedures and help better understand the complexity of deobfuscation for designing secure locking/camouflaging schemes.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:305.5.3PIERCING LOGIC LOCKING KEYS THROUGH REDUNDANCY IDENTIFICATION
Speaker:
Alex Orailoglu, University of California, San Diego, US
Authors:
Leon Li and Alex Orailoglu, UC San Diego, US
Abstract
The globalization of the IC supply chain witnesses the emergence of hardware attacks such as reverse engineering, hardware Trojans, IP piracy and counterfeiting. The consequent losses sum to billions of dollars for the IC industry. One way to defend against these threats is to lock the circuit by inserting additional key-controlled logic such that correct outputs are produced only when the correct key is applied. The viability of logic locking techniques in precluding IP piracy has been tested by researchers who have shown extensive weaknesses when access to a functional IC is guaranteed. In this paper, we uncover weaknesses of logic locking techniques when the attacker has no access to an activated IC, thus exposing vulnerabilities at the earliest stage even for applications that seek refuge from attacks through functional opaqueness. We develop an attack algorithm that prunes out the incorrect value of each key bit when it introduces a significant level of logic redundancy. Throughout our experiments on ISCAS-85 and ISCAS-89 benchmark circuits, the attack deciphers more than half of the key bits on average with a high accuracy.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP2-15, 595DEEP LEARNING-BASED CIRCUIT RECOGNITION USING SPARSE MAPPING AND LEVEL-DEPENDENT DECAYING SUM CIRCUIT REPRESENTATION
Speaker:
Massoud Pedram, University of southern california, US
Authors:
Arash Fayyazi1, Soheil Shababi2, Pierluigi Nuzzo2, Shahin Nazarian2 and Massoud Pedram1
1University of southern california, US; 2University of Southern California, US
Abstract
Efficiently recognizing the functionality of a circuit is key to many applications, such as formal verification, reverse engineering, and security. We present a scalable framework for gate-level circuit recognition that leverages deep learning and a convolutional neural network (CNN)-based circuit representation. Given a standard cell library, we present a sparse mapping algorithm to improve the time and memory efficiency of the CNN-based circuit representation. Sparse mapping allows encoding only the logic cell functionality, independently of implementation parameters such as timing or area. We further propose a data structure, termed level-dependent decaying sum (LDDS) existence vector, which can compactly represent information about the circuit topology. Given a reference gate in the circuit, an LDDS vector can capture the function of the gates in the input and output cones as well as their distance (number of stages) from the reference. Compared to the baseline approach, our framework obtains more than an-order-of-magnitude reduction in the average training time and 2× improvement in the average runtime for generating CNN-based representations from gate-level circuits, while achieving 10% higher accuracy on a set of benchmarks including EPFL and ISCAS'85 circuits.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP2-16, 762PARTIAL ENCRYPTION OF BEHAVIORAL IPS TO SELECTIVELY CONTROL THE DESIGN SPACE IN HIGH-LEVEL SYNTHESIS
Speaker:
Farah Taher, The University of Texas at Dallas, US
Authors:
Zi Wang and Benjamin Carrion Schaefer, The University of Texas at Dallas, US
Abstract
Abstract—Commercial High-Level Synthesis(HLS) tool vendors have started to enable ways toprotect Behavioral IP (BIPs) from being unlawful used.The main approach is to provide tools to encrypt these BIPs which can be decrypted by the HLS tool only. The main problem with this approach is that encrypting the IP does not allow BIP users to insert synthesis directives into the source code in the form of pragmas (comments), and hence cancels out one of the most important advantages of C-based VLSI design: The ability to automatically generate micro-architectures with unique design metrics,e.g.area, power and performance.This work studies the infect to the search space when synthesis directives are not able to be inserted in to the encrypted IP source code while other options are still available to the BIP users (e.g.setting global synthesis options and limiting the number and type of functional units) and proposes a method that selectively controls the search space by encrypting different portions of the BIP. To achieve this goal we propose a fast heuristic based on divide and conquer method.Experimental results show that our proposed method works well compared to an exhaustive search that leads to the optimal solution.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


5.6 Energy efficiency in IoT - Edge to Cloud

Date: Wednesday, March 27, 2019
Time: 08:30 - 10:00
Location / Room: Room 6

Chair:
Semeen Rehman, TU Wien, AT, Contact Semeen Rehman

Co-Chair:
Baris Aksanli, San Diego State University, US, Contact Baris Aksanli

This session includes three papers discussing energy efficiency in the IoT device hierarchy. The first paper shows an energy-aware checkpointing mechanism for devices capable of energy harvesting. The second paper builds an energy-efficient, hardware-supported synchronization mechanism for ultra-low-power devices. The third paper implements energy-efficient video transcoding for cloud servers. The session also features an IP paper, that proposes a software/hardware co-design of a digital baseband processor for the IoT applications.

TimeLabelPresentation Title
Authors
08:305.6.1FLEXICHECK: AN ADAPTIVE CHECKPOINTING ARCHITECTURE FOR ENERGY HARVESTING DEVICES
Speaker:
Priyanka Singla, IIT Delhi, IN
Authors:
Priyanka Singla, Shubhankar Suman Singh and Smruti R. Sarangi, IIT Delhi, IN
Abstract
With the advent of 5G and M2M architectures, energy harvesting devices are expected to become far more prevalent. Such devices harvest energy from ambient sources such as solar energy or vibration energy (from machines) and use it for sensing the environmental parameters and further processing them. Given that the rate of energy consumption is more than the rate of energy production, it is necessary to frequently halt the processor and accumulate energy from the environment. During this period it is mandatory to take a checkpoint to avoid the loss of data. State of the art algorithms use software based methods that extensively rely on compiler analyses. In this paper, we provide the first formal model for such systems, and show that we can arrive at an optimal check- pointing schedule using a quadratically constrained linear pro- gram (QCLP) solver. Using this as a baseline, we show that existing algorithms for checkpointing significantly underperform. Furthermore, we prove and demonstrate that when we have a relatively constant energy source, a greedy algorithm provides an optimal solution. To model more complex situations where the energy varies, we create a novel checkpointing algorithm that adapts itself according to the ambient energy. We obtain a speedup of 2 − 5× over the nearest competing approach, and we are within 3 − 8% of the optimal solution in the general case where the ambient energy exhibits variations.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:005.6.2HARDWARE-ACCELERATED ENERGY-EFFICIENT SYNCHRONIZATION AND COMMUNICATION FOR ULTRA-LOW-POWER TIGHTLY COUPLED CLUSTERS
Speaker:
Florian Glaser, ETH Zurich, CH
Authors:
Florian Glaser1, Germain Haugou1, Davide Rossi2, Qiuting Huang1 and Luca Benini1
1ETH Zürich, CH; 2Università di Bologna, IT
Abstract
Parallel ultra low power computing is emerging as an enabler to meet the growing performance and energy efficiency demands in deeply embedded systems such as the end-nodes of the internet-of-things (IoT). The parallel nature of these systems however adds a significant degree of complexity as processing elements (PEs) need to communicate in various ways to organize and synchronize execution. Naive implementations of these central and non-trivial mechanisms can quickly jeopardize overall system performance and limit the achievable speedup and energy efficiency. To avoid this bottleneck, we present an event-based solution centered around a technology-independent, light-weight and scalable (up to 16 cores) synchronization and communication unit (SCU) and its integration into a shared-memory multicore cluster. Careful design and tight coupling of the SCU to the data interfaces of the cores allows to execute common synchronization procedures with a single instruction. Furthermore, we present hardware support for the common barrier and lock synchronization primitives with a barrier latency of only eleven cycles, independent of the number of involved cores. We demonstrate the efficiency of the solution based on experiments with a post-layout implementation of the multicore cluster in a 22 nm CMOS process where the SCU constitutes less than 2 % of area overhead. Our solution supports parallel sections as small as 100 or 72 cycles with a synchronization overhead of just 10 %, an improvement of up to 14 or 30 times with respect to cycle count or energy, respectively, compared to a test-and-set based implementation.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:305.6.3MAMUT: MULTI-AGENT REINFORCEMENT LEARNING FOR EFFICIENT REAL-TIME MULTI-USER VIDEO TRANSCODING
Speaker:
Luis Costero, Universidad Complutense de Madrid, ES
Authors:
Luis Costero1, Arman Iranfar2, Marina Zapater2, Francisco D. Igual1, Katzalin Olcoz3 and David Atienza2
1Dpto. de Arquitectura de computadores y Automática. Universidad Complutense de Madrid, ES; 2EPFL, CH; 3Dpto. de Arquitectura de Computadores y Automática. Universidad Complutense de Madrid, ES
Abstract
Video transcoding has recently raised as a valid alternative to address the ever-increasing demands for video contents in servers' infrastructures in current multi-user environments, as it enhances user experience by providing the adequate video configuration, reduces pressure on the network, and minimizes inefficient and costly video storage. The advent of next-generation video coding standards makes efficient transcoding feasible. However, the computational complexity of HEVC, together with its myriad of configuration parameters, raises challenges for power management, throughput control, and Quality of Service (QoS) satisfaction. This is particularly challenging in multi-user environments where multiple users with different resolution demands and bandwidth constraints need to be served simultaneously. In this work, we present MAMUT, a multi-agent machine learning approach to tackle these challenges. Our proposal breaks the design space composed of runtime adaptation of the transcoder and system parameters into smaller sub-spaces that can be explored in a reasonable time by individual agents. While working cooperatively, each agent is in charge of learning and dynamically applying the optimal values for internal HEVC parameters and system-wide parameters such as number of threads per video and operating frequency, targeting throughput and video quality as objectives, and compression and power consumption as constraints. We implement M AMUT on an enterprise multicore server and compare equivalent scenarios to state-of-the-art alternative approaches. The obtained results reveal that MAMUT consistently attains up to 8x improvement in terms of FPS violations (and thus Quality of Service), 24% power reduction, and faster and more accurate adaptation both to the video contents and the available resources.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP2-17, 271SOFTWARE-HARDWARE CO-DESIGN OF MULTI-STANDARD DIGITAL BASEBAND PROCESSOR FOR IOT
Speaker:
Carolynn Bernier, CEA-Leti, FR
Authors:
Hela Belhadj Amor and Carolynn Bernier, CEA, LETI, FR
Abstract
This work demonstrates an ultra-low power, software-defined wireless transceiver designed for IoT applications using an open-source 32-bit RISC-V core. The key driver behind this success is an optimized hardware/software partitioning of the receiver's digital signal processing operators. We benchmarked our architecture on an algorithm for the detection of FSK-modulated frames using a RISC-V compatible core and ARM Cortex-M series processors. We use only standard compilation tools and no assembly-level optimizations. Our results show that Bluetooth LE frames can be detected with an estimated peak core power consumption of 1.6 mW on a 28 nm FDSOI technology, and falling to less than 0.6 mW (on average) during symbol demodulation. This is achieved at nominal voltage. Compared to state of the art, our work offers a power efficient alternative to the design of dedicated baseband processors for ultra-low power software-defined radios with a low software complexity.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


5.7 Data-driven Acceleration

Date: Wednesday, March 27, 2019
Time: 08:30 - 10:00
Location / Room: Room 7

Chair:
Christian Fabre, CEA-Leti, FR, Contact Christian Fabre

Co-Chair:
Borzoo Bonakdarpour, Iowa State University, US, Contact Borzoo Bonakdarpour

This session presents accelerated computing paradigms guided by application-data criticality. The first paper presents a compiler for processing-in-memory (PIM) architectures. The second paper proposes a novel kernel tilling approach to reduce access to L2 cache. The third paper introduces data subsetting to reduce memory traffic for approximate computing platforms. The IPs deal with the RISC5 extensions for low-precision floating-point operations and GPU-based predictable execution.

TimeLabelPresentation Title
Authors
08:305.7.1A COMPILER FOR AUTOMATIC SELECTION OF SUITABLE PROCESSING-IN-MEMORY INSTRUCTIONS
Speaker:
Luigi Carro, UFRGS - Federal University of Rio Grande do Sul, BR
Authors:
Hameeza Ahmed1, Paulo Cesar Santos2, Joao Paulo Lima2, Rafael F. de Moura2, Marco Antonio Zanata Alves3, Antonio Carlos Schneider Beck2 and Luigi Carro2
1NED University of Engineering and Technology, PK; 2UFRGS - Universidade Federal do Rio Grande do Sul, BR; 3UFPR, BR
Abstract
Although not a new technique, due to the advent of 3D-stacked technologies, the integration of large memories and logic circuitry able to compute large amount of data has revived the Processing-in-Memory (PIM) techniques. PIM is a technique to increase performance while reducing energy consumption when dealing with large amounts of data. Despite several designs of PIM are available in the literature, their effective implementation still burdens the programmer. Also, various PIM instances are required to take advantage of the internal 3D-stacked memories, which further increases the challenges faced by the programmers. In this way, this work presents the Processing-In-Memory cOmpiler (PRIMO). Our compiler is able to efficiently exploit large vector units on a PIM architecture, directly from the original code. PRIMO is able to automatically select suitable PIM operations, allowing its automatic offloading. Moreover, PRIMO concerns about several PIM instances, selecting the most suitable instance while reduces internal communication between different PIM units. The compilation results of different benchmarks depict how PRIMO is able to exploit large vectors, while achieving a near-optimal performance when compared to the ideal execution for the case study PIM. PRIMO allows a speedup of 38x for specific kernels, while on average achieves 11.8x for a set of benchmarks from PolyBench Suite.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:005.7.2CACHE-AWARE KERNEL TILING: AN APPROACH FOR SYSTEM-LEVEL PERFORMANCE OPTIMIZATION OF GPU-BASED APPLICATIONS
Speaker:
ARIAN MAGHAZEH, Linköping University, SE
Authors:
Arian Maghazeh1, Sudipta Chattopadhyay2, Petru Eles1 and Zebo Peng1
1Linköping University, SE; 2Singapore University of Technology and Design (SUTD), SG
Abstract
We present a software approach to address the data latency issue for certain GPU applications. Each application is modeled as a kernel graph, where the nodes represent individual GPU kernels and the edges capture data dependencies. Our technique exploits the GPU L2 cache to accelerate parameter passing between the kernels. The key idea is that, instead of having each kernel process the entire input in one invocation, we subdivide the input into fragments (which fit in the cache) and, ideally, process each fragment in one continuous sequence of kernel invocations. Our proposed technique is oblivious to kernel functionalities and requires minimal source code modification. We demonstrate our technique on a full-fledged image processing application and improve the performance on average by 30% over various settings.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:305.7.3DATA SUBSETTING: A DATA-CENTRIC APPROACH TO APPROXIMATE COMPUTING
Speaker:
Younghoon Kim, Purdue University, KR
Authors:
Younghoon Kim1, Swagath Venkataramani2, Nitin Chandrachoodan3 and Anand Raghunathan1
1Purdue University, US; 2IBM T. J. Watson Research Center, US; 3Indian Institute of Technology Madras, IN
Abstract
Approximate Computing (AxC), which leverages the intrinsic resilience of applications to approximations in their underlying computations, has emerged as a promising approach to improving computing system efficiency. Most prior efforts in AxC take a compute-centric approach and approximate arithmetic or other compute operations through design techniques at different levels of abstraction. However, emerging workloads such as machine learning, search and data analytics process large amounts of data and are significantly limited by the memory sub-systems of modern computing platforms. In this work, we shift the focus of approximations from computations to data, and propose a data-centric approach to AxC, which can boost the performance of memory-subsystem-limited applications. The key idea is to modulate the application's data-accesses in a manner that reduces off-chip memory traffic. Specifically, we propose a data-access approximation technique called data subsetting, in which all accesses to a data structure are redirected to a subset of its elements so that the overall footprint of memory accesses is decreased. We realize data subsetting in a manner that is transparent to hardware and requires only minimal changes to application software. Recognizing that most applications of interest represent and process data as multi-dimensional arrays or tensors, we develop a templated data structure called SubsettableTensor that embodies mechanisms to define the accessible subset and to suitably redirect accesses to elements outside the subset. As a further optimization, we observe that data subsetting may cause some computations to become redundant and propose a mechanism for application software to identify and eliminate such computations. We implement SubsettableTensor as a C++ class and evaluate it using parallel software implementations of 7 machine learning applications on a 48-core AMD Opteron server. Our experiments indicate that data subsetting enables 1.33x to 4.44x performance improvement with <0.5% loss in application-level quality, underscoring its promise as a new approach to approximate computing.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP2-18, 673TAMING DATA CACHES FOR PREDICTABLE EXECUTION ON GPU-BASED SOCS
Speaker:
Björn Forsberg, ETH Zürich, CH
Authors:
Björn Forsberg1, Luca Benini2 and Andrea Marongiu2
1ETH Zürich, CH; 2Università di Bologna, IT
Abstract
Heterogeneous SoCs (HeSoCs) typically share a single DRAM between the CPU and GPU, making workloads susceptible to memory interference, and predictable execution troublesome. State-of-the art predictable execution models (PREM) for HeSoCs prefetch data to the GPU scratchpad memory (SPM), for computations to be insensitive to CPU-generated DRAM traffic. However, the amount of work that the small SPM sizes allow is typically insufficient to absorb CPU/GPU synchronization costs. On-chip caches are larger, and would solve this issue, but have been argued too unpredictable due to self-evictions. We show how self-eviction can be minimized in GPU caches via clever managing of prefetches, thus lowering the performance cost, while retaining timing predictability.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP2-19, 739DESIGN AND EVALUATION OF SMALLFLOAT SIMD EXTENSIONS TO THE RISC-V ISA
Speaker:
Giuseppe Tagliavini, Università di Bologna, IT
Authors:
Giuseppe Tagliavini1, Stefan Mach2, Davide Rossi1, Andrea Marongiu1 and Luca Benini1
1Università di Bologna, IT; 2ETH Zurich, CH
Abstract
RISC-V is an open-source instruction set architecture (ISA) with a modular design consisting of a mandatory base part plus optional extensions. The RISC-V 32IMFC ISA configuration has been widely adopted for the design of new-generation, low-power processors. Motivated by the important energy savings that smaller-than-32-bit FP types have enabled in several application domains and related compute platforms, some recent studies have published encouraging early results for their adoption in RISC-V processors. In this paper we introduce a set of ISA extensions for RISC-V 32IMFC, supporting scalar and SIMD operations (fitting the 32-bit register size) for 8-bit and two 16-bit FP types. The proposed extensions are enabled by exposing the new FP types to the standard C/C++ type system and an implementation for the RISC-V GCC compiler is presented. As a further, novel contribution, we extensively characterize the performance and energy savings achievable with the proposed extensions. On average, experimental results show that their adoption provide benefits in terms of performance (1.64x speedup for 16-bit and 2.18x for 8-bit types) and energy consumption (30% saving for 16-bit and 50% for 8-bit types). We also illustrate an approach based on automatic precision tuning to make effective use of the new FP types.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:02IP2-20, 24VDARM: DYNAMIC ADAPTIVE RESOURCE MANAGEMENT FOR VIRTUALIZED MULTIPROCESSOR SYSTEMS
Speaker:
Jianmin Qian, Shanghai Jiao Tong University, CN
Authors:
Jianmin Qian, Jian Li, Ruhui Ma and Haibing Guan, Shanghai Jiao Tong University, CN
Abstract
Modern data center servers have been enhancing their computing capacity by increasing processor counts. Meanwhile, these servers are highly virtualized to achieve efficient resource utilization and energy savings. However, due to the shifting of server architecture to non-uniform memory access (NUMA), current hypervisor-level or OS-level resource management methods continue to be challenged in their ability to meet the performance requirement of various user applications. In this work, we first build a performance slowdown model to accurate identify the current system overheads. Based on the model, we finally design a dynamic adaptive virtual resource management method (vDARM) to eliminate the runtime NUMA overheads by re-configuring virtual-to-physical resource mappings. Experiment results show that, compared with state-of-art approaches, vDARM can bring up an average performance improvement of 36.2% on a 8-node NUMA machines. Meanwhile, vDARM only incurs extra CPU utilization no more than 4%.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


5.8 Special Session: The ARAMiS II Project - Efficient Use of Multicore for safety-critical Applications

Date: Wednesday, March 27, 2019
Time: 08:30 - 10:00
Location / Room: Exh. Theatre

Organisers:
Timo Sandmann, Karlsruhe Institute of Technology, DE, Contact Timo Sandmann
Jürgen Becker, Karlsruhe Institute of Technology, DE, Contact Juergen Becker

Chair:
Timo Sandmann, Karlsruhe Institute of Technology, DE, Contact Timo Sandmann

Safety-critical applications in the domains automotive and avionics as well as the future topic Industry 4.0 show a clear and still increasing demand for digital processing power. This demand for processing power is needed, e.g. for highly automated driving and connected machines with realtime requirements. Furthermore, this demand is substantiated by an increasing interaction and integration with other systems and services. This justifies the usage of multicore technology in embedded systems in the near future, which is already successfully applied in other application domains like PCs, tablets and smartphones. However, safety-critical applications in the above-mentioned domains show many additional complex requirements, which at present can, if at all, only be fulfilled partly with an unjustified high development effort. The proposed special session shall present a summary of the most important achieved results and research topics regarding an efficient use of multicore systems in safetycritical applications.

TimeLabelPresentation Title
Authors
08:305.8.1ARAMIS II PROJECT OVERVIEW
Speaker and Author:
Rolf Ernst, TU Braunschweig, DE
Abstract
This talk covers a global overview of the ARAMiS II project and presents current challenges for using multicore-based architectures in safety-critical application domains like Automotive, Avionics and Industry Automation. Furthermore, a summary is given how these challenges were addresses in the ARAMiS II project. The focus of the ARAMiS II project is on optimization and advancement of the development processes, especially development tools and platforms for the efficient usage of multicore technology.
08:525.8.2ARAMIS II DEVELOPMENT PROCESS FOR MODEL-BASED MULTICORE SOFTWARE DEVELOPMENT
Author:
Kuntz Stefan, Continental AG, US
Abstract
This presentation introduces the ARAMiS II development process for model-based multicore software development. As the main goal of the research project ARAMiS II is to evaluate, design, implement and validate existing or new methods, tools and platforms for the efficient use of multicore-based platforms in the domains Automotive, Avionic and Industry Automation, a mainly top-down driven development process is developed supported by a consistent modeling approach.
09:145.8.3METHODS AND TOOLS SUPPORTING MULTICORE DEVELOPMENT
Author:
Bernhard Bauer, University of Augsburg, DE
Abstract
The talk on methods and tools of the ARAMiS II research project presents the different aspects of multicore development. One of the main goals is the realization of the developed concepts through tools supporting the different aspects of multicore development, e.g. temporal correctness of the system design, scheduling, etc. Furthermore, the integration and interoperability issues of these tools are important, as they have to be connected to a toolchain by specified in- and output artifacts. Depending on the targeted application domain, the design methods and tool chain must cover several aspects, e.g., fail-operational system design, performance and/or timing requirements.
09:365.8.4AUTOMOTIVE POWERTRAIN DEMONSTRATOR
Author:
Sebastian Kehr, Denso Automotive Deutschlang GmbH, DE
Abstract
This talk presents an industrial use case build and evaluated in the ARAMiS II project. It shows an efficient development supported by toolchains, which allow migrating existing software to modern multicore systems efficiently, also in safety-critical domains. The overall goal is to ensure, that the migration process to multicore itself is fast, leads to high quality software and is to the most part automated, e.g. needs few inputs from developers.
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


IP2 Interactive Presentations

Date: Wednesday, March 27, 2019
Time: 10:00 - 10:30
Location / Room: Poster Area

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

LabelPresentation Title
Authors
IP2-1TRANSREC: IMPROVING ADAPTABILITY IN SINGLE-ISA HETEROGENEOUS SYSTEMS WITH TRANSPARENT AND RECONFIGURABLE ACCELERATION
Speaker:
Marcelo Brandalero, Universidade Federal do Rio Grande do Sul (UFRGS), BR
Authors:
Marcelo Brandalero1, Muhammad Shafique2, Luigi Carro1 and Antonio Carlos Schneider Beck1
1UFRGS - Universidade Federal do Rio Grande do Sul, BR; 2Vienna University of Technology (TU Wien), AT
Abstract
Single-ISA heterogeneous systems, such as ARM's big.LITTLE, use microarchitecturally-different General-Purpose Processor cores to efficiently match the capabilities of the processing resources with applications' performance and energy requirements that change at run time. However, since only a fixed and non-configurable set of cores is available, reaching the best possible match between the available resources and applications' requirements remains a challenge, especially considering the varying and unpredictable workloads. In this work, we propose TransRec, a hardware architecture which improves over these traditional heterogeneous designs. TransRec integrates a shared, transparent (i.e., no need to change application binary) and adaptive accelerator in the form of a Coarse-Grained Reconfigurable Array that can be used by any of the General-Purpose Processor cores for on-demand acceleration. Through evaluations with cycle-accurate gem5 simulations, synthesis of real RISC-V processor designs for a 15nm technology, and considering the effects of Dynamic Voltage and Frequency Scaling, we demonstrate that TransRec provides better performance-energy tradeoffs that are otherwise unachievable with traditional big.LITTLE-like designs. In particular, for less than 40% area overhead, TransRec can improve performance in the low-energy mode (LITTLE) by 2.28x, and can improve both performance and energy efficiency by 1.32x and 1.59x, respectively, in high-performance mode (big).

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-2CADE: CONFIGURABLE APPROXIMATE DIVIDER FOR ENERGY EFFICIENCY
Speaker:
Mohsen Imani, University of California, San Diego, US
Authors:
Mohsen Imani, Ricardo Garcia, Andrew Huang and Tajana Rosing, University of California San Diego, US
Abstract
Approximate computing is a promising solution to design faster and more energy efficient systems, which provides an adequate quality for a variety of functions. Division, in particular, floating point division, is one of the most important operations in multimedia applications, which has been implemented less in hardware due to its significant cost and complexity. In this paper, we proposed CADE, a Configurable Approximate Divider which performs floating point division operation with a runtime controllable accuracy. The approximation of the CADE is accomplished by removing the costly division operation and replacing it with a subtraction of the input operands mantissa. To increase the level of accuracy, CADE analyses the first N bits (called tuning bits) of both input operands mantissa to estimate the division error. If CADE determines that the first approximation is unacceptable, a pre-computed value is retrieved from memory and subtracted from the first approximation mantissa. At runtime, CADE can provide a higher accuracy by increasing the number of tuning bits. The proposed CADE was integrated on the AMD GPU architecture. Our evaluation shows that CADE is at least 4.1× more energy efficient, 1.5× faster, and 1.7× higher area efficient as compared to state-of-the-art approximate dividers while providing 25% lower error rate. In addition, CADE gives a new knob to GPU in order to configure the level of approximation at runtime depending on the application/user accuracy requirement.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-3HCFTL: A LOCALITY-AWARE PAGE-LEVEL FLASH TRANSLATION LAYER
Speaker:
Hao Chen, University of Science and Technology of China, CN
Authors:
Hao Chen1, Cheng Li1, Yubiao Pan2, Min Lyu1, Yongkun Li1 and Yinlong Xu1
1University of Science and Technology of China, CN; 2Huaqiao University, CN
Abstract
The increasing capacity of SSDs requires a large amount of built-in DRAM to hold the mapping information of logical-to-physical address translation. Due to the limited size of DRAM, existing FTL schemes selectively keep some active mapping entries in a Cached Mapping Table (CMT) in DRAM, while storing the entire mapping table on flash. To improve the CMT hit ratio with limited cache space on SSDs, in this paper, we propose a novel FTL, a hot-clusterity FTL (HCFTL) that clusters mapping entries recently evicted from the cache into dynamic translation pages (DTPs). Given the temporal localities that those hot entries are likely to be visited in near future, loading DTPs will increase the CMT hit ratio and thus improve the FTL performance. Furthermore, we introduce an index structure to speedup the lookup of mapping entries in DTPs. Our experiments show that HCFTL can improve the CMT hit ratio by up to 41.1% and decrease the system response time by up to 33.3%, compared to state-of-the-art FTL schemes.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-4MODEL CHECKING IS POSSIBLE TO VERIFY LARGE-SCALE VEHICLE DISTRIBUTED APPLICATION SYSTEMS
Speaker:
Haitao Zhang, School of Information Science and Engineering, Lanzhou University, CN
Authors:
Haitao Zhang1, Ayang Tuo1 and Guoqiang Li2
1Lanzhou University, CN; 2Shanghai Jiao Tong University, CN
Abstract
OSEK/VDX is a specification for vehicle-mounted systems. Currently, the specification has been widely adopted by many automotive companies to develop a distributed vehicle application system. However, the ever increasing complexity of the developed distributed application system has created a challenge for exhaustively ensuring its reliability. Model checking as an exhaustive technique has been applied to verify OSEK/VDX distributed application systems to discover subtle errors. Unfortunately, it faces a poor scalability for practical systems because the verification models derived from such systems are highly complex. This paper presents an efficient approach that addresses this problem by reducing the complexity of the verification model such that model checking can easily complete the verification.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-5AUTOMATIC ASSERTION GENERATION FROM NATURAL LANGUAGE SPECIFICATIONS USING SUBTREE ANALYSIS
Speaker:
Ian Harris, University of California, Irvine, US
Authors:
Junchen Zhao and Ian Harris, University of California Irvine, US
Abstract
We present an approach to generate assertions from natural language specifications by performing semantic analysis of sentences in the specification document. Other techniques for automatic assertion generation use information found in the design implementation, either by performing static or dynamic analysis. Our approach generates assertions directly from the specification document, so bugs in the implementation will not be reflected in the assertions. Our approach parses each sentence and examines the resulting syntactic parse trees to locate subtrees which are associated with important phrases, such as the antecedent and consequent of an implication. Formal assertions are generated using the information inside these subtrees to fill a set of assertion templates which we present. We evaluate the effectiveness of our approach using a set of statements taken from a real specification document.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-6DETECTION OF HARDWARE TROJANS IN SYSTEMC HLS DESIGNS VIA COVERAGE-GUIDED FUZZING
Speaker:
Niklas Bruns, Cyber-Physical Systems, DFKI GmbH, DE
Authors:
Hoang M. Le, Daniel Grosse, Niklas Bruns and Rolf Drechsler, University of Bremen, DE
Abstract
High-level Synthesis (HLS) is being increasingly adopted as a mean to raise design productivity. HLS designs, which can be automatically translated into RTL, are typically written in SystemC at a more abstract level. Hardware Trojan attacks and countermeasures, while well-known and well-researched for RTL and below, have been only recently considered for HLS. The paper makes a contribution to this emerging research area by proposing a novel detection approach for Hardware Trojans in SystemC HLS designs. The proposed approach is based on coverage-guided fuzzing, a new promising idea from software (security) testing research. The efficiency of the approach in identifying stealthy behavior is demonstrated on a set of open-source benchmarks.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-7DESIGN OPTIMIZATION FOR HARDWARE-BASED MESSAGE FILTERS IN BROADCAST BUSES
Speaker:
Lea Schönberger, TU Dortmund University, DE
Authors:
Lea Schönberger, Georg von der Brüggen, Horst Schirmeier and Jian-Jia Chen, Technical University of Dortmund, DE
Abstract
In the field of automotive engineering, broadcast buses, e.g., Controller Area Network (CAN), are frequently used to connect multiple electronic control units (ECUs). Each message transmitted on such buses can be received by each single participant, but not all messages are relevant for every ECU. For this purpose, all incoming messages must be filtered in terms of relevance by either hardware or software techniques. We address the issue of designing hardware filter configurations for clients connected to a broadcast bus in order to reduce the cost, i.e., the computation overhead, provoked by undesired but accepted messages. More precisely, we propose an SMT formulation that can be applied to i) retrieve a (minimal) perfect filter configuration, i.e., no undesired messages are received,ii) optimize the filter quality under given hardware restrictions, or iii) minimize the hardware cost for a given type of filter component and a maximum cost threshold.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-8VEHICLE SEQUENCE REORDERING WITH COOPERATIVE ADAPTIVE CRUISE CONTROL
Speaker:
Yun-Yun Tsai, National Tsing Hua University, TW
Authors:
Ta-Wei Huang1, Yun-Yun Tsai1, Chung-Wei Lin2 and Tsung-Yi Ho1
1National Tsing Hua University, TW; 2National Taiwan University, TW
Abstract
With Cooperative Adaptive Cruise Control (CACC) systems, vehicles are allowed to communicate and cooperate with each other to form platoons and improve the traffic throughput, traffic performance, and energy efficiency. In this paper, we take into account the braking factors of different vehicles so that there is a desired platoon sequence which minimizes the platoon length. We formulate the vehicle sequence reordering problem and propose an algorithm to reorder vehicles to their desired platoon sequence.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-9USING STATISTICAL MODEL CHECKING TO ASSESS RELIABILITY FOR BATHTUB-SHAPED FAILURE RATES
Speaker and Author:
Josef Strnadel, Brno University of Technology, CZ
Abstract
Ideally, the reliability can be assessed analytically, provided that an analytical solution exists and its presumptions are met. Otherwise, alternative approaches to the assessment must apply. This paper proposes a novel, simulation based approach that relies on stochastic timed automata. Based on the automata, our paper explains principles of creating reliability models for various scenarios. Our approach expects that a reliability model is then processed by a statistical model checking method, used to assess the reliability by statistical processing of simulation results over the model. Main goal of this paper is to show that instruments of stochastic timed automata and statistical model checking are capable of facilitating the assessment process even for adverse conditions such as bathtub shaped failure rates.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-10EMPIRICAL EVALUATION OF IC3-BASED MODEL CHECKING TECHNIQUES ON VERILOG RTL DESIGNS
Speaker:
Aman Goel, University of Michigan, US
Authors:
Aman Goel and Karem Sakallah, University of Michigan, US
Abstract
IC3-based algorithms have emerged as effective scalable approaches for hardware model checking. In this paper we evaluate six implementations of IC3-based model checkers on a diverse set of publicly-available and proprietary industrial Verilog RTL designs. Four of the six verifiers we examined operate at the bit level and two employ abstraction to take advantage of word-level RTL semantics. Overall, the word-level verifier employing data abstraction outperformed the others, especially on the large industrial designs. The analysis helped us identify several key insights on the techniques underlying these tools, their strengths and weaknesses, differences and commonalities, and opportunities for improvement.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-11CO-DESIGN IMPLICATIONS OF ON-DEMAND-ACCELERATION FOR CLOUD HEALTHCARE ANALYTICS: THE AEGLE APPROACH
Speaker:
Konstantina Koliogeorgi, National Technical University of Athens, GR
Authors:
Dimosthenis Masouros1, Konstantina Koliogeorgi1, Georgios Zervakis1, Alexandra Kosvyra2, Achilleas Chytas2, Sotirios Xydis1, Ioanna Chouvarda2 and Dimitrios Soudris3
1National Technical University of Athens, GR; 2Aristotle University of Thessaloniki, GR; 3Democritus University of Thrace, GR
Abstract
Nowadays, big data and machine learning are transforming the way we realize and manage our data. Even though the healthcare domain has recognized big data analytics as a prominent candidate, it has not yet fully grasped their promising benefits that allow medical information to be converted to useful knowledge. In this paper, we introduce AEGLE's big data infrastructure provided as a Platform as a Service. Utilizing the suite of genomic analytics from the Chronic Lymphocytic Leukaemia (CLL) use case, we show that on-demand acceleration is profitable w.r.t a pure software cloud-based solution. However, we further show that on-demand acceleration is not offered as a "free-lunch" and we provide an in-depth analysis and lessons learnt on the co-design implications to be carefully considered for enabling cost-effective acceleration at the cloud-level.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-12MODULAR FPGA ACCELERATION OF DATA ANALYTICS IN HETEROGENOUS COMPUTING
Speaker:
Christoforos Kachris, ICCS-NTUA, GR
Authors:
Christoforos Kachris, Dimitrios Soudris and Elias Koromilas, Democritus University of Thrace, GR
Abstract
Emerging cloud applications like machine learning, AI and big data analytics require high performance computing systems that can sustain the increased amount of data processing without consuming excessive power. Towards this end, many cloud operators have started deploying hardware accelerators, like FPGAs, to increase the performance of computational intensive tasks but increasing the programming complexity to utilize these accelerators. VINEYARD has developed an efficient framework that allows the seamless deployment and utilization of hardware accelerators in the cloud without increasing the programming complexity and offering the flexibility of software packages. This paper presents a modular approach for the acceleration of data analytics using FPGAs. The modular approach allows the automatic development of integrated hardware designs for the acceleration of data analytics. The proposed framework shows the data analytics modules can be used to achieve up to 10x speedup compared to high performance general-purpose processors.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-13ACDC: AN ACCURACY- AND CONGESTION-AWARE DYNAMIC TRAFFIC CONTROL METHOD FOR NETWORKS-ON-CHIP
Speaker:
Siyuan Xiao, South China University of Technology, CN
Authors:
Siyuan Xiao1, Xiaohang Wang1, Maurizio Palesi2, Amit Kumar Singh3 and Terrence Mak4
1South China University of Technology, CN; 2University of Catania, IT; 3University of Essex, GB; 4University of Southampton, GB
Abstract
Many applications exhibit error forgiving features. For these applications, approximate computing provides the opportunity of accelerating the execution time or reducing power consumption, by mitigating computation effort to get an approximate result. Among the components on a chip, network-on-chip (NoC) contributes a large portion to system power and performance. In this paper, we exploit the opportunity of aggressively reducing network congestion and latency by selectively dropping data. Essentially, the importance of the dropped data is measured based on a quality model. An optimization problem is formulated to minimize the network congestion with constraint of the result quality. A lightweight online algorithm is proposed to solve this problem. Experiments show that on average, our proposed method can reduce the execution time by as much as 12.87% and energy consumption by 12.42% under strict quality requirement, speedup execution by 19.59% and reduce energy consumption by 21.20% under relaxed requirement, compared to a recent work on approximate computing approach for NoCs.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-14POWER AND PERFORMANCE OPTIMAL NOC DESIGN FOR CPU-GPU ARCHITECTURE USING FORMAL MODELS
Speaker:
Nader Bagherzadeh, University of California Irvine, US
Authors:
Lulwah Alhubail and Nader Bagherzadeh, University of California - Irvine, US
Abstract
Heterogeneous computing architectures that fuse both CPU and GPU on the same chip are common nowa-days. Using homogeneous interconnect for such heterogeneous processors each with different network demands can result in performance degradation. In this paper, we focused on designing a heterogeneous mesh-style network-on-chip (NoC) to connect heterogeneous CPU-GPU processors. We tackled three problems at once; mapping Processing Elements (PEs) to the routers of the mesh, assigning the number of virtual channels (VC), and assigning the buffer size (BS) for each port of each router in the NoC. By relying on formal models, we developed a method based on Strength Pareto Evolutionary Algorithm2 (SPEA2) to obtain the Pareto optimal set that optimizes communication performance and power consumption of the NoC. By validating our method on a full-system simulator, results show that the NoC performance can be improved by 17% while minimizing the power consumption by at least 2.3x and maintaining the overall system performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-15DEEP LEARNING-BASED CIRCUIT RECOGNITION USING SPARSE MAPPING AND LEVEL-DEPENDENT DECAYING SUM CIRCUIT REPRESENTATION
Speaker:
Massoud Pedram, University of southern california, US
Authors:
Arash Fayyazi1, Soheil Shababi2, Pierluigi Nuzzo2, Shahin Nazarian2 and Massoud Pedram1
1University of southern california, US; 2University of Southern California, US
Abstract
Efficiently recognizing the functionality of a circuit is key to many applications, such as formal verification, reverse engineering, and security. We present a scalable framework for gate-level circuit recognition that leverages deep learning and a convolutional neural network (CNN)-based circuit representation. Given a standard cell library, we present a sparse mapping algorithm to improve the time and memory efficiency of the CNN-based circuit representation. Sparse mapping allows encoding only the logic cell functionality, independently of implementation parameters such as timing or area. We further propose a data structure, termed level-dependent decaying sum (LDDS) existence vector, which can compactly represent information about the circuit topology. Given a reference gate in the circuit, an LDDS vector can capture the function of the gates in the input and output cones as well as their distance (number of stages) from the reference. Compared to the baseline approach, our framework obtains more than an-order-of-magnitude reduction in the average training time and 2× improvement in the average runtime for generating CNN-based representations from gate-level circuits, while achieving 10% higher accuracy on a set of benchmarks including EPFL and ISCAS'85 circuits.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-16PARTIAL ENCRYPTION OF BEHAVIORAL IPS TO SELECTIVELY CONTROL THE DESIGN SPACE IN HIGH-LEVEL SYNTHESIS
Speaker:
Farah Taher, The University of Texas at Dallas, US
Authors:
Zi Wang and Benjamin Carrion Schaefer, The University of Texas at Dallas, US
Abstract
Abstract—Commercial High-Level Synthesis(HLS) tool vendors have started to enable ways toprotect Behavioral IP (BIPs) from being unlawful used.The main approach is to provide tools to encrypt these BIPs which can be decrypted by the HLS tool only. The main problem with this approach is that encrypting the IP does not allow BIP users to insert synthesis directives into the source code in the form of pragmas (comments), and hence cancels out one of the most important advantages of C-based VLSI design: The ability to automatically generate micro-architectures with unique design metrics,e.g.area, power and performance.This work studies the infect to the search space when synthesis directives are not able to be inserted in to the encrypted IP source code while other options are still available to the BIP users (e.g.setting global synthesis options and limiting the number and type of functional units) and proposes a method that selectively controls the search space by encrypting different portions of the BIP. To achieve this goal we propose a fast heuristic based on divide and conquer method.Experimental results show that our proposed method works well compared to an exhaustive search that leads to the optimal solution.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-17SOFTWARE-HARDWARE CO-DESIGN OF MULTI-STANDARD DIGITAL BASEBAND PROCESSOR FOR IOT
Speaker:
Carolynn Bernier, CEA-Leti, FR
Authors:
Hela Belhadj Amor and Carolynn Bernier, CEA, LETI, FR
Abstract
This work demonstrates an ultra-low power, software-defined wireless transceiver designed for IoT applications using an open-source 32-bit RISC-V core. The key driver behind this success is an optimized hardware/software partitioning of the receiver's digital signal processing operators. We benchmarked our architecture on an algorithm for the detection of FSK-modulated frames using a RISC-V compatible core and ARM Cortex-M series processors. We use only standard compilation tools and no assembly-level optimizations. Our results show that Bluetooth LE frames can be detected with an estimated peak core power consumption of 1.6 mW on a 28 nm FDSOI technology, and falling to less than 0.6 mW (on average) during symbol demodulation. This is achieved at nominal voltage. Compared to state of the art, our work offers a power efficient alternative to the design of dedicated baseband processors for ultra-low power software-defined radios with a low software complexity.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-18TAMING DATA CACHES FOR PREDICTABLE EXECUTION ON GPU-BASED SOCS
Speaker:
Björn Forsberg, ETH Zürich, CH
Authors:
Björn Forsberg1, Luca Benini2 and Andrea Marongiu2
1ETH Zürich, CH; 2Università di Bologna, IT
Abstract
Heterogeneous SoCs (HeSoCs) typically share a single DRAM between the CPU and GPU, making workloads susceptible to memory interference, and predictable execution troublesome. State-of-the art predictable execution models (PREM) for HeSoCs prefetch data to the GPU scratchpad memory (SPM), for computations to be insensitive to CPU-generated DRAM traffic. However, the amount of work that the small SPM sizes allow is typically insufficient to absorb CPU/GPU synchronization costs. On-chip caches are larger, and would solve this issue, but have been argued too unpredictable due to self-evictions. We show how self-eviction can be minimized in GPU caches via clever managing of prefetches, thus lowering the performance cost, while retaining timing predictability.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-19DESIGN AND EVALUATION OF SMALLFLOAT SIMD EXTENSIONS TO THE RISC-V ISA
Speaker:
Giuseppe Tagliavini, Università di Bologna, IT
Authors:
Giuseppe Tagliavini1, Stefan Mach2, Davide Rossi1, Andrea Marongiu1 and Luca Benini1
1Università di Bologna, IT; 2ETH Zurich, CH
Abstract
RISC-V is an open-source instruction set architecture (ISA) with a modular design consisting of a mandatory base part plus optional extensions. The RISC-V 32IMFC ISA configuration has been widely adopted for the design of new-generation, low-power processors. Motivated by the important energy savings that smaller-than-32-bit FP types have enabled in several application domains and related compute platforms, some recent studies have published encouraging early results for their adoption in RISC-V processors. In this paper we introduce a set of ISA extensions for RISC-V 32IMFC, supporting scalar and SIMD operations (fitting the 32-bit register size) for 8-bit and two 16-bit FP types. The proposed extensions are enabled by exposing the new FP types to the standard C/C++ type system and an implementation for the RISC-V GCC compiler is presented. As a further, novel contribution, we extensively characterize the performance and energy savings achievable with the proposed extensions. On average, experimental results show that their adoption provide benefits in terms of performance (1.64x speedup for 16-bit and 2.18x for 8-bit types) and energy consumption (30% saving for 16-bit and 50% for 8-bit types). We also illustrate an approach based on automatic precision tuning to make effective use of the new FP types.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-20VDARM: DYNAMIC ADAPTIVE RESOURCE MANAGEMENT FOR VIRTUALIZED MULTIPROCESSOR SYSTEMS
Speaker:
Jianmin Qian, Shanghai Jiao Tong University, CN
Authors:
Jianmin Qian, Jian Li, Ruhui Ma and Haibing Guan, Shanghai Jiao Tong University, CN
Abstract
Modern data center servers have been enhancing their computing capacity by increasing processor counts. Meanwhile, these servers are highly virtualized to achieve efficient resource utilization and energy savings. However, due to the shifting of server architecture to non-uniform memory access (NUMA), current hypervisor-level or OS-level resource management methods continue to be challenged in their ability to meet the performance requirement of various user applications. In this work, we first build a performance slowdown model to accurate identify the current system overheads. Based on the model, we finally design a dynamic adaptive virtual resource management method (vDARM) to eliminate the runtime NUMA overheads by re-configuring virtual-to-physical resource mappings. Experiment results show that, compared with state-of-art approaches, vDARM can bring up an average performance improvement of 36.2% on a 8-node NUMA machines. Meanwhile, vDARM only incurs extra CPU utilization no more than 4%.

Download Paper (PDF; Only available from the DATE venue WiFi)

6.1 Special Day on "Embedded Meets Hyperscale and HPC" Session: Near-memory computing

Date: Wednesday, March 27, 2019
Time: 11:00 - 12:30
Location / Room: Room 1

Chair:
Christoph Hagleitner, IBM Research, CH, Contact Christoph Hagleitner

Co-Chair:
Christian Plessl, Paderborn University, DE, Contact Christian Plessl

While it used to be easy to increase the peak computational capabilities of processors by exploiting the growth in available transistors delivered by Moore's law, the latency and bandwidth of the memory system did not improve at the same pace. Today's microprocessors hide this fact behind a complex memory hierarchy, but often fail to optimally utilize the available memory bandwidth across a broad range of applications. Near-memory computing takes a fresh look at the memory system and proposes innovations ranging from micro-architecture to the runtime system to address these bottlenecks and build more balanced computing systems

TimeLabelPresentation Title
Authors
11:006.1.1NTX: AN ENERGY-EFFICIENT STREAMING ACCELERATOR FOR FLOATING-POINT GENERALIZED REDUCTION WORKLOADS IN 22NM FD-SOI
Speaker:
Luca Benini, IIS, ETH Zürich, CH
Authors:
Fabian Schuiki, Michael Schaffner and Luca Benini, IIS, ETH Zürich, CH
Abstract
Specialized coprocessors for Multiply-Accumulate (MAC) intensive workloads such as Deep Learning are becoming widespread in SoC platforms, from GPUs to mobile SoCs. In this paper we revisit NTX (an efficient accelerator developed for training Deep Neural Networks at scale) as a generalized MAC and reduction streaming engine. The architecture consists of a set of 32 bit floating-point streaming co-processors that are loosely coupled to a RISC-V core in charge of orchestrating data movement and computation. Post-layout results of a recent silicon implementation in 22nm FD-SOI technology show the accelerator's capability to deliver up to 20Gflop/s at 1.25GHz and 168mW. Based on these results we show that a version of NTX scaled down to 14nm can achieve a 3× energy efficiency improvement over contemporary GPUs at 10.4× less silicon area, and a compute performance of 1.4Tflop/s for training large state-of-the-art networks with full floating-point precision. An extended evaluation of MAC-intensive kernels shows that NTX can consistently achieve up to 87% of its peak performance across general reduction workloads beyond machine learning. Its modular architecture enables deployment at different scales ranging from high-performance GPU-class to low-power embedded scenarios.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:306.1.2NEAR-MEMORY PROCESSING: IT'S THE HARDWARE AND SOFTWARE, SILLY!
Speaker and Author:
Boris Grot, University of Edinburgh, GB
Abstract
Conventional computing systems are increasingly challenged by the need to process rapidly growing volumes of data, often at online speeds. One promising way to boost compute efficiency is through Near-Memory Processing (NMP), which integrates light-weight compute logic close to the memory arrays. NMP affords massive bandwidth to the memory-resident data and dramatically reduces energy-hungry data movement.  A key challenge for effectively leveraging NMP is that today's high-performance data processing algorithms have been designed for CPUs with powerful cores, large caches, and bandwidth-constrained memory interfaces. Meanwhile, NMP architectures are limited to simple logic and small caches while offering abundant memory bandwidth. Hence, achieving high efficiency with NMP requires a careful algorithm-hardware co-design to maximize bandwidth utilization given a highly constrained area and power budget. I will describe one instance of such a co-designed NMP architecture for data analytics, and show that it reaps significant performance and energy-efficiency advantages over both CPU-based and baseline NMP systems.  
12:006.1.3COHERENTLY ATTACHED PROGRAMMABLE NEAR-MEMORY ACCELERATION PLATFORM AND ITS APPLICATION TO STENCIL PROCESSING
Speaker:
Jan van Lunteren, IBM Research Zurich, CH
Authors:
Jan van Lunteren, Ronald Luijten, Dionysios Diamantopoulos, Florian Auernhammer, Christoph Hagleitner, Lorenzo Chelini, Stefano Corda and Gagandeep Singh, IBM Research Zurich, CH
Abstract
Application and technology trends are increasingly forcing computer systems to be designed for specific workloads and application domains. Although memory is one of the key components impacting the performance and power consumption of state-of-art computer systems, its operation typically cannot be adapted to workload characteristics beyond some limited controller configuration options. In this paper, we present a novel near-memory acceleration platform based on an Access Processor that enables the main memory system operation to be programmed and adapted dynamically to the accelerated workload. The platform targets both ASIC and FPGA implemen- tations integrated within IBM POWER systems. We show how this platform can be applied to accelerate stencil processing.
12:30End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


6.2 Special Session: 3D Sensor - Hardware to Application

Date: Wednesday, March 27, 2019
Time: 11:00 - 12:30
Location / Room: Room 2

Organisers:
Pascal Vivet, CEA-Leti, FR, Contact Pascal Vivet
Saibal Mukhopadhyay, Georgia Institute of Technology, US, Contact Saibal Mukhopadhyay

Chair:
Fabien Clermidy, CEA-Leti, FR, Contact Fabien Clermidy

Co-Chair:
Pascal Vivet, CEA-Leti, FR, Contact Pascal Vivet

The 3D integration has emerged as a key enabler to continue performance growth of Moore's law. An application where 3D has already shown potential for tremendous benefit is the design of high-throughput and/or energy-efficient sensors. The ability to stack heterogeneous components in a small volumne coupled with potential for highly parallel access between sensing and processing has fueled new generation of senor platforms. Moreover, close proximity of processing and sensing has also lead to innovations in designing smart systems with in-built intelligence. This session will present four talks illustrating how 3D integration creates a platform for designing innovative sensors for applications ranging from high-performance imaging to ultra-low-power IoT platforms to bio-sensing. The first two talks will focus on application of 3D integration to high-performance and smart imaging. The first will present a detailed overview of recent advancements in 3D image sensor design, while the second talk will discuss the feasibility of embedding machine learning based feedback control within a 3D image sensor to create highly intelligent cameras. The third talk will present the concept of mm-scale sensors through 3D die stacking for ultra-low-power applications. Finally, the fourth talk will discuss design of innovative biosensors using fine-grain 3D integration.

TimeLabelPresentation Title
Authors
11:006.2.1ADVANCED 3D TECHNOLOGIES AND ARCHITECTURES FOR 3D SMART IMAGE SENSORS
Speaker:
Pascal Vivet, CEA-Leti, FR
Authors:
Pascal Vivet1, Gilles Sicard1, Laurent Millet1, Stephane Chevobbe2, Karim Ben Chehida2, Luis Angel Cubero MonteAlegre1, Maxence Bouvier1, Alexandre Valentian1, Maria Lepecq2, Thomas Dombek2, Olivier Bichler2, Sebastien Thuriès1, Didier Lattard1, Cheramy Séverine1, Perrine Batude1 and Fabien Clermidy1
1CEA-Leti, FR; 2CEA-LIST, FR
Abstract
Image Sensors will get more and more pervasive into their environment. In the context of Automotive and IoT, low cost image sensors, with high quality pixels, will embed more and more smart functions, such as the regular low level image processing but also object recognition, movement detection, light detection, etc. 3D technology is a key enabler technology to integrate into a single device the pixel layer and associated acquisition layer, but also the smart computing features and the required amount of memory to process all the acquired data. More computing and memory within the 3D Smart Image Sensors will bring new features and reduce the overall system power consumption. Advanced 3D technology with ultra-fine pitch vertical interconnect density will pave the way towards new architectures for 3D Smart Image Sensors, allowing local vertical communication between pixels, and the associated computing and memory structures. The presentation will give an overview of recent 3D technologies solutions, such as Hybrid Bonding technology and the Monolithic 3D CoolCubeTM technology, with respective 3D interconnect pitch in the order of 1µm and 100nm. Recent 3D Image Sensors will be presented, showing the capability of 3D technology to implement fine grain pixel acquisition and processing providing ultra-high speed image acquisition and tile-based processing. Finally, as further perspectives, multi-layer 3D image sensor architecture based on events and spiking will further reduce power consumption with new detection and learning processing capabilities.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:226.2.2A CAMERA WITH BRAIN - EMBEDDING MACHINE LEARNING IN 3D
Speaker:
Saibal Mukhopadhyay, Georgia Institute of Technology, US
Authors:
Burhan Ahmad Mudassar, Priyabrata Saha, Yun Long, Muhammad Faisal Amir, Evan Gebhardt, Taesik Na, Jong Hwan Ko, Marilyn Wolf and Saibal Mukhopadhyay, Georgia Institute of Technology, US
Abstract
The cameras today are designed to capture signals with highest possible accuracy to most faithfully represent what it sees. However, many mission-critical autonomous applications ranging from traffic monitoring to disaster recovery to defense requires quality of information, where 'useful information' depends on the tasks and is defined using complex features, rather than only changes in captured signal. Such applications require cameras that capture 'useful information' from a scene with highest quality while meeting system constraints such as power, performance, and bandwidth. This talk will discuss the feasibility of a camera that learns how to capture 'task-dependent information' with highest quality, paving the pathway to design a camera with brain. The talk will first discuss that 3D integration of digital pixel sensors with massively parallel computing platform for machine learning creates a hardware architecture for such a camera. Next, the talk will discuss embedded machine learning algorithms that can run on such platform to enhance quality of useful information by real-time control of the sensor parameters. The talk will conclude by identifying critical challenges as well as opportunities for hardware and algorithmic innovations to enable machine learning in the feedback loop of a 3D image sensor based camera.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:446.2.3IOT2 - THE INTERNET OF TINY THINGS: REALIZING MM-SCALE SENSORS THROUGH 3D DIE STACKING
Speaker:
Sechang Oh, University of Michigan, US
Authors:
Sechang Oh, Minchang Cho, Xiao Wu, Yejoong Kim, Li-Xuan Chuo, Wootaek Lim, Pat Pannuto, Suyoung Bang, Kaiyuan Yang, Hun-Seok Kim, Dennis Sylvester and David Blaauw, University of Michigan, US
Abstract
The Internet of Things (IoT) is a rapidly evolving application space. One of the fascinating new fields in IoT research is mm-scale sensors, which make up the Internet of Tiny Things (IoT2). With their miniature size, these systems are poised to open up a myriad of new application domains. Enabled by the unique characteristics of cyber-physical systems and recent advances in low-power design and bare-die 3D chip stacking, mm-scale sensors are rapidly becoming a reality. In this paper, we will survey the challenges and solutions to 3D-stacked mm-scale design, highlighting low-power circuit issues ranging from low-power SRAM and miniature neural network accelerators to radio communication protocols and analog interfaces. We will discuss system-level challenges and illustrate several complete systems and their merging application spaces.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:066.2.43D INTERCONNECTS AND INTEGRATION TECHNOLOGIES FOR BIOSENSOR SYSTEMS
Speaker and Author:
Muhannad Bakir, Georgia Institute of Technology, US
Abstract
We present a system for recording in vivo electromyography (EMG) signals from songbirds using flexible multi-electrode arrays (MEAs) featuring 3D integrated electronics. Electrodes with various pitches and topologies are evaluated by measuring EMG activity from the expiratory muscle of anesthetized songbirds. Air pressure data is also recorded simultaneously from the air sac of the songbird. Together, EMG recordings and air pressure measurements can be used to characterize how the nervous system controls breathing. Such technologies can in turn provide unique insights into motor control in a range of species, including humans. 3D IC integration enables the formfactors and interconnect densities needed in such applications. Focus will be given on the technology and microfabrication advances to enables such systems. We also discuss methods to use fine-grain 3D IC technology for electronic microplate technologies for CMOS biosensor systems with cell assays.
12:30End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


6.3 When Approximation Meets Dependability

Date: Wednesday, March 27, 2019
Time: 11:00 - 12:30
Location / Room: Room 3

Chair:
George Constantinides, Imperial College London, GB, Contact George Constantinides

Co-Chair:
Rishad Shafik, Newcastle University, GB, Contact Rishad Shafik

Approximation and dependability are conflicting design requirements. To meet performance, dependability and/or power trade-offs require approaches with insightful analysis and design methodologies. This session presents approximation driven paradigms in designing arithmetic units and developing fault detection schemes using machine learning.

TimeLabelPresentation Title
Authors
11:006.3.1SENSOR-BASED APPROXIMATE ADDER DESIGN FOR ACCELERATING ERROR-TOLERANT AND DEEP-LEARNING APPLICATIONS
Speaker:
Ning-Chi Huang, National Chiao Tung University, TW
Authors:
Ning-Chi Huang, Szu-Ying Chen and Kai-Chiang Wu, Department of Computer Science, National Chiao Tung University, TW
Abstract
Approximate computing is an emerging strategy which trades computational accuracy for computational cost in terms of performance, energy, and/or area. In this paper, we propose a novel sensor-based approximate adder for high-performance energy-efficient arithmetic computation, while considering the accuracy requirement of error-tolerant applications. This is the first work using in-situ sensors for approximate adder design, based on monitoring online transition activity on the carry chain and speculating on carry propagation/truncation. On top of a fully-optimized ripple-carry adder, the performance of our adder is enhanced by 2.17X. When applied in error-tolerant applications such as image processing and handwritten digit recognition, our approximate adder leads to very promising quality of results compared to the case when an accurate adder is used.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:306.3.2LOW-POWER VARIATION-AWARE CORES BASED ON DYNAMIC DATA-DEPENDENT BITWIDTH TRUNCATION
Speaker:
Ioannis Tsiokanos, Queen's University Belfast, GR
Authors:
Ioannis Tsiokanos, Lev Mukhanov and Georgios Karakonstantis, Queen's University Belfast, GB
Abstract
Increasing variability of transistor parameters in nanoscale era renders modern circuits prone to timing failures. To address such failures, designers adopt pessimistic timing/voltage guardbands, which are estimated under rare worst-case conditions, thus leading to power and performance overheads. Recent approximation schemes based on precision reduction may help to limit the incurred overheads, but the precision is reduced statically in all operations. This results in unnecessary quality loss, since these schemes neglect the fact that only few long latency paths (LLPs) may be prone to failures, and such paths may be activated rarely. In this paper, we propose a variation-aware framework that minimizes any quality loss by dynamically truncating the bitwidth only for operands triggering the LLPs. This is achieved by predicting at runtime the excitation of the LLPs based on the processed operands. The applied truncation, which we implement by setting a number of least-significant bits to a constant value of zero, can effectively reduce the delay of the excited LLPs, providing sufficient timing slack to avoid any failure without using conservative guardbands. To facilitate the adoption of such a scheme within pipelined cores and limit the incurred overheads, we also shape the path distribution appropriately for isolating the LLPs in a single pipeline stage. Additionally, to evaluate the efficacy of our framework, we perform post-layout dynamic timing analysis based on real operands that we extract from a variety of applications. When applied to the implementation of an IEEE-754 compatible double precision floating-point unit (FPU) in a 45nm technology, our approach eliminates timing failures under 8% delay variations with no performance loss. Our design comes at a cost of up-to 4.48% power and 0.34% area overheads, while the occasional operand truncation incurs minimal quality-loss in terms of relative error, up-to 4.1 · 10^−6. Finally, when compared to an FPU with pessimistic margins, our technique can save up-to 44.3% power.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:006.3.3A SMART FAULT DETECTION SCHEME FOR RELIABLE IMAGE PROCESSING APPLICATIONS
Speaker:
Luca Cassano, Politecnico di Milano, IT
Authors:
Matteo Biasielli, Cristiana Bolchini, Luca Cassano and Antonio Miele, Politecnico di Milano, IT
Abstract
Traditional fault detection/tolerance techniques exploit multiple instances of the nominal processing and then perform a bit-wise comparison of the outputs to detect the occurrence of faults. In specific application scenarios, e.g., image/signal processing, the elaboration has an inherent degree of fault tolerance because it is possible to use the output even in the presence of slight alterations. In these contexts, the classical bit-wise comparison may be inefficient. Indeed, it may lead to conservatively discard outputs that have been only slightly altered by the fault and that could still be usefully exploited. In this paper, we propose a smart checking scheme based on Convolutional Neural Networks that rather than distinguishing between faulty and not faulty images, discriminates between usable and not usable images according to the ability of the end user to correctly process the output. The experimental evaluation shows that this solution enables an execution time saving of about 6.35% with a 99.42% accuracy, on average.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP3-1, 662NON-INTRUSIVE SELF-TEST LIBRARY FOR AUTOMOTIVE CRITICAL APPLICATIONS: CONSTRAINTS AND SOLUTIONS
Speaker:
Davide Piumatti, Politecnico di Torino, IT
Authors:
Paolo Bernardi1, Riccardo Cantoro1, Andrea Floridia1, Davide Piumatti1, Cozmin Pogonea1, Annachiara Ruospo1, Ernesto Sanchez1, Sergio De Luca2 and Alessandro Sansonetti2
1Politecnico di Torino, IT; 2STMicroelectronics, IT
Abstract
Today, safety-critical applications require self-tests and self-diagnosis approaches to be applied during the lifetime of the device. In general, the fault coverage values required by the standards (like ISO 26262) in the whole System-on-Chip (SoC) are very high. Therefore, different strategies are adopted. In the case of the processor core, the required fault coverage can be achieved by scheduling the periodical execution of a set of test programs or Software-Test Library (STL). However, the STL for in-field testing should be able to comply with the operating system specifications without affecting the mission operation of the device application. In this paper, the most relevant problems for the development of the STL are first discussed. Then, it presents a set of strategies and solutions oriented to produce an efficient and non-intrusive STL to be used exclusively during the in-field testing of automotive processor cores. The proposed approach was experimented on an automotive SoC developed by STMicroelectronics.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


6.4 Hardware support for microarchitecture performance

Date: Wednesday, March 27, 2019
Time: 11:00 - 12:30
Location / Room: Room 4

Chair:
Cristina Silvano, Politecnico di Milano, IT, Contact Cristina Silvano

Co-Chair:
Sylvain Collange, INRIA/IRISA, FR, Contact Sylvain Collange

This session deals with hardware mechanisms for high-performance or embedded real-time processors to improve their efficiency or their performance beyond what is possible to achieve by software. The first paper proposes low-overhead hardware support to enhance system interrupts checking multicore contentions. The second paper reduces the costly instruction scheduling operations in aggressive OoO processors. The third paper is about dynamic analysis of instruction flow and generating vectorized code at runtime.

TimeLabelPresentation Title
Authors
11:006.4.1MAXIMUM-CONTENTION CONTROL UNIT (MCCU): RESOURCE ACCESS COUNT AND CONTENTION TIME ENFORCEMENT
Speaker:
Jordi Cardona, Univ. Politècnica de Barcelona and Barcelona Supercomputing Center, ES
Authors:
Jordi Cardona1, Carles Hernandez2, Jaume Abella2 and Francisco Cazorla2
1Barcelona Supercomputing Center and Universitat Politecnica de Catalunya, ES; 2Barcelona Supercomputing Center, ES
Abstract
In real-time systems, techniques to derive bounds to the contention tasks can suffer in multicore build on resource quota monitoring and enforcement. In particular, they track and bound the number of requests to hardware shared resources that each core (task) is allowed to perform. In this paper, we show that current software-only solutions work well when there is a single resource and type of request to track and bound, but do not scale to the more general case of several shared resources that accept different request types, each with a different associated latency. To handle this (more general) case, we propose low-overhead hardware support called Maximum-Contention Control Unit (MCCU). The MCCU performs fine-grain tracking of different types of requests, preventing a core to cause more interference on its contenders than budgeted. In this process, the MCCU also helps verifying that individual requests duration does not exceed their theoretical bounds, hence dealing with scenarios in which requests can have an arbitrarily large duration.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:306.4.2FIFORDER MICROARCHITECTURE: READY-AWARE INSTRUCTION SCHEDULING FOR OOO PROCESSORS
Speaker:
Mehdi Alipour, Uppsala University, SE
Authors:
Mehdi Alipour1, Rakesh Kumar2, Stefanos Kaxiras1 and David Black-Schaffer1
1Uppsala University, SE; 2Norwegian University of Science and Technology, NO
Abstract
The number of instructions a processor's instruction queue can examine (depth) and the number it can issue together (width) determine its ability to take advantage of the ILP in an application. Unfortunately, increasing either the width or depth of the instruction queue is very costly due to the content-addressable logic needed to wakeup and select instructions out-of-order. This work makes the observation that a large number of instructions have both operands ready at dispatch, and therefore do not benefit from out-of-order scheduling. We leverage this to place such ready-at-dispatch instructions in separate, simpler, in-order FIFO queues for scheduling. With such additional queues, we can reduce the size and width of the expensive out-of-order instruction queue, without reducing the processor's overall issue width and depth. Our design, FIFOrder, is able to steer more than 60% of instructions to the cheaper FIFO queues, providing a 50% energy savings over a traditional out-of-order instruction queue design, while delivering 8% higher performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:006.4.3BOOSTING SIMD BENEFITS THROUGH A RUN-TIME AND ENERGY EFFICIENT DLP DETECTION
Speaker:
Mateus Rutzig, UFSM, BR
Authors:
Michael Jordan, Tiago Knorst, Julio Vicenzi and Mateus Beck Rutzig, UFSM, BR
Abstract
Data Level Parallelism has been improving performance-energy tradeoff of current processors by coupling SIMD engines, such as Intel AVX and ARM NEON. Special libraries and compilers are used to support DLP execution on such engines. However, timing overhead on hand coding is inevitable since most software developers are not skilled to extract DLP using unfamiliar libraries. In addition, DLP detection through compiler, besides breaking software compatibility, is limited to static code analysis, which compromises performance gains. In this work, we propose a runtime DLP detection named as Dynamic SIMD Assembler, which transparently identifies vectorizable code regions to execute in the ARM NEON engine. Due to its dynamic fashion, DSA keeps software compatibility and avoids timing overhead on software developing process. Results have shown that DSA outperforms ARM NEON auto-vectorization compiler by 32% since it covers wider vectorized regions, such as Dynamic Range, Sentinel and Conditional Loops. In addition, DSA outperforms hand-vectorized code using ARM library by 26% reducing 45% of energy consumption with no penalties over software development time.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP3-2, 336DEPENDENCY-RESOLVING INTRA-UNIT PIPELINE ARCHITECTURE FOR HIGH-THROUGHPUT MULTIPLIERS
Speaker:
Dae Hyun Kim, Washington State University, US
Authors:
Jihee Seo and Dae Hyun Kim, Washington State University, US
Abstract
In this paper, we propose two dependency-resolving intra-unit pipeline architectures to design high-throughput multipliers. Simulation results show that the proposed multipliers achieve approximately 2.3× to 3.1× execution time reduction at a cost of 4.4% area and 3.7% power overheads for highly-dependent multiplications.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:31IP3-3, 832A HARDWARE-EFFICIENT LOGARITHMIC MULTIPLIER WITH IMPROVED ACCURACY
Authors:
Mohammad Saeed Ansari, Bruce Cockburn and Jie Han, University of Alberta, CA
Abstract
Logarithmic multipliers take the base-2 logarithm of the operands and perform multiplication by only using shift and addition operations. Since computing the logarithm is often an approximate process, some accuracy loss is inevitable in such designs. However, the area, latency, and power consumption can be significantly improved at the cost of accuracy loss. This paper presents a novel method to approximate log_2N that, unlike the existing approaches, rounds N to its nearest power of two instead of the highest power of two smaller than or equal to N. This approximation technique is then used to design two improved 16x16 logarithmic multipliers that use exact and approximate adders (ILM-EA and ILM-AA, respectively). These multipliers achieve up to 24.42% and 9.82% savings in area and power-delay product, respectively, compared to the state-of-the-art design in the literature with similar accuracy. The proposed designs are evaluated in the Joint Photographic Experts Group (JPEG) image compression algorithm and their advantages over other approximate logarithmic multipliers are shown.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:32IP3-4, 440LIGHTWEIGHT HARDWARE SUPPORT FOR SELECTIVE COHERENCE IN HETEROGENEOUS MANYCORE ACCELERATORS
Speaker:
Alessandro Cilardo, CeRICT, IT
Authors:
Alessandro Cilardo, Mirko Gagliardi and Vincenzo Scotti, University of Naples Federico II, IT
Abstract
Shared memory coherence is a key feature in manycore accelerators, ensuring programmability and application portability. Most established solutions for coherence in homogeneous systems cannot be simply reused because of the special requirements of accelerator architectures. This paper introduces a low-overhead hardware coherence system for heterogeneous accelerators, with customizable granularity and noncoherent region support. The coherence system has been demonstrated in operation in a full manycore accelerator, exhibiting significant improvements in terms of network load, execution time, and power consumption.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


6.5 System Level Security

Date: Wednesday, March 27, 2019
Time: 11:00 - 12:30
Location / Room: Room 5

Chair:
Lionel Torres, University of Montpellier, FR, Contact Lionel Torres

Co-Chair:
Pascal Benoit, University of Montpellier, FR, Contact Pascal Benoit

This session includes four papers on hardware based techniques to support security: to detect malware, to provide secure intermittent computation, to protect the kernel and to self-attest.

TimeLabelPresentation Title
Authors
11:006.5.12SMART: A TWO-STAGE MACHINE LEARNING-BASED APPROACH FOR RUN-TIME SPECIALIZED HARDWARE-ASSISTED MALWARE DETECTION
Speaker:
Houman Homayoun, George Mason University, US
Authors:
Hossein Sayadi1, Hosein Mohammadi Makrani2, Sai Manoj Pudukotai Dinakarrao1, Tinoosh Mohsenin3, Avesta Sasan1, Setareh Rafatirad1 and Houman Homayoun1
1George Mason University, US; 2George Mason university, US; 3University of Maryland Baltimore County, US
Abstract
Hardware-assisted Malware Detection (HMD) has emerged as a promising solution to improve the security of computer systems using Hardware Performance Counters (HPCs) information collected at run-time. While several recent studies proposed machine learning-based solutions to identify malware using HPCs, they rely on a large number of microarchitectural events to achieve high accuracy and detection rate. More importantly, they have largely overlooked complexity-effective prediction of malware classes at run-time. As we show in this work, the detection performance of malware classifiers is highly dependent on the number of available HPCs and varies significantly across classes of malware. The limited number of available HPCs in modern microprocessors that can be simultaneously captured makes run-time malware detection with high detection performance using existing solutions a challenging problem, as they require multiple runs of applications to collect a sufficient number of microarchitectural events. In response, in this paper, we first identify the most important HPCs for HMD using an effective feature reduction method. We then develop a specialized two-stage run-time HMD referred as 2SMaRT. 2SMaRT first classifies applications using a multiclass classification technique into either benign or one of the malware classes (Virus, Rootkit, Backdoor, and Trojan). In the second stage, to have a high detection performance, 2SMaRT deploys a machine learning model that works best for each class of malware. To realize an effective run-time solution that relies on only available HPCs, 2SMaRT is further customized using an ensemble learning technique to boost the performance of general malware detectors. The experimental results show that 2SMaRT using ensemble technique with just 4HPCs outperforms state-of-the-art classifiers with 8HPCs by up to 31.25% in terms of detection performance, on average across different classes of malware.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:306.5.2SECURE INTERMITTENT COMPUTING PROTOCOL: PROTECTING STATE ACROSS POWER LOSS
Speaker:
Patrick Schaumont, Virginia Tech, US
Authors:
Archanaa S. Krishnan, Charles Suslowicz, Daniel Dinu and Patrick Schaumont, Virginia Tech, US
Abstract
Intermittent computing systems execute long-running tasks under a transient power supply such as an energy harvesting power source. During a power loss, they save intermediate program state as a checkpoint into write-efficient non-volatile memory. When the power is restored, the system state is reconstructed from the checkpoint, and the long-running computation continues. We analyze the security risks when power interruption is used as an attack vector, and we demonstrate the need to protect the integrity, authenticity, confidentiality, continuity, and freshness of checkpointed data. We propose a secure checkpointing technique called the Secure Intermittent Computing Protocol (SICP). The proposed protocol has the following properties. First, it associates every checkpoint with a unique power-on state to checkpoint replay. Second, every checkpoint is cryptographically chained to its predecessor, providing continuity, which enables the programmer to carry run-time security properties such as attested program images across power loss events. Third, SICP is atomic and resistant to power loss. We demonstrate a prototype implementation of SICP on an MSP430 microcontroller, and we investigate the overhead of SICP for several cryptographic kernels. To the best of our knowledge, this is the first work to provide a robust solution to secure intermittent computing.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:006.5.3RISKIM: TOWARD COMPLETE KERNEL PROTECTION WITH HARDWARE SUPPORT
Speaker:
Dongil Hwang, Seoul National University, KR
Authors:
Dongil Hwang, Myonghoon Yang, Seongil Jeon, Younghan Lee, Donghyun Kwon and Yunheung Paek, Dept. of Electrical and Computer Engineering and Inter-University Semiconductor Research Center (ISRC), Seoul National University, KR
Abstract
The OS kernel is typically the assumed trusted computing base in a system. Consequently, when they try to protect the kernel, developers often build their solutions in a separate secure execution environment externally located and protected by special hardware. Due to limited visibility into the host system, the external solutions basically all entail the semantic gap problem which can be easily exploited by an adversary to circumvent them. Thus, for complete kernel protection against such adversarial exploits, previous solutions resorted to aggressive techniques that usually come with various adverse side effects, such as high performance overhead, kernel code modifications and/or excessively complicated hardware designs. In this paper, we introduce RiskiM, our new hardware-based monitoring platform to ensure kernel integrity from outside the host system. To overcome the semantic gap problem, we have devised a hardware interface architecture, called PEMI, by which RiskiM is supplied with all internal states of the host system essential for fulfilling its monitoring task to protect the kernel even in the presence of attacks exploiting the semantic gap between the host and RiskiM. To empirically validate the security strength and performance of our monitoring platform in existing systems, we have fully implemented RiskiM in a RISC-V system. Our experiments show that RiskiM succeeds in the host kernel protection by detecting even the advanced attacks which could circumvent previous solutions, yet suffering from virtually no aforementioned side effects.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:156.5.4SACHA: SELF-ATTESTATION OF CONFIGURABLE HARDWARE
Speaker:
Jo Vliegen, imec-COSIC/ESAT, KU Leuven, BE
Authors:
Jo Vliegen1, Md Masoom Rabbani2, Mauro Conti2 and Nele Mentens1
1KU Leuven, BE; 2University of Padua, IT
Abstract
Device attestation is a procedure to verify whether an embedded device is running the intended application code. This way, protection against both physical attacks and remote attacks on the embedded software is aimed for. With the wide adoption of Field-Programmable Gate Arrays or FPGAs, hardware also became configurable, and hence susceptible to attacks (just like software). In addition, an upcoming trend for hardware-based attestation is the use of configurable FPGA hardware. Therefore, in order to attest a whole system that makes use of FPGAs, the status of both the software and the hardware needs to be verified, without the availability of a tamper-resistant hardware module. In this paper, we propose a solution in which a prover core on the FPGA performs an attestation of the entire FPGA, including a self-attestation. This way, the FPGA can be used as a tamper-resistant hardware module to perform hardware-based attestation of a processor, resulting in a protection of the entire hardware/software system against malicious code updates.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP3-5, 75FUNCTIONAL ANALYSIS ATTACKS ON LOGIC LOCKING
Speaker:
Pramod Subramanyan, Indian Institute of Technology Kanpur, IN
Authors:
Deepak Sirone and Pramod Subramanyan, Indian Institute of Technology Kanpur, IN
Abstract
This paper proposes Functional Analysis attacks on state of the art Logic Locking algorithms (FALL attacks). FALL attacks use structural and functional analyses of locked circuits to identify the locking key. In contrast to past work, FALL attacks can often (90% of successful attempts in our experiments) fully defeat locking by only analyzing the locked netlist, without oracle access to an activated circuit. Experiments show that FALL attacks succeed against 65 out of 80 (81%) of circuits locked using Secure Function Logic Locking (SFLL), the only combinational logic locking algorithm resilient to known attacks.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:31IP3-6, 178SIGATTACK: NEW HIGH-LEVEL SAT-BASED ATTACK ON LOGIC ENCRYPTIONS
Speaker:
Hai Zhou, Northwestern University, US
Authors:
Yuanqi Shen1, You Li1, Shuyu Kong2, Amin Rezaei1 and Hai Zhou1
1Northwestern University, US; 2northwestern university, CN
Abstract
Logic encryption is a powerful hardware protection technique that uses extra key inputs to lock a circuit from piracy or unauthorized use. The recent discovery of the SAT-based attack with Distinguishing Input Pattern (DIP) generation has rendered all traditional logic encryptions vulnerable, and thus the creation of new encryption methods. However, a critical question for any new encryption method is whether security against the DIP-generation attack means security against all other attacks. In this paper, a new high-level SAT-based attack called SigAttack has been discovered and thoroughly investigated. It is based on extracting a key-revealing signature in the encryption. A majority of all known SAT-resilient encryptions are shown to be vulnerable to SigAttack. By formulating the condition under which SigAttack is effective, the paper also provides guidance for the future logic encryption design.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


6.6 Intelligent Wearable and Implantable Sensors for Augmented Living

Date: Wednesday, March 27, 2019
Time: 11:00 - 12:30
Location / Room: Room 6

Chair:
Daniela De Venuto, Politecnico di Bari, IT, Contact Daniela De Venuto

Co-Chair:
Theocharis Theocharides, University of Cyprus, CY, Contact Theocharis Theocharides

This session brings together a set of novel technologies that exploit artificial intelligence and data analytics on low-power wearable and implantable sensors, for real-time augmented living and assistive healthcare.

TimeLabelPresentation Title
Authors
11:006.6.1LAELAPS: AN ENERGY-EFFICIENT SEIZURE DETECTION ALGORITHM FROM LONG-TERM HUMAN IEEG RECORDINGS WITHOUT FALSE ALARMS
Speaker:
Alessio Burrello, Department of Information Technology and Electrical Engineering, ETH Zurich, CH
Authors:
Alessio Burrello1, Lukas Cavigelli2, Kaspar Schindler3, Luca Benini2 and Abbas Rahimi2
1Department of Information Technology and Electrical Engineering, ETH Zurich, CH; 2Department of Information Technology and Electrical Engineering, ETH Zurich, CH; 3Sleep-Wake-Epilepsy-Center, Department of Neurology, Inselspital, Bern University Hospital, University Bern., CH
Abstract
We propose Laelaps, an energy-efficient and fast learning algorithm with no false alarms for epileptic seizure detection from long-term intracranial electroencephalography (iEEG) signals. Laelaps uses end-to-end binary operations by exploiting symbolic dynamics and brain-inspired hyperdimensional computing. Laelaps's results surpass those yielded by state-of-the-art (SoA) methods [1], [2], [3], including deep learning, on a new very large dataset containing 116 seizures of 18 drug-resistant epilepsy patients in 2656 hours of recordings—each patient implanted with 24 to 128 iEEG electrodes. Laelaps trains 18 patient-specific models by using only 24 seizures: 12 models are trained with one seizure per patient, the others with two seizures. The trained models detect 79 out of 92 unseen seizures without any false alarms across all the patients as a big step forward in practical seizure detection. Importantly, a simple implementation of Laelaps on the Nvidia Tegra X2 embedded device achieves 1.7x-3.9x faster execution and 1.4x-2.9x lower energy consumption compared to the best result from the SoA methods. Our source code and anonymized iEEG dataset are freely available at http://ieeg-swez.ethz.ch.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:306.6.2AUTOMATIC TIME-FREQUENCY ANALYSIS OF MRPS FOR MIND-CONTROLLED MECHATRONIC DEVICES
Speaker:
Giovanni Mezzina, Politecnico di Bari, IT
Authors:
Daniela De Venuto and Giovanni Mezzina, Politecnico di Bari, IT
Abstract
This paper describes the design, implementation and in vivo test of a novel Brain Computer Interface (BCI) for the mechatronic devices control. The method exploits electroencephalogram acquisitions (EEG), and specifically the Movement Related Potentials (MRPs) (i.e., μ and β rhythms), to actuate the user intention on the mechatronic device. The EEG data are collected by only five wireless smart electrodes positioned on the central and parietal cortex area. The acquired data are analyzed by an innovative single-trial classification algorithm that, with respect to the current state of the art, strongly reduces the training time (Minimum: ~1 h, reached: 10 min), as well as the acquisition time - after stimulus - for a reliable classification (Typical: 4-8 s reached: 2 s). As first step, the algorithm performs an EEG time-frequency analysis in the selected bands, making the data suitable for further computations. The implemented machine learning (ML) stage consists of: (i) dimensionality reduction; (ii) statistical inference-based features extraction (FE); (iii) classification model selection. It is also proposed a dedicated algorithm, the MLE-RIDE, for the dimensionality reduction that, jointly with statistical analyses, digitalize the μ and β rhythms, performing the features extraction. Finally, the best support vector machine (SVM) model is selected and used in the on-line classification. As proof of concept, two mechatronic devices have been brain-controlled by using the proposed BCI algorithm: a three-finger robotic hand and an acrylic prototype car. The experimental results, obtained with data from 3 subjects (aged 26±1), showed an accuracy on human will wireless detection of 87.4%, in the real-time binary discrimination, with 33.7ms of computation times.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:006.6.3A SELF-LEARNING METHODOLOGY FOR EPILEPTIC SEIZURE DETECTION WITH MINIMALLY-SUPERVISED EDGE LABELING
Speaker:
Damián Pascual, EPFL, ES
Authors:
Damian Pascual1, Amir Aminifar2 and David Atienza1
1EPFL, CH; 2Swiss Federal Institute of Technology Lausanne (EPFL), CH
Abstract
Epilepsy is one of the most common neurological disorders and affects over 65 million people worldwide. Despite the continuing advances in anti-epileptic treatments, one third of the epilepsy patients live with drug resistant seizures. Besides, the mortality rate among epileptic patients is 2 - 3 times higher than in the matching group of the general population. Wearable devices offer a promising solution for the detection of seizures in real time so as to alert family and caregivers to provide immediate assistance to the patient. However, in order for the detection system to be reliable, a considerable amount of labeled data is needed to train it. Labeling epilepsy data is a costly and time-consuming process that requires manual inspection and annotation of electroencephalogram (EEG) recordings by medical experts. In this paper, we present a self-learning methodology for epileptic seizure detection without medical supervision. We propose a minimally-supervised algorithm for automatic labeling of seizures in order to generate personalized training data. We demonstrate that the median deviation of the labels from the ground truth is only 10.1 seconds or, equivalently, less than 1% of the signal length. Moreover, we show that training a real-time detection algorithm with data labeled by our algorithm produces a degradation of less than 2.5% in comparison to training it with data labeled by medical experts. We evaluated our methodology on a wearable platform and achieved a lifetime of 2.59 days on a single battery charge.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP3-7, 1005ZEROPOWERTOUCH: ZERO-POWER SMART RECEIVER FOR TOUCH COMMUNICATION AND SENSING FOR INTERNET OF THING AND WEARABLE APPLICATIONS
Speaker:
Michele Magno, ETH Zurich, CH
Authors:
Philipp Mayer, Raphael Strebel and Michele Magno, ETH Zurich, CH
Abstract
The human body can be used as a transmission medium for electric fields. By applying an electric field with a frequency of decades of megahertz to isolated electrodes on the human body, it is possible to send energy and data. Extra body and intra-body communication is an interesting alternative way to communicate in a wireless manner in the new era of wearable device and internet of things. In fact, this promising communication works without the need to design a dedicate radio hardware and with a lower power consumption. We designed and implemented a novel zero-power receiver targeting intra-body and extra-body wireless communication and touch sensing. To achieve zero-power and always-on working, we combined ultra-low power design and an energy-harvesting subsystem, which extracts energy directly from the received message. This energy is then employed to supply the whole receiver to demodulate the message and to perform data processing with a digital logic. The main goal of the proposed design is ideal to wake up external logic only when a specific address is received. Moreover, due to the presence of the digital logic, the designed zero-power receiver can implement identification and security algorithms. The zero-power receiver can be used either as an always-on touch sensor to be deployed in the field or as a body communication wake up smart and secure devices. A working prototype demonstrates the zero-power working, the communication intra-body, and extra-body, and the possibility to achieve more than 1.75m in intra-body without the use of any external battery.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:31IP3-8, 252TAILORING SVM INFERENCE FOR RESOURCE-EFFICIENT ECG-BASED EPILEPSY MONITORS
Speaker:
Lorenzo Ferretti, Università della Svizzera italiana, CH
Authors:
Lorenzo Ferretti1, Giovanni Ansaloni1, Laura Pozzi1, Amir Aminifar2, David Atienza2, Leila Cammoun3 and Philippe Ryvlin3
1USI Lugano, CH; 2EPFL, CH; 3Centre Hospitalier Universitaire Vaudois, CH
Abstract
Event detection and classification algorithms are resilient towards aggressive resource-aware optimisations. In this paper, we leverage this characteristic in the context of smart health monitoring systems. In more detail, we study the attainable benefits resulting from tailoring Support Vector Machine (SVM) inference engines devoted to the detection of epileptic seizures from ECG-derived features. We conceive and explore multiple optimisations, each effectively reducing resource budgets while minimally impacting classification performance. These strategies can be seamlessly combined, which results in 12.5X and 16X gains in energy and area, respectively, with a negligible loss, 3.2% in classification performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:32IP3-9, 418AN INDOOR LOCALIZATION SYSTEM TO DETECT AREAS CAUSING THE FREEZING OF GAIT IN PARKINSONIANS
Speaker:
Graziano Pravadelli, Dept. of Computer Science, Univ. of Verona, IT
Authors:
Florenc Demrozi1, Vladislav Bragoi1, Federico Tramarin2 and Graziano Pravadelli1
1Department of Computer Science, University of Verona, IT; 2Department of Information Engineering, University of Padua, IT
Abstract
People affected by the Parkinson's disease are often subject to episodes of Freezing of Gait (FoG) near specific areas within their environment. In order to prevent such episodes, this paper presents a low-cost indoor localization system specifically designed to identify these critical areas. The final aim is to exploit the output of this system within a wearable device, to generate a rhythmic stimuli able to prevent the FoG when the person enters a risky area. The proposed localization system is based on a classification engine, which uses a fingerprinting phase for the initial training. It is then dynamically adjusted by exploiting a probabilistic graph model of the environment.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:33IP3-10, 879ASSEMBLY-RELATED CHIP/PACKAGE CO-DESIGN OF HETEROGENEOUS SYSTEMS MANUFACTURED BY MICRO-TRANSFER PRINTING
Speaker:
Tilman Horst, Technische Universität Dresden, DE
Authors:
Robert Fischbach, Tilman Horst and Jens Lienig, Technische Universität Dresden, DE
Abstract
Technologies for heterogeneous integration have been promoted as an option to drive innovation in the semiconductor industry. However, adoption by designers is lagging behind and market shares are still low. Alongside the lack of appropriate design tools, high manufacturing costs are one of the main reasons. µTP is a novel and promising micro-assembly technology that enables the heterogeneous integration of dies originating from different wafers. This technology uses an elastomer stamp to transfer dies in parallel from source wafers to their target positions, indicating a high potential for reducing manufacturing time and cost. In order to achieve the latter, the geometrical interdependencies between source, target and stamp and the resulting wafer utilization must be considered during design. We propose an approach to evaluate a given µTP design with regard to the manufacturing costs. We achieve this by developing a model that integrates characteristics of the assembly process into the cost function of the design. Our approach can be used as a template how to tackle other assembly-related co-design issues -- addressing an increasingly severe cost optimization problem of heterogeneous systems design.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


6.7 How Secure and Verified is your Cyber-Physical System?

Date: Wednesday, March 27, 2019
Time: 11:00 - 12:30
Location / Room: Room 7

Chair:
Wanli Chang, University of York, GB, Contact Wanli Chang

Co-Chair:
Mingsong Chen, East China Normal University, CN, Contact Mingsong Chen

The session addresses security and verification aspects for the design of modern cyber-physical systems. Conditional Generative Adversarial Networks are used to increase security. Lightweight machine learning is used to detect malware at network node level. Stochastic model predictive control is used to limit malware diffusion in the network. Bounded model checking with linear programming is used for on-line verification of the cyber-physical system.

TimeLabelPresentation Title
Authors
11:006.7.1GAN-SEC: GENERATIVE ADVERSARIAL NETWORK MODELING FOR THE SECURITY ANALYSIS OF CYBER-PHYSICAL PRODUCTION SYSTEMS
Speaker:
Mohammad Al Faruque, University of California, Irvine, US
Authors:
Sujit Rokka Chhetri, Anthony Bahadir Lopez, Jiang Wan and Mohammad Al Faruque, University of California, Irvine, US
Abstract
Cyber-Physical Production Systems (CPPS) will usher a new era of smart manufacturing. However, CPPS will be vulnerable to cross-domain attacks due to the interactions between the cyber and physical domains. To address the challenges of modeling cross-domain security in CPPS, we are proposing GAN-Sec, a novel conditional Generative Adversarial Network based modeling approach to abstract and estimate the relations between the cyber and physical domains. Using GAN-Sec, we are able to determine if various security requirements such as confidentiality, availability, and integrity are met. We provide a security analysis of an additive manufacturing system to demonstrate the applicability of GAN-Sec.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:306.7.2LIGHTWEIGHT NODE-LEVEL MALWARE DETECTION AND NETWORK-LEVEL MALWARE CONFINEMENT IN IOT NETWORKS
Speaker:
Sai Manoj Pudukotai Dinakarrao, George Mason University, US
Authors:
Sai Manoj Pudukotai Dinakarrao1, Hossein Sayadi1, Hosein Mohammadi Makrani2, Cameron Nowzari1, Setareh Rafatirad1 and Houman Homayoun1
1George Mason University, US; 2George Mason university, US
Abstract
The sheer size of IoT networks being deployed today presents an "attack surface'' and poses significant security risks at a scale never before encountered. In other words, a single device/node in a network that becomes infected with Malware has the potential to spread Malware across the network, eventually ceasing the network functionality. Simply detecting and quarantining the Malware in IoT networks does not guarantee to prevent Malware propagation. On the other hand, use of traditional control theory for Malware confinement is not effective, as most of the existing works do not consider real-time Malware control strategies that can be implemented using uncertain infection information of the nodes in the network or have the containment problem decoupled from network performance. In this work, we propose a two-pronged approach, where a runtime malware detector (HMD) that employs Hardware Performance Counter (HPC) values to detect the Malware and benign applications is devised. This information is fed during runtime to a stochastic model predictive controller to confine the malware propagation without hampering the network performance. With the proposed solution, a runtime Malware detection accuracy of 80% with a runtime of 10ns is achieved, which is an order of magnitude faster than existing Malware detection solutions. Synthesizing this output with the model predictive containment strategy lead to achieving an average network throughput of nearly 200% of that of IoT networks without any embedded defense.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:006.7.3INCREMENTAL ONLINE VERIFICATION OF DYNAMIC CYBER-PHYSICAL SYSTEMS
Speaker:
Lei Bu, Nanjing University, CN
Authors:
Lei Bu1, Shaopeng Xing1, Xinyue Ren1, Yang Yang1, Qixin Wang2 and Xuandong Li1
1Nanjing University, CN; 2Dept. of Computing, The Hong Kong Polytechnic Univ., HK
Abstract
Periodically online verification has been widely recognized as a practical and promising method to handle the non-deterministic and unpredictable behavior of dynamic CPS systems. However, it is a challenge to keep the online verification of CPS systems finishing quickly in time to give enough time for the running system to respond, if any error is detected. Nevertheless, the problems under verification for each cycle are highly similar to each other. Most of the differences are caused by run-time factors like changing of parameters' values or the reorganization of active components in the system. Under this investigation, this paper presents an incremental verification technique for online verification of CPS systems. A method is given to distinguish the differences between the problem under verification and the previous verified problem. Then, by reusing the problem space of the previous verified problem as a warm-start base, the modified part can be introduced into the base, which can be solved incrementally and efficiently. A set of case studies on a real-case train control system is presented in this paper to demonstrate the performance of the incremental online verification technique.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:156.7.4SELF-SECURED CONTROL WITH ANOMALY DETECTION AND RECOVERY IN AUTOMOTIVE CYBER-PHYSICAL SYSTEMS
Speaker:
Korosh Vatanparvar, University of California, Irvine, US
Authors:
Korosh Vatanparvar and Mohammad Al Faruque, University of California, Irvine, US
Abstract
Cyber-Physical Systems (CPS) are growing with added complexity and functionality. Multidisciplinary interactions with physical systems are the major keys to CPS. However, sensors, actuators, controllers, and wireless communications are prone to attacks that compromise the system. Machine learning models have been utilized in controllers of automotive to learn, estimate, and provide the required intelligence in the control process. However, their estimation is also vulnerable to the attacks from physical or cyber domains. They have shown unreliable predictions against unknown biases resulted from the modeling. In this paper, we propose a novel control design using conditional generative adversarial networks that will enable a self-secured controller to capture the normal behavior of the control loop and the physical system, detect the anomaly, and recover from them. We experimented our novel control design on a self-secured BMS by driving a Nissan Leaf S on standard driving cycles while under various attacks. The performance of the design has been compared to the state-of-the-art; the self-secured BMS could detect the attacks with 83% accuracy and the recovery estimation error of 21% on average, which have improved by 28% and 8%, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


6.8 TETRAMAX: Smart funding for digitalization of Europe's Industry

Date: Wednesday, March 27, 2019
Time: 11:00 - 12:30
Location / Room: Exhibition Theatre

Organisers:
Luca Fanucci, University of Pisa, IT, Contact Luca Fanucci
Bernd Janson, ZENIT GmbH, DE, Contact Bernd Janson

Moderator:
Luca Fanucci, University of Pisa, IT, Contact Luca Fanucci

One of the most demanding challenge for Europe's Industry is to implement information technologies. Besides technical problems during the installation and replacement of analogue processes, digitalization touches the whole process of interaction and exchange in and outside companies. Threats like hacks with misuse of personal data, system blackouts or lack of qualified IT experts are heavily discussed and prevent many players, especially smaller SMEs, from fostering digitalization so far. But even if all those problems will be solved the question about the contribution of digitalization to CO2 reduction and lower energy consumption is still remaining. New promising innovations like blockchain technologies seem to completely fail at least from the energy saving point of view. To receive solutions for all aspects of digitalization with focus on Cyber Physical Systems (CPS) and via the instrument of Digital Innovation Hubs (DIH) the European Commission started its Smart Anything Everywhere (SAE) Initiative to foster transfer from research to business. The Horizon 2020 Innovation Action TETRAMAX is one of them offering smart and individual funding schemes for European university-industry cooperation. The technology transfer concept focuses on direct cooperation - Technology Transfer Experiments (TTX) - between universities and SMEs supported by open innovation networks and other stakeholders like investors. The session speakers will demonstrate in a pragmatic way and by use of concrete examples how technology transfer can be initiated and implemented in practice and to overcome the associated pitfalls and use the innovation opportunities. Amongst others, the goal is to motivate more stakeholders to engage in international technology transfer and become part of TETRAMAX.

During the session, TETRAMAX representatives will share their experiences and insights as researcher, founder, entrepreneur, investor or consultant.

TimeLabelPresentation Title
Authors
11:006.8.1PRESENTATION OF TETRAMAX
Speaker:
Rainer Leupers, RWTH Aachen, DE
Abstract

TETRAMAX as part of the SAE Initiative started in 2017 and is funded by Horizon 2020. The project supports application experiments between academia and industry (SMEs) related to Internet of Things (IoT) technologies and focusing on customized low energy computing (CLEC).

The project builds on three major activity lines:

(1) Stimulating, organizing, co-funding, and evaluating different types of cross-border Application Experiments, providing "EU added value" via innovative CLEC technologies to first-time users and broad markets in European ICT-related industries,

(2) Building and leveraging a new European CLEC competence center network, offering technology brokerage, one-stop shop assistance and CLEC training to SMEs and mid-caps, and with a clear evolution path towards new regional digital innovation hubs where needed, and

(3) Paving the way towards self-sustainability based on pragmatic and customized long-term business plans

The project impact will be measured based on 50+ performance indicators. The immediate ambition of TETRAMAX within its duration is to support 50+ industry clients and 3rd parties in the entire EU with innovative technologies, leading to an estimated revenue increase of 25 Mio. € based on 50+ new or improved CLEC-based products, 10+ entirely new businesses/SMEs initiated, as well as 30+ new permanent jobs and significant cost and energy savings in product manufacturing. Moreover, in the long term, TETRAMAX will be the trailblazer towards a reinforced, profitable, and sustainable ecosystem infrastructure, providing CLEC competence, services and a continuous innovation stream at European scale, yet with strong regional presence as favoured by SMEs.

11:156.8.2EVERMORE
Speaker:
Davide Rossi, Università di Bologna, IT
Abstract

EVErMORE: Energy-efficient Variation awarE MulticORE: EVErMORE TTX experiment aims at developing the next generation [GAP-8 IoT processor from GreenWaves Technologies](https://greenwaves-technologies.com/en/gap8-the-internet-of-things-iot-a...). Exploiting the adaptive management architecture for process and temperature compensation developed at University of Bologna, coupled to the low-voltage capabilities of 22nm FD-SOI technology, is expected to improve the energy efficiency of current generation GreenWaves Technology processors by 6x, enabling new applications and opening new market opportunities.

11:306.8.3CARROTS
Speaker:
Antonio Solinas, Lifely, IT
Abstract

Carrots: Cooperative ARchitecture for gaRdening with Open moniToring Systems: Tomappo is a digital gardening assistant enabling anyone to grow their own vegetables. Within Carrots, Lifely's social sensors will be customized for use with Tomappo. This will add new dimension to Tomappo leading to a better product for users and new revenue stream for company receiving technology, while also benefit the owner of the technology by providing a new use-case for their sensors.

11:456.8.4TETRAWIN
Speaker:
Neven Rusković, Spica Sustativi d.o.o., HR
Abstract

TETRaWIN: TEchnology Transfer of computational-Rfid Wirelessly-powered IoT Nodes: This TTX will transfer the University of Salento recognized skills on wirelessly-powered Computational-RFID technology for IoT to the SME Spica, by defining a new cost-effective battery-less CLEC-based tag enabling the smart traceability of fresh and frozen fish. By performing computation, communication, and sensing to check the food product integrity, the solution will improve the SME business.

12:006.8.5EUROLAB4HPC - JOINING FORCES TOWARDS EUROPEAN LEADERSHIP IN EXASCALE COMPUTING SYSTEMS
Speaker:
Per Stenström, Chalmers University of Technology, SE
Abstract

High-Performance Computing (HPC) systems are of vital importance to the progress of science and technology. Europe has made significant progress in becoming a leader in HPC through industrial and application providers. In addition, ETP4HPC is driving a European HPC vision towards exascale systems. Despite such gains, excellence in HPC systems research is fragmented across Europe. Eurolab4HPC has the bold overall goal to strengthen academic research excellence and innovation in HPC in Europe.

This talk will highlight the key instruments used to structure the European HPC community;  to promote entrepreneurship by building an innovation pipeline from general purpose entrepreneurial training, business prototyping, business plan development and helping with funding and to stimulate technology transfer by connecting with other technology transfer activities and providing competitive seed funding for HPC technology transfer. With EuroLab4HPC the objective is to raise the visibility of the European HPC community, attract more participants and eventually grow the interest and impact of Europe's Exascale research.

12:156.8.6OPEN INNOVATION BUSINESS BASED ON EFFICIENT NETWORKING
Speaker:
Bernd Janson, ZENIT GmbH, DE
Abstract

Open innovation is based on strong networks between academia and industry. ICT developments depend greatly on open innovation due to short innovation cycles and strong competition. To build an open innovation network which operates in a regional, national and international context was the idea behind the Enterprise Europe Network which started in 2008. The overall aim is to support the competitiveness and innovation capabilities of SMEs in Europe. Today, the Enterprise Europe Network is the largest innovation network in the world. It addresses every need in the whole value chain of the innovation process - from idea to product. Bernd Janson will explain how 600 partners and over 6000 consultants worldwide work together to improve the performance of SMEs in Europe. He also explains the Network's role within Tetramax.

12:30End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


7.0 LUNCH TIME KEYNOTE SESSION

Date: Wednesday, March 27, 2019
Time: 13:50 - 14:20
Location / Room: Room 1

Chair:
Christoph Hagleitner, IBM Research, CH, Contact Christoph Hagleitner

Co-Chair:
Christian Plessl, Paderborn University, DE, Contact Christian Plessl

TimeLabelPresentation Title
Authors
13:457.0.0CEDA LUNCHEON ANNOUNCEMENT
Speaker:
David Atienza, EPFL, CH
13:507.0.1HETEROGENEOUS, HIGH SCALE COMPUTING IN THE ERA OF INTELLIGENT, CLOUD-CONNECTED DEVICES
Author:
David Pellerin, Amazon, US
Abstract
Rapid advances in connected devices, coupled with machine learning and "data lake" methods of advanced analytics, have led to an explosion in demand for non-traditional, highly scalable computing and storage platforms. This increasing demand is being seen in the public cloud as well as in cloud-connected IoT edge devices. AI/ML is at the heart of many the newest, most advanced analytics and IoT applications, ranging from robotics and autonomous vehicles, to cloud-connected products such as Alexa, to smart factories and consumer-facing services in the financial and healthcare sectors. In support of these important workloads, alternative methods of computing are being deployed in the cloud and at the edge. These alternative, heterogeneous computing methods include CPUs, GPUs, FPGAs, and other emerging acceleration technologies. This talk presents examples of such use-cases within Amazon, as well examples of how Amazon customers increasingly rely on AI/ML, accelerated using alternative computing methods and coupled with smart, cloud-connected devices to create next-generation intelligent products. The talk will conclude with examples of how cloud-based semiconductor design is being enhanced using these same methods.
14:20End of session
16:00Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


7.1 Special Day on "Embedded Meets Hyperscale and HPC" Session: Tools and Runtime Systems

Date: Wednesday, March 27, 2019
Time: 14:30 - 16:00
Location / Room: Room 1

Chair:
Christian Plessl, Paderborn University, DE, Contact Christian Plessl

Co-Chair:
Christoph Hagleitner, IBM Research, CH, Contact Christoph Hagleitner

Programming and operating heterogeneous computing systems that use multiple computing resources poses additional challenges to the programmer, e.g. handling different programming and execution models, partitioning application to exploit the strength of each resource type, or modeling and optimizing the overall and efficiency. In this session, we will discuss tools and runtime systems that support the developer with these tasks by raising the level of abstraction for application specification

TimeLabelPresentation Title
Authors
14:307.1.1EXTREME HETEROGENEITY IN HIGH PERFORMANCE COMPUTING
Speaker and Author:
Jeffrey S Vetter, Oak Ridge National Laboratory, US
Abstract
Concerns about energy-efficiency and cost are forcing our community to reexamine system architectures, including the memory and storage hierarchy. While computing technologies have remained relatively stable for nearly two decades, new architectural features, such as heterogeneous cores, deep memory hierarchies, non-volatile memory (NVM), and near-memory processing, have emerged as possible solutions to address these concerns. However, we expect this "golden age"of architectural change to lead to extreme heterogeneity and it will have a major impact on software systems and applications. Software will need to be redesigned to exploit these new capabilities and provide some level of performance portability across these diverse architectures. In this talk, I will sample these emerging memory technologies, discuss their architectural and software implications, and describe several new approaches to address these challenges. One programming system we have designed allows users to program FPGAs using C and OpenACC directives, which facilitates portability to GPUs and CPUs. Another system is Papyrus (Parallel Aggregate Persistent -yru- Storage); it is a programming system that aggregates NVM from across the system for use as application data structures, such as vectors and key-value stores, while providing performance portability across emerging NVM hierarchies.
15:007.1.2HOMOGENIZING HETEROGENEITY: THE OMPSS APPROACH
Speaker and Author:
Jesus Labarta, Barcelona Supercomputing Center, ES
Abstract
Initially aiming at the node level parallelization of HPC science and engineering codes, OmpSs has been proposing programming model features to enable the incremental migration of applications to the recent and foreseeable architectures. Heterogeneity is and will be an important characteristic of such systems but the spectrum of future devices is wide and open. In this context, ensuring programmer productivity and quality of life as well as code portability requires mechanisms that make heterogeneous systems appear as uniform as possible. The OmpSs task based approach provides such homogenization of heterogeneity, enabling the execution of a single program on nodes with just multicores or including GPUs or FPGAs.
15:307.1.3AUTOMATIC CODE RESTRUCTURING FOR FPGAS: CURRENT STATUS, TRENDS AND OPEN ISSUES
Speaker and Author:
João M. P. Cardoso, University of Porto/FEUP, PT
Abstract
The customization features, large scale parallel computing power, heterogeneity, and hardware reconfigurability of FPGAs make them suitable computing platforms in many application domains, from high-performance to embedded computing. FPGAs are not only able to provide hardware acceleration to algorithms but to also provide complete system solutions with low cost and efficient performance/energy tradeoffs. In recent years we witnessed significant maturity levels in high-level synthesis (HLS) and in FPGA design flows, helping the mapping of computations to FPGAs. However, in order that HLS tools are able to achieve efficient FPGA implementations, applications source code typically needs substantial code restructuring/refactoring. This is neither a simple task for software developers nor for compilers and its automation has become an important line of research. This presentation will start by motivating the investment on source-to-source compilers and then will focus on some of the problems regarding automatic code restructuring. We will focus on the automatic code restructuring improvements over the last years, the trends, the challenges, and on the aspects that make automatic code restructuring an exciting research subject. Finally, we will show our recent and promising approach to automatic code restructuring.
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


7.2 Accelerators using novel memory technologies

Date: Wednesday, March 27, 2019
Time: 14:30 - 16:00
Location / Room: Room 2

Chair:
Mladen Berekovic, TU Braunschweig, DE, Contact Mladen Berekovic

Co-Chair:
Andrea Marongiu, Università di Bologna, IT, Contact Andrea Marongiu

The session focuses on accelerating three complex applications. They all use novel combination of memory and computing elements to achieve this goal. The first paper focuses on pattern matching and proposes to use resistive-RAM (RRAM) to accelerate an Automata Processor (AP). The second one focuses on Homomorphic Encryption (HE) and improves the performance and energy consumption using near-data processing on a 3D-stacked memory (Hybrid Memory Cube). The third paper focuses on Inference (DCNN) proposes a 3D-stacked neuromorphic architecture consisting of Processing Elements and multiple DRAM layers.

TimeLabelPresentation Title
Authors
14:307.2.1TIME-DIVISION MULTIPLEXING AUTOMATA PROCESSOR
Speaker:
Mottaqiallah Taouil, Delft University of Technology, NL
Authors:
Jintao Yu, Hoang Anh Du Nguyen, Muath Abu Lebdeh, Mottaqiallah Taouil and Said Hamdioui, Delft University of Technology, NL
Abstract
Automata Processor (AP) is a special implementation of non-deterministic finite automata that performs pattern matching by exploring parallel state transitions. The implementation typically contains a hierarchical switching network, causing long latency. This paper proposes a methodology to split such a hierarchical switching network into multiple pipelined stages, making it possible to process several input sequences in parallel by using time-division multiplexing. We use a new resistive RAM based AP (instead of known DRAM or SRAM based) to illustrate the potential of our method. The experimental results show that our approach increases the throughput by almost a factor of 2 at a cost of marginal area overhead.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:007.2.2NEAR-DATA ACCELERATION OF PRIVACY-PRESERVING BIOMARKER SEARCH WITH 3D-STACKED MEMORY
Speaker:
Alvin Oliver Glova, University of California, Santa Barbara, US
Authors:
Alvin Oliver Glova, Itir Akgun, Shuangchen Li, Xing Hu and Yuan Xie, University of California, Santa Barbara, US
Abstract
Homomorphic encryption is a promising technology for enabling various privacy-preserving applications such as secure biomarker search. However, current implementations are not practical due to large performance overheads. A homomorphic encryption scheme has recently been proposed that allows bitwise comparison without the computationally-intensive multiplication and bootstrapping operations. Even so, this scheme still suffers from memory-bound performance bottleneck due to large ciphertext expansion. In this work, we propose HEGA, a near-data processing architecture that leverages this scheme with 3D-stacked memory to accelerate privacy-preserving biomarker search. We observe that homomorphic encryption-based search, like other emerging applications, can greatly benefit from the large throughput, capacity, and energy savings of 3D-stacked memory-based near-data processing architectures. Our near-data acceleration solution can speed up biomarker search by 6.3x with 5.7x energy savings compared to an 8-core Intel Xeon processor.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:307.2.3TOWARDS CROSS-PLATFORM INFERENCE ON EDGE DEVICES WITH EMERGING NEUROMORPHIC ARCHITECTURE
Speaker:
Yi Wang, Shenzhen University, CN
Authors:
Shangyu Wu1, Yi Wang1, Amelie Chi Zhou1, Rui Mao1, Zili Shao2 and Tao Li3
1Shenzhen University, CN; 2The Chinese University of Hong Kong, HK; 3University of Florida, US
Abstract
Deep convolutional neural networks have become the mainstream solution for many artificial intelligence appli- cations. However, they are still rarely deployed on mobile or edge devices due to the cost of a substantial amount of data movement among limited resources. The emerging processing-in-memory neuromorphic architecture offers a promising direction to accelerate the inference process. The key issue becomes how to effectively allocate the processing of inference between computing and storage resources for mobile edge computing. This paper presents Mobile-I, a resource allocation scheme to accelerate the Inference process on Mobile or edge devices. Mobile-I targets at the emerging 3D neuromorphic architecture to reduce the processing latency among computing resources and fully utilize the limited on-chip storage resources. We formulate the target problem as a resource allocation problem and use a software-based solution to offer the cross-platform deployment across multiple mobile or edge devices. We conduct a set of experiments using realistic workloads that are generated from Intel Movidius neural compute stick. Experimental results show that Mobile-I can effectively reduce the processing latency and improve the utilization of computing resources with negligible overhead in comparison with representative schemes.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP3-11, 445VISUAL INERTIAL ODOMETRY AT THE EDGE - A HARDWARE-SOFTWARE CO-DESIGN APPROACH FOR ULTRA-LOW LATENCY AND POWER
Speaker:
Dipan Kumar Mandal, Intel Corporation, IN
Authors:
Dipan Kumar Mandal1, Srivatsava Jandhyala1, Om J Omer1, Gurpreet S Kalsi1, Biji George1, Gopi Neela1, Santhosh Kumar Rethinagiri1, Sreenivas Subramoney1, Hong Wong2, Lance Hacking2 and Belliappa Kuttanna2
1Intel Corporation, IN; 2Intel Corporation, US
Abstract
Visual Inertial Odometry (VIO) is used for estimating pose and trajectory of a system and is a foundational requirement in many emerging applications like AR/VR, autonomous navigation in cars, drones and robots. In this paper, we analyze key compute bottlenecks in VIO and present a highly optimized VIO accelerator based on a hardware-software co-design approach. We detail a set of novel micro-architectural techniques that optimize compute, data movement, bandwidth and dynamic power to make it possible to deliver high quality of VIO at ultra-low latency and power required for budget constrained edge devices. By offloading the computation of the critical linear algebra algorithms from the CPU, the accelerator enables high sample rate IMU usage in VIO processing while acceleration of image processing pipe increases precision, robustness and reduces IMU induced drift in final pose estimate. The proposed accelerator requires a small silicon footprint (1.3 mm2 in a 28nm process at 600 MHz), utilizes a modest on-chip shared SRAM (560KB) and achieves 10x speedup over a software-only implementation in terms of image sample-based pose update latency while consuming just 2.2 mW power. In a FPGA implementation, using the EuRoC VIO dataset (VGA 30fps images and 100Hz IMU) the accelerator design achieves pose estimation accuracy (loop closure error) comparable to a software based VIO implementation.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:01IP3-12, 263CAPSACC: AN EFFICIENT HARDWARE ACCELERATOR FOR CAPSULENETS WITH DATA REUSE
Speaker:
Alberto Marchisio, Vienna University of Technology (TU Wien), AT
Authors:
Alberto Marchisio, Muhammad Abdullah Hanif and Muhammad Shafique, Vienna University of Technology (TU Wien), AT
Abstract
Recently, CapsuleNets have overtaken traditional Deep Convolutional Neural Networks (CNNs), because of their improved generalization ability due to the multi-dimensional capsules, in contrast to the single-dimensional neurons. Consequently, CapsuleNets also require extremely intense matrix computations, making it a gigantic challenge to achieve high performance. In this paper, we propose CapsAcc, the first specialized CMOS-based hardware architecture to perform CapsuleNets inference with high performance and energy efficiency. State-of-the-art convolutional CNN accelerators would not work efficiently for CapsuleNets, as their designs do not account for unique processing nature of CapsuleNets involving multi-dimensional matrix processing, squashing and dynamic routing. Our architecture exploits the massive parallelism by flexibly feeding the data to a specialized systolic array according to the operations required in different layers. It also avoids extensive load and store operations on the on-chip memory, by reusing the data when possible. We synthesized the complete CapsAcc architecture in a 32nm CMOS technology using Synopsys design tools, and evaluated it for the MNIST benchmark (as also done by the original CapsuleNet paper) to ensure consistent and fair comparisons. This work enables highly-efficient CapsuleNets inference on embedded platforms.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:02IP3-13, 576SDCNN: AN EFFICIENT SPARSE DECONVOLUTIONAL NEURAL NETWORK ACCELERATOR ON FPGA
Speaker:
Suk-Ju Kang, Sogang University, KR
Authors:
Jung-Woo Chang, Keon-Woo Kang and Suk-Ju Kang, Sogang University, KR
Abstract
Generative adversarial networks (GANs) have shown excellent performance in image generation applications. GAN typically uses a new type of neural network called deconvolutional neural network (DCNN). To implement DCNN in hardware, the state-of-the-art DCNN accelerator optimizes the dataflow using DCNN-to-CNN conversion method. However, this method still requires high computational complexity because the number of feature maps is increased when converted from DCNN to CNN. Recently, pruning has been recognized as an effective solution to reduce the high computational complexity and huge network model size. In this paper, we propose a novel sparse DCNN accelerator (SDCNN) combining these approaches on FPGA. First, we propose a novel dataflow suitable for the sparse DCNN acceleration by loop transformation. Then, we introduce a four stage pipeline for generating the SDCNN model. Finally, we propose an efficient architecture based on SDCNN dataflow. Experimental results on DCGAN show that SDCNN achieves up to 2.63 times speedup over the state-of-the-art DCNN accelerator.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


7.3 CPU and GPU microarchitecture dependability

Date: Wednesday, March 27, 2019
Time: 14:30 - 16:00
Location / Room: Room 3

Chair:
Michail Maniatakos, NYU Abu Dhabi, UA, Contact Michail Maniatakos

Co-Chair:
Nikolaos Foutris, University of Manchester, GB, Contact Nikos Foutris

This session first focuses on the dependability of out-of-order processors and specifically in the register renaming sub-system and the L1 cache. Then, it analyzes the main requirements to enable ISO26262 ASIL-D compliance for Commercial Off-The-Shelf (COTS) GPUs.

TimeLabelPresentation Title
Authors
14:307.3.1ERROR-SHIELDED REGISTER RENAMING SUBSYSTEM FOR A DYNAMICALLY SCHEDULED OUT-OF-ORDER CORE
Authors:
Ron Gabor1, Yiannakis Sazeides2, Arkady Bramnik1, Alexandros Andreou2, Chrysostomos Nicopoulos2, Karyofyllis Patsidis3, Dimitris Konstantinou3 and Giorgos Dimitrakopoulos3
1Intel, IL; 2University of Cyprus, CY; 3Democritus University of Thrace, GR
Abstract
Emerging mission-critical and functional safety applications require high performance processors that meet strict reliability requirements against random hardware failures. These requirements touch even sub-systems within the core that, so far, may have been considered as low significance contributors to the processor failure rate. This paper identifies the register renaming sub-system of an out-of-order core as a prime example of where cost-efficient and non-intrusive protection can enable future processors to meet their reliability goals. We propose two hardware schemes that guard against failures in the register renaming sub-system of a core: a technique for the detection of random hardware errors in the physical register identifiers, and a method to recover from the detected errors.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:007.3.2LAEC: LOOK-AHEAD ERROR CORRECTION CODES IN EMBEDDED PROCESSORS L1 DATA CACHE
Authors:
Pedro Benedicte1, Carles Hernandez2, Jaume Abella2 and Francisco Cazorla2
1Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES; 2Barcelona Supercomputing Center, ES
Abstract
As implementation technology shrinks, the presence of errors in cache memories is becoming an increasing issue in all computing domains. Critical systems, e.g. space and automotive, are specially exposed and susceptible to reliability issues. Furthermore, hardware designs in these systems are migrating to multi-level cache multicore systems, in which write-through first level data (DL1) caches have been shown to heavily harm average and guaranteed performance. While write-back DL1 caches solve this problem they come with their own challenges: they need Error Correction Codes (ECC) to tolerate soft errors, but implementing DL1 ECC in simple embedded micro-controllers requires either complex hardware to squash instructions consuming erroneous data, or delayed delivery of data to correct potential errors, which impacts performance even if such process is pipelined. In this paper we present a low-complexity hardware mechanism to anticipate data fetch and error correction in DL1 so that both (1) correct data is always delivered, but (2) avoiding additional delays in most of the cases. This achieves both high guaranteed performance and an effective solutions against errors.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:307.3.3HIGH-INTEGRITY GPU DESIGNS FOR CRITICAL REAL-TIME AUTOMOTIVE SYSTEMS
Speaker:
Sergi Alcaide Portet, Universitat Politècnica de Catalunya - Barcelona Supercomputing Center, ES
Authors:
Sergi Alcaide1, Leonidas Kosmidis2, Carles Hernandez2 and Jaume Abella2
1Universitat Politècnica de Catalunya - Barcelona Supercomputing Center (BSC), ES; 2Barcelona Supercomputing Center, ES
Abstract
Autonomous Driving (AD) imposes the use of high-performance hardware, such as GPUs, to perform object recognition and tracking in real-time. However, differently to the consumer electronics market, critical real-time AD functionalities require a high degree of resilience against faults, in line with the automotive ISO26262 functional safety standard requirements. ISO26262 imposes the use of some source of independent redundancy for the most critical functionalities so that a single fault cannot lead to a failure, being dual core lockstep (DCLS) with diversity the preferred choice for computing devices. Unfortunately, GPUs do not support diverse DCLS by construction, thus failing to meet ISO26262 requirements efficiently. In this paper we propose lightweight modifications to GPUs to enable diverse DCLS for critical real-time applications without diminishing their performance for non-critical applications. In particular, we show how enabling specific mechanisms for software-controlled kernel scheduling in the GPU, allows guaranteeing that redundant kernels can be executed in different resources so that a single fault cannot lead to a failure, as imposed by ISO26262. Our results on a GPU simulator and an NVIDIA GPU prove the viability of the approach and its effectiveness on high-performance GPU designs needed for AD systems.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP3-14, 703A FINE-GRAINED SOFT ERROR RESILIENT ARCHITECTURE UNDER POWER CONSIDERATIONS
Speaker:
Sajjad Hussain, Chair for Embedded Systems, KIT, Karlsruhe, DE
Authors:
Sajjad Hussain1, Muhammad Shafique2 and Joerg Henkel1
1Karlsruhe Institute of Technology, DE; 2Vienna University of Technology (TU Wien), AT
Abstract
Besides the limited power budgets and the dark-silicon issue, soft error is one of the most critical reliability issues in computing systems fabricated using nano-scale devices. During the execution, different applications have varying performance, power/energy consumption and vulnerability properties. Different trade-offs can be devised to provide required resiliency within the allowed power constraints. To exploit this behavior, we propose a novel soft error resilient architecture and the corresponding run-time system that enables power-aware fine-grained resiliency for different processor components. It selectively determines the reliability state of various components, such that the overall application reliability is improved under a given power budget. Our architecture saves power up to 16% and reliability degradation up to 11% compared to state-of-the-art techniques.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:01IP3-15, 188FINE-GRAINED HARDWARE MITIGATION FOR MULTIPLE LONG-DURATION TRANSIENTS ON VLIW FUNCTION UNITS
Speaker:
Angeliki Kritikakou, University of Rennes 1 - IRISA/INRIA, FR
Authors:
Rafail Psiakis1, Angeliki Kritikakou1 and Olivier Sentieys2
1Univ Rennes/IRISA/INRIA, FR; 2INRIA, FR
Abstract
Technology scaling makes hardware more susceptible to radiation, which can cause multiple transient faults with long duration. In these cases, the affected function unit is usually considered as faulty and is not further used. To reduce this performance degradation, the proposed hardware mechanism detects the faults that are still active during execution and re-schedules the instructions to use the fault-free components of the affected function units. The results show multiple long-duration fault mitigation with low performance, area, and power overhead

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


7.4 Low Power Design: From Highly-Optimized Power Delivery Networks to CNN Accelerators

Date: Wednesday, March 27, 2019
Time: 14:30 - 16:00
Location / Room: Room 4

Chair:
Pascal Vivet, CEA-Leti, FR, Contact Pascal Vivet

Co-Chair:
Giuseppe Tagliavini, Università di Bologna, IT, Contact Giuseppe Tagliavini

The session presents four papers covering power optimization through the whole design stack. The first paper presents an innovative power mesh IR optimization in an advanced technology node. The second paper presents a new formulation and optimization strategy to get the best efficiency from on-chip switch cap converters for heterogeneous many cores. While the third paper presents an optimized power delivery network for 3D integrated systems. Finally, the fourth paper presents a power efficient accelerator based on associative CAMs for CNN inference.

TimeLabelPresentation Title
Authors
14:307.4.1DETAILED PLACEMENT FOR IR DROP MITIGATION BY POWER STAPLE INSERTION IN SUB-10NM VLSI
Speaker:
Minsoo Kim, UC San Diego, US
Authors:
Sun ik Heo1, Andrew Kahng2, Minsoo Kim3, Lutong Wang3 and Chutong Yang3
1Samsung Electronics Co., Ltd., KR; 2University of California San Diego, US; 3UC San Diego, US
Abstract
Power Delivery Network (PDN) is one of the most challenging topics in modern VLSI design. Due to aggressive technology node scaling, resistance of back-end-of-line (BEOL) layers increases dramatically in sub-10nm VLSI, causing high supply voltage (IR) drop. To solve this problem, pre-placed or post-placed power staples are inserted in pin-access layers to connect adjacent power rails and reduce PDN resistance, at the cost of reduced routing flexibility, or reduced power staple insertion opportunity. In this work, we propose dynamic programming-based single-row and double-row detailed placement optimizations to maximize the power staple insertion in a post-placement flow. We further propose metaheuristics to improve the quality of result. Compared to the traditional post-placement flow, we achieve up to 13.2% (10mV) reduction in IR drop, with almost no WNS degradation.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:007.4.2OPTIMIZING THE ENERGY EFFICIENCY OF POWER SUPPLY IN HETEROGENEOUS MULTICORE CHIPS WITH INTEGRATED SWITCHED-CAPACITOR CONVERTERS
Speaker:
Lu Wang, ShanghaiTech University, CN
Authors:
Lu Wang1, Leilei Wang1, Dejia Shang1, Cheng Zhuo2 and Pingqiang Zhou1
1ShanghaiTech University, CN; 2Zhejiang University, CN
Abstract
Energy efficiency is a major concern in heterogeneous multi-core chips. Due to the switching-capacitor converter (SCC) has wide output voltages and high potential ratio efficiency, they are widely used in multi-core chips. In this paper we propose the optimization of Metal-Insulator-Metal (MIM) capacitance resource allocation and converter ratio selection for SCCs to improve the power efficiency by transforming the mixed integer nonlinear programming (MINLP) problems into a series of convex problems. The experimental results show that our approach can achieve a 9%-13% improvement in power efficiency and can be applied to more complicated heterogeneous multicore scenarios.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:307.4.3POWER DELIVERY PATHFINDING FOR EMERGING DIE-TO-WAFER INTEGRATION TECHNOLOGY
Speaker:
Seungwon Kim, Ulsan National Institute of Science and Technology, KR
Authors:
Andrew B. Kahng1, Seokhyeong Kang2, Seungwon Kim3, Kambiz Samadi4 and Bangqi Xu1
1UC San Diego, US; 2Pohang University of Science and Technology, KR; 3Ulsan National Institute of Science and Technology (UNIST), KR; 4Qualcomm Technologies, Inc., US
Abstract
In advanced technology nodes, emerging die-towafer (D2W) integration technology is a promising "More Than Moore" lever for continued scaling of system capability and value. In D2W 3D IC implementation, the power delivery network (PDN) is crucial to meeting design specifications. However, determining the optimal PDN design is nontrivial. On the one hand, to meet the IR drop requirement, denser power mesh is desired. On the other hand, to meet the timing requirement for a high-utilization design, more routing resource should be available for signal routing. Moreover, additional competition between signal routing and power routing is caused by intertier vertical interconnects in 3D IC. In this paper, we propose a power delivery pathfinding methodology for emerging die-towafer integration, which seeks to identify an optimal or nearoptimal PDN for a given design and PDN specification. Our pathfinding methodology exploits models for routability and worst IR drop, which helps reduce iterations between PDN design and circuit design in 3D IC implementation. We present validations with real design examples and a 28nm foundry technology.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:457.4.4ENERGY-EFFICIENT CONVOLUTIONAL NEURAL NETWORKS VIA RECURRENT DATA REUSE
Speaker:
Luca Mocerino, Politecnico di Torino, IT
Authors:
Luca Mocerino, Valerio Tenace and Andrea Calimera, Politecnico di Torino, IT
Abstract
Deep learning (DL) algorithms have substantially improved in terms of accuracy and efficiency. Convolutional Neural Networks (CNNs) are now able to outperform traditional algorithms in computer vision tasks such as object classification, detection, recognition, and image segmentation. They represent an attractive solution for many embedded applications which may take advantage from machine-learning at the edge. Needless to say, the key to success lies under the availability of efficient hardware implementations which meet the stringent design constraints. Inspired by the way human brains process information, this paper presents a method that improves the processing efficiency of CNNs leveraging their repetitiveness. More specifically, we introduce (i) a clustering methodology that maximizes weights/activation reuse, and (ii) the design of a heterogeneous processing element which integrates a Floating-Point Unit (FPU) with an associative memory that manages recurrent patterns. The experimental analysis reveals that the proposed method achieves substantial energy savings with low accuracy loss, thus providing a practical design option that might find application in the growing segment of edge-computing.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP3-16, 717ADAPTIVE WORD REORDERING FOR LOW-POWER INTER-CHIP COMMUNICATION
Speaker:
Eleni Maragkoudaki, University of Manchester, GB
Authors:
Eleni Maragkoudaki1, Przemyslaw Mroszczyk2 and Vasilis Pavlidis3
1University of Manchester, GB; 2Qualcomm, IE; 3The University of Manchester, GB
Abstract
The energy for data transfer has an increasing effect on the total system energy as technology scales, often overtaking computation energy. To reduce the power of inter-chip interconnects, an adaptive encoding scheme called Adaptive Word Reordering (AWR) is proposed that effectively decreases the number of signal transitions, leading to a significant power reduction. AWR outperforms other adaptive encoding schemes in terms of decrease in transitions, yielding up to 73% reduction in switching. Furthermore, complex bit transition computations are represented as delays in the time domain to limit the power overhead due to encoding. The saved power outweighs the overhead beyond a moderate wire length where the I/O voltage is assumed equal to the core voltage. For a typical I/O voltage, the decrease in power is significant reaching 23% at just 1 mm.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


7.5 Reliable and Persistent: From Cache to File system

Date: Wednesday, March 27, 2019
Time: 14:30 - 16:00
Location / Room: Room 5

Chair:
Chengmo Yang, University of Delaware, US, Contact Chengmo Yang

Co-Chair:
Alexandre Levisse, EPFL - ESL, CH, Contact Alexandre Levisse

This session integrates both hardware and software optimizations aiming at enhancing reliability and performance of non-volatile caches and main memory. The first paper proposes a novel cache design to completely eliminate the accumulation of read disturbances in STT-MRAM without compromising cache performance. The second paper makes adaptive page migration decisions between DRAM and NVRAM as workload changes and hot/cold pattern varies. The third paper aims to reduce write amplification caused by frequently-updated inodes in journaling file systems, while maintaining crash consistency using persistent memory.

TimeLabelPresentation Title
Authors
14:307.5.1ENHANCING RELIABILITY OF STT-MRAM CACHES BY ELIMINATING READ DISTURBANCE ACCUMULATION
Speaker:
Hossein Asadi, Sharif University of Technology, IR
Authors:
Elham Cheshmikhani1, Hamed Farbeh2 and Hossein Asadi1
1Sharif University of Technology, IR; 2Amirkabir University of Technology, IR
Abstract
Spin-Transfer Torque Magnetic RAM (STT-MRAM) as one of the most promising replacements for SRAMs in on-chip cache memories benefits from higher density and scalability, near-zero leakage power, and non-volatility, but its reliability is threatened by high read disturbance error rate. Error-Correcting Codes (ECCs) are conventionally suggested to overcome the read disturbance errors in STT-MRAM caches. By employing aggressive ECCs and checking out a cache block on every read access, a high level of cache reliability is achieved. However, to minimize the cache access time in modern processors, all blocks in the target cache set are simultaneously read in parallel for tags comparison operation and only the requested block is sent out, if any, after checking its ECC. These extra cache block reads without checking their ECCs until requesting the blocks by the processor cause the accumulation of read disturbance error, which significantly degrades the cache reliability. In this paper, we first introduce and formulate the read disturbance accumulation phenomenon and reveal that this accumulation due to conventional parallel accesses of cache blocks significantly increases the cache error rate. Then, we propose a simple yet effective scheme, so-called Read Error Accumulation Preventer cache (REAP-cache) to completely eliminate the accumulation of read disturbances without compromising the cache performance. Our evaluations show that the proposed REAP-cache extends the cache Mean Time To Failure (MTTF) by 171x, while increases the cache area by less than 1% and energy consumption by only 2.7%

Download Paper (PDF; Only available from the DATE venue WiFi)
15:007.5.2UIMIGRATE: ADAPTIVE DATA MIGRATION FOR HYBRID NON-VOLATILE MEMORY SYSTEMS
Speaker:
Duo Liu, College of Computer Science, Chongqing University, CN
Authors:
Yujuan Tan1, Baiping Wang1, Zhichao Yan2, Qiuwei Deng1, Xianzhang Chen1 and Duo Liu1
1Chongqing University, CN; 2University of Texas Arlington, CN
Abstract
Byte-addressable, non-volatile memory (NVRAM) combines the benefits of DRAM and flash memory. Its slower speed compared to DRAM, however, makes it hard to entirely replace DRAM with NVRAM. Hybrid NVRAM systems that equip both DRAM and NVRAM on the memory bus become a better solution: frequently accessed, hot pages can be stored in DRAM while other cold pages can reside in NVRAM. This way, the system gets the benefits of both high performance (from DRAM) and lower power consumption and cost/performance (from NVRAM). Realizing an efficient hybrid NVRAM system requires careful page migration and accurate data temperature measurement. Existing solutions, however, often cause invalid migrations due to inaccurate data temperature accounting, because hot and cold pages are separately identified in DRAM and NVRAM regions. Based on this observation, we propose UIMigrate, an adaptive data migration approach for hybrid NVRAM systems. The key idea is to consider data temperature across the whole DRAMNVRAM space when determining whether a page should be migrated between DRAM and NVRAM. In addition, UIMigrate adapts workload changes by dynamically adjusting migration decisions as workload changes. Our experiments using SPEC 2006 show that UIMigrate can reduce the number of migrations and improves performance by up to 90.4% compared to existing state-of-the-art approaches.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:307.5.3REDUCING WRITE AMPLIFICATION FOR INODES OF JOURNALING FILE SYSTEMS USING PERSISTENT MEMORY
Speaker:
Xianzhang Chen, Chongqing University, CN
Authors:
Chaoshu Yang, Duo Liu, Xianzhang Chen, Runyu Zhang, Wenbin Wang, Moming Duan and Yujuan Tan, Chongqing University, CN
Abstract
Conventional journaling file systems, such as Ext4, guarantee data consistency by writing in-memory dirty inodes to block devices twice. The write back of inodes may contains up to 90% clean inode that is unnecessary to be written back, which caused severe write amplification problem and largely reduce performance since the size of an inode is several times less than the size of a basic unit for updating the block device. Emerging persistent memories (PMs), such as STT-RAM, provide the possibility for storing the offset of inodes in memory persistently. In this paper, we propose an efficient scheme, Updating Frequency based Inode Aggregation (UFIA), to reduce the write amplification of dirty inodes using PM. The main idea of UFIA is to identify the frequently-updated inodes and reorganize them in adjacent physical locations on block device. Firstly, UFIA adopts NVM as an inode mapping table for remapping logical inodes to any physical inodes. Second, we design an efficient algorithm for UFIA to identify and reorganize the frequently-updated inodes. We implement UFIA and integrate it into Ext4 (denoted by UFIA-Ext4) in Linux kernel 4.4.4. The experiments are conducted with widely-used benchmark Filebench. Compared with original Ext4, the experimental results show that UFIA significantly reduces the write amplification of inodes and improves 40% of performance on average.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


7.6 Optimization of Smart Energy Systems

Date: Wednesday, March 27, 2019
Time: 14:30 - 16:00
Location / Room: Room 6

Chair:
Davide Quaglia, University of Verona, IT, Contact Davide Quaglia

Co-Chair:
Massimo Poncino, Politecnico di Torino, IT, Contact Massimo Poncino

In this session, three approaches to optimizing smart grid and photovoltaic systems are presented, targeting cost, efficiency, and privacy.

TimeLabelPresentation Title
Authors
14:307.6.1COST/PRIVACY CO-OPTIMIZATION IN SMART ENERGY GRIDS
Speaker:
Alma Proebstl, TUM, DE
Authors:
Alma Proebstl, Sangyoung Park, Sebastian Steinhorst and Samarjit Chakraborty, TUM, DE
Abstract
The smart grid features real-time monitoring of electricity usage such that it can control the generation and distribution of electricity as well as utilize dynamic pricing in response to the demands. Smart metering systems continuously monitor the electricity usage of customers, and report it back to the Utility Provider (UP). This raises privacy concerns regarding the undesired exposure of human activity and time-of-use of home appliances. Photovoltaics (PV) and a residential Electrical Energy Storage (EES) have proven to be effective in mitigating the privacy concerns. However, this comes at several costs: Installation of PV and EES, its subsequent aging and the possibly increased electricity cost. We quantify the trade-off between privacy exposure and financial costs by formulating a stochastic dynamic programming problem. Our analysis shows that i) there is a quantifiable trade-off between the financial cost and privacy leakage, ii) proper control of the system is crucial for both metrics, iii) a strategy solely focusing on privacy results in high financial costs, and iv) that for a typical residential setting, the costs for a trade-off solution lie in the range of 600 US$-1700 US$. Load flattening is also known as peak shaving and we propose to split costs among UP and user due to the mutual benefit.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:007.6.2A LOW-COMPLEXITY FRAMEWORK FOR DISTRIBUTED ENERGY MARKET TARGETING SMART-GRID
Speaker:
Kostas Siozios, Dept. of Physics, Aristotle University of Thessaloniki, GR
Authors:
Kostas Siozios and Stylianos Siskos, Department of Physics, Aristotle University of Thessaloniki, GR
Abstract
With the increasing connection of distributed energy resources, traditional energy consumers are becoming prosumers, who can both dissipate and generate energy in a smart-grid environment. This enables the wide adoption of dynamic pricing environment, where demand and price forecasting for determining prices and scheduling loads are applied. Throughout this paper we propose a Peer-to-Peer (P2P) platform, as well as a light-weighted decision-making mechanism based on game theory to support the energy trading. Experimental results based on real data validate the efficiency of proposed framework, as it achieves considerable reduction to the energy cost (on average 87%) as compared to the corresponding cost from the main-grid.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:307.6.3IRRADIANCE-DRIVEN PARTIAL RECONFIGURATION OF PV PANELS
Speaker:
Enrico Macii, Politecnico di Torino, IT
Authors:
Daniele Jahier Pagliari, Sara Vinco, Enrico Macii and Massimo Poncino, Politecnico di Torino, IT
Abstract
Adaptive reconfiguration of a photo-voltaic (PV) panel by means of a switch network is a well-known approach to tackle shading issues dynamically and with a reasonable cost. Most of these approaches assume however that the entire panel is reconfigurable, resulting in high installation costs due to the large wiring overhead required by this solution. In this work we propose an architecture in which only a portion of the panel is reconfigurable, while minimizing the loss in the extracted power with respect to a fully reconfigurable solution. The key feature of our approach is the use of environmental (irradiance and temperature) data to determine the reconfigurable subset at design time. Simulation results show that, by reconfiguring only about 70% of the panel, it is possible to achieve a 20-45% power increase with respect to a static topology, while losing less than 1-5% power with respect to full reconfiguration.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:01IP3-17, 358MACHINE-LEARNING-DRIVEN MATRIX ORDERING FOR POWER GRID ANALYSIS
Speaker:
Wenjian Yu, Tsinghua University, CN
Authors:
Ganqu Cui1, Wenjian Yu1, Xin Li2, Zhiyu Zeng3 and Ben Gu3
1Tsinghua University, CN; 2Duke University, US; 3Cadence Design Systems, Inc., US
Abstract
A machine-learning-driven approach for matrix ordering is proposed for power grid analysis based on domain decomposition. It utilizes support vector machine or artificial neural network to learn a classifier to automatically choose the optimal ordering algorithm, thereby reducing the expense of solving the subdomain equations. Based on the feature selection considering sparse matrix properties, the proposed method achieves superior efficiency in runtime and memory usage over conventional methods, as demonstrated by industrial test cases.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


7.7 Toward Correct and Secure Embedded Systems

Date: Wednesday, March 27, 2019
Time: 14:30 - 16:00
Location / Room: Room 7

Chair:
Todd Austin, University of Michigan, US, Contact Todd Austin

Co-Chair:
Ylies Falcone, University Grenoble Alpes, FR, Contact Ylies Falcone

This session will explore novel techniques for developing embedded systems with strong assurances of correctness and security. The correctness topics explored include verification of execution deadlines and correctness enforcement in the field. The security topics to be explored include efficient behavioral analysis of malware, improved protection of machine learning algorithms from adversarial attacks, and zero-footprint hardware Trojan attacks.

TimeLabelPresentation Title
Authors
14:307.7.1BETTER LATE THAN NEVER VERIFICATION OF EMBEDDED SYSTEMS AFTER DEPLOYMENT
Authors:
Martin Ring1, Fritjof Bornebusch1, Christoph Lüth1, Robert Wille2 and Rolf Drechsler1
1University of Bremen, DE; 2Johannes Kepler University Linz, AT
Abstract
This paper investigates the benefits of verifying embedded systems after deployment. We argue that one reason for the huge state spaces of contemporary embedded and cyber-physical systems is the large variety of operating contexts, which are unknown during design. Once the system is deployed, these contexts become observable, confining several variables. By this, the search space is dramatically reduced, making verification possible even on the limited resources of a deployed system. In this paper, we propose a design and verification flow which exploits this observation. We show how specifications are transferred to the deployed system and verified there. Evaluations on a number of case studies demonstrate the reduction of the search space, and we sketch how the proposed approach can be employed in practice.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:007.7.2EFFICIENT COMPUTATION OF DEADLINE-MISS PROBABILITY AND POTENTIAL PITFALLS
Speaker:
Kuan-Hsun Chen, TU Dortmund, DE
Authors:
Kuan-Hsun Chen, Niklas Ueter, Georg von der Brüggen and Jian-Jia Chen, Technical University of Dortmund, DE
Abstract
In soft real-time systems, applications can tolerate rare deadline misses. Therefore, probabilistic arguments and analyses are applicable in the timing analyses for this class of systems, as demonstrated in many existing researches. Convolution-based analyses allow to derive tight deadline-miss probabilities, but suffer from a high time complexity. Among the analytical approaches, which result in a significantly faster runtime than the convolution-based approaches, the Chernoff bounds provide the tightest results. In this paper, we show that calculating the deadline-miss probability using Chernoff bounds can be solved by considering an equivalent convex optimization problem. This allows us to, on the one hand, decrease the runtime of the Chernoff bounds while, on the other hand, ensure a tighter approximation since a larger variable space can be searched more efficiently, i.e., by using binary search techniques over a larger area instead of a sequential search over a smaller area. We evaluate this approach considering synthesized task sets. Our approach is shown to be computationally efficient for large task systems, whilst experimentally suggesting reasonable approximation quality compared to an exact analysis.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:157.7.3FADEML: UNDERSTANDING THE IMPACT OF PRE-PROCESSING NOISE FILTERING ON ADVERSARIAL MACHINE LEARNING
Speaker:
Faiq Khalid, Department of computer engineering, TU Wein, AT
Authors:
Faiq Khalid1, Muhammad Abdullah Hanif1, Semeen Rehman1, Junaid Qadir2 and Muhammad Shafique1
1Vienna University of Technology (TU Wien), AT; 2Information Technology University, Lahore, PK
Abstract
Deep neural networks (DNN)-based machine learning (ML) algorithms have recently emerged as the leading ML paradigm particularly for the task of classification due to their superior capability of learning efficiently from large datasets. The discovery of a number of well-known attacks such as dataset poisoning, adversarial examples, and network manipulation (through the addition of malicious nodes) has, however, put the spotlight squarely on the lack of security in DNN-based ML systems. In particular, malicious actors can use these well-known attacks to cause random/targeted misclassification, or cause a change in the prediction confidence, by only slightly but systematically manipulating the environmental parameters, inference data, or the data acquisition block. Most of the prior adversarial attacks have, however, not accounted for the pre-processing noise filters commonly integrated with the ML-inference module. Our contribution in this work is to show that this is a major omission since these noise filters can render ineffective the majority of the existing attacks, which rely essentially on introducing adversarial noise. Apart from this, we also extend the state of the art by proposing a novel pre-processing noise Filter-aware Adversarial ML attack called FAdeML. To demonstrate the effectiveness of the proposed methodology, we generate an adversarial attack image by exploiting the "VGGNet" DNN trained for the "German Traffic Sign Recognition Benchmarks (GTSRB)" dataset, which despite having no visual noise, can cause a classifier to misclassify even in the presence of preprocessing noise filters. We will make all the contributions open-source at: http://LinkHiddenForBlindReview to facilitate reproducible research and further R&D.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:307.7.4REAL-TIME ANOMALOUS BRANCH BEHAVIOR INFERENCE WITH A GPU-INSPIRED ENGINE FOR MACHINE LEARNING MODELS
Speaker:
Hyunyoung Oh, Seoul National University, KR
Authors:
Hyunyoung Oh1, Hayoon Yi1, Hyeokjun Choe1, Yeongpil Cho2, Sungroh Yoon1 and Yunheung Paek1
1Seoul National University, KR; 2Soongsil University, KR
Abstract
As the age of IoT approaches, the importance of security for embedded devices is continuously increasing. Since attacks on these devices are likely to occur any time in unexpected manners, the defense systems based on a fixed set of rules will easily be subverted by such unexpected, unknown attacks. Learning-based anomaly detection is a promising technique that may potentially prevent new unknown zero-day attacks by leveraging the capability of machine learning (ML) to learn the intricate true nature of software hidden within raw information. This paper introduces our recent work to develop an MPSoC platform, called RTAD, which can efficiently support in hardware various ML models that run to detect anomalous behaviors in a real-time fashion. In our work, we assume that ML models are trained with runtime branch information since it is widely regarded that a sequence of branches serves as a record of control flow transfers during program execution. In fact, there have been numerous studies that examine various types of branches including system calls and general branches in order to infer (or detect) anomaly in branch behaviors that may be induced by diverse attacks. Our goal of real-time anomalous branch behavior inference poses two challenges to our development of RTAD. One is that RTAD must collect and transfer in a timely fashion a sequence of branches as the input to the ML model. The other is that RTAD must be able to promptly process the delivered branch data with the ML model. To tackle these challenges, we have implemented in RTAD two core hardware components: an input generation module and a GPU-inspired ML processing engine. According to our empirical results, RTAD enables various ML models to infer anomaly instantly after the victim program behaves aberrantly as the result of attacks being injected into the system.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:457.7.5TROJANZERO: SWITCHING ACTIVITY-AWARE DESIGN OF UNDETECTABLE HARDWARE TROJANS WITH ZERO POWER AND AREA FOOTPRINT
Speaker:
Imran Abbasi, NUST, PK
Authors:
Imran Abbasi1, Faiq Khalid2, Semeen Rehman2, Awais Kamboh1, Axel Jantsch2, Siddharth Garg3 and Muhammad Shafique2
1NUST, PK; 2Vienna University of Technology (TU Wien), AT; 3University of Waterloo, CA
Abstract
Conventional Hardware Trojan (HT) detection techniques are based on the validation of integrated circuits to determine changes in their functionality, and on non-invasive side-channel analysis to identify the variations in their physical parameters. In particular, almost all the proposed side-channel power-based detection techniques presume that HTs are detectable because they only add gates to the original circuit with a noticeable increase in power consumption. This paper demonstrates how undetectable HTs can be realized with zero impact on the power and area footprint of the original circuit. Towards this, we propose a novel concept of TrojanZero and a systematic methodology for designing undetectable HTs in the circuits, which conceals their existence by gate-level modifications. The crux is to salvage the cost of the HT from the original circuit without being detected using standard testing techniques. Our methodology leverages the knowledge of transition probabilities of the circuit nodes to identify and safely remove expendable gates, and embeds malicious circuitry at the appropriate locations with zero power and area overheads when compared to the original circuit. We synthesize these designs and then embed in multiple ISCAS85 benchmarks using a 65nm technology library, and perform a comprehensive power and area characterization. Our experimental results demonstrate that the proposed TrojanZero designs are undetectable by the state-of-the-art power-based detection methods.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP3-18, 414ASSERTION-BASED VERIFICATION THROUGH BINARY INSTRUMENTATION
Speaker:
Laurence Pierre, Univ. Grenoble Alpes, FR
Authors:
Enzo Brignon and Laurence Pierre, TIMA Lab (Univ. Grenoble Alpes, CNRS, Grenoble INP), FR
Abstract
Verifying the correctness and the reliability of C or C++ embedded software is a crucial issue. To alleviate this verification process, we advocate runtime assertion-based verification of formal properties. Such logic and temporal properties can be specified using the IEEE standard PSL (Property Specification Language) and automatically translated into software assertion checkers. A major issue is the instrumentation of the embedded program so that those assertion checkers will be triggered upon specific events during execution. This paper presents an automatic instrumentation solution for object files, which enables such an event-driven property evaluation. It also reports experimental results for different kinds of applications and properties.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


7.8 Inspiring futures! Careers Session @ DATE (part 1)

Date: Wednesday, March 27, 2019
Time: 14:30 - 16:00
Location / Room: Exhibition Theatre

Organisers:
Luca Fanucci, University of Pisa, IT, Contact Luca Fanucci
Rossano Massai, University of Pisa, IT, Contact Rossano Massai
Xavier Salazar, Barcelona Supercomputing Center, ES, Contact Xavier Salazar

Moderator:
Sergio Saponara, University of Pisa, IT, Contact Sergio Saponara

This session (registration: link) aims to bring together recruiters - mostly companies large and small, as well as universities and research centres - with potential jobseekers in the technology areas covered by DATE and HiPEAC, including:

  • computer science and engineering undergraduate and master students in their final year
  • early career researchers
  • students attending the PhD forum or at the end of their PhDs

The progamme will be tailored to the needs of the students and researchers. It will include:

  • career insights and mentoring by the HiPEAC officer for recruitment activities and a careers advisor from a local university
  • company pitches
  • time for informal networking

For students, this session is an opportunity to:

  • find out about different career paths within computer science high-end research and engineering
  • get advice on possible ways of progressing their careers
  • learn about the main skills employers look for
  • hear about most interesting vacancies and internship opportunities from companies and research centres
  • take advantage of the best environment to share their CVs with company speakers or discuss opportunities on a one-to-one basis in an informal environment
  • get free access to the rest of the exhibition

For companies, this session provides an excellent opportunity to:

  • get in contact with potential jobseekers specializing in the right areas for their business
  • talk to jobseekers in an informal environment and collect their CVs
  • promote their corporate brand as representing the best workplace, with most stimulating environment and interesting projects
TimeLabelPresentation Title
Authors
14:307.8.1ACADEMIA OR INDUSTRY? - OR EVERYTHING! CAREER AND INTERNSHIP OPPORTUNITIES POWERED BY HIPEAC
Speaker:
Xavier Salazar, Barcelona Supercomputing Center, ES
Abstract

HiPEAC is European Network of Experts on High Performance and Embedded Architecture and Compilation. We organize many activities and that can help to grow your career regardless you follow an academic or industrial career or as an innovator. In our presentation you will learn about our career opportunities, internships, educational opportunities, student competitions, events, conferences and many more.

14:457.8.2HOW TO KICK START YOUR CAREER IN AN EVER-CHANGING WORLD
Speaker:
Antonella Magliocchi, University of Pisa, IT
Abstract

Defining a career path in today's workplace can be a real challenge. During this presentation I will guide you through the key steps of career planning, from learning how to leverage your skills to branding yourself and networking.

15:157.8.3INSPIRING FUTURES @ INFINEON TECHNOLOGIES
Speaker:
Simone Fontanesi, Infineon Technologies, AT
15:307.8.4INSPIRING FUTURES @ CADENCE
Speaker:
Anton Klotz, Cadence Design Systems, DE
15:457.8.5INSPIRING FUTURES @ ESILICON
Speaker:
Fernando De Bernardinis, eSilicon, IT
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


IP3 Interactive Presentations

Date: Wednesday, March 27, 2019
Time: 16:00 - 16:30
Location / Room: Poster Area

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

LabelPresentation Title
Authors
IP3-1NON-INTRUSIVE SELF-TEST LIBRARY FOR AUTOMOTIVE CRITICAL APPLICATIONS: CONSTRAINTS AND SOLUTIONS
Speaker:
Davide Piumatti, Politecnico di Torino, IT
Authors:
Paolo Bernardi1, Riccardo Cantoro1, Andrea Floridia1, Davide Piumatti1, Cozmin Pogonea1, Annachiara Ruospo1, Ernesto Sanchez1, Sergio De Luca2 and Alessandro Sansonetti2
1Politecnico di Torino, IT; 2STMicroelectronics, IT
Abstract
Today, safety-critical applications require self-tests and self-diagnosis approaches to be applied during the lifetime of the device. In general, the fault coverage values required by the standards (like ISO 26262) in the whole System-on-Chip (SoC) are very high. Therefore, different strategies are adopted. In the case of the processor core, the required fault coverage can be achieved by scheduling the periodical execution of a set of test programs or Software-Test Library (STL). However, the STL for in-field testing should be able to comply with the operating system specifications without affecting the mission operation of the device application. In this paper, the most relevant problems for the development of the STL are first discussed. Then, it presents a set of strategies and solutions oriented to produce an efficient and non-intrusive STL to be used exclusively during the in-field testing of automotive processor cores. The proposed approach was experimented on an automotive SoC developed by STMicroelectronics.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-2DEPENDENCY-RESOLVING INTRA-UNIT PIPELINE ARCHITECTURE FOR HIGH-THROUGHPUT MULTIPLIERS
Speaker:
Dae Hyun Kim, Washington State University, US
Authors:
Jihee Seo and Dae Hyun Kim, Washington State University, US
Abstract
In this paper, we propose two dependency-resolving intra-unit pipeline architectures to design high-throughput multipliers. Simulation results show that the proposed multipliers achieve approximately 2.3× to 3.1× execution time reduction at a cost of 4.4% area and 3.7% power overheads for highly-dependent multiplications.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-3A HARDWARE-EFFICIENT LOGARITHMIC MULTIPLIER WITH IMPROVED ACCURACY
Authors:
Mohammad Saeed Ansari, Bruce Cockburn and Jie Han, University of Alberta, CA
Abstract
Logarithmic multipliers take the base-2 logarithm of the operands and perform multiplication by only using shift and addition operations. Since computing the logarithm is often an approximate process, some accuracy loss is inevitable in such designs. However, the area, latency, and power consumption can be significantly improved at the cost of accuracy loss. This paper presents a novel method to approximate log_2N that, unlike the existing approaches, rounds N to its nearest power of two instead of the highest power of two smaller than or equal to N. This approximation technique is then used to design two improved 16x16 logarithmic multipliers that use exact and approximate adders (ILM-EA and ILM-AA, respectively). These multipliers achieve up to 24.42% and 9.82% savings in area and power-delay product, respectively, compared to the state-of-the-art design in the literature with similar accuracy. The proposed designs are evaluated in the Joint Photographic Experts Group (JPEG) image compression algorithm and their advantages over other approximate logarithmic multipliers are shown.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-4LIGHTWEIGHT HARDWARE SUPPORT FOR SELECTIVE COHERENCE IN HETEROGENEOUS MANYCORE ACCELERATORS
Speaker:
Alessandro Cilardo, CeRICT, IT
Authors:
Alessandro Cilardo, Mirko Gagliardi and Vincenzo Scotti, University of Naples Federico II, IT
Abstract
Shared memory coherence is a key feature in manycore accelerators, ensuring programmability and application portability. Most established solutions for coherence in homogeneous systems cannot be simply reused because of the special requirements of accelerator architectures. This paper introduces a low-overhead hardware coherence system for heterogeneous accelerators, with customizable granularity and noncoherent region support. The coherence system has been demonstrated in operation in a full manycore accelerator, exhibiting significant improvements in terms of network load, execution time, and power consumption.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-5FUNCTIONAL ANALYSIS ATTACKS ON LOGIC LOCKING
Speaker:
Pramod Subramanyan, Indian Institute of Technology Kanpur, IN
Authors:
Deepak Sirone and Pramod Subramanyan, Indian Institute of Technology Kanpur, IN
Abstract
This paper proposes Functional Analysis attacks on state of the art Logic Locking algorithms (FALL attacks). FALL attacks use structural and functional analyses of locked circuits to identify the locking key. In contrast to past work, FALL attacks can often (90% of successful attempts in our experiments) fully defeat locking by only analyzing the locked netlist, without oracle access to an activated circuit. Experiments show that FALL attacks succeed against 65 out of 80 (81%) of circuits locked using Secure Function Logic Locking (SFLL), the only combinational logic locking algorithm resilient to known attacks.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-6SIGATTACK: NEW HIGH-LEVEL SAT-BASED ATTACK ON LOGIC ENCRYPTIONS
Speaker:
Hai Zhou, Northwestern University, US
Authors:
Yuanqi Shen1, You Li1, Shuyu Kong2, Amin Rezaei1 and Hai Zhou1
1Northwestern University, US; 2northwestern university, CN
Abstract
Logic encryption is a powerful hardware protection technique that uses extra key inputs to lock a circuit from piracy or unauthorized use. The recent discovery of the SAT-based attack with Distinguishing Input Pattern (DIP) generation has rendered all traditional logic encryptions vulnerable, and thus the creation of new encryption methods. However, a critical question for any new encryption method is whether security against the DIP-generation attack means security against all other attacks. In this paper, a new high-level SAT-based attack called SigAttack has been discovered and thoroughly investigated. It is based on extracting a key-revealing signature in the encryption. A majority of all known SAT-resilient encryptions are shown to be vulnerable to SigAttack. By formulating the condition under which SigAttack is effective, the paper also provides guidance for the future logic encryption design.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-7ZEROPOWERTOUCH: ZERO-POWER SMART RECEIVER FOR TOUCH COMMUNICATION AND SENSING FOR INTERNET OF THING AND WEARABLE APPLICATIONS
Speaker:
Michele Magno, ETH Zurich, CH
Authors:
Philipp Mayer, Raphael Strebel and Michele Magno, ETH Zurich, CH
Abstract
The human body can be used as a transmission medium for electric fields. By applying an electric field with a frequency of decades of megahertz to isolated electrodes on the human body, it is possible to send energy and data. Extra body and intra-body communication is an interesting alternative way to communicate in a wireless manner in the new era of wearable device and internet of things. In fact, this promising communication works without the need to design a dedicate radio hardware and with a lower power consumption. We designed and implemented a novel zero-power receiver targeting intra-body and extra-body wireless communication and touch sensing. To achieve zero-power and always-on working, we combined ultra-low power design and an energy-harvesting subsystem, which extracts energy directly from the received message. This energy is then employed to supply the whole receiver to demodulate the message and to perform data processing with a digital logic. The main goal of the proposed design is ideal to wake up external logic only when a specific address is received. Moreover, due to the presence of the digital logic, the designed zero-power receiver can implement identification and security algorithms. The zero-power receiver can be used either as an always-on touch sensor to be deployed in the field or as a body communication wake up smart and secure devices. A working prototype demonstrates the zero-power working, the communication intra-body, and extra-body, and the possibility to achieve more than 1.75m in intra-body without the use of any external battery.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-8TAILORING SVM INFERENCE FOR RESOURCE-EFFICIENT ECG-BASED EPILEPSY MONITORS
Speaker:
Lorenzo Ferretti, Università della Svizzera italiana, CH
Authors:
Lorenzo Ferretti1, Giovanni Ansaloni1, Laura Pozzi1, Amir Aminifar2, David Atienza2, Leila Cammoun3 and Philippe Ryvlin3
1USI Lugano, CH; 2EPFL, CH; 3Centre Hospitalier Universitaire Vaudois, CH
Abstract
Event detection and classification algorithms are resilient towards aggressive resource-aware optimisations. In this paper, we leverage this characteristic in the context of smart health monitoring systems. In more detail, we study the attainable benefits resulting from tailoring Support Vector Machine (SVM) inference engines devoted to the detection of epileptic seizures from ECG-derived features. We conceive and explore multiple optimisations, each effectively reducing resource budgets while minimally impacting classification performance. These strategies can be seamlessly combined, which results in 12.5X and 16X gains in energy and area, respectively, with a negligible loss, 3.2% in classification performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-9AN INDOOR LOCALIZATION SYSTEM TO DETECT AREAS CAUSING THE FREEZING OF GAIT IN PARKINSONIANS
Speaker:
Graziano Pravadelli, Dept. of Computer Science, Univ. of Verona, IT
Authors:
Florenc Demrozi1, Vladislav Bragoi1, Federico Tramarin2 and Graziano Pravadelli1
1Department of Computer Science, University of Verona, IT; 2Department of Information Engineering, University of Padua, IT
Abstract
People affected by the Parkinson's disease are often subject to episodes of Freezing of Gait (FoG) near specific areas within their environment. In order to prevent such episodes, this paper presents a low-cost indoor localization system specifically designed to identify these critical areas. The final aim is to exploit the output of this system within a wearable device, to generate a rhythmic stimuli able to prevent the FoG when the person enters a risky area. The proposed localization system is based on a classification engine, which uses a fingerprinting phase for the initial training. It is then dynamically adjusted by exploiting a probabilistic graph model of the environment.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-10ASSEMBLY-RELATED CHIP/PACKAGE CO-DESIGN OF HETEROGENEOUS SYSTEMS MANUFACTURED BY MICRO-TRANSFER PRINTING
Speaker:
Tilman Horst, Technische Universität Dresden, DE
Authors:
Robert Fischbach, Tilman Horst and Jens Lienig, Technische Universität Dresden, DE
Abstract
Technologies for heterogeneous integration have been promoted as an option to drive innovation in the semiconductor industry. However, adoption by designers is lagging behind and market shares are still low. Alongside the lack of appropriate design tools, high manufacturing costs are one of the main reasons. µTP is a novel and promising micro-assembly technology that enables the heterogeneous integration of dies originating from different wafers. This technology uses an elastomer stamp to transfer dies in parallel from source wafers to their target positions, indicating a high potential for reducing manufacturing time and cost. In order to achieve the latter, the geometrical interdependencies between source, target and stamp and the resulting wafer utilization must be considered during design. We propose an approach to evaluate a given µTP design with regard to the manufacturing costs. We achieve this by developing a model that integrates characteristics of the assembly process into the cost function of the design. Our approach can be used as a template how to tackle other assembly-related co-design issues -- addressing an increasingly severe cost optimization problem of heterogeneous systems design.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-11VISUAL INERTIAL ODOMETRY AT THE EDGE - A HARDWARE-SOFTWARE CO-DESIGN APPROACH FOR ULTRA-LOW LATENCY AND POWER
Speaker:
Dipan Kumar Mandal, Intel Corporation, IN
Authors:
Dipan Kumar Mandal1, Srivatsava Jandhyala1, Om J Omer1, Gurpreet S Kalsi1, Biji George1, Gopi Neela1, Santhosh Kumar Rethinagiri1, Sreenivas Subramoney1, Hong Wong2, Lance Hacking2 and Belliappa Kuttanna2
1Intel Corporation, IN; 2Intel Corporation, US
Abstract
Visual Inertial Odometry (VIO) is used for estimating pose and trajectory of a system and is a foundational requirement in many emerging applications like AR/VR, autonomous navigation in cars, drones and robots. In this paper, we analyze key compute bottlenecks in VIO and present a highly optimized VIO accelerator based on a hardware-software co-design approach. We detail a set of novel micro-architectural techniques that optimize compute, data movement, bandwidth and dynamic power to make it possible to deliver high quality of VIO at ultra-low latency and power required for budget constrained edge devices. By offloading the computation of the critical linear algebra algorithms from the CPU, the accelerator enables high sample rate IMU usage in VIO processing while acceleration of image processing pipe increases precision, robustness and reduces IMU induced drift in final pose estimate. The proposed accelerator requires a small silicon footprint (1.3 mm2 in a 28nm process at 600 MHz), utilizes a modest on-chip shared SRAM (560KB) and achieves 10x speedup over a software-only implementation in terms of image sample-based pose update latency while consuming just 2.2 mW power. In a FPGA implementation, using the EuRoC VIO dataset (VGA 30fps images and 100Hz IMU) the accelerator design achieves pose estimation accuracy (loop closure error) comparable to a software based VIO implementation.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-12CAPSACC: AN EFFICIENT HARDWARE ACCELERATOR FOR CAPSULENETS WITH DATA REUSE
Speaker:
Alberto Marchisio, Vienna University of Technology (TU Wien), AT
Authors:
Alberto Marchisio, Muhammad Abdullah Hanif and Muhammad Shafique, Vienna University of Technology (TU Wien), AT
Abstract
Recently, CapsuleNets have overtaken traditional Deep Convolutional Neural Networks (CNNs), because of their improved generalization ability due to the multi-dimensional capsules, in contrast to the single-dimensional neurons. Consequently, CapsuleNets also require extremely intense matrix computations, making it a gigantic challenge to achieve high performance. In this paper, we propose CapsAcc, the first specialized CMOS-based hardware architecture to perform CapsuleNets inference with high performance and energy efficiency. State-of-the-art convolutional CNN accelerators would not work efficiently for CapsuleNets, as their designs do not account for unique processing nature of CapsuleNets involving multi-dimensional matrix processing, squashing and dynamic routing. Our architecture exploits the massive parallelism by flexibly feeding the data to a specialized systolic array according to the operations required in different layers. It also avoids extensive load and store operations on the on-chip memory, by reusing the data when possible. We synthesized the complete CapsAcc architecture in a 32nm CMOS technology using Synopsys design tools, and evaluated it for the MNIST benchmark (as also done by the original CapsuleNet paper) to ensure consistent and fair comparisons. This work enables highly-efficient CapsuleNets inference on embedded platforms.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-13SDCNN: AN EFFICIENT SPARSE DECONVOLUTIONAL NEURAL NETWORK ACCELERATOR ON FPGA
Speaker:
Suk-Ju Kang, Sogang University, KR
Authors:
Jung-Woo Chang, Keon-Woo Kang and Suk-Ju Kang, Sogang University, KR
Abstract
Generative adversarial networks (GANs) have shown excellent performance in image generation applications. GAN typically uses a new type of neural network called deconvolutional neural network (DCNN). To implement DCNN in hardware, the state-of-the-art DCNN accelerator optimizes the dataflow using DCNN-to-CNN conversion method. However, this method still requires high computational complexity because the number of feature maps is increased when converted from DCNN to CNN. Recently, pruning has been recognized as an effective solution to reduce the high computational complexity and huge network model size. In this paper, we propose a novel sparse DCNN accelerator (SDCNN) combining these approaches on FPGA. First, we propose a novel dataflow suitable for the sparse DCNN acceleration by loop transformation. Then, we introduce a four stage pipeline for generating the SDCNN model. Finally, we propose an efficient architecture based on SDCNN dataflow. Experimental results on DCGAN show that SDCNN achieves up to 2.63 times speedup over the state-of-the-art DCNN accelerator.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-14A FINE-GRAINED SOFT ERROR RESILIENT ARCHITECTURE UNDER POWER CONSIDERATIONS
Speaker:
Sajjad Hussain, Chair for Embedded Systems, KIT, Karlsruhe, DE
Authors:
Sajjad Hussain1, Muhammad Shafique2 and Joerg Henkel1
1Karlsruhe Institute of Technology, DE; 2Vienna University of Technology (TU Wien), AT
Abstract
Besides the limited power budgets and the dark-silicon issue, soft error is one of the most critical reliability issues in computing systems fabricated using nano-scale devices. During the execution, different applications have varying performance, power/energy consumption and vulnerability properties. Different trade-offs can be devised to provide required resiliency within the allowed power constraints. To exploit this behavior, we propose a novel soft error resilient architecture and the corresponding run-time system that enables power-aware fine-grained resiliency for different processor components. It selectively determines the reliability state of various components, such that the overall application reliability is improved under a given power budget. Our architecture saves power up to 16% and reliability degradation up to 11% compared to state-of-the-art techniques.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-15FINE-GRAINED HARDWARE MITIGATION FOR MULTIPLE LONG-DURATION TRANSIENTS ON VLIW FUNCTION UNITS
Speaker:
Angeliki Kritikakou, University of Rennes 1 - IRISA/INRIA, FR
Authors:
Rafail Psiakis1, Angeliki Kritikakou1 and Olivier Sentieys2
1Univ Rennes/IRISA/INRIA, FR; 2INRIA, FR
Abstract
Technology scaling makes hardware more susceptible to radiation, which can cause multiple transient faults with long duration. In these cases, the affected function unit is usually considered as faulty and is not further used. To reduce this performance degradation, the proposed hardware mechanism detects the faults that are still active during execution and re-schedules the instructions to use the fault-free components of the affected function units. The results show multiple long-duration fault mitigation with low performance, area, and power overhead

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-16ADAPTIVE WORD REORDERING FOR LOW-POWER INTER-CHIP COMMUNICATION
Speaker:
Eleni Maragkoudaki, University of Manchester, GB
Authors:
Eleni Maragkoudaki1, Przemyslaw Mroszczyk2 and Vasilis Pavlidis3
1University of Manchester, GB; 2Qualcomm, IE; 3The University of Manchester, GB
Abstract
The energy for data transfer has an increasing effect on the total system energy as technology scales, often overtaking computation energy. To reduce the power of inter-chip interconnects, an adaptive encoding scheme called Adaptive Word Reordering (AWR) is proposed that effectively decreases the number of signal transitions, leading to a significant power reduction. AWR outperforms other adaptive encoding schemes in terms of decrease in transitions, yielding up to 73% reduction in switching. Furthermore, complex bit transition computations are represented as delays in the time domain to limit the power overhead due to encoding. The saved power outweighs the overhead beyond a moderate wire length where the I/O voltage is assumed equal to the core voltage. For a typical I/O voltage, the decrease in power is significant reaching 23% at just 1 mm.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-17MACHINE-LEARNING-DRIVEN MATRIX ORDERING FOR POWER GRID ANALYSIS
Speaker:
Wenjian Yu, Tsinghua University, CN
Authors:
Ganqu Cui1, Wenjian Yu1, Xin Li2, Zhiyu Zeng3 and Ben Gu3
1Tsinghua University, CN; 2Duke University, US; 3Cadence Design Systems, Inc., US
Abstract
A machine-learning-driven approach for matrix ordering is proposed for power grid analysis based on domain decomposition. It utilizes support vector machine or artificial neural network to learn a classifier to automatically choose the optimal ordering algorithm, thereby reducing the expense of solving the subdomain equations. Based on the feature selection considering sparse matrix properties, the proposed method achieves superior efficiency in runtime and memory usage over conventional methods, as demonstrated by industrial test cases.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-18ASSERTION-BASED VERIFICATION THROUGH BINARY INSTRUMENTATION
Speaker:
Laurence Pierre, Univ. Grenoble Alpes, FR
Authors:
Enzo Brignon and Laurence Pierre, TIMA Lab (Univ. Grenoble Alpes, CNRS, Grenoble INP), FR
Abstract
Verifying the correctness and the reliability of C or C++ embedded software is a crucial issue. To alleviate this verification process, we advocate runtime assertion-based verification of formal properties. Such logic and temporal properties can be specified using the IEEE standard PSL (Property Specification Language) and automatically translated into software assertion checkers. A major issue is the instrumentation of the embedded program so that those assertion checkers will be triggered upon specific events during execution. This paper presents an automatic instrumentation solution for object files, which enables such an event-driven property evaluation. It also reports experimental results for different kinds of applications and properties.

Download Paper (PDF; Only available from the DATE venue WiFi)

8.1 Special Day on "Embedded Meets Hyperscale and HPC" Panel: What can HPC and hyperscale learn from embedded computing

Date: Wednesday, March 27, 2019
Time: 17:00 - 18:30
Location / Room: Room 1

Moderators:
Christoph Hagleitner, IBM Research Zurich, CH, Contact Christoph Hagleitner
Christian Plessl, Paderborn University, DE, Contact Christian Plessl

Despite their very different origins, HPC/datacenter technologies and applications face similar challenges than embedded computing. For example, embedded systems have used heterogeneous architectures with specialized co-processors for a very long time due to strict realtime or efficiency constraints. Hence, the EDA community has extensively studied models, algorithms and tools for application analysis, optimization and operation. In contrast, HPC and datacenters applications are designed to harvest the performance of networked, massively parallel but homogeneous computing resources. In this panel, our experts will debate with the audience, what the datacenter and embedded communities can learn from each other.

Panelists:

  • Peter Messmer, NVidia, US
  • Luca Benini, Università di Bologna, IT
  • Boris Grot, University of Edinburgh, GB
  • Jan van Lunteren, IBM Research Zurich, CH
  • Jeffrey S Vetter, Oak Ridge National Laboratory, US
  • Jesus Labarta, Barcelona Supercomputing Center, ES
  • João M. P. Cardoso, University of Porto/FEUP, PT
  • Babak Falsafi, EPFL, CH
18:30End of session

8.2 Special Session: Innovative methods for verifying Systems-on-Chip: digital, mixed-signal, security and software

Date: Wednesday, March 27, 2019
Time: 17:00 - 18:30
Location / Room: Room 2

Organisers:
Subhasish Mitra, Stanford University, US, Contact Subhasish Mitra
Georges Gielen, KU Leuven, BE, Contact Georges Gielen

Chair:
Ulf Schlichtmann, TUM, DE, Contact Ulf Schlichtmann

Co-Chair:
Giovanni De Micheli, EPFL, CH, Contact Giovanni De Micheli

Modern-day integrated circuits can contain several billions of transistors, multiple cores and memories, several analog and mixed-signal blocks, etc. While designing such chips is a huge effort, their verification is a true nightmare. Traditional techniques require extremely long computation times and may fail to capture all bugs in the system. These problems are exacerbated by even more difficult challenges: hardware security (e.g. the recent Spectre/Meltdown attacks stemming from hardware designs), system safety (e.g. for automotive applications), and software complexity (firmware and software form a significant component of complex Systems-on-Chip). This special session focuses on novel approaches from industry and academia to overcome these seemingly insurmountable outstanding challenges Most importantly, this session will discuss not only design bugs but also the new challenges (stated above) that design verification must address - major directions for the DATE research community.

TimeLabelPresentation Title
Authors
17:008.2.1HARDWARE AND FIRMWARE VERIFICATION AND VALIDATION: AN ALGORITHM-TO-FIRMWARE DEVELOPMENT METHODOLOGY
Speaker:
Henry Cox, MediaTek, US
Authors:
Henry Cox and Harry Chen, MediaTek, US
Abstract
System-level verification of a modem product involves ensuring both the hardware and firmware work correctly and that the product meets signal performance requirements at low cost. In many ways, the firmware problem is harder - or at least more open-ended. We discuss a development methodology based on automatic generation and reusable components that has been used to implement several generations of software-defined radio (SDR) modem SOCs. Automation both ensures consistency between models and tools and enables fast turnaround when something changes.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.2.2PROCESSOR HARDWARE SECURITY VULNERABILITIES AND THEIR DETECTION BY UNIQUE PROGRAM EXECUTION CHECKING
Speaker:
Wolfgang Kunz, University of Kaiserslautern, DE
Authors:
Mohammad Rahmani Fadiheh1, Dominik Stoffel1, Clark Barrett2, Subhasish Mitra2 and Wolfgang Kunz1
1University of Kaiserslautern, DE; 2Stanford University, US
Abstract
Recent discovery of security attacks in advanced processors, known as Spectre and Meltdown, has resulted in high public alertness about security of hardware. The root cause of these attacks is information leakage across covert channels that reveal secret data without any explicit information flow between the secret and the attacker. Many sources believe that such covert channels are intrinsic to highly advanced processor architectures based on speculation and out-of-order execution, suggesting that such security risks can be avoided by staying away from high- end processors. This paper, however, shows that the problem is of wider scope: we present new classes of covert channel attacks which are possible in average-complexity processors with in-order pipelining, as they are mainstream in applications ranging from Internet-of-Things to Autonomous Systems. We present a new approach as a foundation for remedy against covert channels: while all previous attacks were found by clever thinking of human attackers, this paper presents a formal method called Unique Program Execution Checking which detects and locates vulnerabilities to covert channels systematically, including those to covert channels unknown so far.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.2.3SYMBOLIC QED PRE-SILICON VERIFICATION FOR AUTOMOTIVE MICROCONTROLLER CORES: INDUSTRIAL CASE STUDY
Speaker:
Subhasish Mitra, Stanford University, US
Authors:
Eshan Singh1, Keerthikumara Devarajegowda2, Sebastian Simon3, Ralf Schnieder3, Karthik Ganesan1, Mohammad Fadiheh4, Dominik Stoffel4, Wolfgang Kunz4, Clark Barrett1, Wolfgang Ecker5 and Subhasish Mitra1
1Stanford University, US; 2Infineon Technologies AG/Technische Universität Kaiserslautern, DE; 3Infineon Technologies, DE; 4Technische Universität Kaiserslautern, DE; 5Infineon Technologies AG/Technische Universität München, DE
Abstract
We present an industrial case study that demonstrates the practicality and effectiveness of Symbolic Quick Error Detection (Symbolic QED) in detecting logic design flaws (logic bugs) during pre-silicon verification. Our study focuses on several microcontroller core designs (~1,800 flip-flops, ~70,000 logic gates) that have been extensively verified using an industrial verification flow and used for various commercial automotive products. The results of our study are as follows: 1. Symbolic QED detected all logic bugs in the designs that were detected by the industrial verification flow (which includes various flavors of simulation-based verification and formal verification). 2. Symbolic QED detected additional logic bugs that were not recorded as detected by the industrial verification flow. (These additional bugs were also perhaps detected by the industrial verification flow.) 3. Symbolic QED enables significant design productivity improvements: (a) 8X improved (i.e., reduced) verification effort for a new design (8 person-weeks for Symbolic QED vs. 17 person-months using the industrial verification flow). (b) 60X improved verification effort for subsequent designs (2 person-days for Symbolic QED vs. 4-7 person-months using the industrial verification flow). (c) Quick bug detection (runtime of 20 seconds or less), together with short counterexamples (10 or fewer instructions) for quick debug, using Symbolic QED.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:158.2.4REVIEW OF METHODOLOGIES FOR PRE- AND POST-SILICON ANALOG VERIFICATION IN MIXED-SIGNAL SOCS
Speaker:
Georges Gielen, KU Leuven, BE
Authors:
Georges Gielen1, Nektar Xama1, Karthik Ganesan2 and Subhasish Mitra2
1KU Leuven, BE; 2Stanford University, US
Abstract
The integration of increasingly more complex and heterogeneous SOCs results in ever more complicated demands for the verification of the system and its underlying subsystems. Pre-silicon design validation as well as post-silicon test generation of the analog and mixed-signal (AMS) subsystems within SOCs proves extremely challenging as these subsystems do not share the formal description potential of their digital counterparts. Several methods have been developed to cope with this lack of formalization during AMS pre-silicon validation, including model checkers, affine arithmetic formalisms and equivalence checkers. However, contrary to the industrial practice for digital circuits of using formal verification and ATPG tools, common industry practice for analog circuits still largely defaults to simulation-based validation and test generation. A new formal digital-inspired technique, called AMS-QED, can potentially solve these issues in analog and mixed-signal verification.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

8.3 Test Preparation and Generation

Date: Wednesday, March 27, 2019
Time: 17:00 - 18:30
Location / Room: Room 3

Chair:
Matteo Sonza Reorda, Politecnico di Torino, IT, Contact

Co-Chair:
Grzegorz Mrugalski, Mentor, A Siemens Business, PL, Contact Grzegorz Mrugalski

Deep Neural Networks and Approximate Circuits are of increasing importance in many applications. They pose completely new challenges with respect to test generation. Promising approaches to face these challenges are presented by papers 1 and 4. Reconfigurable Scan Networks allow flexible access to embedded instruments for post-silicon test, validation and debug or diagnosis. On the other hand this creates security issues that have to be taken into account. Paper 2 provides an approach to guarantee secure data flow. Resynthesis for improving testability is the topic of paper 3.

TimeLabelPresentation Title
Authors
17:008.3.1ON FUNCTIONAL TEST GENERATION FOR DEEP NEURAL NETWORK IPS
Speaker:
BO LUO, The Chinese University of Hong Kong, HK
Authors:
Bo Luo, Yu Li, Lingxiao Wei and Qiang Xu, The Chinese University of Hong Kong, HK
Abstract
Machine learning systems based on deep neural networks (DNNs) produce state-of-the-art results in many applications. Considering the large amount of training data and know-how required to generate the network, it is more practical to use third-party DNN intellectual property (IP) cores for many designs. No doubt to say, it is essential for DNN IP vendors to provide test cases for functional validation without leaking their parameters. In this paper, we tackle the above problem by judiciously selecting test cases from DNN training samples and applying data augmentation for effective test generation. Experimental results demonstrate the efficacy of our proposed solution.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.3.2ON SECURE DATA FLOW IN RECONFIGURABLE SCAN NETWORKS
Speaker:
Pascal Raiola, University of Freiburg, DE
Authors:
Pascal Raiola1, Benjamin Thiemann1, Jan Burchard2, Ahmed Atteya3, Natalia Lylina3, Hans-Joachim Wunderlich3, Bernd Becker1 and Matthias Sauer1
1University of Freiburg, DE; 2Mentor, a Siemens Business, DE; 3University of Stuttgart, DE
Abstract
Reconfigurable Scan Networks (RSNs) allow flexible access to embedded instruments for post-silicon test, validation and debug or diagnosis. The increased observability and controllability of registers inside the circuit can be exploited by an attacker to leak or corrupt critical information. Precluding such security threats is of high importance but difficult due to complex data flow dependencies inside the reconfigurable scan network as well as across the underlying circuit logic. This work proposes a method that fine-granularly computes dependencies over circuit logic and the RSN. These dependencies are utilized to detect security violations for a given insecure RSN, which is then transformed into a secure RSN. Experimental results demonstrate the applicability of the method to large academical and industrial designs. Additionally, we report on the required effort to mitigate found security violations which also motivates the necessity to consider the circuit logic in addition to pure scan paths.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.3.3RESYNTHESIS FOR AVOIDING UNDETECTABLE FAULTS BASED ON DESIGN-FOR-MANUFACTURABILITY GUIDELINES
Speaker:
Naixing Wang, Purdue University, US
Authors:
Naixing Wang1, Irith Pomeranz1, Sudhakar Reddy2, Arani Sinha3 and Srikanth Venkataraman3
1Purdue University, US; 2University of Iowa, US; 3Intel, US
Abstract
As integrated circuit manufacturing advances, the occurrence of systematic defects is expected to be prominent. A methodology for predicting potential systematic defects based on design-for-manufacturability (DFM) guidelines was described earlier. In this paper we first report that, among the faults obtained based on DFM guidelines, there are undetectable faults, and these faults cluster in certain areas of the circuit. Because faults may not perfectly represent potential defect behaviors, defects may be detectable even though the faults that model them are undetectable. Clusters of undetectable faults thus leave areas in the circuit uncovered for potential systematic defects. As the potential defects are systematic, the test escapes can impact the DPPM significantly, and thus lead to circuit malfunction and/or reliability problems after deployment. To address this issue in the context of cell-based design, we propose a logic resynthesis procedure followed by physical design to eliminate large clusters of undetectable faults related to DFM guidelines. The resynthesized circuit maintains design constraints of critical path delay, power consumption and die area. The resynthesis procedure is applied to benchmark circuits and logic blocks of the OpenSPARC T1 microprocessor. Experimental results indicate that both the reduction in the numbers of undetectable faults and the reduction in the sizes of undetectable fault clusters are significant.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:158.3.4TEST PATTERN GENERATION FOR APPROXIMATE CIRCUITS BASED ON BOOLEAN SATISFIABILITY
Speaker:
Anteneh Gebregiorgis, Karlsruhe Institute of Technology, DE
Authors:
Anteneh Gebregiorgis and Mehdi B. Tahoori, Karlsruhe Institute of Technology, DE
Abstract
Approximate computing has gained growing attention as it provides trade-off between output quality and computation effort for inherent error tolerant applications such as recognition, mining, and media processing applications. As a result, several approximate hardware designs have been proposed in order to harness the benefits of approximate computing. While these circuits are subjected to manufacturing defects and runtime failures, the testing methods should be aware of their approximate nature. In this paper, we propose an automatic test pattern generation methodology for approximate circuit based on boolean satisfiability, which is aware of output quality and approximable vs non-approximable faults. This allows us to significantly reduce the number of faults to be tested, and test time accordingly, without sacrificing the output quality or test coverage. Experimental results show that, the proposed approach can reduce the fault list by 2.85 on average while maintaining high fault coverage.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

8.4 Applications of Reconfigurable Computing

Date: Wednesday, March 27, 2019
Time: 17:00 - 18:30
Location / Room: Room 4

Chair:
Suhaib Fahmy, University of Warwick, GB, Contact Suhaib A. Fahmy

Co-Chair:
Marco Platzner, Paderborn University, DE, Contact Marco Platzner

This session presents three papers that advance the state of the art in FPGA-based applications for autonomous driving, circuit analysis, and time-series data processing, and one interactive presentation on mapping deep neural networks to multi-FPGA platforms.

TimeLabelPresentation Title
Authors
17:008.4.1ADAPTIVE VEHICLE DETECTION FOR REAL-TIME AUTONOMOUS DRIVING SYSTEM
Speaker:
Maryam Hemmati, The University of Auckland, NZ
Authors:
Maryam Hemmati1, Morteza Biglari-Abhari1 and Smail Niar2
1University of Auckland, NZ; 2University of Valenciennes and Hainaut-Cambresis, FR
Abstract
Modern cars are being equipped with powerful computational resources for autonomous driving systems (ADS) as one of their major parts to provide safer travels on roads. High accuracy and real-time requirements of ADS are addressed by HW/SW co-design methodology which helps in offloading the computationally intensive tasks to the hardware part. However, the limited hardware resources could be a limiting factor in complicated systems. This paper presents a dynamically reconfigurable system for ADS which is capable of real-time vehicle and pedestrian detection. Our approach employs different methods of vehicle detection in different lighting conditions to achieve better results. A novel deep learning method is presented for detection of vehicles in the dark condition where the road light is very limited or unavailable. We present a partial reconfiguration (PR) controller which accelerates the reconfiguration process on Zynq SoC for seamless detection in real-time applications. By partially reconfiguring the vehicle detection block on Zynq SoC, resource requirements is maintained low enough to allow for the existence of other functionalities of ADS on hardware which could complete their tasks without any interruption. Our presented system is capable of detecting pedestrian and vehicles in different lighting conditions at the rate of 50fps (frames per second) for HDTV (1080x1920) frame.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.4.2AN EFFICIENT FPGA-BASED FLOATING RANDOM WALK SOLVER FOR CAPACITANCE EXTRACTION USING SDACCEL
Speaker:
Xin Wei, Fudan University, CN
Authors:
Xin Wei1, Changhao Yan1, Hai Zhou2, Dian Zhou1 and Xuan Zeng1
1Fudan University, CN; 2Northwestern Univerity, US
Abstract
The floating random walk (FRW) algorithm is an important method widely used in the capacitance extraction of very large-scale integration (VLSI) interconnects. FRW could be both time-consuming and power-consuming as the circuit scale grows. However, its highly parallel nature prompts us to accelerate it with FPGAs, which have shown great performance and energy efficiency potential to other computing architectures. In this paper, we propose a scalable FPGA/CPU heterogeneous framework of FRW using SDAccel. Large-scale circuits are partitioned first by the CPU into several segments, and these segments are then sent to the FPGA random walking one by one. The framework solves the challenge of limited FPGA on-chip resource and integrates both merits of FPGAs and CPUs by targeting separate parts of the algorithm to suitable architecture, and the FPGA bitstream is built once for all. Several kernel optimization strategies are used to maximize performance of FPGAs. Besides, the FRW algorithm we use is the naive version with walking on spheres (WOS), which is much simpler and easier to implement than the complicatedly optimized version with walking on cubes (WOC). The implementation on AWS EC2 F1 (Xilinx VU9P FPGA) shows up to 6.1x performance and 42.6x energy efficiency over a quad-core CPU, and 5.2x energy efficiency over the state-of-the-art WOC implementation on an 8-core CPU.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.4.3ACCELERATING ITEMSET SAMPLING USING SATISFIABILITY CONSTRAINTS ON FPGA
Speaker:
Mael Gueguen, Univ Rennes, Inria, CNRS, IRISA, FR
Authors:
Mael Gueguen1, Olivier Sentieys2 and Alexandre Termier1
1Univ Rennes, CNRS, IRISA, FR; 2INRIA, FR
Abstract
Finding recurrent patterns within a data stream is important for fields as diverse as cybersecurity or e-commerce. This requires to use pattern mining techniques. However, pattern mining suffers from two issues. The first one, known as ``pattern explosion'', comes from the large combinatorial space explored, and is the output of too many results for them to be useful. Recent techniques called output space sampling solve this problem by outputing only a sampled set of all the results, with a target size provided by the user. The second issue is that most algorithms are designed to operate on static datasets or low throughput streams. In this paper, we propose a contribution to tackle both issues, by designing an FPGA accelerator for pattern mining with output space sampling and we show that our accelerator can outperform a state of the art implementation on a server class CPU using modest a FPGA product.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP4-1, 492AN EFFICIENT MAPPING APPROACH TO LARGE-SCALE DNNS ON MULTI-FPGA ARCHITECTURES
Speaker:
Jiaxi Zhang, Peking University, CN
Authors:
Wentai Zhang1, Jiaxi Zhang1, Minghua Shen2, Guojie Luo1 and Nong Xiao3
1Peking University, CN; 2Sun Yat-sen University, CN; 3Sun Yat-Sen University, CN
Abstract
FPGAs are very attractive to accelerate the deep neural networks (DNNs). While single FPGA can provide good performance for small-scale DNNs, support for large-scale DNNs is limited due to higher resource demand. In this paper, we propose an efficient mapping approach for accelerating large-scale DNNs on asymmetric multi-FPGA architectures. In this approach, the neural network mapping can be formulated as a resource allocation problem. We design a dynamic programming-based partitioning to solve this problem optimally. Experimental results using the large-scale ResNet-152 demonstrate that our approach deploys sixteen FPGAs to provide an advantage of 16.4x GOPS over the state-of-the-art work.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

8.5 Don't Forget the Memory

Date: Wednesday, March 27, 2019
Time: 17:00 - 18:30
Location / Room: Room 5

Chair:
Christian Pilato, Politecnico di Milano, IT, Contact Christian Pilato

Co-Chair:
Olivier Sentieys, INRIA, FR, Contact Olivier Sentieys

Multi-core systems demand new solutions to overcome the increasing memory gap and emerging memory technologies still need to find a suitable place in the traditional memory system. This session showcases different proposals covering memory, storage, and OS. The first presentation improves the parallelism of the Open-Channel SSD Linux implementation. The second presentation proposes a method to orchestrate multicore memory requests to maintain the main memory locality. The third presentation proposes a new method to improve directory entry lookup in deep directory structures. An interactive presentation completes the session with a new cache replacement algorithm for NVM disk read caches.

TimeLabelPresentation Title
Authors
17:008.5.1DS-CACHE: A REFINED DIRECTORY ENTRY LOOKUP CACHE WITH PREFIX-AWARENESS FOR MOBILE DEVICES
Speaker:
Zhaoyan Shen, Shandong University, CN
Authors:
Lei Han1, Bin Xiao1, Xuwei Dong2, Zhaoyan Shen3 and Zili Shao4
1The Hong Kong Polytechnic University, HK; 2Northwestern Polytechnical University, CN; 3Shandong University, CN; 4The Chinese University of Hong Kong, HK
Abstract
Our modern devices are filled with files, directories upon directories. Applications generate huge I/O activities in mobile devices. Directory cache is adopted to accelerate file lookup operations in the virtual file system. However, the original directory cache recursively walks all the components of a path for each lookup, leading to inefficient lookup performance and lower cache hit ratio. In this paper, we for the first time fully investigate the characteristics of the directory entry lookup in mobile devices. Based on our findings, we further propose a new directory cache scheme, called Dynamic Skipping Cache, which adopts an ASCII-based hash table to simplify the path lookup complexity by skipping the common prefixes of paths. We also design a novel lookup scheme to optimize the directory cache hit ratio. We have implemented and deployed DS-Cache on a Google Nexus 6P smartphone. Experimental results show that we can significantly reduce the latency of invoking system calls by up to 57.4%, and further reduce the completion time of real-world mobile applications by up to 64%.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.5.2IMPROVING THE DRAM ACCESS EFFICIENCY FOR MATRIX MULTIPLICATION ON MULTICORE ACCELERATORS
Speaker:
Sheng Ma, National University of Defense Technology, CN
Authors:
Sheng Ma, Yang Guo, Shenggang Chen, Libo Huang and Zhiying Wang, National University of Defense Technology, CN
Abstract
The parallelization of matrix multiplication on multicore accelerators divides a matrix into several partitions. The existing design deploys an independent DMA transfer for each core to access its own partition from DRAM. This design has poor memory access efficiency, since memory access streams of multiple concurrent DMA transfers interfere with each other. We propose Distributed-DMA (D-DMA), which invokes one transfer to serve all cores. D-DMA accesses data in a row-major manner to efficiently exploit inter-partition locality to improve the DRAM access efficiency. Compared with a baseline design, D-DMA improves the bandwidth by 84.8% and reduces DRAM energy consumption by 43.1% for micro-benchmarks. It achieves higher performance for the GEMM benchmark. With much lower hardware cost, D-DMA significantly outperforms an out-of-order memory controller.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.5.3QBLK: TOWARDS FULLY EXPLOITING THE PARALLELISM OF OPEN-CHANNEL SSDS
Speaker:
Hongwei Qin, Wuhan National Laboratory for Optoelectronics, Key Laboratory of Information Storage System, Engineering Research Center of data storage systems and Technology, Ministry of Education of China, School of Computer Science and Technology, Huazhong University, CN
Authors:
Hongwei Qin, Dan Feng, Wei Tong, Jingning Liu and Yutong Zhao, Wuhan National lab for Optoelectronics, CN
Abstract
By exposing physical channels to host software, Open-Channel SSD shows great potential in future high performance storage systems. However, the existing scheme fails to achieve acceptable performance under heavy workloads. The main reasons reside not only in its single-buffer architecture, more importantly, but also in its line-based physical address management. Besides, the lock of address mapping table is also a performance burden under heavy workloads. We propose QBLK, an open source driver which tries to better exploit the parallelism of Open-Channel SSDs. Particularly, QBLK adopts four key techniques, namely (1) Multi-queue based buffering, (2) Per-channel based address management, (3) Lock-free address mapping, and (4) Fine-grained draining. Experimental results show that QBLK achieves up to 97.4% bandwidth improvement compared with the state-of-the-art PBLK scheme.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP4-2, 1013A WRITE-EFFICIENT CACHE ALGORITHM BASED ON MACROSCOPIC TREND FOR NVM-BASED READ CACHE
Speaker:
Ning Bao, Renmin University of China, CN
Authors:
Ning Bao1, Yunpeng Chai1 and Xiao Qin2
1Renmin University of China, CN; 2Auburn University, US
Abstract
Compared with traditional storage technologies, non-volatile memory (NVM) techniques have excellent I/O performances, but high costs and limited write endurance (e.g., NAND and PCM) or high energy consumption of writing (e.g., STT-MRAM). As a result, the storage systems prefer to utilize NVM devices as read caches for performance boost. Unlike write caches, read caches have greater potential of write reduction because their writes are only triggered by cache updates. However, traditional cache algorithms like LRU and LFU have to update cached blocks frequently because it is difficult for them to predict data popularity in the long future. Although some new algorithms like SieveStore reduce cache write pressure, they still rely on those traditional cache schemes for data popularity prediction. Due to the bad long-term data popularity prediction effect, these new cache algorithms lead to a significant and unnecessary decrease of cache hit ratios. In this paper, we propose a new Macroscopic Trend (MT) cache replacement algorithm to reduce cache updates effectively and maintain high cache hit ratios. This algorithm discovers long-term hot data effectively by observing the macroscopic trend of data blocks. We have conducted extensive experiments driven by a series of real-world traces, and the results indicate that compared with LRU, the MT cache algorithm can achieve 15.28 times longer lifetime or less energy consumption of NVM caches with a similar hit ratio.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:31IP4-3, 626SRAM DESIGN EXPLORATION WITH INTEGRATED APPLICATION-AWARE AGING ANALYSIS
Speaker:
Alexandra Listl, TUM, DE
Authors:
Alexandra Listl1, Daniel Mueller-Gritschneder2, Sani Nassif3 and Ulf Schlichtmann2
1Chair of Electronic Design Automation, DE; 2TUM, DE; 3Radyalis, US
Abstract
On-Chip SRAMs are an integral part of safetycritical System-on-Chips. At the same time however, they are also most susceptible to reliability threats such as Bias Temperature Instability (BTI), originating from the continuous trend of technology shrinking. BTI leads to a significant performance degradation, especially in the Sense Amplifiers (SAs) of SRAMs, where failures are fatal, since the data of a whole column is destroyed. As BTI strongly depends on the workload of an application, the aging rates of SAs in a memory array differ significantly and the incorporation of workload information into aging simulations is vital. Especially in safety-critical systems precise estimation of application specific reliability requirements to predict the memory lifetime is a key concern. In this paper we present a workload-aware aging analysis for On-Chip SRAMs that incorporates the workload of real applications executed on a processor. According to this workload, we predict the performance degradation of the SAs in the memory. We integrate this aging analysis into an aging-aware SRAM design exploration framework that generates and characterizes memories of different array granularity to select the most reliable memory architecture for the intended application. We show that this technique can mitigate SA degradation significantly depending on the environmental conditions and the application workload.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

8.6 Robotics and Industry 4.0

Date: Wednesday, March 27, 2019
Time: 17:00 - 18:30
Location / Room: Room 6

Chair:
Federica Ferraguti, University of Modena-Reggio, IT, Contact Federica Ferraguti

Co-Chair:
Armin Schoenlieb, Infineon Technologies, AT, Contact Armin Schoenlieb

This session presents new results in the field of robotics and cyberphysical systems applied to Industry 4.0. The session includes theoretical results as well as evaluation of relevant use-cases.

TimeLabelPresentation Title
Authors
17:008.6.1A METHODOLOGY FOR COMPARATIVE ANALYSIS OF COLLABORATIVE ROBOTS FOR INDUSTRY 4.0
Speaker:
Marcello Bonfè, University of Ferrara, IT
Authors:
Federica Ferraguti1, Andrea Pertosa1, Cristian Secchi1, Cesare Fantuzzi1 and Marcello Bonfè2
1University of Modena and Reggio Emilia, IT; 2University of Ferrara, IT
Abstract
Collaborative robots are one of the key drivers in Industry 4.0 and they have evolved considerably since the last decades of the 20th century. With respect to the industrial robots, collaborative robots are more productive, flexible, versatile and safer. In the recent years, both market leading manufacturers of industrial robots and newer startup companies have developed novel products for collaborative robotic applications. In this paper, we propose a methodology for developing a comparative analysis of the collaborative robots currently available in the market. The goal of the paper is to provide a framework for benchmarking alternative robots for a given collaborative application, based on common robot parameters and standardized experiments to be performed with the robots under investigation. An experimental technological review of three different collaborative robots is provided, showing how the methodology can be applied in real cases.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.6.2HYBRID SENSING APPROACH FOR CODED MODULATION TIME-OF-FLIGHT CAMERAS
Speaker:
Armin Schoenlieb, Infineon Technologies, AT
Authors:
Armin Schoenlieb1, Hannes Plank1, Christian Steger2, Gerald Holweg1 and Norbert Druml1
1Infineon Technologies, AT; 2Graz University of Technology, AT
Abstract
In recent years, application fields such as industrial automation and indoor robot navigation increased the demand on reliable localization systems. Simultaneous mapping and localization systems often depend on depth imaging in order to reconstruct the scene. Time-of-Flight sensors prove to be well suited for these applications, however are impaired by different error sources. The measurement principle is based on measuring the phase and consequently the delay of emitted and reflected light. Specular surfaces can cause pixel saturation, while the periodicity of the measured phase leads to ambiguous distances. In this paper, we aim to solve these problems by proposing a new Time-of- Flight depth sensing approach. By combining the emerging coded modulation method with traditional depth sensing, we are able to unify the advantages of both methods. Images captured with coded modulation show a pixel response only within selected distance limits. In contrast traditional continuous wave Time- of-Flight imaging exhibits a superior signal-to-noise ratio. This method enables to mask erroneous distance measurements, al- lowing Time-of-Flight sensors to produce more reliable depth measurements and gain traction in the industrial environment. As our evaluation shows, our method is able to remove the influence of specular surfaces, and is capable of masking ambiguous distance measurements. Furthermore, our approach improves the system behavior by enabling more robust exposure time control.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.6.3COMMUNICATION-COMPUTATION CO-DESIGN OF DECENTRALIZED TASK CHAIN IN CPS APPLICATIONS
Speaker:
Eli Bozorgzadeh, University of California, Irvine, US
Authors:
Seyyed Ahmad Razavi, Eli Bozorgzadeh and Solmaz Kia, University of California, Irvine, US
Abstract
In this paper, we present a method to find an optimal trade-off between computation and communication of decentralized linear task chain running on a network of mobile agents. Task replication has been deployed to reduce the data links among highly correlated nodes in communication networks. The primary goal is to reduce or remove the data links at the cost of increase in computational load at each node. However, with increase in complexity of applications and computation load on end devices with limited resources, the computational load is not negligible. Our proposed selective task replication enables communication-computation trade-off in decentralized task chains and minimizes the overall local computation overhead while keeping the critical path delay under a threshold delay. We applied our approach to decentralized Unscented Kalman Filter (UKF) for state estimation in cooperative localization of mobile multi-robot systems. We demonstrate and evaluate our proposed method on a network of 15 Raspberry Pi3B connected via WiFi. Our experimental results show that, using the proposed method, the prediction step of decentralized UKF is faster by 15%, and for the same threshold delay, the overall computation overhead is reduced by 2.41 times, compared to task replication without resource constraint.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:158.6.4RESOURCE MANAGER FOR SCALABLE PERFORMANCE IN ROS DISTRIBUTED ENVIRONMENTS
Speaker:
Daisuke Fukutomi, Ritsumeikan University, JP
Authors:
Daisuke Fukutomi1, Takuya Azumi2, Shinpei Kato3 and Nobuhiko Nishio1
1Ritsumeikan University, JP; 2Saitama University, JP; 3The University of Tokyo, JP
Abstract
This paper presents a resource manager to achieve scalable performance in Robot Operating System (ROS) for distributed environments. In robotics, using ROS in distributed environments via multiple host machines is trending for large-scale data processing, for example, cloud/edge computing and the data communication of point clouds and images in dynamic map composition. However, ROS is unable to manage the resources (e.g., the CPUs, memory, and disks) on each host machine. Therefore, it is difficult to use distributed environmental resources efficiently and achieve scalable performance. This paper proposes a resource management mechanism for ROS distributed environments using a master-slave model to execute ROS processes efficiently and smoothly. We manage the resource usage of each host machine and construct a mechanism to adaptively distribute the load to be balanced. Evaluations show that scalable performance can be achieved in ROS distributed environments comprising ten host machines using a real application (SLAM: simultaneous localization and mapping) processing large-scale point cloud data.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:31IP4-4, 791FROM MULTI-LEVEL TO ABSTRACT-BASED SIMULATION OF A PRODUCTION LINE
Speaker:
Stefano Centomo, University of Verona, IT
Authors:
Stefano Centomo, Enrico Fraccaroli and Marco Panato, University of Verona, IT
Abstract
This paper proposes two approaches for the integration of cyber-physical systems in a production line in order to obtain predictions concerning the actual production, core operation in the context of Industry 4.0. The first approach relies on the Multi-Level paradigm where multiple descriptions of the same CPS are modeled with different levels of details. Then, the models are switched at runtime. The second approach relies on abstraction techniques of CPS maintaining a certain levels of details. The two approaches are validated and compared with a real use case scenario to identify the most effective simulation strategy.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:32IP4-5, 285ACCURATE DYNAMIC MODELLING OF HYDRAULIC SERVOMECHANISMS
Speaker:
Manuel Pencelli, Yanmar R&D Europe S.r.l., IT
Authors:
Manuel Pencelli1, Renzo Villa2, Alfredo Argiolas1, Gianni Ferretti2, Marta Niccolini1, Matteo Ragaglia1, Paolo Rocco2 and Andrea Maria Zanchettin2
1YANMAR R&D EUROPE S.R.L, IT; 2Politecnico di Milano, IT
Abstract
In this paper, the process of modelling and identification of a hydraulic actuator is discussed. In this framework a simple model based on the classical theory have been derived and a first experimental campaign has been performed on a test bench. These tests highlighted the presence of unmodelled phenomena (dead-zone, hysteresis, etc.), therefore a second and more extensive set of experiments has been done. With the acquired knowledge a new improved model is presented and its parameter identified. Finally several test has been performed in order to experimentally validate the model.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:33IP4-6, 208PLANNING WITH REAL-TIME COLLISION AVOIDANCE FOR COOPERATING AGENTS UNDER RIGID BODY CONSTRAINT
Speaker:
Federico Vesentini, University of Verona, IT
Authors:
Nicola Piccinelli, Federico Vesentini and Riccardo Muradore, University of Verona, IT
Abstract
In automated warehouses, path planning is a crucial topic to improve automation and efficiency. This kind of planning is usually computed off-line knowing the planimetry of the warehouse and the starting and target points of each agent. However, this global approach is not able to manage unexpected static/dynamic obstacles and other agents moving in the same area. For this reason in multi-robot systems global planners are usually integrated with local collision avoidance algorithms. In this paper we use the Voronoi diagram as global planner and the Velocity Obstacle (VO) method as collision avoidance algorithm. The goal of this paper is to extend such hybrid motion planner by enforcing mechanical constraints between agents in order to execute a task that cannot be performed by a single agent. We will focus on the cooperative task of carrying a payload, such as a bar. Two agents are constrained to move at the end points of the bar. We will improve the original algorithms by taking into account dynamically the constrained motion both at the global and at the collision avoidance level.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

8.7 Embedded hardware architectures for deep neural networks

Date: Wednesday, March 27, 2019
Time: 17:00 - 18:30
Location / Room: Room 7

Chair:
Sandeep Pande, IMEC-NL, NL, Contact Sandeep Pande

Co-Chair:
Kyuho Lee, Ulsan National Institute of Science and Technology (UNIST), KR, Contact Kyuho Lee

This session presents papers that address various research challenges including optimization of deep neural networks for edge devices, multiplierless neural network acceleration, design space exploration of CNNs on FPGAs and accelerating local binary pattern networks on FPGAs.

TimeLabelPresentation Title
Authors
17:008.7.1SELF-SUPERVISED QUANTIZATION OF PRE-TRAINED NEURAL NETWORKS FOR MULTIPLIERLESS ACCELERATION
Speaker:
Sebastian Vogel, Robert Bosch GmbH, DE
Authors:
Sebastian Vogel1, Jannik Springer1, Andre Guntoro1 and Gerd Ascheid2
1Robert Bosch GmbH, DE; 2RWTH Aachen University, DE
Abstract
To host intelligent algorithms such as Deep Neural Networks on embedded devices, it is beneficial to transform the data representation of neural networks into a fixed-point format with reduced bit-width. In this paper we present a novel quantization procedure for parameters and activations of pre-trained neural networks. For 8,bit linear quantization, our procedure achieves close to original network performance without retraining and consequently does not require labeled training data. Additionally, we evaluate our method for power-of-two quantization as well as for a two-hot quantization scheme, enabling shift-based inference. To underline the hardware benefits of a multiplierless accelerator, we propose the design of a shift-based processing element.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.7.2MULTI-OBJECTIVE PRECISION OPTIMIZATION OF DEEP NEURAL NETWORKS FOR EDGE DEVICES
Speaker:
Nhut Minh Ho, National University of Singapore, SG
Authors:
Nhut-Minh Ho, Ramesh Vaddi and Weng-Fai Wong, National University of Singapore, SG
Abstract
Precision tuning post-training is often needed for efficient implementation of deep neural networks especially when the inference implementation platform is resource constrained. While previous works have proposed many ad hoc strategies for this task, this paper describes a general method for allocating precision to trained deep neural networks' data based on a property relating errors in a network. We demonstrate that the precision results of previous works for hardware accelerator or understanding cross layer precision requirement is subsumed by the proposed general method. It has achieved a 29% and 46% energy saving over the state-of-the-art search-based method for GoogleNet and VGG-19 respectively. Proposed precision allocation method can be used to optimize for different criteria based on hardware design constraints, allocating precision at the granularity of layers for very deep networks such as Resnet-152, which hitherto was not achievable.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.7.3TOWARDS DESIGN SPACE EXPLORATION AND OPTIMIZATION OF FAST ALGORITHMS FOR CONVOLUTIONAL NEURAL NETWORKS (CNNS) ON FPGAS
Speaker:
Muhammad Adeel Pasha, LUMS, PK
Authors:
Afzal Ahmad and Muhammad Adeel Pasha, Department of Electrical Engineering, SBASSE, LUMS, PK
Abstract
Convolutional Neural Networks (CNNs) have gained widespread popularity in the field of computer vision and image processing. Due to huge computational requirements of CNNs, dedicated hardware-based implementations are being explored to improve their performance. Hardware platforms such as Field Programmable Gate Arrays (FPGAs) are widely being used to design parallel architectures for this purpose. In this paper, we analyze Winograd minimal filtering or fast convolution algorithms to reduce the arithmetic complexity of convolutional layers of CNNs. We explore a complex design space to find the sets of parameters that result in improved throughput and power-efficiency. We also design a pipelined and parallel Winograd convolution engine that improves the throughput and power-efficiency while reducing the computational complexity of the overall system. Our proposed designs show up to 4.75x and 1.44x improvements in throughput and power-efficiency, respectively, in comparison to the state-of-the-art design while using approximately 2.67x more multipliers. Furthermore, we obtain savings of up to 53.6% in logic resources compared with the state-of-the-art implementation.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:158.7.4ACCELERATING LOCAL BINARY PATTERN NETWORKS WITH SOFTWARE PROGRAMMABLE FPGAS
Speaker:
Jeng-Hau Lin, UC San Diego, US
Authors:
Jeng-Hau Lin, Atieh Lotfi, Vahideh Akhlaghi, Zhuowen Tu and Rajesh Gupta, UC San Diego, US
Abstract
Fueled by the success of mobile devices, the computational demands on these platforms have been rising faster than the computational and storage capacities or energy availability to perform tasks ranging from recognizing speech, images to automated reasoning and cognition. While the success of convolutional neural networks (CNNs) have contributed to such a vision, these algorithms stay out of the reach of limited computing and storage capabilities of mobile platforms. It is clear to most researchers that such a transition can only be achieved by using dedicated hardware accelerators on these platforms. However, CNNs with arithmetic-intensive operations remain particularly unsuitable for such acceleration both computationally as well as for the high memory bandwidth needs of highly parallel processing required. In this paper, we implement and optimize an alternative genre of networks, local binary pattern network (LBPNet) which eliminates arithmetic operations by combinatorial operations thus substantially boosting the efficiency of hardware implementation. LBPNet is built upon a radically different view of the arithmetic operations sought by conventional neural networks to overcome limitations posed by compression and quantization methods used for hardware implementation of CNNs. This paper explores in depth the design and implementation of both an architecture and critical optimizations of LBPNet for realization in accelerator hardware and provides a comparison of results with the state-of-art CNN on multiple datasets.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:31IP4-7, 334THE CASE FOR EXPLOITING UNDERUTILIZED RESOURCES IN HETEROGENEOUS MOBILE ARCHITECTURES
Speaker:
Nikil Dutt, University of California, Irvine, US
Authors:
Chenying Hsieh, Nikil Dutt and Ardalan Amiri Sani, UC Irvine, US
Abstract
Heterogeneous architectures are ubiquitous in mobile plat-forms, with mobile SoCs typically integrating multiple processors along with accelerators such as GPUs (for data-parallel kernels) and DSPs (for signal processing kernels). This strict partitioning of application execution on heterogeneous compute resources often results in underutilization of resources such as DSPs. We present a case study executing popular data-parallel workloads such as convolutional neural networks (CNNs), computer vision application and graphics kernels on mobile devices, and show that both performance and energy consumption of mobile platforms can be improved by synergistically deploying these underutilized DSPs. Our experiments on a mobile Snapdragon 835 platform under both single and multiple application scenarios executing CNNs and graphics workloads, demonstrates average performance and energy improvements of 15-46% and 18-80% respectively by synergistically deploying all available compute resources, especially the underutilized DSP.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:32IP4-8, 420ONLINE RARE CATEGORY DETECTION FOR EDGE COMPUTING
Speaker:
Yufei Cui, City University of Hong Kong, HK
Authors:
Yufei Cui1, Qiao Li1, Sarana Nutanong2 and Chun Jason Xue1
1City University of Hong Kong, HK; 2Vidyasirimedhi Institute of Science and Technology, TH
Abstract
Abstract — Identifying rare categories is an important data management problem in many application fields including video surveillance, ecological environment monitoring and precision medicine. Previous solutions in literature require all data instances to be first delivered to the server. Then, the rare categories identification algorithms are executed on the pool of data to find informative instances for human annotators to label. This incurs large bandwidth consumption and high latency. To deal with the problems, we propose a light-weight rare categories identification framework. At the sensor side, the designed online algorithm filters less informative data instances from the data stream and only sends the informative ones to human annotators. After labeling, the server only sends labels of the corresponding data instances in response. The sensor-side algorithm is extended to enable cooperation between embedded devices for the cases that data is collected in a distributed manner. Experiments are conducted to show our framework dramatically outperforms the baseline. The network traffic is reduced by 75% on average.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:33IP4-9, 416RAGRA: LEVERAGING MONOLITHIC 3D RERAM FOR MASSIVELY-PARALLEL GRAPH PROCESSING
Speaker:
Yu Huang, Huazhong University of Science and Technology, CN
Authors:
Yu Huang, Long Zheng, Xiaofei Liao, Hai Jin, Pengcheng Yao and Chuangyi Gui, Huazhong University of Science and Technology, CN
Abstract
With the maturity of monolithic 3D integration, 3D ReRAM provides impressive storage-density and computational-parallelism with great opportunities for parallel-graph processing acceleration. In this paper, we present RAGra, a 3D ReRAM based graph processing accelerator, which has two significant technical highlights. First, monolithic 3D ReRAM usually has the complexly-intertwined feature with shared input wordlines and output bitlines for different layers. We propose a novel mapping scheme, which can guide to apply graph algorithms into 3D ReRAM seamlessly and correctly for exposing the massive parallelism of 3D ReRAM. Second, consider the sparsity of real-world graphs, we further propose a row- and column-mixed execution model, which can filter invalid subgraphs for exploiting the massive parallelism of 3D ReRAM. Our evaluation on 8-layer stacked ReRAM shows that RAGra outperforms state-of-the-art planar (2D) ReRAM-based graph accelerator GraphR by 6.18x performance improvement and 2.21x energy saving, on average. In particular, RAGra significantly outperforms GridGraph (a typical CPU-based graph system) by up to 293.12x speedup.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

8.8 Inspiring futures! Careers Session @ DATE (part 2)

Date: Wednesday, March 27, 2019
Time: 17:00 - 18:30
Location / Room: Exhibition Theatre

Organisers:
Luca Fanucci, University of Pisa, IT, Contact Luca Fanucci
Rossano Massai, University of Pisa, IT, Contact Rossano Massai
Xavier Salazar, Barcelona Supercomputing Center, ES, Contact Xavier Salazar

Moderator:
Luca Fanucci, University of Pisa, IT, Contact Luca Fanucci

This session (registration: link) aims to bring together recruiters - mostly companies large and small, as well as universities and research centres - with potential jobseekers in the technology areas covered by DATE and HiPEAC, including:

  • computer science and engineering undergraduate and master students in their final year
  • early career researchers
  • students attending the PhD forum or at the end of their PhDs

The progamme will be tailored to the needs of the students and researchers. It will include:

  • career insights and mentoring by the HiPEAC officer for recruitment activities and a careers advisor from a local university
  • company pitches
  • time for informal networking

For students, this session is an opportunity to:

  • find out about different career paths within computer science high-end research and engineering
  • get advice on possible ways of progressing their careers
  • learn about the main skills employers look for
  • hear about most interesting vacancies and internship opportunities from companies and research centres
  • take advantage of the best environment to share their CVs with company speakers or discuss opportunities on a one-to-one basis in an informal environment
  • get free access to the rest of the exhibition

For companies, this session provides an excellent opportunity to:

  • get in contact with potential jobseekers specializing in the right areas for their business
  • talk to jobseekers in an informal environment and collect their CVs
  • promote their corporate brand as representing the best workplace, with most stimulating environment and interesting projects
TimeLabelPresentation Title
Authors
17:008.8.1INSPIRING FUTURES @ MICROTEST
Speaker:
Eluisa Ghilardi, Microtest, IT
17:158.8.2INSPIRING FUTURES @ COBHAM GAISLER
Speaker:
Magnus Hjorth, Cobham Gaisler, SE
17:308.8.3INSPIRING FUTURES @ INGENIARS
Speaker:
Camila Giunti, IngeniArs, IT
17:458.8.4INSPIRING FUTURES @ INTEL
Speaker:
Neslihan Kose Cihangir, Intel, DE
18:008.8.5INSPIRING FUTURES @ ST
Speaker:
Valeria Tomaselli, STMicroelectronics, IT
18:158.8.6INSPIRING FUTURES @ CAEN
Speaker:
Alessandro Iovene, CAEN, IT
18:30End of session

DATE-Party DATE Party | Networking Event

Date: Wednesday, March 27, 2019
Time: 19:30 - 23:00
Location / Room:

The DATE Party traditionally states one of the highlights of the DATE week. As one of the main networking opportunities during the DATE week, it is a perfect occasion to meet friends and colleagues in a relaxed atmosphere while enjoying local amenities. It is scheduled on Wednesday, March 27, 2019, from 19:30 to 23:00.

Please kindly note that it is not a seated dinner.

All delegates, exhibitors and their guests are invited to attend the party. Please note that entrance is only possible with a valid party ticket. Each full conference registration includes a ticket for the DATE Party (which needs to be booked during the online registration process though). Additional tickets can be purchased on-site at the registration desk (subject to availability of tickets). Price for extra ticket: 70 € per person.

TimeLabelPresentation Title
Authors
23:00End of session

9.1 Special Day on "Model-Based Design of Intelligent Systems" Session: Experiences from the trenches, model-based design at work

Date: Thursday, March 28, 2019
Time: 08:30 - 10:00
Location / Room: Room 1

Chair:
Ingo Sander, KTH, SE, Contact Ingo Sander

Co-Chair:
Sander Stuijk, Eindhoven University of Technology, NL, Contact Sander Stuijk

TimeLabelPresentation Title
Authors
08:309.1.1MODEL BASED DESIGN AT THALES: THE CURRENT STATUS AND NEW CHALLENGES
Speaker and Author:
Laurent Rioux, Thales, FR
Abstract
After more than a decade since the introduction of its model based design method (ARCADIA) and its dedicated tool (CAPELLA) in the industrial practices, THALES has acquired strong expertise and competences in model based engineering techniques and solutions to master complex systems. THALES has also developed approaches to cope with non-functional properties like safety, security and performance in the context of model-based design by integrating formal methods and other verification techniques. The integration of these techniques early in the process allows systematic verification through the lifecycle and avoiding costly errors. However, these non-functional system properties are not isolated and often strongly related - but with limited/no automation of traceability of interdependencies. Thus THALES is currently working to advance technology enable to combine safety, security and performance engineering activities. Supported by the model based approach, it is possible to define interaction points where architects and experts (safety, security and performance) can work together to identify common solutions to meet such non-functional requirements. Today, THALES systems integrate more intelligence to become more autonomous. But such systems still have to comply with the same level of criticality as before. This creates new engineering challenges where these systems are able to adapt themselves with new behaviors and where these new behaviors need to comply with critical non-functional system properties. So, the techniques already developed need to be extended to verify and validate the safety, security and performance of such autonomous systems.
09:009.1.2MODEL-BASED DESIGN FOR CONTROLS, AI, AND COMMUNICATIONS IN INTELLIGENT SYSTEMS
Speaker and Author:
Pieter Mosterman, Mathworks, US
Abstract
In the current technology landscape, data serves as a convergence point of concurrent trends. Ubiquitous sensors are generating ever larger amounts of data, pervasively connected 5G networks are making this data available at rapidly increasing speeds and size, proliferation of compute platforms enables computational applications beyond control-flow oriented Harvard architectures, and sophisticated Artificial Intelligence and other algorithms are uniquely creating value from these reams of data and data intensive compute resources. These trends challenge the status quo in systems development and applications and create opportunity to predict, control, and optimize processes in new ways. How can Model-Based Design tools and workflows enable engineers to conceive, optimize, and implement these complex systems?
09:309.1.3MODEL DRIVEN DEVELOPMENT OF TWINSCAN SOFTWARE, BUT NOT FROM SCRATCH!
Speaker and Author:
Ramon Schiffelers, ASML, NL
Abstract
ASML, a high-tech company, aims at cost-effective transitioning of its traditionally built software to Model Driven Engineered software. The existing software is vital to the company as it contains important business logic developed using thousands of man years. It will be costly in terms of both time and effort to construct models for this software from scratch. To address this issue, we have come up with the applications of existing dynamic software analysis techniques to retrieve behavior of the ASML's software. Dynamic analysis techniques deal with analysis of software in execution mode. The sub-categories of dynamic learning, active learning and passive learning, differentiate on the basis of input acceptable by the techniques. In this talk, we will discuss the methodology to infer interface protocols of software components in ASML context. The interface protocols later can serve as the starting point for the maintenance activities like re-factoring, re-engineering etc. Taking into account the complementary nature of active and passive learning, we will also talk about our approach of refining active learning with passive learning and execution logs. This contributes to increasing efficiency of the active learning process, while guaranteeing minimal behavior coverage. We will also share the results of case studies for applying our techniques at ASML.
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


9.2 High-Level Synthesis

Date: Thursday, March 28, 2019
Time: 08:30 - 10:00
Location / Room: Room 2

Chair:
Yuko Hara-Azumi, Tokyo Institute of Technology, JP, Contact Yuko Hara-Azumi

Co-Chair:
Jordi Cortadella, UPC, ES, Contact Jordi Cortadella

In this session, we show how high-level synthesis (HLS) can be used to protected IPs and to exploit high-level information to predict the outcome of the physical design. First, the protection of a high-level IP model in a cloud-based synthesis context is discussed by using functional locking. The second talk investigates how the concepts of hardware Trojans can be used during HLS to add watermarks to IPs. A novel approach is then proposed to estimate the routing congestion at physical level during HLS. The interactive presentation discusses how to estimate the hardware cost and the software performance for the hardware/software interface.

TimeLabelPresentation Title
Authors
08:309.2.1TRANSIENT KEY-BASED OBFUSCATION FOR HLS IN AN UNTRUSTED CLOUD ENVIRONMENT
Speaker:
Hannah Badier, ENSTA Bretagne, Lab-STICC, Brest, FR
Authors:
Hannah Badier1, Jean-Christophe Le Lann1, Philippe Coussy2 and Guy Gogniat3
1ENSTA Bretagne, FR; 2Universite de Bretagne-Sud / Lab-STICC, FR; 3Université Bretagne Sud, FR
Abstract
Recent advances in cloud computing have led to the advent of Business-to-Business Software as a Service (SaaS) solutions, opening new opportunities for EDA. High-Level Synthesis (HLS) in the cloud is likely to offer great opportunities to hardware design companies. However, these companies are still reluctant to make such a transition, due to the new risks of Behavioral Intellectual Property (BIP) theft that a cloud-based solution presents. In this paper, we introduce a key-based obfuscation approach to protect BIPs during cloud-based HLS. The source-to-source transformations we propose hide functionality and make normal behavior dependent on a series of input keys. In our process, the obfuscation is transient: once an obfuscated BIP is synthesized through HLS by a service provider in the cloud, the obfuscation code can only be removed at Register Transfer Level (RTL) by the design company that owns the correct obfuscation keys. Original functionality is thus restored and design overhead is kept at a minimum. Our method significantly increases the level of security of cloud-based HLS at low performance overhead. The average area overhead after obfuscation and subsequent de-obfuscation with tests performed on ASIC and FPGA is 0.39%, and over 95% of our tests had an area overhead under 5%.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:009.2.2HIGH-LEVEL SYNTHESIS OF BENEVOLENT TROJANS
Speaker:
Christian Pilato, Politecnico di Milano, IT
Authors:
Christian Pilato1, Kanad Basu2, Mohammed Shayan2, Francesco Regazzoni3 and Ramesh Karri2
1Politecnico di Milano, IT; 2NYU, US; 3ALaRI, CH
Abstract
High-Level Synthesis (HLS) allows designers to create a register transfer level (RTL) description of a digital circuit starting from its high-level specification (e.g., C/C++/SystemC). HLS reduces engineering effort and design-time errors, allowing the integration of additional features. This study introduces an approach to generate benevolent Hardware Trojans (HT) using HLS. Benevolent HTs are Intellectual Property (IP) watermarks that borrow concepts from well-known malicious HTs to ward off piracy and counterfeiting either during the design flow or in fielded integrated circuits. Benevolent HTs are difficult to detect and remove because they are intertwined with the functional units used to implement the IP. Experimental results testify to the suitability of the approach and the limited overhead.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:309.2.3MACHINE LEARNING BASED ROUTING CONGESTION PREDICTION IN FPGA HIGH-LEVEL SYNTHESIS
Speaker:
Jieru ZHAO, HKUST, CN
Authors:
Jieru Zhao1, Tingyuan Liang2, Sharad Sinha3 and Wei Zhang1
1Hong Kong University of Science and Technology, HK; 2HKUST, CN; 3Indian Institute of Technology Goa, IN
Abstract
High-level synthesis (HLS) shortens the development time of hardware designs and enables faster design space exploration at a higher abstraction level. Optimization of complex applications in HLS is challenging due to the effects of implementation issues such as routing congestion. Routing congestion estimation is absent or inaccurate in existing HLS design methods and tools. Early and accurate congestion estimation is of great benefit to guide the optimization in HLS and improve the efficiency of implementation. However, routability, a serious concern in FPGA designs, has been difficult to evaluate in HLS without analyzing post-implementation details after Place and Route. To this end, we propose a novel method to predict routing congestion in HLS using machine learning and map the expected congested regions in the design to the relevant high-level source code. This is greatly beneficial in early identification of routability oriented bottlenecks in the high-level source code without running time-consuming register-transfer level (RTL) implementation flow. Experiments demonstrate that our approach accurately estimates vertical and horizontal routing congestion with errors of 6.71% and 10.05% respectively. By presenting Face Detection application as a case study, we show that by discovering the bottlenecks in high-level source code, routing congestion can be easily and quickly resolved compared to the efforts involved in RTL level implementation and design feedback.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP4-10, 275ACCURATE COST ESTIMATION OF MEMORY SYSTEMS INSPIRED BY MACHINE LEARNING FOR COMPUTER VISION
Speaker:
Lorenzo Servadei, Infineon Technologies, DE
Authors:
Lorenzo Servadei1, Elena Zennaro1, Keerthikumara Devarajegowda1, Martin Manzinger1, Wolfgang Ecker1 and Robert Wille2
1Infineon AG, DE; 2Johannes Kepler University Linz, AT
Abstract
Hardware/software co-designs are usually defined at high levels of abstractions at the beginning of the design process in order to allow plenty of options how to eventually realize a system. This allows for design exploration which in turn heavily relies on knowing the costs of different design configurations (with respect to hardware usage as well as firmware metrics). To this end, methods for cost estimation are frequently applied in industrial practice. However, currently used methods for cost estimation oversimplify the problem and ignore important features - leading to estimates which are far off from the real values. In this work, we address this problem for memory systems. To this end, we borrow and re-adapt solutions based on Machine Learning (ML) which have been found suitable for problems from the domain of Computer Vision (CV) - in particular age determination of persons depicted in images. We show that, for an ML approach, age determination from the CV domain is actually very similar to cost estimation of a memory system.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP4-11, 658PRACTICAL CAUSALITY HANDLING FOR SYNCHRONOUS LANGUAGES
Speaker:
Steven Smyth, Kiel University, DE
Authors:
Steven Smyth, Alexander Schulz-Rosengarten and Reinhard von Hanxleden, Dept. of Computer Science, Kiel University, DE
Abstract
A key to the synchronous principle of reconciling concurrency with determinism is to establish at compile time that a program is causal, which means that there exists a schedule that obeys the rules put down by the language. In practice it can be rather cumbersome for the developer to cure causality problems. To facilitate causality handling, we propose, first, to enrich the scheduling regime of the language to also consider explicit scheduling directives that can be used by either the modeler or model-to-model transformations. Secondly, we propose to enhance programming environments with dedicated causality views to guide the developer in finding causality issues. Our proposals should be applicable for synchronous languages; we here illustrate them for the SCCharts language and its open source development platform KIELER.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:02IP4-12, 998APPLICATION PERFORMANCE PREDICTION AND OPTIMIZATION UNDER CACHE ALLOCATION TECHNOLOGY
Speaker:
Yeseong Kim, UCSD, US
Authors:
Yeseong Kim1, Ankit More2, Emily Shriver2 and Tajana Rosing1
1University of California San Diego, US; 2Intel, US
Abstract
Many applications running on high-performance computing systems share limited resources such as the last-level cache, often resulting in lower performance. Intel recently introduced a new control mechanism, called cache allocation technology (CAT), which controls the cache size used by each application. To intelligently utilize this technology for automated management, it is essential to accurately identify application performance behavior for different cache allocation scenarios. In this work, we show a novel approach which automatically builds a prediction model for application performance changes with CAT. We profile the workload characteristics based on Intel Top-down Microarchitecture Analysis Method (TMAM), and train the model using machine learning. The model predicts instructions per cycle (IPC) across available cache sizes allocated for the applications. We also design a dynamic cache management technique which utilizes the prediction model and intelligently partitions the cache resource to improve application throughput. We implemented and evaluated the proposed framework in Intel PMU profiling tool running on Xeon Platinum 8186 Skylake processor. In our evaluation, we show that the proposed model accurately predicts the IPC changes of applications with 4.7% error on average for different cache allocation scenarios. Our predictive online cache managements achieves improvements on application performance of up to 25% as compared to a prediction-agnostic policy.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:03IP4-13, 910GENERALIZED MATRIX FACTORIZATION TECHNIQUES FOR APPROXIMATE LOGIC SYNTHESIS
Speaker:
Sherief Reda, Brown University, US
Authors:
Soheil Hashemi and Sherief Reda, Brown University, US
Abstract
Approximate computing is an emerging computing paradigm, where computing accuracy is relaxed for improvements in hardware metrics, such as design area and power profile. In circuit design, a major challenge is to synthesize approximate circuits automatically from input exact circuits. In this work, we extend our previous work, BLASYS, for approximate logic synthesis based on matrix factorization, where an arbitrary input circuit can be approximated in a controlled fashion. Whereas our previous approach uses a semi-ring algebra for factorization, this work generalizes matrix-based circuit factorization to include both semi-ring and field algebra implementations. We also propose a new method for truth table folding to improve the factorization quality. These new approaches significantly widen the design space of possible approximate circuits, effectively offering improved trade-offs in terms of quality, area and power consumption. We evaluate our methodology on a number of representative circuits showcasing the benefits of our proposed methodology for approximate logic synthesis.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


9.3 Special Session: RISC-V or RISK-V? Towards Secure Open Hardware

Date: Thursday, March 28, 2019
Time: 08:30 - 10:00
Location / Room: Room 3

Chair:
Georg Sigl, TUM, DE, Contact Georg Sigl

The end of Moore's law is pushing the renewed interest on entirely new computing and System-on-Chip (SoC) design approaches to meet the requirements of largely diverse applications. Cloud computing, Internet-of-Things and artificial intelligence are pushing the development of a wide variety of complex SoCs that integrate heterogeneous IP hardware components from different providers. Especially when it comes to building complex, highly customized SoCs, it will be desirable that SoC designers could select verified, open source hardware blocks in the same manner that software developers are doing today. Open source hardware opens a path towards an ultra-fast design cycle for complex and highly customized SOCs. RISC-V is a free and open instruction set architecture (ISA), which enables a new era of processor innovation. RISC-V has captured attention of research and industrial communities; however security is still a main concern in this architecture. RISC-V allows the development of open source hardware with hardware security extensions and secure coprocessors able to be checked by many users at the source level. This could be a further driver to create open source secure RISC-V implementations. Such implementations must resist microarchitectural attacks such as Meltdown or Spectre. Currently researchers often try to fix the michroarchitecture security problems in closed source hardware with software, while their origin is in hardware. Open source hardware allows the development of countermeasures open to a wide research community in both hardware and software together, yielding to much more secure and performant solutions. On the other hand unsecure RISC-V implementations pose a major threat on our systems. In order to avoid that RISC-V becomes a security RISK-V, more research and development in both architectural and microarchitectural security solutions is required. This special session focuses on chances and risks offered by open source hardware based on RISC-V. A trend in modern SoCs is the integration of a dedicated security module with its own CPU which is hardened against attacks. RISC-V processors hardened against hardware attacks could be an ideal open source secure element, which can be easily integrated into SOCs, offering a transparent open source trust anchor for SOCs. The first talk will give an example how to secure RISC-V processors against hardware attacks. RISC-V currently lacks security features provided in standard processors such as trusted execution environments or enclaves. While all the known approaches in this area have known weaknesses the open source hardware project offers new chances to improve enclave's security by new concepts. This area will be covered by the second presentation. Security modules usually need accelerators for standardized cryptographic operations. How to integrate security coprocessors in a RISC-V system will be covered by the third talk. The last presentation will give an application example where RISC-V based processors are integrated in a SOC for both, the data processing as a multicore system and one additional as a hardware security module. The session consists of four presentations ranging from isolated on- or off-chip secure elements based on RISC-V, open-source projects for building trusted execution environments (TEE) with secure hardware enclaves based on the RISC-V, design of side-channel and fault attack resistant of crypto accelerators based on RISC-V and an application example using a RISC-V based SOC.

TimeLabelPresentation Title
Authors
08:309.3.1PROTECTING RISC-V PROCESSORS AGAINST PHYSICAL SIDE CHANNEL ATTACKS
Speaker:
Stefan Mangard, Graz University of Technology, AT
Authors:
Thomas Unterluggauer, Robert Schilling, Mario Werner and Stefan Mangard, Graz University of Technology, AT
Abstract
RISC-V is an instruction-set architecture suitable for a wide variety of applications, which ranges from simple microcontrollers to high-performance CPUs. As an increasing number of commercial vendors plans to adopt the architecture in their products, its security aspects are becoming a major concern. For microcontroller implementations of RISC-V, one of the main security risks are attackers with direct physical access to the microchip. These physical attackers can perform highly powerful attacks that span from memory probing over power analysis to fault injection and analysis. In this paper, we give an overview on the capabilities of attackers with direct physical device access, common threat models, and possible countermeasures. In addition, we discuss in more detail current approaches to secure RISC-V processors against fault injection attacks on the microchip itself. First, we show how to protect the control flow against fault attacks by using an encrypted instruction stream and decrypting it in a newly added pipeline stage between the processor's fetch and decode unit. Second, we show how to protect conditional branches against fault injection by adding redundancy to the comparison operation and entangling the comparison result with the encrypted instruction stream. Finally, we discuss an approach to protect the address bus to the memory against tampering.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:009.3.2SANCTORUM: A LIGHTWEIGHT SECURITY MONITOR FOR SECURE ENCLAVES
Speaker:
Ilia Lebedev, MIT, US
Authors:
Ilia Lebedev1, Kyle Hogan1, Jules Drean1, David Kohlbrenner2, Dayeol Lee2, Krste Asanović2, Dawn Song2 and Srinivas Devadas1
1MIT, US; 2UC Berkeley, US
Abstract
Recent widespread interest in trusted execution environments (TEEs) has given rise to a rich ecosystem of hardware security design starts. Of the many interpretations of a TEE, enclaves have emerged as a particularly compelling primitive: strongly isolated user-mode processes in a largely untrusted software environment. While the threat models employed by various enclave systems differ, the high-level guarantees they offer are largely the same: attestation of an enclave's initial state, as well as a guarantee of enclave integrity and privacy in the presence of a modelled adversary. This work describes Sanctorum, a small software TCB of a generic enclave-capable system, which is sufficient to implement secure enclaves akin to the primitive offered by Intel's SGX. While enclaves may be implemented via unconditionally trusted hardware and microcode, as is the case in SGX, we employ a TCB consisting largely of privileged software, which is authenticated, and may be replaced or patched as needed. Sanctorum is the trusted system software employed by the Sanctum and Keystone enclave systems, and implements a formally verified specification for enclaves on in-order multiprocessor system meeting baseline security requirements. Specifically, Sanctorum requires trustworthy hardware including a random number generator, a private cryptographic key pair derived via a secure bootstrapping protocol, and a robust isolation primitive to safeguard sensitive information. Sanctorum's threat model is informed by the threat model of the isolation primitive, and is suitable for adding enclaves to a variety of in-order processor systems.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:309.3.3TOWARDS RELIABLE AND SECURE POST-QUANTUM CO-PROCESSOR BASED ON RISC-V
Speaker:
Johanna Sepulveda, TUM, DE
Authors:
Tim Fritzmann1, Uzair Sharif1, Daniel Mueller-Gritschneder1, Cezar Rodolfo Wedig Reinbrecht2, Ulf Schlichtmann1 and Johanna Sepulveda1
1TUM, DE; 2UFRGS, BR
Abstract
Increasingly complex and powerful Systems-on-Chips (SoCs), connected through a 5G network, form the basis of the Internet-of-Things (IoT). These technologies will drive the digitization in all domains, e.g. industry automation, automotive, avionics, and healthcare. A major requirement for all above domains is the long-term (10 to 30 years) secure communication between the SoCs and the cloud over public 5G networks. The foreseeable breakthrough of quantum computers represents a risk for all communication. In order to prepare for such an event, SoCs must integrate secure quantum-computer-resistant cryptography which is reliable and protected against SW and HW attacks. Empowering SoCs with such strong security poses a challenging problem due to limited resources, tight performance requirements and long-term life-cycles. While current works are focused on efficient implementations of post-quantum cryptography, implementation-security and reliability aspects for SoCs are still largely unexplored. To this end, we present two contributions. First, we discuss the challenges and opportunities of implementing reliable and secure post-quantum MPSoCs based on RISC-V architecture. Second, we introduce our RISC-V co-processor for post-quantum security, able to support different lattice-based algorithms. We show that our co-processor achieves reliability and security capabilities while presenting a good performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:459.3.4A SECURITY ARCHITECTURE FOR RISC-V BASED IOT DEVICES
Speaker:
Matthias Hiller, Fraunhofer AISEC, DE
Authors:
Lukas Auer1, Christian Skubich2 and Matthias Hiller1
1Fraunhofer AISEC, DE; 2Fraunhofer IIS / EAS, DE
Abstract
New IoT applications are demanding for more and more performance in embedded devices while their deployment and operation poses strict power constraints. We present the security concept for a customizable Internet of Things (IoT) platform based on the RISC-V ISA and developed by several Fraunhofer Institutes. It integrates a range of peripherals with a scalable computing subsystem as a three dimensional System in-Package (3D-SiP). The security features aim for a medium security level and target the requirements of the IoT market. Our security architecture extends given implementations to enable secure deployment, operation, and update. Core security features are secure boot, an authenticated watchdog timer, and key management. The Universal Sensor Platform (USeP) SoC is developed for GLOBALFOUNDRIES' 22FDX technology and aims to provide a platform for Small and Medium-sized Enterprises (SMEs) that typically do not have access to advanced microelectronics and integration know-how, and are therefore limited to Commercial Off-The-Shelf (COTS) products.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


9.4 Where do NoC and Machine Learning meet?

Date: Thursday, March 28, 2019
Time: 08:30 - 10:00
Location / Room: Room 4

Chair:
Masoud Daneshtalab, Mälardalen University, SE, Contact Masoud Daneshtalab

Co-Chair:
Sébastien Le Beux, Lyon Institute of Nanotechnology, FR, Contact Sébastien Le Beux

The NoC design is being enhanced using machine intelligence technologies to drive system more efficiently. Denial-of-Service is one attack, caused by a malicious intellectual property core flooding the network, that can affect a NoC. In this session a lightweight and real-time DoS attack detection mechanism will be presented with timely attack detection and minor area and power overhead. Find a trade-offs among error rate, packet retransmission, performance, and energy is a very challenging topic. In this session a proactive fault-tolerant mechanism to optimize energy efficiency and performance with reinforcement learning (RL) will be proposed. Method to exploit the elasticity and noise-tolerance features of deep learning algorithms to circumvent the bottleneck of on-chip inter-core data moving and accelerate their execution will be discussed in this session. This method shows a better interconnects energy efficiency. Taking into account the fact that we can predict the destination of some packets ahead at the network interface we can establishes a highway from the source to the destination built up by reserving virtual channel. This mechanism can reduce the target packets' transfer latency and will be presented in this session.

TimeLabelPresentation Title
Authors
08:309.4.1REAL-TIME DETECTION AND LOCALIZATION OF DOS ATTACKS IN NOC BASED SOCS
Speaker:
Subodha Charles, University of Florida, US
Authors:
Subodha Charles, Yangdi Lyu and Prabhat Mishra, University of Florida, US
Abstract
Network-on-Chip (NoC) is widely employed by multi-core System-on-Chip (SoC) architectures to cater to their communication requirements. The increased usage of NoC and its distributed nature across the chip has made it a focal point of potential security attacks. Denial-of-Service (DoS) is one such attack that is caused by a malicious intellectual property (IP) core flooding the network with unnecessary packets causing significant performance degradation through NoC congestion. In this paper, we propose a lightweight and real-time DoS attack detection mechanism. Once a potential attack has been flagged, our approach is also capable of localizing the malicious IP using latency data gathered by NoC components. Experimental results demonstrate the effectiveness of our approach with timely attack detection and localization while incurring minor area and power overhead (less than 6% and 4%, respectively).

Download Paper (PDF; Only available from the DATE venue WiFi)
09:009.4.2HIGH-PERFORMANCE, ENERGY-EFFICIENT, FAULT-TOLERANT NETWORK-ON-CHIP DESIGN USING REINFORCEMENT LEARNING
Speaker:
Avinash Karanth, Ohio University, US
Authors:
Ke Wang1, Ahmed Louri1, Avinash Karanth2 and Razvan Bunescu2
1George Washington University, US; 2Ohio University, US
Abstract
Network-on-Chips (NoCs) are becoming the standard communication fabric for multi-core and system on a chip (SoC) architectures. As technology continues to scale, transistors and wires on the chip are becoming increasingly vulnerable to various fault mechanisms, especially timing errors, resulting in exacerbation of energy efficiency and performance for NoCs. Typical techniques for handling timing errors are reactive in nature, responding to the faults after their occurrence. They rely on error detection/correction techniques which have resulted in excessive power consumption and degraded performance, since the error detection/correction hardware is constantly enabled. On the other hand, indiscriminately disabling error handling hardware can induce more errors and intrusive retransmission traffic. Therefore, the challenge is to balance the trade-offs among error rate, packet retransmission, performance, and energy. In this paper, we propose a proactive fault-tolerant mechanism to optimize energy efficiency and performance with reinforcement learning (RL). First, we propose a new proactive error handling technique comprised of a dynamic scheme for enabling per-router error detection/correction hardware and an effective retransmission mechanism. Second, we propose the use of RL to train the dynamic control policy with the goals of providing increased fault-tolerance, reduced power consumption and improved performance as compared to conventional techniques. Our evaluation indicates that, on average, end-to-end packet latency is lowered by 55%, energy efficiency is improved by 64%, and retransmission caused by faults is reduced by 48% over the reactive error correction techniques.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:309.4.3LEARN-TO-SCALE: PARALLELIZING DEEP LEARNING INFERENCE ON CHIP MULTIPROCESSOR ARCHITECTURE
Speaker:
Kaiwei Zou, Institute of Computing Technology, Chinese Academy of Sciences, CN
Authors:
Kaiwei Zou, Ying Wang, Huawei Li and Xiaowei Li, Institute of Computing Technology, Chinese Academy of Sciences, CN
Abstract
Accelerating deep neural networks on resource-constrained embedded devices is becoming increasingly important for real-time applications. However, in contrast to the intensive research works on specialized neural network inference architectures, there is a lack of study on the acceleration and parallelization of deep learning inference on embedded chip-multiprocessor architectures, which are favored by many real-time applications for superb energy-efficiency and scalability. In this work, we investigate the strategies of parallelizing single-pass deep neural network inference on embedded on-chip multi-core accelerators. These methods exploit the elasticity and noise-tolerance features of deep learning algorithms to circumvent the bottleneck of on-chip inter-core data moving and reduce the communication overhead aggravated as the core number scales up. The experimental results show that the communication-aware sparsified parallelization method improves the system performance by 1.6×-1.1× and achieves 4×-1.6× better interconnects energy efficiency for different neural networks.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:459.4.4ADVANCE VIRTUAL CHANNEL RESERVATION
Speaker:
Boqian Wang, KTH Royal Institute of Technology, SE
Authors:
Boqian Wang1 and Zhonghai Lu2
1KTH Royal Institute of Technology, National University of Defense Technology, CN; 2KTH Royal Institute of Technology, CN
Abstract
We present a smart communication service called Advance Virtual Channel Reservation (AVCR) to provide a highway to packets, which can greatly reduce their contention delay in NoC. AVCR takes advantage of the fact that we can know or predict the destination of some packets ahead at the network interface (NI). Exploiting the time slack before a packet is ready, AVCR establishes an end-to-end highway from the source NI to the destination NI. This highway is built up by reserving virtual channel (VC) resources ahead and at the same time, offering priority service to those VCs in the router, which can therefore avoid highway packets' VC allocation and switch arbitration delay in NoC. Additionally, optimization schemes are developed to reduce VC overhead and increase highway utilization. We evaluate AVCR with cycle-accurate full-system simulations in GEM5 by using all benchmarks in PARSEC. Compared to the state-of-art mechanisms and the priority based mechanism, experimental results show that our mechanism can significantly reduce the target packets' transfer latency and effectively decrease the average region-of-interest (ROI) time by 22.4% (maximally by 29.4%) across PARSEC benchmarks.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


9.5 Attacking Memory and I/O Bottlenecks

Date: Thursday, March 28, 2019
Time: 08:30 - 10:00
Location / Room: Room 5

Chair:
Leonidas Kosmidis, Barcelona Supercomputing Center, ES, Contact Leonidas Kosmidis

Co-Chair:
Cristina Silvano, Politecnico di Milano, IT, Contact Cristina Silvano

Focusing on the memory hierarchy and memory and I/O bottlenecks, this session presents new techniques to better exploit the GPU cache memory by using new adaptive compression techniques, by enhancing the GPU cache utilization by exploiting non-frequently accessed blocks, and by proposing a new smart SSD-based I/O caching system. The papers in this section showcase new opportunities for alternative cache solutions.

TimeLabelPresentation Title
Authors
08:309.5.1SLC: MEMORY ACCESS GRANULARITY AWARE SELECTIVE LOSSY COMPRESSION FOR GPUS
Speaker:
Sohan Lal, Technical University of Berlin, DE
Authors:
Sohan Lal, Jan Lucas and Ben Juurlink, Technical University of Berlin, DE
Abstract
Memory compression is a promising approach for reducing memory bandwidth requirements and increasing performance, however, memory compression techniques often result in a low effective compression ratio due to large memory access granularity (MAG) exhibited by GPUs. Our analysis of the distribution of compressed blocks shows that a significant percentage of blocks are compressed to a size that is only a few bytes above a multiple of MAG, but a whole burst is fetched from memory. These few extra bytes significantly reduce the compression ratio and the performance gain that otherwise could result from a higher raw compression ratio. To increase the effective compression ratio, we propose a novel MAG aware Selective Lossy Compression (SLC) technique for GPUs. The key idea of SLC is that when lossless compression yields a compressed size with few bytes above a multiple of MAG, we approximate these extra bytes such that the compressed size is a multiple of MAG. This way, SLC mostly retains the quality of a lossless compression and occasionally trades small accuracy for higher performance. We show a speedup of up to 35% normalized to a state-of-the-art lossless compression technique with a low loss in accuracy. Furthermore, average energy consumption and energy-delay-product are reduced by 8.3% and 17.5%, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:009.5.2LOSCACHE: LEVERAGING LOCALITY SIMILARITY TO BUILD ENERGY-EFFICIENT GPU L2 CACHE
Speaker:
Jingweijia Tan, Jilin University, CN
Authors:
Jingweijia Tan1, Kaige Yan2, Shuaiwen Leon Song3 and Xin Fu4
1Jilin University, CN; 2College of Communication Engineering, Jilin University, CN; 3Pacific Northwest National Laboratory, US; 4University of Houston, US
Abstract
This paper presents a novel energy-efficient cache design for massively parallel, throughput-oriented architectures like GPUs. Unlike L1 data cache on modern GPUs, L2 cache shared by all the streaming multiprocessors is not the primary performance bottleneck but it does consume a large amount of chip energy. We observe that L2 cache is significantly underutilized by spending 95.6% of the time storing useless data. If such "dead time" on L2 is identified and reduced, L2's energy efficiency can be drastically improved. Fortunately, we discover that the SIMT programming model of GPUs provides a unique feature among threads: instruction-level data locality similarity, which can be used to accurately predict the data re-reference counts at L2 cache block level. We propose a simple design that leverages this Locality Similarity to build an energy-efficient GPU L2 Cache, named LoSCache. Specifically, LoSCache uses the data locality information from a small group of CTAs to dynamically predict the L2-level data re-reference counts of the remaining CTAs. After that, specific L2 cache lines can be powered off if they are predicted to be "dead" after certain accesses. Experimental results on a wide range of applications demonstrate that our proposed design can significantly reduce the L2 cache energy by an average of 64% with only 0.5% performance loss.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:309.5.3LBICA: A LOAD BALANCER FOR I/O CACHE ARCHITECTURES
Speaker:
Reza Salkhordeh, Sharif University of Technology, IR
Authors:
Saba Ahmadian, Reza Salkhordeh and Hossein Asadi, Sharif University of Technology, IR
Abstract
In recent years, enterprise Solid-State Drives (SSDs) are used in the caching layer of high-performance servers to close the growing performance gap between processing units and storage subsystem. SSD-based I/O caching is typically not effective in workloads with burst accesses in which the caching layer itself becomes the performance bottleneck because of the large number of accesses. Existing I/O cache architectures mainly focus on maximizing the cache hit ratio while they neglect the average queue time of accesses. Previous studies suggested bypassing the cache when burst accesses are identified. These schemes, however, are not applicable to a general cache configuration and also result in significant performance degradation on burst accesses. In this paper, we propose a novel I/O cache load balancing scheme (LBICA) with adaptive write policy management to prevent the I/O cache from becoming performance bottleneck in burst accesses. Our proposal, unlike previous schemes, which disable the I/O cache or bypass the requests into the disk subsystem in burst accesses, selectively reduces the number of waiting accesses in the SSD queue and balances the load between the I/O cache and the disk subsystem while providing the maximum performance. The proposed scheme characterizes the workload based on the type of in-queue requests and assigns an effective cache write policy. We aim to bypass the accesses which 1) are served faster by the disk subsystem or 2) cannot be merged with other accesses in the I/O cache queue. Doing so, the selected requests are responded by the disk layer, preventing from overloading the I/O cache. Our evaluations on a physical system shows that LBICA reduces the load on the I/O cache by 48% and improves the performance of burst workloads by 30% compared to the latest state-of-the-art load balancing scheme.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP4-14, 356CARS: A MULTI-LAYER CONFLICT-AWARE REQUEST SCHEDULER FOR NVME SSDS
Speaker:
Tianming Yang, Huanghuai University, CN
Authors:
Tianming Yang1, Ping Huang2, Weiying Zhang3, Haitao Wu1 and Longxin Lin4
1Huanghuai University, CN; 2Temple University, US; 3Northeastern University, CN; 4Jinan University, CN
Abstract
NVMe SSDs are nowadays widely deployed in various computing platforms due to its high performance and low power consumption, especially in data centers to support modern latency-sensitive applications. NVMe SSDs improve on SATA and SAS interfaced SSDs by providing a large number of device I/O queues at the host side and applications can directly manage the queues to concurrently issue requests to the device. However, the currently deployed request scheduling approach is oblivious to the states of the various device internal components and thus may lead to suboptimal decisions due to various resource contentions at different layers inside the SSD device. In this work, we propose a Conflict Aware Request Scheduling policy named CARS for NVMe SSDs to maximally leverage the rich parallelism available in modern NVMe SSDs. The central idea is to check possible conflicts that a fetched request might be associated with before dispatching that request. If there exists a conflict, it refrains from issuing the request and move to check a request in the next submission queue. In doing so, our scheduler can evenly distribute the requests among the parallel idle components in the flash chips, improving performance. Our evaluations have shown that our scheduler can reduce the slowdown metric by up to 46% relative to the de facto round-robin scheduling policy for a variety of patterned workloads.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP4-15, 878QUEUE BASED MEMORY MANAGEMENT UNIT FOR HETEROGENEOUS MPSOCS
Speaker:
Robert Wittig, Technische Universität Dresden, DE
Authors:
Robert Wittig, Mattis Hasler, Emil Matus and Gerhard Fettweis, Technische Universität Dresden, DE
Abstract
Sharing tightly coupled memory in a multi-processor system-on-chip is a promising approach to improve the programming flexibility as well as to ease the constraints imposed by area and power. However, it poses a challenge in terms of access latency. In this paper, we present a queue based memory management unit which combines the low latency access of shared tightly coupled memory with the flexibility of a traditional memory management unit. Our passive conflict detection approach significantly reduces the critical path compared to previously proposed methods while preserving the flexibility associated with dynamic memory allocation and heterogeneous data widths.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


9.6 Reliability of highly-parallel architectures: an industrial perspective

Date: Thursday, March 28, 2019
Time: 08:30 - 10:00
Location / Room: Room 6

Chair:
Doris Keitel-Schulz, Infineon Technologies, DE, Contact Doris Keitel-Schulz

Co-Chair:
Fabien Clermidy, CEA, FR, Contact Fabien Clermidy

This session addresses the issues of proving, verifying and enhancing highly-parallel designs on four different application domains ranging from Solid-State-Drive to 5G

TimeLabelPresentation Title
Authors
08:309.6.1AURIX TC277 MULTICORE CONTENTION MODEL INTEGRATION FOR AUTOMOTIVE APPLICATIONS
Speaker:
Jaume Abella, Barcelona Supercomputing Center (BSC), ES
Authors:
Enrico Mezzetti1, Luca Barbina2, Jaume Abella1, Stefania Botta2 and Francisco Cazorla1
1Barcelona Supercomputing Center, ES; 2Magneti Marelli S.p.A., IT
Abstract
Embedded systems industry needs reliable and tight worst-case execution time (WCET) estimates for critical applications running on multicores, as a prerequisite to their adoption. While industry already uses reliable tools for single-core WCET estimation and several multicore contention models (MCMs) have been proposed, their combination have not been shown to be fully compatible with the automotive industrial practice yet. This paper reduces this gap by presenting a framework for the integration of MCMs into industrial WCET estimation practice. We illustrate such integration for a Magneti Marelli powertrain control unit on an Infineon AURIX TC277 multicore platform.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:009.6.2SEAMLESS SOC VERIFICATION USING VIRTUAL PLATFORMS: AN INDUSTRIAL CASE STUDY
Speaker:
Kyungsu Kang, Samsung Electronics, KR
Authors:
Kyungsu Kang, Sangho Park, Byeongwook Bae, Jungyun Choi, SungGil Lee, Byunghoon Lee and Jong-Bae Lee, Samsung, KR
Abstract
As SoC (System-on-Chip) complexity continues to increase, function/performance verification is required in the middle of design process (before tape-out) to reduce the possible risks ranging from over-design to non-compliance with the design specifications. In this paper, we propose a seamless SoC verification. The proposed methodology exploits a modern virtual platform (VP) technology which can combine high-level C++ firmware, timing-accurate SystemC models, and RTL (register-transfer level) designs. Thus, the full-chip level verification can be done at any design stages in the whole development process. With experimental results, this paper shows the benefits and lessons of using VPs.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:309.6.3MULTICORE EARLY DESIGN STAGE GUARANTEED PERFORMANCE ESTIMATES FOR THE SPACE DOMAIN
Speaker:
Mikel Fernandez, Barcelona Supercomputing Center, ES
Authors:
Mikel Fernandez, Gabriel Fernandez, Jaume Abella and Francisco Cazorla, Barcelona Supercomputing Center, ES
Abstract
The ability to produce early guaranteed performance (worst-case execution time) estimations for multicores, i.e. before software from different providers gets integrated onto the same critical system, is pivotal. This helps reducing lately-detected costly-to-handle timing violations. An existing methodology creates 'copy' (surrogate) applications from the execution in isolation of each target application. Surrogate applications can be used to upperbound multicore contention delay, and hence WCET estimates in multicores. However, this methodology has only been shown to work on a simulation environment. In this paper we show the work we have done to adapt this technology to a real multicore processor for the space domain.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:459.6.4POLAR CODE DECODER FRAMEWORK
Speaker:
Timo Lehnigk-Emden, Creonic GmbH, DE
Authors:
Timo Lehnigk-Emden1, Matthias Alles1, Claus Kestel2 and Norbert Wehn2
1Creonic GmbH, DE; 2University of Kaiserslautern, DE
Abstract
Polar codes gained large interest in the last years since they are the first channel codes that are proven to achieve channel capacity. Due to this property, Polar codes were recently adopted for the 5G standard. We present an industrial framework for the generation of Polar code decoders for highest data throughput. The framework automatically generates VHDL models ready for synthesis, placement and routing and corresponding simulation models to assess the communications performance. This framework enables Polar code decoder IP providers to give fast feedback to customers on communications and implementation performance. We demonstrate that this framework outperforms existing manually optimized decoders especially in terms of energy efficiency.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


9.7 Runtime Predictability

Date: Thursday, March 28, 2019
Time: 08:30 - 10:00
Location / Room: Room 7

Chair:
Rolf Ernst, TU Braunschweig, DE, Contact Rolf Ernst

Co-Chair:
Gerhard Fohler, University of Kaiserslautern, DE, Contact Gerhard Fohler

This session includes papers that address the runtime predictability at level of HW and SW. Potential applications are in the area of autonomous systems.

TimeLabelPresentation Title
Authors
08:309.7.1INCREASING ACCURACY OF TIMING MODELS: FROM CPA TO CPA+
Speaker:
Leonie Köhler, TU Braunschweig, DE
Authors:
Leonie Köhler1, Borislav Nikolic1, Marc Boyer2 and Rolf Ernst1
1Technische Universität Braunschweig, DE; 2ONERA, FR
Abstract
Formal analysis methods of embedded systems provide safe, but unfortunately often pessimistic bounds on response times. An important source of pessimism is the common approach to characterize service request either by the amount of data or the number of events to be processed. Several works, e.g. [1]-[4], have demonstrated that a dual model - which includes information on both data and events - is more accurate, especially for more complex scheduling problems. In this paper, we enrich Compositional Performance Analysis (CPA) by a new component interface which, as we show, is consistent with the generic dual model proposed in [3]. Furthermore, we discuss how composition of components should be realized and how the new information should be integrated into the analysis technique. The improved CPA is called CPA+, and we identify different types of scenarios where CPA+ is particularly beneficial.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:009.7.2SCRATCHPAD MEMORIES WITH OWNERSHIP
Speaker:
Martin Schoeberl, Technical Uniersity of Denmark, DK
Authors:
Martin Schoeberl, Tórur Biskopstø Strøm, Oktay Baris and Jens Sparsø, Technical University of Denmark, DK
Abstract
A multicore processor for real-time systems needs a time-predictable way to communicate data between different threads running on different cores. Standard multicore processors support data sharing with shared main memory backed up by caches and cache coherence protocol. This sharing solution is hardly time predictable nor does it scale to more than a few cores. This paper presents a shared scratchpad memory (SPM) for time-predictable communication between cores. The base architecture uses time-division multiplexing for the arbitration of the access to the shared SPM. This allows that the timing of programs executing on different cores is completely independent from programs executing on other cores. We extend this architecture by the notion of ownership. A core can own the SPM. Having exclusive access to the SPM reduces the access time to a single clock cycle. The ownership of the SPM can then be transferred to a different core, implementing low latency communication of bulk data. As an extension, we propose to organize this memory as a pool of SPMs that can be owned by different cores and transferred as needed. We evaluate the proposed architecture within the T-CREST multicore architecture.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:309.7.3A CONTAINER-BASED DOS ATTACK-RESILIENT CONTROL FRAMEWORK FOR REAL-TIME UAV SYSTEMS
Speaker:
Jiyang Chen, University of Illinois at Urbana Champaign, US
Authors:
Jiyang Chen1, Zhiwei Feng2, Jen-Yang Wen1, Bo Liu3 and Lui Sha1
1University of Illinois at Urbana-Champaign, US; 2Northeastern University, CN; 3NVIDIA, US
Abstract
Unmanned aerial vehicles (UAVs) are expanding fast and the expectation for their capabilities keeps growing. Defending malicious attacks against real-time UAVs has become one of the challenges that need to be urgently solved. Among all types of attack, denial-of-service attack (DoS attack) can exhaust system resources and cause important tasks to miss deadlines. DoS attack is also easy to implement but hard to counter. In this paper, we present a software framework that offers DoS attack-resilient control for real-time UAV systems using containers: ContainerDrone. The framework provides defense mechanisms for three critical system resources: CPU, memory, and communication channel. We restrict attacker's access to CPU core set and utilization. Memory bandwidth throttling limits attacker's memory usage. By simulating sensors and drivers in the container, a security monitor constantly checks DoS attack against communication channel. Upon the detection of a security rule violation, the framework switches to the safety controller to prevent potential damages. We implemented a prototype drone with commercial-off-shelf (COTS) hardware and open-source software. Our experiment results demonstrated the effectiveness of the proposed framework against various types of DoS attacks.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:459.7.4AN EXACT SCHEDULABILITY TEST FOR NON-PREEMPTIVE SELF-SUSPENDING REAL-TIME TASKS
Speaker:
Mitra Nasri, Delft University of Technology, NL
Authors:
Beyazit Yalcinkaya1, Mitra Nasri1 and Björn Brandenburg2
1Max Planck Institute for Software Systems, DE; 2MPI-SWS, DE
Abstract
Exact schedulability analysis of limited-preemptive (or non-preemptive) real-time workloads with variable execution costs and release jitter is a notoriously difficult challenge due to the scheduling anomalies inherent in non-preemptive execution. Furthermore, the presence of self-suspending tasks is well-understood to add tremendous complications to an already difficult problem. By mapping the schedulability problem to the reachability problem in timed automata (TA), this paper provides the first exact schedulability test for this challenging model. Specifically, using TA extensions available in UPPAAL, this paper presents an exact schedulability test for sets of periodic and sporadic self-suspending tasks with fixed preemption points that are scheduled upon a multiprocessor under a global fixed-priority scheduling policy. To the best of our knowledge, this is the first exact schedulability test for non- and limited-preemptive self-suspending tasks (for both uniprocessor and multiprocessor systems), and thus also the first exact schedulability test for the special case of global non-preemptive fixed-priority scheduling (for either periodic or sporadic tasks). Additionally, the paper highlights some subtle pitfalls and limitations in existing TA-based schedulability tests for non-preemptive workloads.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


9.8 Special Session: IBM's Qiskit Tool Chain: Developing for and Working with Real Quantum Computers

Date: Thursday, March 28, 2019
Time: 08:30 - 10:00
Location / Room: Exh. Theatre

Organiser:
Robert Wille, Johannes Kepler University Linz, AT, Contact Robert Wille

Chair:
Robert Wille, Johannes Kepler University Linz, AT, Contact Robert Wille

Quantum computers promise substantial speedups over conventional computers for many practical relevant applications such as quantum chemistry, optimization, machine learning, cryptography, quantum simulation, systems of linear equations, and many more. While considered "dreams of the future" for a long time, recent years have shown impressive accomplishments - leading to the first real quantum computers which can be utilized by everyone. A leading force within this development is IBM Research which launched the IBM Q Experience - the first industrial initiative to build universal quantum computers and make them accessible to a broad audience through a cloud access. In the meantime, a worldwide network of Fortune 500 companies, academic institutions, and startups work within this initiative and collaborate to advance quantum computing. This special session aims to foster this potential by introducing Qiskit to the EDA community as well as showcasing success stories on how to develop new methods for as well as how to work with the tool - eventually allowing for efficiently and robustly executing algorithms on a real quantum computer. To this end, the special session covers all sides: IBM's own perspective on Qiskit and the IBM Q Experience, the developer view on how to develop new methods for Qiskit (sometimes outperforming IBM's own solutions using EDA expertise), as well as the user-view on how Qiskit and its extensions can be utilized to actually work with quantum computers.

TimeLabelPresentation Title
Authors
08:309.8.1QISKIT: AN OVERVIEW OF THE OPEN-SOURCE FRAMEWORK FOR QUANTUM COMPUTING
Author:
Yehuda Naveh, IBM Research, US
Abstract
The first talk will provide an overview on Qiskit, the leading software library and framework for quantum computing. Qiskit includes libraries to support front-end description of quantum algorithms and applications, back-ends which include interfaces to actual quantum computers (IBM Q), simulators for simulating quantum programs on classical computers - with and without simulation of quantum noise, and all other software modules and tools needed in between. Qiskit forms a lively center of interest for developers and users. Originated and led by IBM, it now has a highly collaborative, worldwide team of enthusiasts, contributing code and engaging in discussions on any and all aspects of quantum computing.
09:009.8.2DEVELOPING FOR QISKIT: INTRODUCING EDA METHODS INTO THE TOOLKIT
Author:
Robert Wille, Johannes Kepler University Linz, AT
Abstract
Although Qiskit is a powerful tool, it still offers much room for improvement. In fact, many problems to be addressed by Qiskit are solved in a rather straight-forward fashion thus far. Here, the EDA community can introduce its expertise. The talk will discuss how developers can extend Qiskit e.g. by clever data-structures and sophisticated search methods which are taken for granted in the EDA community, but have not fully be exploited within Qiskit yet. The talk shows how the power of EDA actually can be applied to clearly outperform IBM's own solutions. As one example, it is demonstrated how EDA helped winning the Qiskit Developer Challenge organized in summer 2018 by IBM.
09:309.8.3USING QISKIT: NISQ-ERA COMPILATION FOR QISKIT
Author:
Rod Van Meter, Keio University, JP
Abstract
The talk will give an overview of user activities with a particular focus on compilation for Qiskit. This task has more in common with place-and-route design for computer hardware than it does with compiling programs into instructions for classical computers. In the area of so-called Noisy Intermediate-Scale Quantum (NISQ) technology, heterogeneity has to be taken into account, i.e. individual qubits and couplers between them vary dramatically in fidelity or quality. The talk will cover software to optimize the placement of program variables into qubit storage locations on the processor, and to maximize the fidelity of operations when moving the variables around inside the processor. Unfortunately, the problem is inherently NP hard such that compiling even for 20 qubits is challenging. Hence, heuristics such as beam search are applied to the problem.
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


IP4 Interactive Presentations

Date: Thursday, March 28, 2019
Time: 10:00 - 10:30
Location / Room: Poster Area

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

LabelPresentation Title
Authors
IP4-1AN EFFICIENT MAPPING APPROACH TO LARGE-SCALE DNNS ON MULTI-FPGA ARCHITECTURES
Speaker:
Jiaxi Zhang, Peking University, CN
Authors:
Wentai Zhang1, Jiaxi Zhang1, Minghua Shen2, Guojie Luo1 and Nong Xiao3
1Peking University, CN; 2Sun Yat-sen University, CN; 3Sun Yat-Sen University, CN
Abstract
FPGAs are very attractive to accelerate the deep neural networks (DNNs). While single FPGA can provide good performance for small-scale DNNs, support for large-scale DNNs is limited due to higher resource demand. In this paper, we propose an efficient mapping approach for accelerating large-scale DNNs on asymmetric multi-FPGA architectures. In this approach, the neural network mapping can be formulated as a resource allocation problem. We design a dynamic programming-based partitioning to solve this problem optimally. Experimental results using the large-scale ResNet-152 demonstrate that our approach deploys sixteen FPGAs to provide an advantage of 16.4x GOPS over the state-of-the-art work.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-2A WRITE-EFFICIENT CACHE ALGORITHM BASED ON MACROSCOPIC TREND FOR NVM-BASED READ CACHE
Speaker:
Ning Bao, Renmin University of China, CN
Authors:
Ning Bao1, Yunpeng Chai1 and Xiao Qin2
1Renmin University of China, CN; 2Auburn University, US
Abstract
Compared with traditional storage technologies, non-volatile memory (NVM) techniques have excellent I/O performances, but high costs and limited write endurance (e.g., NAND and PCM) or high energy consumption of writing (e.g., STT-MRAM). As a result, the storage systems prefer to utilize NVM devices as read caches for performance boost. Unlike write caches, read caches have greater potential of write reduction because their writes are only triggered by cache updates. However, traditional cache algorithms like LRU and LFU have to update cached blocks frequently because it is difficult for them to predict data popularity in the long future. Although some new algorithms like SieveStore reduce cache write pressure, they still rely on those traditional cache schemes for data popularity prediction. Due to the bad long-term data popularity prediction effect, these new cache algorithms lead to a significant and unnecessary decrease of cache hit ratios. In this paper, we propose a new Macroscopic Trend (MT) cache replacement algorithm to reduce cache updates effectively and maintain high cache hit ratios. This algorithm discovers long-term hot data effectively by observing the macroscopic trend of data blocks. We have conducted extensive experiments driven by a series of real-world traces, and the results indicate that compared with LRU, the MT cache algorithm can achieve 15.28 times longer lifetime or less energy consumption of NVM caches with a similar hit ratio.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-3SRAM DESIGN EXPLORATION WITH INTEGRATED APPLICATION-AWARE AGING ANALYSIS
Speaker:
Alexandra Listl, TUM, DE
Authors:
Alexandra Listl1, Daniel Mueller-Gritschneder2, Sani Nassif3 and Ulf Schlichtmann2
1Chair of Electronic Design Automation, DE; 2TUM, DE; 3Radyalis, US
Abstract
On-Chip SRAMs are an integral part of safetycritical System-on-Chips. At the same time however, they are also most susceptible to reliability threats such as Bias Temperature Instability (BTI), originating from the continuous trend of technology shrinking. BTI leads to a significant performance degradation, especially in the Sense Amplifiers (SAs) of SRAMs, where failures are fatal, since the data of a whole column is destroyed. As BTI strongly depends on the workload of an application, the aging rates of SAs in a memory array differ significantly and the incorporation of workload information into aging simulations is vital. Especially in safety-critical systems precise estimation of application specific reliability requirements to predict the memory lifetime is a key concern. In this paper we present a workload-aware aging analysis for On-Chip SRAMs that incorporates the workload of real applications executed on a processor. According to this workload, we predict the performance degradation of the SAs in the memory. We integrate this aging analysis into an aging-aware SRAM design exploration framework that generates and characterizes memories of different array granularity to select the most reliable memory architecture for the intended application. We show that this technique can mitigate SA degradation significantly depending on the environmental conditions and the application workload.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-4FROM MULTI-LEVEL TO ABSTRACT-BASED SIMULATION OF A PRODUCTION LINE
Speaker:
Stefano Centomo, University of Verona, IT
Authors:
Stefano Centomo, Enrico Fraccaroli and Marco Panato, University of Verona, IT
Abstract
This paper proposes two approaches for the integration of cyber-physical systems in a production line in order to obtain predictions concerning the actual production, core operation in the context of Industry 4.0. The first approach relies on the Multi-Level paradigm where multiple descriptions of the same CPS are modeled with different levels of details. Then, the models are switched at runtime. The second approach relies on abstraction techniques of CPS maintaining a certain levels of details. The two approaches are validated and compared with a real use case scenario to identify the most effective simulation strategy.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-5ACCURATE DYNAMIC MODELLING OF HYDRAULIC SERVOMECHANISMS
Speaker:
Manuel Pencelli, Yanmar R&D Europe S.r.l., IT
Authors:
Manuel Pencelli1, Renzo Villa2, Alfredo Argiolas1, Gianni Ferretti2, Marta Niccolini1, Matteo Ragaglia1, Paolo Rocco2 and Andrea Maria Zanchettin2
1YANMAR R&D EUROPE S.R.L, IT; 2Politecnico di Milano, IT
Abstract
In this paper, the process of modelling and identification of a hydraulic actuator is discussed. In this framework a simple model based on the classical theory have been derived and a first experimental campaign has been performed on a test bench. These tests highlighted the presence of unmodelled phenomena (dead-zone, hysteresis, etc.), therefore a second and more extensive set of experiments has been done. With the acquired knowledge a new improved model is presented and its parameter identified. Finally several test has been performed in order to experimentally validate the model.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-6PLANNING WITH REAL-TIME COLLISION AVOIDANCE FOR COOPERATING AGENTS UNDER RIGID BODY CONSTRAINT
Speaker:
Federico Vesentini, University of Verona, IT
Authors:
Nicola Piccinelli, Federico Vesentini and Riccardo Muradore, University of Verona, IT
Abstract
In automated warehouses, path planning is a crucial topic to improve automation and efficiency. This kind of planning is usually computed off-line knowing the planimetry of the warehouse and the starting and target points of each agent. However, this global approach is not able to manage unexpected static/dynamic obstacles and other agents moving in the same area. For this reason in multi-robot systems global planners are usually integrated with local collision avoidance algorithms. In this paper we use the Voronoi diagram as global planner and the Velocity Obstacle (VO) method as collision avoidance algorithm. The goal of this paper is to extend such hybrid motion planner by enforcing mechanical constraints between agents in order to execute a task that cannot be performed by a single agent. We will focus on the cooperative task of carrying a payload, such as a bar. Two agents are constrained to move at the end points of the bar. We will improve the original algorithms by taking into account dynamically the constrained motion both at the global and at the collision avoidance level.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-7THE CASE FOR EXPLOITING UNDERUTILIZED RESOURCES IN HETEROGENEOUS MOBILE ARCHITECTURES
Speaker:
Nikil Dutt, University of California, Irvine, US
Authors:
Chenying Hsieh, Nikil Dutt and Ardalan Amiri Sani, UC Irvine, US
Abstract
Heterogeneous architectures are ubiquitous in mobile plat-forms, with mobile SoCs typically integrating multiple processors along with accelerators such as GPUs (for data-parallel kernels) and DSPs (for signal processing kernels). This strict partitioning of application execution on heterogeneous compute resources often results in underutilization of resources such as DSPs. We present a case study executing popular data-parallel workloads such as convolutional neural networks (CNNs), computer vision application and graphics kernels on mobile devices, and show that both performance and energy consumption of mobile platforms can be improved by synergistically deploying these underutilized DSPs. Our experiments on a mobile Snapdragon 835 platform under both single and multiple application scenarios executing CNNs and graphics workloads, demonstrates average performance and energy improvements of 15-46% and 18-80% respectively by synergistically deploying all available compute resources, especially the underutilized DSP.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-8ONLINE RARE CATEGORY DETECTION FOR EDGE COMPUTING
Speaker:
Yufei Cui, City University of Hong Kong, HK
Authors:
Yufei Cui1, Qiao Li1, Sarana Nutanong2 and Chun Jason Xue1
1City University of Hong Kong, HK; 2Vidyasirimedhi Institute of Science and Technology, TH
Abstract
Abstract — Identifying rare categories is an important data management problem in many application fields including video surveillance, ecological environment monitoring and precision medicine. Previous solutions in literature require all data instances to be first delivered to the server. Then, the rare categories identification algorithms are executed on the pool of data to find informative instances for human annotators to label. This incurs large bandwidth consumption and high latency. To deal with the problems, we propose a light-weight rare categories identification framework. At the sensor side, the designed online algorithm filters less informative data instances from the data stream and only sends the informative ones to human annotators. After labeling, the server only sends labels of the corresponding data instances in response. The sensor-side algorithm is extended to enable cooperation between embedded devices for the cases that data is collected in a distributed manner. Experiments are conducted to show our framework dramatically outperforms the baseline. The network traffic is reduced by 75% on average.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-9RAGRA: LEVERAGING MONOLITHIC 3D RERAM FOR MASSIVELY-PARALLEL GRAPH PROCESSING
Speaker:
Yu Huang, Huazhong University of Science and Technology, CN
Authors:
Yu Huang, Long Zheng, Xiaofei Liao, Hai Jin, Pengcheng Yao and Chuangyi Gui, Huazhong University of Science and Technology, CN
Abstract
With the maturity of monolithic 3D integration, 3D ReRAM provides impressive storage-density and computational-parallelism with great opportunities for parallel-graph processing acceleration. In this paper, we present RAGra, a 3D ReRAM based graph processing accelerator, which has two significant technical highlights. First, monolithic 3D ReRAM usually has the complexly-intertwined feature with shared input wordlines and output bitlines for different layers. We propose a novel mapping scheme, which can guide to apply graph algorithms into 3D ReRAM seamlessly and correctly for exposing the massive parallelism of 3D ReRAM. Second, consider the sparsity of real-world graphs, we further propose a row- and column-mixed execution model, which can filter invalid subgraphs for exploiting the massive parallelism of 3D ReRAM. Our evaluation on 8-layer stacked ReRAM shows that RAGra outperforms state-of-the-art planar (2D) ReRAM-based graph accelerator GraphR by 6.18x performance improvement and 2.21x energy saving, on average. In particular, RAGra significantly outperforms GridGraph (a typical CPU-based graph system) by up to 293.12x speedup.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-10ACCURATE COST ESTIMATION OF MEMORY SYSTEMS INSPIRED BY MACHINE LEARNING FOR COMPUTER VISION
Speaker:
Lorenzo Servadei, Infineon Technologies, DE
Authors:
Lorenzo Servadei1, Elena Zennaro1, Keerthikumara Devarajegowda1, Martin Manzinger1, Wolfgang Ecker1 and Robert Wille2
1Infineon AG, DE; 2Johannes Kepler University Linz, AT
Abstract
Hardware/software co-designs are usually defined at high levels of abstractions at the beginning of the design process in order to allow plenty of options how to eventually realize a system. This allows for design exploration which in turn heavily relies on knowing the costs of different design configurations (with respect to hardware usage as well as firmware metrics). To this end, methods for cost estimation are frequently applied in industrial practice. However, currently used methods for cost estimation oversimplify the problem and ignore important features - leading to estimates which are far off from the real values. In this work, we address this problem for memory systems. To this end, we borrow and re-adapt solutions based on Machine Learning (ML) which have been found suitable for problems from the domain of Computer Vision (CV) - in particular age determination of persons depicted in images. We show that, for an ML approach, age determination from the CV domain is actually very similar to cost estimation of a memory system.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-11PRACTICAL CAUSALITY HANDLING FOR SYNCHRONOUS LANGUAGES
Speaker:
Steven Smyth, Kiel University, DE
Authors:
Steven Smyth, Alexander Schulz-Rosengarten and Reinhard von Hanxleden, Dept. of Computer Science, Kiel University, DE
Abstract
A key to the synchronous principle of reconciling concurrency with determinism is to establish at compile time that a program is causal, which means that there exists a schedule that obeys the rules put down by the language. In practice it can be rather cumbersome for the developer to cure causality problems. To facilitate causality handling, we propose, first, to enrich the scheduling regime of the language to also consider explicit scheduling directives that can be used by either the modeler or model-to-model transformations. Secondly, we propose to enhance programming environments with dedicated causality views to guide the developer in finding causality issues. Our proposals should be applicable for synchronous languages; we here illustrate them for the SCCharts language and its open source development platform KIELER.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-12APPLICATION PERFORMANCE PREDICTION AND OPTIMIZATION UNDER CACHE ALLOCATION TECHNOLOGY
Speaker:
Yeseong Kim, UCSD, US
Authors:
Yeseong Kim1, Ankit More2, Emily Shriver2 and Tajana Rosing1
1University of California San Diego, US; 2Intel, US
Abstract
Many applications running on high-performance computing systems share limited resources such as the last-level cache, often resulting in lower performance. Intel recently introduced a new control mechanism, called cache allocation technology (CAT), which controls the cache size used by each application. To intelligently utilize this technology for automated management, it is essential to accurately identify application performance behavior for different cache allocation scenarios. In this work, we show a novel approach which automatically builds a prediction model for application performance changes with CAT. We profile the workload characteristics based on Intel Top-down Microarchitecture Analysis Method (TMAM), and train the model using machine learning. The model predicts instructions per cycle (IPC) across available cache sizes allocated for the applications. We also design a dynamic cache management technique which utilizes the prediction model and intelligently partitions the cache resource to improve application throughput. We implemented and evaluated the proposed framework in Intel PMU profiling tool running on Xeon Platinum 8186 Skylake processor. In our evaluation, we show that the proposed model accurately predicts the IPC changes of applications with 4.7% error on average for different cache allocation scenarios. Our predictive online cache managements achieves improvements on application performance of up to 25% as compared to a prediction-agnostic policy.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-13GENERALIZED MATRIX FACTORIZATION TECHNIQUES FOR APPROXIMATE LOGIC SYNTHESIS
Speaker:
Sherief Reda, Brown University, US
Authors:
Soheil Hashemi and Sherief Reda, Brown University, US
Abstract
Approximate computing is an emerging computing paradigm, where computing accuracy is relaxed for improvements in hardware metrics, such as design area and power profile. In circuit design, a major challenge is to synthesize approximate circuits automatically from input exact circuits. In this work, we extend our previous work, BLASYS, for approximate logic synthesis based on matrix factorization, where an arbitrary input circuit can be approximated in a controlled fashion. Whereas our previous approach uses a semi-ring algebra for factorization, this work generalizes matrix-based circuit factorization to include both semi-ring and field algebra implementations. We also propose a new method for truth table folding to improve the factorization quality. These new approaches significantly widen the design space of possible approximate circuits, effectively offering improved trade-offs in terms of quality, area and power consumption. We evaluate our methodology on a number of representative circuits showcasing the benefits of our proposed methodology for approximate logic synthesis.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-14CARS: A MULTI-LAYER CONFLICT-AWARE REQUEST SCHEDULER FOR NVME SSDS
Speaker:
Tianming Yang, Huanghuai University, CN
Authors:
Tianming Yang1, Ping Huang2, Weiying Zhang3, Haitao Wu1 and Longxin Lin4
1Huanghuai University, CN; 2Temple University, US; 3Northeastern University, CN; 4Jinan University, CN
Abstract
NVMe SSDs are nowadays widely deployed in various computing platforms due to its high performance and low power consumption, especially in data centers to support modern latency-sensitive applications. NVMe SSDs improve on SATA and SAS interfaced SSDs by providing a large number of device I/O queues at the host side and applications can directly manage the queues to concurrently issue requests to the device. However, the currently deployed request scheduling approach is oblivious to the states of the various device internal components and thus may lead to suboptimal decisions due to various resource contentions at different layers inside the SSD device. In this work, we propose a Conflict Aware Request Scheduling policy named CARS for NVMe SSDs to maximally leverage the rich parallelism available in modern NVMe SSDs. The central idea is to check possible conflicts that a fetched request might be associated with before dispatching that request. If there exists a conflict, it refrains from issuing the request and move to check a request in the next submission queue. In doing so, our scheduler can evenly distribute the requests among the parallel idle components in the flash chips, improving performance. Our evaluations have shown that our scheduler can reduce the slowdown metric by up to 46% relative to the de facto round-robin scheduling policy for a variety of patterned workloads.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-15QUEUE BASED MEMORY MANAGEMENT UNIT FOR HETEROGENEOUS MPSOCS
Speaker:
Robert Wittig, Technische Universität Dresden, DE
Authors:
Robert Wittig, Mattis Hasler, Emil Matus and Gerhard Fettweis, Technische Universität Dresden, DE
Abstract
Sharing tightly coupled memory in a multi-processor system-on-chip is a promising approach to improve the programming flexibility as well as to ease the constraints imposed by area and power. However, it poses a challenge in terms of access latency. In this paper, we present a queue based memory management unit which combines the low latency access of shared tightly coupled memory with the flexibility of a traditional memory management unit. Our passive conflict detection approach significantly reduces the critical path compared to previously proposed methods while preserving the flexibility associated with dynamic memory allocation and heterogeneous data widths.

Download Paper (PDF; Only available from the DATE venue WiFi)

10.1 Special Day on "Model-Based Design of Intelligent Systems" Session: Hot topic: Model-Based Machine Learning

Date: Thursday, March 28, 2019
Time: 11:00 - 12:30
Location / Room: Room 1

Chair:
Andreas Gerstlauer, University of Texas, Austin, US, Contact Andreas Gerstlauer

Co-Chair:
Patricia Derler, National Instruments, US, Contact Patricia Derler

TimeLabelPresentation Title
Authors
11:0010.1.1EMBEDDED SYSTEMS' AUTOMATION FOLLOWING OMG'S MODEL DRIVEN ARCHITECTURE VISION
Speaker:
Wolfgang Ecker, Infineon Technologies, DE
Authors:
Wolfgang Ecker1, Keerthikumara Devarajegowda2, Michael Werner3, Zhao Han3 and Lorenzo Servadei4
1Infineon Technologies AG / TU Munich, DE; 2Infineon Technologies AG/TU Kaiserslautern, DE; 3Infineon Technologies AG / TU Munich, DE; 4Infineon Technologies AG/Johannes Kepler University Linz, DE
Abstract
This paper presents an automated process for end- to-end embedded system design following OMG's model driven architecture (MDA) vision. It tackles a major challenge in automation: bridging the large semantic gap between the specifi- cation and the target code. The shown MDA adaption proposes an uniform and systematic way by splitting the translation process into multiple layers and introducing design platform independent and implementation independent views. In our adaption of MDA, we start with a formalized specification and we end with code (view) generation. The code is then compiled (software) or synthesized (hardware) and finally as- sembled to the embedded system design. We split the translation process in Model-of-Thing (MoT), Model-of-Design (MoD) and Model-of-View (MoV) layers. MoTs represent the formalized specification, MoDs contain the implementation architecture in a view independent way, and MoVs are implementation dependent and view dependent, i.e., specific details in target language. MoT is translated to MoD, MoD is translated to MoV and MoV is finally used to generate views. The translation between the Models is based on templates, that reflect design and coding blueprints. The final step of the view generation is itself part of generation. The Model MoV and the unparse method are generated from a view language description. The approach has been successfully adapted for generating digital hardware (RTL), properties for verification (SVA), and snippets of firmware that have been successfully synthesized to an FPGA.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.1.2FORMAL COMPUTATION MODELS IN NEUROMORPHIC COMPUTING: CHALLENGES AND OPPORTUNITIES
Speaker and Author:
Orlando Moreira, GrAI Matter Labs, NL
Abstract
Neuromorphic computing is expected to enable ultra-low power solutions for inference-based computational functions, especially for real-time event-based systems. At the heart of the approach is the concept of Spiking Neural Networks (SNNs). SNNs can be thought of as a family of models of computation, each representing a different trade-off between computational efficiency and functional accuracy. In this talk, we will review SNN models and we will have a look at their fundamental analytical properties. We will discuss the challenge of selecting an adequate SNN model for hardware implementation, and the need for new approaches to analysis, functional composability, verification and testing. This is, at least partially, because in SNNs, contrary to popular concurrency models like Kahn Process Networks or Data Flow, event arrival times are intrinsic to the functional behavior of a graph.
12:0010.1.3AUTOMATED SIGNAL PROCESSING DESIGN THROUGH BAYESIAN MODEL-BASED MACHINE LEARNING
Speaker and Author:
Bert de Vries, GN ReSound, NL
12:30End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


10.2 Special Session: Enabling Graph Analytics at Extreme Scales: Design Challenges, Advances, and Opportunities

Date: Thursday, March 28, 2019
Time: 11:00 - 12:30
Location / Room: Room 2

Organiser:
Ananth Kalyanaraman, Washington State University, US, Contact Ananth Kalyanaraman

Chair:
Partha Pande, Washington State University, US, Contact Partha Pande

A hot topic special session on enabling extreme scale graph analytics using innovations in algorithms and architectures is proposed. The special session will host three talks by speakers who work at the intersection of graph analytics and high performance computing. Through these talks, the session will cover the spectrum of unique challenges, latest advances, and exciting opportunities for research and development that exist in this emerging research area. The special session will serve as an avenue for discussion of recent advances that have started to show that graph analytics and computer architecture are capable of benefiting from one another and spawning new research directions. The goal is to build a new vibrant community at the intersection of graph analytics and HPC architecture, by fostering an environment conducive to active engagement and exchange of ideas between the two groups.

TimeLabelPresentation Title
Authors
11:0010.2.1A BRIEF SURVEY OF ALGORITHMS, ARCHITECTURES, AND CHALLENGES TOWARD EXTREME-SCALE GRAPH ANALYTICS
Speaker:
Ananth Kalyanaraman, Washington State University, US
Authors:
Ananth Kalyanaraman and Partha Pratim Pande, Washington State University, US
Abstract
The notion of networks is inherent in the structure, function and behavior of the natural and engineered world that surround us. Consequently, graph models and methods have assumed a prominent role to play in this modern era of Big Data, and are taking a center stage in the discovery pipelines of various data-driven scientific domains. In this paper, we present a brief review of the state-of-the-art in parallel graph analytics, particularly focusing on iterative graph algorithms and their implementation on modern day manycore architectures. The class of iterative graph algorithms covers a broad class of graph operations of varying complexities, from simpler routines such as Breadth-First Search (BFS), to polynomially-solvable problems such as shortest path computations, to NP-Hard problems such as community detection and graph coloring. We cover a set of common algorithmic abstractions used in implementing such iterative graph algorithms, state the challenges around parallelization on contemporary parallel platforms (including commodity multicores and emerging manycore platforms), and describe a set of approaches that have led to efficient implementations. We conclude the paper identifying potential research directions, opportunities, and challenges that lay ahead in the path toward enabling graph analytics at exascale

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.2.2AN ENHANCED PARALLEL GRAPH PLATFORM FOR REAL-WORLD DATA ANALYTIC WORKFLOWS
Speaker:
John Feo, Pacific Northwest National Laboratory, US
Authors:
Vito Giovanni Castellana, Maurizio Drocco, John Feo, Andrew Lumsdaine, Joseph Manzano, Andres Marquez, Marco Minutoli, Joshua Suetterlein, Antonino Tumeo and Marcin Zalewski, Pacific Northwest National Laboratory, US
Abstract
Economic competitiveness and national security depend increasingly on the insightful analysis of large data sets. The diversity of real-world data sources and analytic workflows impose challenging hardware and software requirements for parallel graph platforms. The irregular nature of graph methods is not supported well by the deep memory hierarchies of conventional distributed systems, requiring new processor and runtime system designs to tolerate memory and synchronization latencies. Moreover, the efficiency of relational table operations and matrix computations are not attainable when data is stored in common graph data structures. In this paper, we present HAGGLE, a high-performance, scalable data analytic platform. The platform's hybrid data model supports a variety of distributed, thread-safe data structures, parallel programming constructs, and persistent and streaming data. An abstract runtime layer enables us to map the stack to conventional, distributed computer systems with accelerators. The runtime uses multithreading, active messages, and data aggregation to hide memory and synchronization latencies on large-scale systems.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.2.3SCALING UP NETWORK CENTRALITY COMPUTATIONS
Speaker:
Henning Meyerhenke, Humboldt-University Berlin, DE
Authors:
Henning Meyerhenke and Alexander van der Grinten, Humboldt-University Berlin, DE
Abstract
Network science methodology is increasingly applied to study various real-world phenomena. Consequently, large network data sets comprising millions or billions of edges are more and more common. In order to process and analyze such massive graphs, we need appropriate graph processing systems and fast algorithms. Yet, many analysis algorithms have been pioneered on small networks, where speed was not the highest concern. Developing an analysis toolkit for large-scale networks thus often requires faster variants, both from an algorithmic and from an implementation point of view. In this paper we focus on computational aspects of network centrality measures. Such measures indicate the importance of a vertex (or an edge) based on the position of the vertex (or the edge) in the network. We describe several common measures as well as algorithms for computing them. This description focuses on three aspects: (i) our recent contributions to the field, (ii) new results regarding Katz centrality, and (iii) future work from a lower-level implementation point of view.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


10.3 System-level Dependability for Multicore and Real-time Systems

Date: Thursday, March 28, 2019
Time: 11:00 - 12:30
Location / Room: Room 3

Chair:
Stefano Di Carlo, Politecnico di Torino, IT, Contact Stefano Di Carlo

Co-Chair:
Luca Cassano, Politecnico di Milano, IT, Contact Luca Cassano

This session covers topics ranging from reliability assessments in heterogeneous systems, optimization of the availability in real-time systems under permanent and transient faults, as well as fault tolerant techniques in many core systems.

TimeLabelPresentation Title
Authors
11:0010.3.1IDENTIFYING THE MOST RELIABLE COLLABORATIVE WORKLOAD DISTRIBUTION IN HETEROGENEOUS DEVICES
Speaker:
Paolo Rech, UFRGS, BR
Authors:
Gabriel Piscoya Dávila, Daniel Oliveira, Philippe Navaux and Paolo Rech, UFRGS, BR
Abstract
The constant need for higher performances and reduced power consumption has lead vendors to design heterogeneous devices that embed traditional CPU and an accelerator, like a GPU or FPGA. When the CPU and the accelerator are used collaboratively the device computational performances reach their peak. However, the higher amount of resources employed for computation has, potentially, the side effect of increasing soft error rate. In this paper, we evaluate the reliability behaviour of AMD Kaveri Accelerated Processing Units executing a set of heterogeneous applications. We distribute the workload between the CPU and GPU and evaluate which configuration provides the lowest error rate or allows the computation of the highest amount of data before experiencing a failure. We show that, in most cases, the most reliable workload distribution is the one that delivers the highest performances. As experimentally proven, by choosing the correct workload distribution the device reliability can increase of up to 9x.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.3.2CE-BASED OPTIMIZATION FOR REAL-TIME SYSTEM AVAILABILITY UNDER LEARNED SOFT ERROR RATE
Speaker:
Liying Li, East China Normal University, CN
Authors:
Liying Li1, Tongquan Wei1, Junlong Zhou2, Mingsong Chen1 and X, Sharon Hu3
1East China Normal University, CN; 2Nanjing University of Science and Technology, CN; 3University of Notre Dame, US
Abstract
As the density of integrated circuits continues to increase, the possibility that real-time systems suffer from transient and permanent failures rises significantly, resulting in a degraded availability of system functionality. In this paper, we investigate the dynamic modeling of transient failure rate based on Back Propagation (BP) neural network, and propose an optimization strategy for system availability based on Cross Entropy (CE). Specifically, the neural network is trained using cross-layer simulation data obtained from SPICE simulation while the CE-based optimization for system functionality availability is achieved by judiciously selecting an optimal supply voltage for processors under timing constraints. Simulation results show that the proposed method can achieve system availability improvement of up to 32% compared to benchmarking methods.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.3.3A DETERMINISTIC-PATH ROUTING ALGORITHM FOR TOLERATING MANY FAULTS ON WAFER-LEVEL NOC
Speaker:
Ying Zhang, Tongji University, CN
Authors:
Zhongsheng Chen1, Ying Zhang1, Zebo Peng2 and Jianhui Jiang1
1Tongji University, CN; 2Linköping University, SE
Abstract
Wafer-level NoC has emerged as a promising fabric to further improve supercomputer performance, but this new fabric may suffer from the many-fault problem. This paper presents a deterministic-path routing algorithm for tolerating many faults on wafer-level NoCs. The proposed algorithm generates routing tables using a breadth-first traversal strategy, and stores one routing table in each NoC switch. The switch will then transmit packages according to its routing table online. We use the Tarjan algorithm to dynamically reconfigure the routes to avoid the faulty nodes and develop the deprecated link/node rules to ensure deadlock-free communication of the NoCs. Experimental results demonstrate that the proposed algorithm does not only tolerate the effects of many faults, but also maximizes the available nodes in the reconfigured NoC. The performance of the proposed algorithm in terms of average latency, throughput, and energy consumption is also better than those of the existing solutions.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP5-1, 705THERMAL-AWARENESS IN A SOFT ERROR TOLERANT ARCHITECTURE
Speaker:
Sajjad Hussain, Chair for Embedded Systems, KIT, DE
Authors:
Sajjad Hussain1, Muhammad Shafique2 and Joerg Henkel1
1Karlsruhe Institute of Technology, DE; 2Vienna University of Technology (TU Wien), AT
Abstract
It is crucial to provide soft error reliability in a power-efficient manner such that the maximum chip temperature remains within the safe operating limits. Different execution phases of an application have diverse performance, power, temperature and vulnerability behavior that can be leveraged to fulfill the resiliency requirements within the allowed thermal constraints. We propose a soft error tolerant architecture with fine-grained redundancy for different architectural components, such that their reliable operations can be activated selectively at fine-granularity to maximize the reliability under a given thermal constraint. When compared with state-of-the-art, our temperature-aware fine-grained reliability manager provides up to 30% reliability within the thermal budget.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:31IP5-2, 547A SOFTWARE-LEVEL REDUNDANT MULTITHREADING FOR SOFT/HARD ERROR DETECTION AND RECOVERY
Speaker:
Hwisoo So, Yonsei University, KR
Authors:
Moslem Didehban1, HwiSoo So2, Aviral Shrivastava1 and Kyoungwoo Lee2
1Arizona State University, US; 2Yonsei University, KR
Abstract
Advances in semiconductor technology have enabled unprecedented growth in safety-critical applications. In such environments, error resiliency is one of the main design concerns. Software level Redundant MultiThreading is one of the most promising error resilience strategies because they can potentially serve as inexpensive and flexible solutions for hardware unreliability issues i.e. soft and hard errors. However, the error coverage of the existing software level RMT solutions is limited to soft error detection and they rely on external schemes for error recovery. In this paper, we investigate the potential of software-level RMT schemes for complete soft and hard error detection and recovery. First, we pinpoint the main reasons behind ineffectiveness of basic software level triple redundant multithreading (STRMT) in protection against soft and hard errors. Then we introduce FISHER (FlexIble Soft and Hard Error Resiliency) as a software-only RMT scheme which can achieve comprehensive error resiliency against both soft and hard errors. Rather than performing centralized voting operations for critical instructions operands, FISHER distributes and intertwines error detection and recovery operations between redundant threads. To evaluate the effectiveness of the proposed solution, we performed more than 135,000 soft and hard error injection experiments on different hardware components of an ARM cortex53-like μ-architecturally simulated microprocessor. The results demonstrate that FISHER can reduce programs failure rate by around 261× and 162× compared to original and basic STRMTprotected versions of programs, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:32IP5-3, 317COMMON-MODE FAILURE MITIGATION:INCREASING DIVERSITY THROUGH HIGH-LEVEL SYNTHESIS
Speaker:
Farah Naz Taher, University of Texas at Dallas, US
Authors:
Farah Naz Taher1, Matthew Joslin1, Anjana Balachandran2, Zhiqi Zhu1 and Benjamin Carrion Schaefer1
1The University of Texas at Dallas, US; 2The Hong Kong Polytechnic University, HK
Abstract
Fault tolerance is vital in many domains. One popular way to increase fault-tolerance is through hardware redundancy. However, basic redundancy cannot cope with Common Mode Failures (CMFs). One way to address CMF is through the use of diversity in combination with traditional hardware redundancy. This work proposes an automatic design space exploration (DSE) method to generate optimized redundant hardware accelerators with maximum diversity to protect against CMFs given as a single behavioral description for High-Level Synthesis (HLS). For this purpose, this work exploits one of the main advantages of C-based VLSI design over the traditional RT-level design based on low-level Hardware Description Languages (HDLs): The ability to generate micro-architectures with unique characteristics from the same behavioral description. Experimental results show that the proposed method provides a significant diversity increment compared to using traditional RTL-based exploration to generate diverse designs.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


10.4 Disruptive Technologies Ain't Fake News!

Date: Thursday, March 28, 2019
Time: 11:00 - 12:30
Location / Room: Room 4

Chair:
Elena Gnani, Università di Bologna, IT, Contact Elena Gnani

Co-Chair:
Aida Todri-Sanial, CNRS-LIRMM, FR, Contact Aida Todri-Sanial

Wanna see something real? This is the right session that covers a wide variety of disruptive technologies: from wireless 3D integration to photonics and thin film electronics, all the way to quantum computing.

TimeLabelPresentation Title
Authors
11:0010.4.1CODAPT: A CONCURRENT DATA AND POWER TRANSCEIVER FOR FULLY WIRELESS 3D-ICS
Speaker:
Benjamin Fletcher, University of Southampton, GB
Authors:
Benjamin Fletcher1, Shidhartha Das2 and Terrence Mak1
1University of Southampton, GB; 2ARM Ltd., GB
Abstract
Three dimensional system integration is a promising enabling technology for realising heterogeneous ICs, facilitating stacking of disparate elements such as MEMS, sensors, analogue components, memories and digital processing. Recently, research has looked to contactless 3D integration using inductive coupling links (ICLs) to provide a low-cost alternative to conventional contact-based approaches (e.g. through silicon vias) for 3D integration. In this paper, we present a novel, fully wireless, ICL architecture for Concurrent Data and Power Transfer (CoDAPT) between tiers of a 3D-IC. The proposed CoDAPT architecture uses only a single inductor for simultaneous power transmission and data communication, resulting in high area efficiency, whilst facilitating low-cost, straightforward die stacking. The proposed design is experimentally validated through full wave EM and SPICE simulation and demonstrates capability to communicate data vertically at a rate of 1.3Gbps/channel (utilising an area of only 0.052mm2) whilst simultaneously achieving power delivery of 0.83mW, under standard operating conditions. A case study is also presented, demonstrating that CoDAPT achieves an area reduction greater than 1.7x when compared with existing works, representing an important progression towards ultra low-cost 3D-ICs through fully wireless stacking.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.4.2COMPILING PERMUTATIONS FOR SUPERCONDUCTING QPUS
Speaker:
Mathias Soeken, EPFL, CH
Authors:
Mathias Soeken, Fereshte Mozafari, Bruno Schmitt and Giovanni De Micheli, EPFL, CH
Abstract
In this paper we consider the compilation of quan- tum state permutations into quantum gates for physical quantum computers. A sequence of generic single-target gates, which realize the input permutation, are extracted using a decom- position based reversible logic synthesis algorithm. We present a compilation algorithm that translates single-target gates into a quantum circuit composed of the elementary quantum gate sets that are supported by IBM's 5-qubit and 16-qubit, and Rigetti's 8-qubit and 19-qubit superconducting transmon QPUs. Compared to generic state-of-the-art compilation techniques, our technique improves gate volume and gate depth by up to 58% and 49%, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.4.3STOCHASTIC COMPUTING WITH INTEGRATED OPTICS
Speaker:
Hassnaa El-Derhalli, Concordia University, CA
Authors:
Hassnaa El-Derhalli1, Sébastien Le Beux2 and Sofiene Tahar1
1Concordia University, CA; 2Lyon Institute of Nanotechnology, FR
Abstract
Stochastic computing (SC) allows reducing hardware complexity and improving energy efficiency of error resilient applications. However, a main limitation of the computing paradigm is the low throughput induced by the intrinsic serial computing of bit-streams. In this paper, we address the implementation of SC in the optical domain, with the aim to improve the computation speed. We implement a generic optical architecture allowing the execution of polynomial functions. We propose design methods to explore the design space in order to optimize key metrics such as circuit robustness and power consumption. We show that a circuit implementing a 2nd order polynomial degree function and operating at 1Ghz leads to 20.1pJ laser consumption per computed bit.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:1510.4.4INKJET-PRINTED TRUE RANDOM NUMBER GENERATOR BASED ON ADDITIVE RESISTOR TUNING
Speaker:
Ahmet Turan Erpzan, Karlsruhe Institute of Technology, DE
Authors:
Ahmet Turan Erozan1, Rajendra Bishnoi1, Jasmin Aghassi-Hagmann2 and Mehdi Tahoori1
1Karlsruhe Institute of Technology, DE; 2Karlsruhe Institute of Technology, Offenburg University of Applied Science, DE
Abstract
Printed electronics (PE) is a fast growing technology with promising applications in wearables, smart sensors and smart cards since it provides mechanical flexibility, low-cost, on-demand and customizable fabrication. To secure the operation of these applications, True Random Number Generators (TRNGs) are required to generate unpredictable bits for cryptographic functions and padding. However, since the additive fabrication process of PE circuits results in high intrinsic variation due to the random dispersion of the printed inks on the substrate, constructing a printed TRNG is challenging. In this paper, we exploit the additive customizable fabrication feature of inkjet printing to design a TRNG based on electrolyte-gated field effect transistors (EGFETs). The proposed memory-based TRNG circuit can operate at low voltages (≤ 1 V), it is hence suitable for low-power applications. We also propose a flow which tunes the printed resistors of the TRNG circuit to mitigate the overall process variation of the TRNG so that the generated bits are mostly based on the random noise in the circuit, providing a true random behaviour. The results show that the overall process variation of the TRNGs is mitigated by 110 times, and the simulated TRNGs pass the National Institute of Standards and Technology Statistical Test Suite.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP5-4, 435EXPLOITING WAVELENGTH DIVISION MULTIPLEXING FOR OPTICAL LOGIC SYNTHESIS
Speaker:
David Z. Pan, University of Texas, Austin, US
Authors:
Zheng Zhao1, Derong Liu2, Zhoufeng Ying1, Biying Xu1, Chenghao Feng1, Ray T. Chen1 and David Z. Pan1
1University of Texas, Austin, US; 2Cadence Design Systems, US
Abstract
Photonic integrated circuit (PIC), as a promising alternative to traditional CMOS circuit, has demonstrated the potential to accomplish on-chip optical interconnects and computations in ultra-high speed and/or low power consumption. Wavelength division multiplexing (WDM) is widely used in optical communication for enabling multiple signals being processed and transferred independently. In this work, we apply WDM to optical logic PIC synthesis to reduce the PIC area.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


10.5 SSD and data placement

Date: Thursday, March 28, 2019
Time: 11:00 - 12:30
Location / Room: Room 5

Chair:
Olivier Sentieys, INRIA, FR, Contact Olivier Sentieys

Co-Chair:
Hamid Tabani, Barcelona Supercomputing Center, BSC, ES, Contact Hamid Tabani

This session deals with some solutions to improve memory and storage throughput and latency. The first two papers propose solutions for SSD-based storage while the latter covers data placement and management in CPU-FPGA multicore systems.

TimeLabelPresentation Title
Authors
11:0010.5.1HOTR: ALLEVIATING READ/WRITE INTERFERENCE WITH HOT READ DATA REPLICATION FOR FLASH STORAGE
Speaker:
Hong Jiang, The University of Texas at Arlington, US
Authors:
Suzhen Wu1, Weiwei Zhang1, Bo Mao1 and Hong Jiang2
1Xiamen University, CN; 2The University of Texas at Arlington, US
Abstract
The read/write interference problem of flash storage remains a critical concern under workloads with a mixture of read and write requests. To significantly improve the read performance in face of read/write interference, we propose a Hot Data Replication scheme for flash storage, called HotR. HotR utilizes the asymmetric read and write performance characteristics of flash-based SSDs and outsources the popular read data to a surrogate space such as a dedicated spare flash chip or an over-provisioned space within an SSD. By servicing some conflicted read requests on the surrogate flash space, HotR can alleviate, if not entirely eliminate, the contention between the read requests and the on-going write requests. The evaluation results show that HotR improves the state-of-the-art scheme in the system performance and cost efficiency significantly. Consequently, the tail-latency of the flash-based storage systems is also reduced.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.5.2RAFS: A RAID-AWARE FILE SYSTEM TO REDUCE THE PARITY UPDATE OVERHEAD FOR SSD RAID
Speaker:
Chenlei Tang, Huazhong University of Science and Technology, CN
Authors:
Chenlei Tang1, Jiguang Wan1, Yifeng Zhu2, Zhiyuan Liu1, Peng Xu1, Fei Wu1 and Changsheng Xie1
1Huazhong University of Science and Technology, CN; 2University of Maine, US
Abstract
In a parity-based SSD RAID, small write requests not only accelerate the wear-out of SSDs due to extra writes for updating parities but also deteriorate performance due to associated expensive garbage collection. To mitigate the problem of small writes, a buffer is often added at the RAID controller to absorb overwrites and writes performed to the same stripe. However, this approach achieves only suboptimal efficiency because file layout information is invisible at the block level. This paper proposes RAFS, a RAID-aware file system, which utilizes a RAID-friendly data layout to improve the reliability and performance of SSD-based RAID 5. By leveraging delayed allocation of modern file systems, RAFS employs a stripe-aware buffer policy to coalesce writes to the same file. To reduce parity updates, RAFS compacts buffered updates and flushes back in stripe units to mitigate the parity update overhead. RAFS adopts a stripe-granularity allocation scheme to align writes to stripe boundaries. Experimental results show that RAFS can improve throughput by up to 90%, compared to Ext4.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.5.3AUTOMATIC DATA PLACEMENT FOR CPU-FPGA HETEROGENEOUS MULTIPROCESSOR SYSTEM-ON-CHIPS
Speaker:
Shiqing Li, Shandong University, CN
Authors:
Shiqing Li, Yixun Wei and Lei Ju, Shandong University, CN
Abstract
Efficient utilization of restrained memory resources is of paramount importance in CPU-FPGA heterogeneous multiprocessor system-on-chip (HMPSoC) based system design for memory-intensive applications. State-of-the-art high level synthesis (HLS) tools rely on the system programmers to manually determine the data placement within the complex memory hierarchy. In this paper, we propose an automatic data placement framework which can be seamlessly integrated with the commercial Vivado HLS. We first show counter-intuitive results that traditional frequency and locality based data placement strategy designed for CPU architecture leads to non-optimal system performance in CPU-FPGA HMPSoCs. Built on top of our memory latency analysis model, the proposed integer linear programming (ILP) based framework determines whether each array object should be access via the on-chip BRAM, shared CPU L2-cache, or DDR memory directly. Experimental results on the Zedboard platform show an average 1.39X performance speedup compared with a greedy-based allocation strategy.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP5-5, 482IGNORETM: OPPORTUNISTICALLY IGNORING TIMING VIOLATIONS FOR ENERGY SAVINGS USING HTM
Speaker:
Dimitra Papagiannopoulou, University of Massachusetts Lowell, US
Authors:
Dimitra Papagiannopoulou1, Sungseob Whang2, Tali Moreshet3 and Iris Bahar4
1University of Massachusetts Lowell, US; 2CloudHealth Technologies, US; 3Boston University, US; 4Brown University, US
Abstract
Energy consumption is the dominant factor in many computing systems. Voltage scaling is a widely used technique to lower energy consumption, which exploits supply voltage margins to ensure reliable circuit operation. Aggressive voltage scaling will slow signal propagation; without coherent frequency relaxation, timing violations may be generated. Hardware Transactional Memory (HTM) offers an error recovery mechanism that allows reliable execution and power savings with modest overhead. We propose IgnoreTM, an adaptive error management framework, that tolerates (i.e., opportunistically ignores) timing violations, allowing for more aggressive voltage scaling. Our experimental results show that IgnoreTM allows up to 47% total energy savings with negligible impact on runtime.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


10.6 Self-adaptive resource management

Date: Thursday, March 28, 2019
Time: 11:00 - 12:30
Location / Room: Room 6

Chair:
Geoff Merret, University of Southampton, GB, Contact Geoff Merrett

Co-Chair:
Andy Pimantel, University of Amsterdam, NL, Contact Andy Pimentel

This session covers run-time resource management techniques for multicores, edge computing devices and storage systems. Proposed techniques are based on either machine learning or heuristics.

TimeLabelPresentation Title
Authors
11:0010.6.1A RUNTIME RESOURCE MANAGEMENT POLICY FOR OPENCL WORKLOADS ON HETEROGENEOUS MULTICORES
Speaker:
Antonio Miele, Politecnico di Milano, IT
Authors:
Daniele Angioletti, Francesco Bertani, Cristiana Bolchini, Francesco Cerizzi and Antonio Miele, Politecnico di Milano, IT
Abstract
Nowadays, runtime workload distribution and resource tuning for heterogeneous multicores running multiple OpenCL applications is still an open quest. This paper proposes an adaptive policy capable at identifying an optimal working point for an unknown multiprogrammed OpenCL workload without using any design-time application profiling or analysis. The approach compared against a design-time optimization strategy demonstrates to be effective in converging to an solution guaranteeing required performance while minimizing power consumption and maximum temperature; it achieves on average values 0.085 W (5.15%) and 0.83°C (1.47%) worse than the static optimal solution.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.6.2DMRM: DISTRIBUTED MARKET-BASED RESOURCE MANAGEMENT OF EDGE COMPUTING SYSTEMS
Speaker:
Dimosthenis Masouros, National Technical University of Athens, GR
Authors:
Manolis Katsaragakis1, Dimosthenis Masouros1, Vasileios Tsoutsouras1, Farzad Samie2, Lars Bauer2, Joerg Henkel2 and Dimitrios Soudris3
1National Technical University of Athens, GR; 2Karlsruhe Institute of Technology, DE; 3Democritus University of Thrace, GR
Abstract
Resource management is a key technique for efficiently operating devices in Internet of Things (IoT). In this paper, we propose DMRM, a new algorithm based on economic and pricing models for dynamic resource management of IoT networks under CPU, memory, bandwidth and latency constraints. We use a supply and demand model, smart data pricing and perceived valued pricing, implementing a marketplace where IoT devices and Gateways buy and sell computing and communication resources necessary for task execution. Our new market-based algorithm is compared to relevant approaches showing that it not only reaches near-optimal results, but also, its scalable, distributed nature leads to three orders of magnitude lower execution requirements compared to centralized approaches.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.6.3GOAL-DRIVEN AUTONOMY FOR EFFICIENT ON-CHIP RESOURCE MANAGEMENT: TRANSLATING OBJECTIVES TO GOALS
Speaker:
Anil Kanduri, University of Turku, FI
Authors:
Elham Shamsa1, Anil Kanduri1, Amir M. Rahmani2, Pasi Liljeberg1, Axel Jantsch3 and Nikil Dutt4
1University of Turku, FI; 2University of California Irvine & TU Wien, US; 3Vienna University of Technology (TU Wien), AT; 4UC Irvine, US
Abstract
Run-time resource management of heterogeneous multi-core systems is challenging due to the limited energy budget that has to be allocated among diverse workloads and dynamic environment. User interaction within these systems alter the performance requirements, which often conflicts with concurrent applications' objectives and system constraints. Current resource management approaches focus on optimizing fixed objectives, ignoring the variation in system and applications' constraints at run-time. For an efficient resource management, the system has to operate autonomously in complex environments. We present goal-driven autonomy, which allows systems to generate and prioritize their goals in response to the environment changes. Experimental results on Odroid XU3 show the effectiveness of this technique for an efficient resource management by considering dynamic environment compared to the existing fixed objective solutions.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:1510.6.4SCRUB UNLEVELING: ACHIEVING HIGH DATA RELIABILITY AT LOW SCRUBBING COST
Speaker:
Tianming Jiang, Huazhong University of Science and Technology, CN
Authors:
Tianming Jiang1, Ping Huang2 and Ke Zhou1
1Huazhong University of Science and Technology, CN; 2Temple University, US
Abstract
Nowadays, proactive error prediction, using machine learning methods, has been proposed to improve storage system reliability by increasing the scrubbing rate for drives with higher error rates. Unfortunately, the majority of works incur non-trivial scrubbing cost and ignore the periodic characteristic of scrubbing. In this paper, we aim to make the prediction guided scrubbing more suitable for practical use. In particular, we design a scrub unleveling technique that enforces a lower rate scrubbing to healthy disks and a higher rate scrubbing to disks subject to latent sector errors (LSEs). Moreover, a voting-based method is introduced to ensure prediction accuracy. Experimental results on a real-world field dataset have demonstrated that our proposed approach can achieve lower scrubbing cost together with higher data reliability than traditional fixed-rate scrubbing methods. Compared with the state-of-the-art, our method can achieve the same level of Mean-Time-To-Detection (MTTD) with almost 32% less scrubbing.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP5-6, 66USING MACHINE LEARNING FOR QUALITY CONFIGURABLE APPROXIMATE COMPUTING
Speaker:
Mahmoud Masadeh, Concordia University, CA
Authors:
Mahmoud Masadeh, Osman Hasan and Sofiene Tahar, Department of Electrical and Computer Engineering, Concordia University, Montreal, Quebec, CA
Abstract
Approximate computing (AC) is a nascent energy-efficient computing paradigm for error-resilient applications. However, the quality control of AC is quite challenging due to its input-dependent nature. Existing solutions fail to address fine-grained input-dependent controlled approximation. In this paper, we propose an input-aware machine learning based approach for the quality control of AC. For illustration purposes, we use 20 configurations of 8-bit approximate multipliers. We evaluate these designs for all combinations of possible input data. Then, we use machine learning algorithms to efficiently make predictive decisions for the quality control of the target approximate application, based on experimentally collected training data. The key benefits of the proposed approach include: (1) fine-grained input-dependent approximation, (2) no missed approximation opportunities, (3) no rollback recovery overhead, (4) applicable to any approximate computation with error-tolerant components, and (5) flexibility in adapting various error metrics.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:31IP5-7, 439PREDICTION-BASED TASK MIGRATION ON S-NUCA MANY-CORES
Speaker:
Martin Rapp, Karlsruhe Institute of Technology, DE
Authors:
Martin Rapp1, Anuj Pathania1, Tulika Mitra2 and Joerg Henkel1
1Karlsruhe Institute of Technology, DE; 2National University of Singapore, SG
Abstract
Performance of a task running on a many-core with distributed shared Last-Level Cache (LLC) strongly depends on two factors: the power budget needed to guarantee thermally safe operation and the LLC latency. The task's thread-to-core mapping determines both the factors. Arrival and departure of tasks on a many-core deployed in an open system can change its state significantly in terms of available cores and power budget. Task migrations can thereupon be used as a tool to keep the many-core operating at the peak performance. Furthermore, the relative impacts of power budget and LLC latency on a task's performance can change with its different execution phases mandating its migration on-the-fly. We propose the first run-time algorithm PCMig that increases the performance of a many-core with distributed shared LLC by migrating tasks based on their phases and the many-core's state. PCMig is based on a performance-prediction model that predicts the performance impact of migrations. PCMig results in up to 16% reduction in the average response time compared to the state-of-the-art.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


10.7 Architectures for emerging machine learning techniques

Date: Thursday, March 28, 2019
Time: 11:00 - 12:30
Location / Room: Room 7

Chair:
Sander Stuijk, Eindhoven University of Technology, NL, Contact Sander Stuijk

Co-Chair:
Marina Zapater, EPFL, CH, Contact Marina Zapater Sancho

The first paper presents a reinforcement learning approach to optimize the accelerator parameters. The second paper showcases how a memory Trojan can be used to attack, and lower the accuracy of a neural network. The last two papers introduce hardware accelerators to improve the energy consumption of the network.

TimeLabelPresentation Title
Authors
11:0010.7.1LEARNING TO INFER: RL-BASED SEARCH FOR DNN PRIMITIVE SELECTION ON HETEROGENEOUS EMBEDDED SYSTEMS
Speaker:
Miguel de Prado, HES-SO/ETHZ, CH
Authors:
Miguel de Prado1, Nuria Pazos2 and Luca Benini1
1Integrated Systems Laboratory, ETH Zurich & Haute Ecole Arc Ingénierie, HES-SO, CH; 2Haute Ecole Arc Ingénierie, HES-SO, CH
Abstract
Deep Learning is increasingly being adopted by industry for computer vision applications running on embedded devices. While Convolutional Neural Networks' accuracy has achieved a mature and remarkable state, inference latency and throughput are a major concern especially when targeting low-cost and low-power embedded platforms. CNNs' inference latency may become a bottleneck for Deep Learning adoption by industry, as it is a crucial specification for many real-time processes. Furthermore, deployment of CNNs across heterogeneous platforms presents major compatibility issues due to vendor-specific technology and acceleration libraries. In this work, we present QS-DNN, a fully automatic search based on Reinforcement Learning which, combined with an inference engine optimizer, efficiently explores through the design space and empirically finds the optimal combinations of libraries and primitives to speed up the inference of CNNs on heterogeneous embedded devices. We show that, an optimized combination can achieve 45x speedup in inference latency on CPU compared to a dependency-free baseline and 2x on average on GPGPU compared to the best vendor library. Further, we demonstrate that, the quality of results and time ``to-solution'' is much better than with Random Search and achieves up to 15x better results for a short-time search.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.7.2MEMORY TROJAN ATTACK ON NEURAL NETWORK ACCELERATORS
Speaker:
Xing Hu, University of California, Santa Barbara, CN
Authors:
Yang Zhao1, Xing Hu1, Shuangchen Li1, Jing Ye2, Lei Deng1, Yu Ji3, Jianyu Xu4, Dong Wu3 and Yuan Xie1
1University of California, Santa Barbara, US; 2State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, CN; 3Tsinghua University, University of California, Santa Barbara, CN; 4Tsinghua University, CN
Abstract
Neural network accelerators are widely deployed in application systems for computer vision, speech recognition, and machine translation. Due to ubiquitous deployment of these systems, a strong incentive rises for adversaries to attack such artificial intelligence (AI) systems. Trojan is one of the most important attack model in hardware security domain. Hardware Trojans are malicious modifications to original ICs inserted by adversaries, which lead the system to malfunction after being triggered. The globalization of the semiconductor gives a chance for the adversary to conduct the hardware Trojan attacks. Previous works design Neural Network (NN) trojans with access to the model, toolchain, and hardware platform. The threat model is impractical which hinders their real adoption. In this work, we propose a memory Trojan methodology without the help of toolchain manipulation and model parameter information. We first leverage the memory access patterns to identify the input image data. Then we propose a Trojan triggering method based on the dedicated input image other than the circuit events, which has better controllability. The triggering mechanism works well even with environment noise and preprocessing towards the original images. In the end, we implement and verify the effectiveness of accuracy degradation attack.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.7.3DEEP POSITRON: A DEEP NEURAL NETWORK USING THE POSIT NUMBER SYSTEM
Speaker:
Zachariah Carmichael, Rochester Institute of Technology, US
Authors:
Zachariah Carmichael1, Hamed F. Langroudi1, Char Khazanov1, Jeffrey Lillie1, John L. Gustafson2 and Dhireesha Kudithipudi1
1Rochester Institute of Technology, US; 2National University of Singapore, SG
Abstract
The recent surge of interest in Deep Neural Networks (DNNs) has led to increasingly complex networks that tax computational and memory resources. Many DNNs presently use 16-bit or 32-bit floating point operations. Significant performance and power gains can be obtained when DNN accelerators support low-precision numerical formats. Despite considerable research, there is still a knowledge gap on how low-precision operations can be realized for both DNN training and inference. In this work, we propose a DNN architecture, Deep Positron, with posit numerical format operating successfully at less or equal to 8 bits for inference. We propose a precision-adaptable FPGA soft core for exact multiply-and-accumulate for uniform comparison across three numerical formats, fixed, floating-point and posit. Preliminary results demonstrate that 8-bit posit has better accuracy than 8-bit fixed or floating-point for three different low-dimensional datasets. Moreover, the accuracy is comparable to 32-bit floating-point on a Xilinx Virtex-7 FPGA device. The trade-offs between DNN performance and hardware resources, i.e. latency, power, and resource utilization, show that posit outperforms in accuracy and latency at 8-bit and below.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:1510.7.4LEARNING TO SKIP INEFFECTUAL RECURRENT COMPUTATIONS IN LSTMS
Speaker:
Zhengyun Ji, McGill University, CA
Authors:
Arash Ardakani, Zhengyun Ji and Warren Gross, McGill University, CA
Abstract
Long Short-Term Memory (LSTM) is a special class of recurrent neural network, which has shown remarkable successes in processing sequential data. The typical architecture of an LSTM involves a set of states and gates: the states retain information over arbitrary time intervals and the gates regulate the flow of information. Due to the recursive nature of LSTMs, they are computationally intensive to deploy on edge devices with limited hardware resources. To reduce the computational complexity of LSTMs, we first introduce a method that learns to retain only the important information in the states by pruning redundant information. We then show that our method can prune over 90% of information in the states without incurring any accuracy degradation over a set of temporal tasks. This observation suggests that a large fraction of the recurrent computations are ineffectual and can be avoided to speed up the process during the inference as they involve noncontributory multiplications/accumulations with zero-valued states. Finally, we introduce a custom hardware accelerator that can perform the recurrent computations using both sparse and dense states. Experimental measurements show that performing the computations using the sparse states speeds up the process and improves energy efficiency (GOPS/W) by up to 5.2x when compared to implementation results of the accelerator performing the computations using dense states.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP5-8, 244DESIGN OF HARDWARE-FRIENDLY MEMORY ENHANCED NEURAL NETWORKS
Speaker:
Ann Franchesca Laguna, University of Notre Dame, US
Authors:
Ann Franchesca Laguna, Michael Niemier and X, Sharon Hu, University of Notre Dame, US
Abstract
Neural networks with external memories have been proven to minimize catastrophic forgetting, a major problem in applications such as lifelong and few-shot learning. However, such memory enhanced neural networks (MENNs) typically often require a large number of floating point-based cosine distance metric calculations to perform necessary attentional operations, which greatly increases energy consumption and hardware cost. This paper investigates other distance metrics in such neural networks in order to achieve more efficient hardware implementations in MENNs. We propose using content addressable memories (CAMs) to accelerate and simplify attentional operations. We focus on reducing the bit precision, memory size (MxD) and using alternative distance metric calculations such as L1, L2, and L∞ to perform attentional mechanism computations for MENNs. Our hardware friendly approach implements fixed point L∞ distance calculations via ternary content addressable memories (TCAM) and fixed point L1 and L2 distance calculations on a general purpose graphical processing unit (GPGPU) (Computing-in-memory arrays (CIM) might also be used). As a representative example, a 32-bit floating point-based cosine distance MENN with MD multiplications has a 99.06% accuracy for the Omniglot 5-way 5-shot classification task. Based on our approach with just 4-bit fixed point precision, a L∞-L1 distance hardware accuracy of 90.35% can be achieved with just 16 TCAM lookups and 16D addition and subtraction operations. With 4-bit precision and a L∞-L2 distance, hardware classification accuracies of 96.00% are possible. Hence, 16 TCAM lookups and 16D multiplication operations are needed. Assuming the hardware memory has 512 entries, the number of multiplication operations is reduced by 32x versus the cosine distance approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:31IP5-9, 107ENERGY-EFFICIENT INFERENCE ACCELERATOR FOR MEMORY-AUGMENTED NEURAL NETWORKS ON AN FPGA
Speaker:
Seongsik Park, Seoul National University, KR
Authors:
Seongsik Park, Jaehee Jang, Seijoon Kim and Sungroh Yoon, Seoul National University, KR
Abstract
Memory-augmented neural networks (MANNs) are designed for question-answering tasks. It is difficult to run a MANN effectively on accelerators designed for other neural networks (NNs), in particular on mobile devices, because MANNs require recurrent data paths and various types of operations related to external memory access. We implement an accelerator for MANNs on a field-programmable gate array (FPGA) based on a data flow architecture. Inference times are also reduced by inference thresholding, which is a data-based maximum inner-product search specialized for natural language tasks. Measurements on the bAbI data show that the energy efficiency of the accelerator (FLOPS/kJ) was higher than that of an NVIDIA TITAN V GPU by a factor of about 125, increasing to 140 with inference thresholding.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:32IP5-10, 345HDCLUSTER: AN ACCURATE CLUSTERING USING BRAIN-INSPIRED HIGH-DIMENSIONAL COMPUTING
Speaker:
Mohsen Imani, University of California, San Diego, US
Authors:
Mohsen Imani, Yeseong Kim, Thomas Worley, Saransh Gupta and Tajana Rosing, University of California San Diego, US
Abstract
Internet of things has increased the rate of data generation. Clustering is one of the most important tasks in this domain to find the latent correlation between data. However, performing today's clustering tasks is often inefficient due to the data movement cost between cores and memory. We propose HDCluster, a brain-inspired unsupervised learning algorithm which clusters input data in a high-dimensional space by fully mapping and processing in memory. Instead of clustering input data in either fixed-point or floating-point representation, HDCluster maps data to vectors with dimension in thousands, called hypervectors, to cluster them. Our evaluation shows that HDCluster provides better clustering quality for the tasks that involve a large amount of data while providing a potential for accelerating in a memory-centric architecture.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


10.8 Europe digitization: Smart Anything Everywhere Initiative & FED4SAE, open calls and success stories

Date: Thursday, March 28, 2019
Time: 11:00 - 12:30
Location / Room: Exhibition Theatre

Organiser:
Isabelle Dor, COMMISSARIAT A L ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVES, FR, Contact Isabelle Dor

Chair:
Marcello Coppola, STMicroelectronics, FR, Contact Marcello Coppola

The goal of Smart Anything Everywhere (SAE) inititiave is to support SMEs, start-ups and mid-caps to enhance their products and services through the inclusion of innovative digital technologies. SAE H2020 projects provide one stop-shops to help companies to become more competitive through the adoption of the latest digital technologies. A SME tailored service is now available to provide access to R&D and digital competences, training to develop technical skills, business management support and networking opportunities. Cascade funding is available through SAE open calls, but also I4MS focusing on manufacturing.

FED4SAE project aims at bringing innovative Cyber-Physical System technologies to businesses from any sectors and any companies. The presentation of awarded projects illustrate FED4SAE one-stop-shop to accelerate CPS developments combining i) Access to leading-edge CPS platforms, Advanced Technologies, and Testbeds from Industrials and R&D centers, ii) Technical coaching from domain experts, iii) Innovation Management support, iv) Up to €60k in financial support to innovative companies plus access to further VC funding, v) and Access to potential users and suppliers across value chains throughout Europe. This session will confront the view point of large industrial, RTOs and SMEs and their targeted objectives and impact.

TimeLabelPresentation Title
Authors
11:0010.8.1SAE, AN EXAMPLE OF EC INITITIAVE TO SUPPORT EUROPE DIGITIZATION
Speaker:
Isabelle Dor, COMMISSARIAT A L ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVES, FR
Abstract

The goal of Smart Anything Everywhere (SAE) inititiave is to support  SMEs, start-ups and mid-caps to enhance their products and services through the inclusion of innovative digital technologies. SAE H2020 projects provide  one stop-shops to help companies to become more competitive through the adoption of the latest digital technologies. A SME tailored service is now available to provide access to R&D and digital competences, training to develop technical skills, business management support and networking opportunities. Cascade funding is available through SAE open calls, but also I4MS focusing on manufacturing.

11:1510.8.2SME, RTO, INDUSTRIAL: HOW SAE SUPPORT THE COLLABORATION - PART 1
Speaker:
Marcello Coppola, STMicroelectronics, FR
Abstract

FED4SAE project aims at bringing innovative Cyber-Physical System technologies to businesses from any sectors and any companies. The presentation of awarded  projects illustrate FED4SAE one-stop-shop to accelerate CPS developments combining i) Access to leading-edge CPS platforms, Advanced Technologies, and Testbeds from Industrials and R&D centers, ii) Technical coaching from domain experts, iii) Innovation Management support, iv) Up to €60k in financial support to innovative companies plus access to further VC funding, v) and Access to potential users and suppliers across value chains throughout Europe. This session will confront the view point of large industrial, RTOs and SMEs and their targeted objectives and impact.

11:3010.8.3SME, RTO, INDUSTRIAL: HOW SAE SUPPORT THE COLLABORATION - PART 2
Speaker:
Michael Setton, Digital Catapult, GB
11:4510.8.4SME, RTO, INDUSTRIAL: HOW SAE SUPPORT THE COLLABORATION - PART 3
Speaker:
Rosanna Zaza, Alitec Srl, IT
12:0010.8.5SME, RTO, INDUSTRIAL: HOW SAE SUPPORT THE COLLABORATION - PART 4
Speaker:
Giovanni Gherardi, Energica Motor Company, IT
12:30End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


11.0 LUNCH TIME KEYNOTE SESSION

Date: Thursday, March 28, 2019
Time: 13:20 - 13:50
Location / Room: Room 1

Chair:
Marc Geilen, Eindhoven University of Technology, NL, Contact Marc Geilen

Co-Chair:
Sander Stuijk, Eindhoven University of Technology, NL, Contact Sander Stuijk

TimeLabelPresentation Title
Authors
13:2011.0.1A FUNDAMENTAL LOOK AT MODELS AND INTELLIGENCE
Author:
Edward Lee, UC Berkeley, US
Abstract
Models are central to building confidence in complex software systems. Type systems, interface theories, formal semantics, concurrent models of computation, component models, and ontologies all augment classical software engineering techniques such as object-oriented design to catch errors and to make software more modular and composable. Every model lives within a modeling framework, ideally giving semantics to the model, and many modeling frameworks have been developed that enable rigorous analysis and proof of properties. But every such modeling framework is an imperfect mirror of reality. A computer system operating in the physical world may or may not accurately reflect behaviors predicted by a model, and the model may not reflect behaviors that are critical to correct operation of the software. Software in a cyber-physical system, for example, has timing properties that are rarely represented in formal models. As artificial intelligence gets more widely used, the problem gets worse, with predictability and explainability seemingly evaporating. In this talk, I will examine the limitations in the use of models. I will show that two very different classes of models are used in practice, classes that I call "scientific models" and "engineering models." These two classes have complementary properties, and many misuses of models stem from confusion about which class is being used. Scientific models of intelligent systems are very different from engineering models.
13:50End of session
15:30Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


11.1 Special Day on "Model-Based Design of Intelligent Systems" Session: MBD of Cyber-Physical Systems

Date: Thursday, March 28, 2019
Time: 14:00 - 15:30
Location / Room: Room 1

Chair:
Eugenio Villar, Universidad de Cantabria, ES, Contact Eugenio Villar

Co-Chair:
Marc Geilen, Eindhoven University of Technology, NL, Contact Marc Geilen

MBD of Cyber-Physical Systems

TimeLabelPresentation Title
Authors
14:0011.1.1SPECIFYING AND EVALUATING QUALITY METRICS FOR VISION-BASED PERCEPTION SYSTEMS
Speaker:
Jyotirmoy Deshmukh, Arizona State University, US
Authors:
Adel Dokhanchi, Aniruddh Puranic, Xin Qin, Anand Balakrishnan, Heni Ben Amor, Georgios Fainekos and Jyotirmoy V. Deshmukh, Univ. of Southern California, US
Abstract
Robust perception algorithms are a vital ingredient for autonomous systems such as self-driving vehicles. Checking the correctness of perception algorithms such as those based on deep convolutional neural networks (CNN) is a formidable challenge problem. In this paper, we suggest the use of Timed Quality Temporal Logic (TQTL) as a formal language to express desirable spatio-temporal properties of a perception algorithm processing a video. While perception algorithms are traditionally tested by comparing their performance to ground truth labels, we show how TQTL can be a useful tool to determine quality of perception, and offers an alternative metric that can give useful information, even in the absence of ground truth labels. We demonstrate TQTL monitoring on two popular CNNs: YOLO and SqueezeDet, and give a comparative study of the results obtained for each architecture.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.1.2MODELING CROSS-LAYER INTERACTIONS FOR DESIGNING CERTIFIABLE CYBER-PHYSICAL SYSTEMS
Speaker:
Samarjit Chakraborty, TUM, DE
Authors:
Samarjit Chakraborty, James H. Anderson, Martin Becker, Helmut Graeb, Samiran Halder, Ravindra Metta, Lothar Thiele, Stavros Tripakis and Anand Yeolekar, TUM, DE
Abstract
A central challenge in designing embedded control systems or cyber-physical systems (CPS) is that of translating high-level models of control algorithms into efficient implemen- tations, while ensuring that model-level semantics are preserved. While a large body of techniques for designing provably correct control strategies exist in the control theory literature, when it comes to transforming mathematical descriptions of these strategies to an efficient implementation, the available means are surprisingly ad hoc in nature. Among other reasons, this is be- cause of (i) implementation platform details not sufficiently being accounted for in controller models, (ii) side effects introduced in the code generation process, (iii) various compiler optimizations whose impact on the dynamics of the plant being controlled not being properly understood, (iv) the presence of analog components on the implementation platform whose behavior is difficult to model, (v) computation and communication delays that exist in an implementation but were not accounted for in the model, and (vi) also the effects of image/video processing whose accuracy and timing behavior are difficult to model. As we move towards designing autonomous systems, these issues become biting problems on the path to certification, and striking a balance between performance and certification. In this position paper, we discuss some of these challenges - that we formulate as the need for modeling the interactions between various imple- mentation layers in a CPS - and potential research directions to address them.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.1.3STEPS TOWARD VERIFIED PROGRAMMING OF EMBEDDED COMPUTING DEVICES
Speaker:
Jean-Pierre Talpin, INRIA, FR
Authors:
Jean-Pierre Talpin1, Jean-Joseph Marty1, Shravan Narayan2, Deian Stefan2 and Rajesh Gupta2
1INRIA, FR; 2University of California at San Diego, US
Abstract
We propose a type-driven approach to building ver- ified safe and correct IoT applications. Today's IoT applications are plagued with bugs that can cause physical damage. This is largely because developers account for physical constraints using ad-hoc techniques. Accounting for such constrains in a more principled fashion demands reasoning about the composition of all the software and hardware components of the application. Our proposed framework takes a step in this direction by (1) using refinement types to make make physical constraints explicit and (2) imposing an event-driven programing discipline to simplify the reasoning of system-wide properties to that of an event queue. In taking this approach, our framework makes it possible for developers to build verified IoT application by making it a type error for code to violate physical constraints.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


11.2 Novel techniques in optimization and high-level modeling of mixed-signal circuits

Date: Thursday, March 28, 2019
Time: 14:00 - 15:30
Location / Room: Room 2

Chair:
Francisco V. Fernandez, IMSE, ES, Contact Francisco V. Fernandez

Co-Chair:
Mark Po-Hung Lin, National Chung Cheng University, TW, Contact Mark Po-Hung Lin

New techniques are presented for the automated behavioral model generation and efficient numerical-symbolic simulation of analog/mixed-signal circuits. Also a Bayesian optimization approach for efficient analog circuit synthesis is presented.

TimeLabelPresentation Title
Authors
14:0011.2.1BEHAVIORAL MODELING OF TRANSISTOR-LEVEL CIRCUITS USING AUTOMATIC ABSTRACTION TO HYBRID AUTOMATA
Speaker:
Ahmad Tarraf, Goethe University Frankfurt, DE
Authors:
Ahmad Tarraf and Lars Hedrich, Goethe University Frankfurt, DE
Abstract
Abstract—Accurate abstracted behavioral modeling of analog circuits is still an open problem, especially when the abstrac- tion process is automated. In this paper we present an auto- mated abstraction technique of transistor level circuits with full SPICE accuracy alongside a significant simulation speed-up. The methodology computes a hybrid automaton which is transformed into a behavioral model in Verilog-A. The resulting hybrid automaton exhibits linear behavior as well as the technology dependent nonlinear e.g. limiting behavior. The accuracy and speed-up of the methodology is evaluated on several transistor level circuits ranging from simple operational amplifiers up to a complex industrial OTA-based Gm/C filter. Finally, we formally verify the equivalence between the generated model and the original circuit. Index Terms—abstraction, verification, hybrid automaton, Verilog-A, behavioral modeling

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.2.2NUBOLIC SIMULATION OF AMS SYSTEMS WITH DATA FLOW AND DISCRETE EVENT MODELS
Speaker:
Carna Zivkovic, University of Kaiserslautern, DE
Authors:
Carna Zivkovic and Christoph Grimm, University of Kaiserslautern, DE
Abstract
This paper deals with the performance verification of analog/mixed-signal (AMS) systems by symbolic simulation. The approach is to piggyback the symbolic simulation via code-instrumentation on the existing, numeric SystemC AMS simulator. This permits the combination of symbolic and numeric simulation (``nubolic simulation''). The particular focus in the paper is the handling of the symbolic discrete-event process activations. This permits the symbolic simulation of digital parts of AMS systems modeled by discrete event processes. The approach is demonstrated by the symbolic simulation of a dual-charge-pump PLL of an IEEE 802.15.4 RF transceiver that includes an asynchronous digital counter as a frequency divider.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.2.3BAYESIAN OPTIMIZATION APPROACH FOR ANALOG CIRCUIT SYNTHESIS USING NEURAL NETWORK
Speaker:
Shuhan Zhang, Fudan University, CN
Authors:
Shuhan Zhang, Wenlong Lv, Fan Yang, Changhao Yan, Dian Zhou and Xuan Zeng, Fudan University, CN
Abstract
Bayesian optimization with Gaussian process as surrogate model has been successfully applied to analog circuit synthesis. In the traditional Gaussian process regression model, the kernel functions are defined explicitly. The computational complexity of training is O(N^3), and the computation complexity of prediction is O(N^2), where N is the number of training data. Gaussian process model can also be derived from a weight space view, where the original data are mapped to feature space, and the kernel function is defined as the inner product of nonlinear features. In this paper, we propose a Bayesian optimization approach for analog circuit synthesis using neural network. We use deep neural network to extract good feature representations, and then define Gaussian process using the extracted features. Model averaging method is applied to improve the quality of uncertainty prediction. Compared to Gaussian process model with explicitly defined kernel functions, the neural-network-based Gaussian process model can automatically learn a kernel function from data, which makes it possible to provide more accurate predictions and thus accelerate the follow-up optimization procedure. Also, the neural-network-based model has O(N) training time and constant prediction time. The efficiency of the proposed method has been verified by two real-world analog circuits.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30IP5-11, 959FINDING ALL DC OPERATING POINTS USING INTERVAL-ARITHMETIC BASED VERIFICATION ALGORITHMS
Speaker:
Itrat A. Akhter, University of British Columbia, CA
Authors:
Itrat Akhter, Justin Reiher and Mark Greenstreet, University of British Columbia, CA
Abstract
This paper applies interval-arithmetic based verification algorithms to circuit verification problems. In particular, we use Krawczyk's operator to find all DC operating points of CMOS circuits. We present what we believe to be the first, completely automatic verification of the Rambus ring-oscillator start-up problem. Comparisons with the dReal and Z3 SMT shows large performance and scalability advantages to the interval verification approach. We provide an open-source implementation that supports state-of-the-art short-channel device models.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


11.3 Special Session: Rebooting our Computing Models

Date: Thursday, March 28, 2019
Time: 14:00 - 15:30
Location / Room: Room 3

Organiser:
Pierre-Emmanuel Gaillardon, University of Utah, US, Contact Pierre-Emmanuel Gaillardon

Chair:
Pierre-Emmanuel Gaillardon, University of Utah, US, Contact Pierre-Emmanuel Gaillardon

Co-Chair:
Ian O'Connor, Ecole Centrale of Lyon, FR, Contact Ian O'Connor

With the current slowing down of Moore's Law, standard Von Neumann computing architectures are struggling more than ever to sustain the increase of computing needs. While alternative design approaches, such as the use of optimized accelerators or advanced power management techniques are successfully employed in contemporary designs, the trend keeps worsening due to the ever-increasing gap between on-chip and off-chip memory data rates. This trend, known as Von Neumann bottleneck, not only limits the system performance, but also acts nowadays as a limiter of the energy scaling. The quest towards more energy-efficiency requires solutions that disrupt the Von Neumann paradigm. In this hot topic session, we intend to elaborate on disruptive computing models far beyond our current Von Neumann computing model. Three talks will be provided: The first talk, from researchers from TU Delft, will cover quantum computing from the most basic physics to the practicalities of making a useful computer, including micro architecture and programming language). The second talk, from researchers from the University of Notre Dame, Georgia Tech and Penn State, will present new ways of computing using intrinsic oscillators. The third talk, from researchers from The University of California San Diego, will focus on memcomputing where self-organizing logic gates can be employ to solve complex computing problems very efficiently. In addition to provide a clear perspective to the DATE community beyond the currently hot novel architectures, such as neuromorphic or in-memory computing, this proposal also serve the purpose of tightening the link between DATE and the EDA community at large with the mission and roles of the IEEE Rebooting Computing Initiative - https://rebootingcomputing.ieee.org - that endorses it. We believe it will stimulate the EDA researchers to look into new grounds to develop their activities.

TimeLabelPresentation Title
Authors
14:0011.3.1FROM QUBIT TO COMPUTER
Authors:
Koen Bertels and Carmen G. Almudever, TU Delft, NL
Abstract
This talk will present the basics of quantum computing using superposition and entanglement between quantum bits, called qubits and also what programming a quantum computer actually involves. It will also highlight the quantum programming language, OpenQL, that has been developed in Delft and demonstrate how one can execute and simulate a quantum algorithm on the QX simulation platform that was developed for that purpose. The talk will also cover the core challenges of this field which have to do with the error rates and loss of a coherent state of the qubits. Enough attention will be given to the definition and development of a quantum computer architecture of which already two experimental versions have been implemented. Finally, the speaker will highlight ways to make the architecture sufficiently generic to control two quite different quantum technologies namely with superconducting and semiconducting qubits and explain the tight collaboration with Intel in this research.
14:3011.3.2INTRINSIC COMPUTING USING WEAKLY COUPLED OSCILLATORS
Author:
Nagadastagiri Reddy, Penn State, US
Abstract
The quest for alternate efficient computational systems is becoming critical as Moore's law scaling nears to an end. In this work, progress towards solving hard computational problems and common video processing applications using a weakly coupled oscillator system will be demonstrated. The paper will focus on synergistic advances in device fabrication, circuit design and system design in realizing such systems. These systems show promise of significant reduction in energy consumption compared to traditional CMOS designs.
15:0011.3.3THE MEMCOMPUTING PARADIGM
Author:
Massimiliano Di Ventra, UCSD, US
Abstract
This talk will discuss how to employ memory (time non-locality) in a novel physics-based approach to computation, memcomputing, and its practical realization with self-organizing logic gates (SOLGs). SOLGs are terminal-agnostic gates that self-organize to always satisfy their logical proposition regardless to which terminal(s) the truth value is assigned. As examples, the talk will highlight the polynomial-time solution of prime factorization, the search version of the subset-sum problem, and approximations to the Max-SAT beyond the inapproximability gap using polynomial resources. The talk will also show that these digital memcomputing machines compute via an instantonic phase, implying that they are robust against noise and disorder. The digital memcomputing machines that are proposed can be efficiently simulated, are scalable and can be easily realized with available nanotechnology components. This work is supported in part by MemComputing, Inc. (http://memcpu.com).
15:30End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


11.4 Learning Gets Smarter

Date: Thursday, March 28, 2019
Time: 14:00 - 15:30
Location / Room: Room 4

Chair:
Yuanqing Cheng, Beihang University, CN, Contact Yuanqing Cheng

Co-Chair:
Mariagrazia Graziano, Politecnico di Torino, IT, Contact Mariagrazia Graziano

Come and learn how emerging technologies enable deep learning and beyond for a wide range of applications: from industry speech recognition in industrial cloud computing to drones at the edge.

TimeLabelPresentation Title
Authors
14:0011.4.1NEUADC: NEURAL NETWORK-INSPIRED RRAM-BASED SYNTHESIZABLE ANALOG-TO-DIGITAL CONVERSION WITH RECONFIGURABLE QUANTIZATION SUPPORT
Speaker:
Xuan Zhang, WASHINGTON UNIVERSITY ST LOUIS, US
Authors:
Weidong Cao, Xin He, Ayan Chakrabarti and Xuan Zhang, Washington University, US
Abstract
Traditional analog-to-digital converters (ADCs) employ dedicated analog and mixed-signal (AMS) circuits and require time-consuming manual design process. They also exhibit limited reconfigurability and are unable to support diverse quantization schemes using the same circuitry. In this paper, we propose NeuADC --- an automated design approach to synthesizing an analog-to-digital (A/D) interface that can approximate the desired quantization function using a neural network (NN) with a single hidden layer. Our design leverages the mixed-signal resistive random-access memory (RRAM) crossbar architecture in a novel dual-path configuration to realize basic NN operations at the circuit level and exploits smooth bit-encoding scheme to improve the training accuracy. Results obtained from SPICE simulations based on 130nm technology suggest that not only can NeuADC deliver promising performance compared to the state-of-art ADC designs across comprehensive design metrics, but also it can intrinsically support multiple reconfigurable quantization schemes using the same hardware substrate, paving the ways for future adaptable application-driven signal conversion. The robustness of NeuADC's quantization quality under moderate RRAM resistance precision is also evaluated using SPICE simulations.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.4.2HOLYLIGHT: A NANOPHOTONIC ACCELERATOR FOR DEEP LEARNING IN DATA CENTERS
Speaker:
Weichen Liu, School of Computer Science and Engineering, Nanyang Technological University, Singapore, CN
Authors:
Weichen Liu1, Wenyang Liu2, Yichen Ye3, Qian Lou4, Yiyuan Xie3 and Lei Jiang5
1Nanyang Technological University, SG; 2College of Computer Science, Chongqing University, CN; 3College of Electronics and Information Engineering, Southwest University, CN; 4Department of Intelligent Systems Engineering, Indiana University, US; 5Indiana University Bloomington, US
Abstract
Convolutional Neural Networks (CNNs) are widely adopted in object recognition, speech processing and machine translation, due to their extremely high inference accuracy. However, it is challenging to compute massive computationally expensive convolutions of deep CNNs on traditional CPUs and GPUs. Emerging Nanophotonic technology has been employed for on-chip data communication, because of its CMOS compatibility, high bandwidth and low power consumption. In this paper, we propose a nanophotonic accelerator, HolyLight, to boost the CNN inference throughput in datacenters. Instead of an all-photonic design, HolyLight performs convolutions by photonic integrated circuits, and process the other operations in CNNs by CMOS circuits for high inference accuracy. We first build HolyLight-M by microdisk-based matrix-vector multipliers. We find analog-to-digital converters (ADCs) seriously limit its inference throughput per Watt. We further use microdisk-based adders and shifters to architect HolyLight-A without ADCs. Compared to the state-of-the-art ReRAM-based accelerator, HolyLight-A improves the CNN inference throughput per Watt by 13x with trivial accuracy degradation.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.4.3TRANSFER AND ONLINE REINFORCEMENT LEARNING IN STT-MRAM BASED EMBEDDED SYSTEMS FOR AUTONOMOUS DRONES
Speaker:
Insik Yoon, Georgia Institute of Technology, US
Authors:
Insik Yoon1, Aqeel Anwar1, Titash Rakshit2 and Arijit Raychowdhury1
1Georgia Institute of Technology, US; 2Samsung, US
Abstract
In this paper we present an algorithm-hardware co-design for camera-based autonomous flight in small drones. We show that the large write-latency and write-energy for non-volatile memory (NVM) based embedded systems makes them unsuitable for real-time reinforcement learning (RL). We address this by performing transfer learning (TL) on meta-environments and RL on the last few layers of a deep convolutional network, While the NVM stores the meta-model (from TL), an on-die SRAM stores the weights of the last few layers. Thus all the real-time updates via RL are carried out on the SRAM arrays. This provides us with a practical platform with comparable performance as end-to-end RL and 83.4% lower energy per image frame.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:1511.4.4AIX: A HIGH PERFORMANCE AND ENERGY EFFICIENT INFERENCE ACCELERATOR ON FPGA FOR A DNN-BASED COMMERCIAL SPEECH RECOGNITION
Speaker:
Minwook Ahn, SK Telecom, KR
Authors:
Minwook Ahn, Seok Joong Hwang, Wonsub Kim, Seungrok Jung, Yeonbok Lee, Mookyoung Chung, Woohyung Lim and Youngjoon Kim, SK Telecom, KR
Abstract
Automatic speech recognition (ASR) is crucial in virtual personal assistant (VPA) services such as Apple Siri, Amazon Alexa, Google Now and SKT NUGU. Recently, ASR has been showing a remarkable advance in accuracy by applying deep learning. However, with the explosive increase of the user utterances and growing complexities in ASR, the demands for the custom accelerators in datacenters are highly increasing in order to process them in real time with low power consumption. This paper evaluates a custom inference accelerator for ASR enhanced by a deep neural network, called AIX (Artificial Intelligence aXellerator). AIX is developed on a Xilinx FPGA and deployed to SKT NUGU since 2018. Owing to the full exploitation of DSP slices and memory bandwidth provided by FPGA, AIX outperforms the cutting-edge CPUs by 10.2 times and even a state-of-the-art GPU by 20.1 times with real time workloads of ASR in performance and power consumption wise. This improvement achieves faster response time in ASR, and in turn reduces the number of required machines in datacenters to a third.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


11.5 Vitello e Mozzarella alla Fiorentina: Virtualization, Multicore, and Fault-Tolerance

Date: Thursday, March 28, 2019
Time: 14:00 - 15:30
Location / Room: Room 5

Chair:
Philippe Coussy, Universite de Bretagne-Sud / Lab-STICC, FR, Contact Philippe Coussy

Co-Chair:
Michael Glass, Ulm University, DE, Contact Michael Glaß

This session showcases innovation solutions for optimizing the performance of multiprocessors and virtual machines, as well as fault-tolerant deep neural networks (DNNs). The first paper presents an approach to improve virtual-machine (VM) performance in scenarios where multiple VMs share a single physical storage device. The second paper applies formal (ILP-based) and heuristic techniques to the problem of scheduling approximate computing tasks in asymmetric multiprocessors containing cores with different performance/power trade-offs. The third paper introduces techniques to improve the robustness of DNNs to bit-flip errors, such as those due to single-event-upsets in space and military applications. Two interactive presentations round out the session, with the first being on task and data migration in virtualized multiprocessors, and the second on how to optimize the performance of machine learning tasks in compute clusters.

TimeLabelPresentation Title
Authors
14:0011.5.1VM-AWARE FLUSH MECHANISM FOR MITIGATING INTER-VM I/O INTERFERENCE
Speaker:
Taehyung Lee, Sungkyunkwan University, KR
Authors:
Taehyung Lee, Minho Lee and Young Ik Eom, Sungkyunkwan University, KR
Abstract
Consolidating multiple servers into a physical machine is now commonplace in cloud infrastructures. The virtualized systems often arrange virtual disks of multiple virtual machines (VMs) on the same underlying storage device while striving to guarantee the performance service level objective (SLO) for each VM. Unfortunately, sync operations called by a VM make it hard to satisfy the performance SLO by disturbing I/O activities of other VMs. We reveal that the disk cache flush command is a root cause of this problem and present a novel VM-aware flush mechanism, called vFLUSH, which supports the VM-based persistency control of the disk cache flush command. Our evaluation shows that vFLUSH reduces the average latency of disk cache flush commands by up to 52.0% and improves the overall I/O performance by up to 59.6% on real workloads.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.5.2AN EFFICIENT BIT-FLIP RESILIENCE OPTIMIZATION METHOD FOR DEEP NEURAL NETWORKS
Speaker:
Christoph Schorn, Robert Bosch GmbH, DE
Authors:
Christoph Schorn1, Andre Guntoro1 and Gerd Ascheid2
1Robert Bosch GmbH, DE; 2RWTH Aachen University, DE
Abstract
Deep neural networks usually possess a high overall resilience against errors in their intermediate computations. However, it has been shown that error resilience is generally not homogeneous within a neural network and some neurons might be very sensitive to faults. Even a single bit-flip fault in one of these critical neuron outputs can result in a large degradation of the final network output accuracy, which cannot be tolerated in some safety-critical applications. While critical neuron computations can be protected using error correction techniques, a resilience optimization of the neural network itself is more desirable, since it can reduce the required effort for error correction and fault protection in hardware. In this paper, we develop a novel resilience optimization method for deep neural networks, which builds upon a previously proposed resilience estimation technique. The optimization involves only few steps and can be applied to pre-trained networks. In our experiments, we significantly reduce the worst-case failure rates after a bit-flip fault for deep neural networks trained on the MNIST, CIFAR-10 and ILSVRC classification benchmarks.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.5.3APPROXIMATION-AWARE TASK DEPLOYMENT ON ASYMMETRIC MULTICORE PROCESSORS
Speaker:
Lei Mo, INRIA, FR
Authors:
Lei Mo1, Angeliki Kritikakou2 and Olivier Sentieys1
1INRIA, FR; 2IRISA/INRIA, Univ. Rennes, FR
Abstract
Asymmetric multicore processors (AMP) are a very promising architecture to deal efficiently with the wide diversity of applications. In real-time application domains, in-time approximated results are preferred than accurate -- but too late -- results. In this work, we propose a deployment approach that exploits the heterogeneity provided by AMP architectures and the approximation tolerance provided by the applications, so as to increase as much as possible the quality of the results under given energy and timing constraints. Initially, an optimal approach is proposed based on problem linearization and decomposition. Then, a heuristic approach is developed based on iteration relaxation of the optimal version. The obtained results show 16.3% reduction in the computation time for the optimal approach compared to the conventional optimal approaches. The proposed heuristic approach is about 100 times faster at the cost of a 29.8% QoS degradation in comparison with the optimal solution.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30IP5-12, 5GENIE: QOS-GUIDED DYNAMIC SCHEDULING FOR CNN-BASED TASKS ON SME CLUSTERS
Speaker:
Zhaoyun Chen, National University of Defense Technology, CN
Authors:
Zhaoyun Chen, Lei Luo, Haoduo Yang, Jie Yu, Mei Wen and Chunyuan Zhang, National University of Defense Technology, CN
Abstract
Convolutional Neural Network (CNN) has achieved dramatic developments in emerging Machine Learning (ML) services. Compared to online ML services, offline ML services that are full of diverse CNN workloads are common in small and medium-sized Enterprises (SMEs), research institutes and universities. Efficient scheduling and processing of multiple CNNbased tasks on SME clusters is both significant and challenging. Existing schedulers cannot predict the resource requirements of CNN-based tasks. In this paper, we propose GENIE, a QoS-guided dynamic scheduling framework for SME clusters that achieves users' QoS guarantee and high system utilization. Based on a prediction model derived from lightweight profiling, a QoS-guided scheduling strategy is proposed to identify the best placements for CNN-based tasks. We implement GENIE as a plugin of Tensorflow and experiment with real SME clusters and large-scale simulations. The results of the experiments demonstrate that the QoS-guided strategy outperforms other baseline schedulers by up to 67.4% and 28.2% in terms of QoSguarantee percentage and makespan.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:33IP5-13, 59ADIABATIC IMPLEMENTATION OF MANCHESTER ENCODING FOR PASSIVE NFC SYSTEM
Speaker:
Sachin Maheshwari, University of Westminster, GB
Authors:
Sachin Maheshwari1 and Izzet Kale2
1university of Westminster, GB; 2University of Westminster, GB
Abstract
Energy plays an important role in NFC passive tags as they are powered by radio waves from the reader. Hence reducing the energy consumption of the tag can bring large interrogation range, increase security and maximizes the reader's battery life. The ISO 14443 standard utilizes Manchester coding for the data transmission from passive tag to the reader in the majority of the cases for NFC passive communications. This paper proposes a novel method of Manchester encoding using the adiabatic logic technique for energy minimization. The design is implemented by generating replica bits of the actual transmitted bits and then flipping the replica bits, for generating the Manchester coded bits. The proposed design was implemented using two adiabatic logic families namely; Positive Feedback Adiabatic Logic (PFAL) and Improved Efficient Charge Recovery Logic (IECRL) which are compared in terms of energy for the range of frequency variations. The energy comparison was also made including the power-clock generator designed using 2-stepwise charging circuit (SWC) with FSM controller. The simulation results presented for 180nm CMOS technology at 1.8V power supply shows that IECRL shows approximately 40% less system energy compared to PFAL family.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


11.6 Design Automation Solutions for Microfluidic Platforms and Tasks

Date: Thursday, March 28, 2019
Time: 14:00 - 15:30
Location / Room: Room 6

Chair:
Robert Wille, Johannes Kepler University Linz, AT, Contact Robert Wille

Co-Chair:
Andy Tyrrell, University of York, GB, Contact Andy Tyrrell

The session provides talks on design automation for microfluidic devices which covers both, a wide range of different platforms and tasks. More precisely, the presentations are covering platforms such as biochips based on micro-electrode-dot arrays (MEDA biochips), flow-based biochips, and Programmable Microfluidic Devices (PMDs). The covered tasks include parameter space exploration, physical synthesis, and washing. This variety makes this session ideal for both, experts already working in the area and interest in the latest results but also researchers who are curios about this domain and want to get a closer insight.

TimeLabelPresentation Title
Authors
14:0011.6.1BIOSCAN: PARAMETER-SPACE EXPLORATION OF SYNTHETIC BIOCIRCUITS USING MEDA BIOCHIPS
Speaker:
Mohamed Ibrahim, Intel Corporation, US
Authors:
Mohamed Ibrahim1, Bhargab Bhattacharya2 and Krishnendu Chakrabarty1
1Duke University, US; 2Indian Statistical Institute, Kolkata, IN
Abstract
Recent advances in microfluidic technology offer efficient platforms to emulate complex molecular networks of biological pathways (biocircuits) on a lab-on-chip. The behavior of biocircuits is governed by a number of gene-regulatory parameters. A fundamental challenge in synthesizing and verifying biocircuits is the lack of design tools that implement biocircuit-regulatory scanning (BRS) assays to explore the large parameter-space efficiently, while optimizing synthesis time and reagent cost. In this paper, we introduce an optimization flow named BioScan for systematic exploration of the parameter-space of a biocircuit. BioScan includes: (1) a statistical approach to determine a subset of mixing ratios of reagents that span the entire parameter space as densely as possible subject to certain cost constraints; (2) an ILP-based synthesis method that implements a BRS-assay on a micro-electrode dot-array biochip. Simulation results show that BioScan reduces reagent cost and enhances space-filling properties.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.6.2PHYSICAL SYNTHESIS OF FLOW-BASED MICROFLUIDIC BIOCHIPS CONSIDERING DISTRIBUTED CHANNEL STORAGE
Speaker:
Xing Huang, National Tsing Hua University, TW
Authors:
Zhisheng Chen1, Xing Huang2, Wenzhong Guo1, Bing Li3, Tsung-Yi Ho2 and Ulf Schlichtmann3
1Fuzhou University, CN; 2National Tsing Hua University, TW; 3TUM, DE
Abstract
Flow-based microfluidic biochips (FBMBs) have attracted much attention over the past decade. On such a micrometer-scale platform, various biochemical applications, also called bioassays, can be processed concurrently and automatically. To improve execution efficiency and reduce fabrication cost, a distributed channel-storage architecture (DCSA) can be implemented on this platform, where fluid samples can be cached temporarily in flow channels close to components. Although this distributed storage architecture can improve the execution efficiency of FBMBs significantly, it requires a careful arrangement of fluid samples to enable the channels to fulfill the dual functions of transportation and caching. In this paper, we formulate the first practical flow-layer physical design problem considering DCSA, and propose a top-down synthesis algorithm to generate efficient solutions considering execution efficiency, washing, and resource usage simultaneously. Experimental results demonstrate that the proposed algorithm leads to a shorter execution time, less flowchannel length, and a higher efficiency of on-chip resource utilization for biochemical applications compared with a direct approach to incorporate distributed storage into existing frameworks.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.6.3BLOCK-FLUSHING: A BLOCK-BASED WASHING ALGORITHM FOR PROGRAMMABLE MICROFLUIDIC DEVICES
Speaker:
Bing Li, TUM, DE
Authors:
Yu-Huei Lin1, Tsung-Yi Ho1, Bing Li2 and Ulf Schlichtmann2
1National Tsing Hua University, TW; 2TUM, DE
Abstract
Programmable Microfluidic Devices (PMDs) have emerged as a new architecture for next-generation flow-based biochips. These devices can be dynamically reconfigured to execute different bioassays flexibly and efficiently owing to their two- dimensional regularly-arranged valve structure. During execution of a bioassay or between the execution of multiple bioassays, some areas on the PMD, however, become contaminated and must be cleaned by washing them with a buffer flow before they are reused. In this paper, we propose a novel block-based washing technique called block flushing. In this method, contaminated areas are first collected according to given patterns and flushed as a whole to increase washing efficiency. Simulation results show that with this technique the proposed method can achieve on average 28% improvement in reducing washing time compared with two other baseline solutions.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30IP5-14, 916A PULSE WIDTH MODULATION BASED POWER-ELASTIC AND ROBUST MIXED-SIGNAL PERCEPTRON DESIGN
Speaker:
Sergey Mileiko, MR, GB
Authors:
Sergey Mileiko1, Rishad Shafik1, Alex Yakovlev1 and Jonathan Edwards2
1Newcastle University, GB; 2Temporal Computing, GB
Abstract
Neural networks are exerting burgeoning influence in emerging artificial intelligence applications at the micro-edge, such as sensing systems. As many of these systems are typically self-powered, their circuits are expected to be resilient and efficient to continuous power variations imposed by the harvesters. In this paper, we propose a novel mixed-signal (i.e. analogue/digital) approach of designing a power-elastic perceptron using the principle of pulse width modulation (PWM). Fundamental to the design are a number of parallel inverters that transcode the input-weight pairs based on the principle of PWM duty cycle. Since PWM-based inverters are typically resilient to amplitude and frequency variations, the perceptron shows a high degree of power elasticity and robustness in the presence of these variations. Our extensive design analysis also demonstrates significant power and area efficiency, leading to significant reduction in dynamic and leakage energy when compared with a purely digital equivalent.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:31IP5-15, 320FAULT LOCALIZATION IN PROGRAMMABLE MICROFLUIDIC DEVICES
Speaker:
Ulf Schlichtmann, TUM, DE
Authors:
Alessandro Bernardini, Chunfeng Liu, Bing Li and Ulf Schlichtmann, TUM, DE
Abstract
Programmable Microfluidic Devices (PMDs) have revolutionized the traditional biochemical experiment flow. Test algorithms for PMDs have recently been proposed. Test patterns can be generated algorithmically. But an algorithm for fault localization once some faults have been identified is not yet available. When testing a PMD, once a test pattern fails it is unknown where the stuck valve is located. The stuck valve can be any one valve out of many valves forming the test pattern. In this paper, we propose an effective algorithm for the localization of stuck-at-0 faults and stuck-at-1 faults in a PMD. The stuck valve is localized either exactly or within a very small set of candidate valves. Once the locations of faulty valves are known, it becomes possible to continue to use the PMD by resynthesizing the application

Download Paper (PDF; Only available from the DATE venue WiFi)
15:32IP5-16, 529THERMAL SENSING USING MICRO-RING RESONATORS IN OPTICAL NETWORK-ON-CHIP
Speaker:
Mengquan Li, Chongqing University, CN
Authors:
Weichen Liu1, Mengquan Li2, Wanli Chang3, Chunhua Xiao2, Yiyuan Xie4, Nan Guan5 and Lei Jiang6
1Nanyang Technological University, SG; 2Chongqing University, CN; 3University of York, GB; 4Southwest University, CN; 5Hong Kong Polytechnic University, HK; 6Indiana University Bloomington, US
Abstract
In this paper, we for the first time utilize the micro-ring resonators (MRs) in optical networks-on-chip (ONoCs) to implement thermal sensing without requiring additional hardware or chip area. The challenges in accuracy and reliability that arise from fabrication-induced process variations (PVs) and device-level wavelength tuning mechanism are resolved.We quantitatively model the intrinsic thermal sensitivity of MRs with fine-grained consideration of wavelength tuning mechanism. Based on it, a novel PV-tolerant thermal sensor design is proposed. By exploiting the hidden 'redundancy' in wavelength division multiplexing (WDM) technique, our sensor achieves accurate and efficient temperature measurement with the capability of PV tolerance. Evaluation results based on professional photonic component and circuit simulations show an average of 86.49% improvement in measurement accuracy compared to the state-of-the-art on-chip thermal sensing approach using MRs. Our thermal sensor achieves stable performance in the ONoCs employing dense WDM with an inaccuracy of only 0.8650 K.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


11.7 Extending Scheduling Schemes

Date: Thursday, March 28, 2019
Time: 14:00 - 15:30
Location / Room: Room 7

Chair:
Marco Di Natale, Scuola Superiore Sant’Anna of Pisa, IT, Contact Marco Di natale

Co-Chair:
Mitra Nasri, TU Delft, NL, Contact Mitra Nasri

This session presents papers with generalization of fixed-priority, global EDF, and applications to synchronous data flows

TimeLabelPresentation Title
Authors
14:0011.7.1ANALYZING GEDF SCHEDULING FOR PARALLEL REAL-TIME TASKS WITH ARBITRARY DEADLINES
Speaker:
Xu Jiang, The Hong Kong Polytechnic University, HK
Authors:
Xu Jiang1, Nan Guan1, Di Liu2 and Weichen Liu3
1The Hong Kong Polytechnic University, HK; 2Yunnan University, CN; 3Nanyang Technological University, SG
Abstract
Real-time and embedded systems are shifting from single-core to multi-core processors, on which software must be parallelized to fully utilize the computation capacity of hardware. Recently much work has been done on the scheduling of parallel real-time tasks modeled as directed acyclic graphs (DAG). However, most of these studies assume tasks to have implicit or constrained deadlines. Much less work considered the general case of arbitrary deadlines (i.e., the relative deadline is allowed to be larger than the period), which is more difficult to analyze due to intra-task interference among jobs. In this paper, we study the analysis of Global Earliest Deadline First (GEDF) scheduling for DAG parallel tasks with arbitrary deadlines. We develop new analysis techniques for GEDF scheduling of a single DAG task, which not only outperform the state-of-the-art in general evidenced by empirical evaluation, but also guarantee a better capacity augmentation bound 2.41 (the best known result is 2.5). The proposed analysis techniques are also extended to and evaluated with the case of multiple DAG tasks using the federated scheduling approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.7.2SIMPLE AND GENERAL METHODS FOR FIXED-PRIORITY SCHEDULABILITY IN OPTIMIZATION PROBLEMS
Speaker:
Paolo Pazzaglia, Scuola Superiore Sant Anna, Pisa, IT
Authors:
Paolo Pazzaglia, Alessandro Biondi and Marco Di Natale, Scuola Superiore Sant'Anna, IT
Abstract
This paper presents a set of sufficient-only, but accurate schedulability tests for fixed-priority scheduling. The tests apply to the general case of scheduling with constrained deadline where tasks can incur in blocking times, be subject to release jitters, activated with fixed offsets, or involved in transactions with other tasks. The proposed tests come in a linear closed-form with a number of conditions polynomial in the number of tasks. All tests are targeted for use when encoding schedulability constraints within Mixed-Integer Linear Programming for the purpose of optimizing real-time systems (e.g., to address task partitioning in a multicore system). The tests are evaluated with a large-scale experimental study based on synthetic workload, revealing a failure rate (with respect to the state-of-the-art reference tests) of less than 1% in average, and at most of 2% in a very small number of limit-case configurations.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.7.3HARD REAL-TIME SCHEDULING OF STREAMING APPLICATIONS MODELED AS CYCLIC CSDF GRAPHS
Speaker:
Sobhan Niknam, Leiden University, NL
Authors:
Sobhan Niknam, Peng Wang and Todor Stefanov, Leiden University, NL
Abstract
Recently, it has been shown that the classical hard real-time scheduling theory can be applied to streaming applications modeled as acyclic Cyclo-Static Dataflow (CSDF) graphs. However, many streaming applications are modeled as cyclic CSDF graphs, thus they are not supported by such scheduling theory. Therefore, in this paper, we propose an approach which enables to apply the classical hard real-time scheduling theory on streaming applications modeled as cyclic CSDF graphs. The proposed approach converts each task in a cyclic CSDF graph to a constrained-deadline periodic task. This conversion enables the utilization of many hard real-time scheduling algorithms which offer properties such as temporal isolation and fast calculation of the required number of processors for scheduling the tasks. We evaluate the performance of our approach in comparison to existing scheduling approaches. The evaluation, on a set of real-life benchmarks, demonstrates that our approach can schedule the tasks in an application, modeled as a cyclic CSDF graph, with guaranteed throughput equal or comparable to the throughput obtained by existing scheduling approaches while providing hard real-time guarantees for every task in the application thereby enabling temporal isolation among concurrently running tasks/applications on a multi-processor platform.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


11.8 An Industry Approach to FPGA/ARM System Development and Verification (part 1)

Date: Thursday, March 28, 2019
Time: 14:00 - 15:30
Location / Room: Exhibition Theatre

Organiser:
John Zhao, MathWorks, US, Contact John Zhao

MATLAB and Simulink provide a rich environment for embedded-system development, with libraries of proven, specialized algorithms ready to use for specific applications. The environment enables a model-based design workflow for fast prototyping and implementation of the algorithms on heterogeneous embedded targets, such as MPSoC. A system-level design approach enables architectural exploration and partitioning, as well as coordination between SW and HW development workflows. Functional verification throughout the design process improves coverage and test-case generation while reducing the time and resources required.

In this set of tutorial sessions, you will learn

  • How to evaluate hardware and software system architectures using latest feature in Simulink
  • How to implement an application that leverages the FPGA and ARM core of a Zynq SOC
  • The flexibility and diversity of the approach through examples that include prototyping a motor control algorithm and a video-processing algorithm.
  • A HW/SW co-design workflow that combines system level design and simulation with automatic code generation
TimeLabelPresentation Title
Authors
14:0011.8.1AN INDUSTRY APPROACH TO FPGA/ARM SYSTEM DEVELOPMENT AND VERIFICATION (PART 1)
Speaker:
John Zhao, MathWorks, US
15:30End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019


IP5 Interactive Presentations

Date: Thursday, March 28, 2019
Time: 15:30 - 16:00
Location / Room: Poster Area

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

LabelPresentation Title
Authors
IP5-1THERMAL-AWARENESS IN A SOFT ERROR TOLERANT ARCHITECTURE
Speaker:
Sajjad Hussain, Chair for Embedded Systems, KIT, DE
Authors:
Sajjad Hussain1, Muhammad Shafique2 and Joerg Henkel1
1Karlsruhe Institute of Technology, DE; 2Vienna University of Technology (TU Wien), AT
Abstract
It is crucial to provide soft error reliability in a power-efficient manner such that the maximum chip temperature remains within the safe operating limits. Different execution phases of an application have diverse performance, power, temperature and vulnerability behavior that can be leveraged to fulfill the resiliency requirements within the allowed thermal constraints. We propose a soft error tolerant architecture with fine-grained redundancy for different architectural components, such that their reliable operations can be activated selectively at fine-granularity to maximize the reliability under a given thermal constraint. When compared with state-of-the-art, our temperature-aware fine-grained reliability manager provides up to 30% reliability within the thermal budget.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-2A SOFTWARE-LEVEL REDUNDANT MULTITHREADING FOR SOFT/HARD ERROR DETECTION AND RECOVERY
Speaker:
Hwisoo So, Yonsei University, KR
Authors:
Moslem Didehban1, HwiSoo So2, Aviral Shrivastava1 and Kyoungwoo Lee2
1Arizona State University, US; 2Yonsei University, KR
Abstract
Advances in semiconductor technology have enabled unprecedented growth in safety-critical applications. In such environments, error resiliency is one of the main design concerns. Software level Redundant MultiThreading is one of the most promising error resilience strategies because they can potentially serve as inexpensive and flexible solutions for hardware unreliability issues i.e. soft and hard errors. However, the error coverage of the existing software level RMT solutions is limited to soft error detection and they rely on external schemes for error recovery. In this paper, we investigate the potential of software-level RMT schemes for complete soft and hard error detection and recovery. First, we pinpoint the main reasons behind ineffectiveness of basic software level triple redundant multithreading (STRMT) in protection against soft and hard errors. Then we introduce FISHER (FlexIble Soft and Hard Error Resiliency) as a software-only RMT scheme which can achieve comprehensive error resiliency against both soft and hard errors. Rather than performing centralized voting operations for critical instructions operands, FISHER distributes and intertwines error detection and recovery operations between redundant threads. To evaluate the effectiveness of the proposed solution, we performed more than 135,000 soft and hard error injection experiments on different hardware components of an ARM cortex53-like μ-architecturally simulated microprocessor. The results demonstrate that FISHER can reduce programs failure rate by around 261× and 162× compared to original and basic STRMTprotected versions of programs, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-3COMMON-MODE FAILURE MITIGATION:INCREASING DIVERSITY THROUGH HIGH-LEVEL SYNTHESIS
Speaker:
Farah Naz Taher, University of Texas at Dallas, US
Authors:
Farah Naz Taher1, Matthew Joslin1, Anjana Balachandran2, Zhiqi Zhu1 and Benjamin Carrion Schaefer1
1The University of Texas at Dallas, US; 2The Hong Kong Polytechnic University, HK
Abstract
Fault tolerance is vital in many domains. One popular way to increase fault-tolerance is through hardware redundancy. However, basic redundancy cannot cope with Common Mode Failures (CMFs). One way to address CMF is through the use of diversity in combination with traditional hardware redundancy. This work proposes an automatic design space exploration (DSE) method to generate optimized redundant hardware accelerators with maximum diversity to protect against CMFs given as a single behavioral description for High-Level Synthesis (HLS). For this purpose, this work exploits one of the main advantages of C-based VLSI design over the traditional RT-level design based on low-level Hardware Description Languages (HDLs): The ability to generate micro-architectures with unique characteristics from the same behavioral description. Experimental results show that the proposed method provides a significant diversity increment compared to using traditional RTL-based exploration to generate diverse designs.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-4EXPLOITING WAVELENGTH DIVISION MULTIPLEXING FOR OPTICAL LOGIC SYNTHESIS
Speaker:
David Z. Pan, University of Texas, Austin, US
Authors:
Zheng Zhao1, Derong Liu2, Zhoufeng Ying1, Biying Xu1, Chenghao Feng1, Ray T. Chen1 and David Z. Pan1
1University of Texas, Austin, US; 2Cadence Design Systems, US
Abstract
Photonic integrated circuit (PIC), as a promising alternative to traditional CMOS circuit, has demonstrated the potential to accomplish on-chip optical interconnects and computations in ultra-high speed and/or low power consumption. Wavelength division multiplexing (WDM) is widely used in optical communication for enabling multiple signals being processed and transferred independently. In this work, we apply WDM to optical logic PIC synthesis to reduce the PIC area.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-5IGNORETM: OPPORTUNISTICALLY IGNORING TIMING VIOLATIONS FOR ENERGY SAVINGS USING HTM
Speaker:
Dimitra Papagiannopoulou, University of Massachusetts Lowell, US
Authors:
Dimitra Papagiannopoulou1, Sungseob Whang2, Tali Moreshet3 and Iris Bahar4
1University of Massachusetts Lowell, US; 2CloudHealth Technologies, US; 3Boston University, US; 4Brown University, US
Abstract
Energy consumption is the dominant factor in many computing systems. Voltage scaling is a widely used technique to lower energy consumption, which exploits supply voltage margins to ensure reliable circuit operation. Aggressive voltage scaling will slow signal propagation; without coherent frequency relaxation, timing violations may be generated. Hardware Transactional Memory (HTM) offers an error recovery mechanism that allows reliable execution and power savings with modest overhead. We propose IgnoreTM, an adaptive error management framework, that tolerates (i.e., opportunistically ignores) timing violations, allowing for more aggressive voltage scaling. Our experimental results show that IgnoreTM allows up to 47% total energy savings with negligible impact on runtime.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-6USING MACHINE LEARNING FOR QUALITY CONFIGURABLE APPROXIMATE COMPUTING
Speaker:
Mahmoud Masadeh, Concordia University, CA
Authors:
Mahmoud Masadeh, Osman Hasan and Sofiene Tahar, Department of Electrical and Computer Engineering, Concordia University, Montreal, Quebec, CA
Abstract
Approximate computing (AC) is a nascent energy-efficient computing paradigm for error-resilient applications. However, the quality control of AC is quite challenging due to its input-dependent nature. Existing solutions fail to address fine-grained input-dependent controlled approximation. In this paper, we propose an input-aware machine learning based approach for the quality control of AC. For illustration purposes, we use 20 configurations of 8-bit approximate multipliers. We evaluate these designs for all combinations of possible input data. Then, we use machine learning algorithms to efficiently make predictive decisions for the quality control of the target approximate application, based on experimentally collected training data. The key benefits of the proposed approach include: (1) fine-grained input-dependent approximation, (2) no missed approximation opportunities, (3) no rollback recovery overhead, (4) applicable to any approximate computation with error-tolerant components, and (5) flexibility in adapting various error metrics.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-7PREDICTION-BASED TASK MIGRATION ON S-NUCA MANY-CORES
Speaker:
Martin Rapp, Karlsruhe Institute of Technology, DE
Authors:
Martin Rapp1, Anuj Pathania1, Tulika Mitra2 and Joerg Henkel1
1Karlsruhe Institute of Technology, DE; 2National University of Singapore, SG
Abstract
Performance of a task running on a many-core with distributed shared Last-Level Cache (LLC) strongly depends on two factors: the power budget needed to guarantee thermally safe operation and the LLC latency. The task's thread-to-core mapping determines both the factors. Arrival and departure of tasks on a many-core deployed in an open system can change its state significantly in terms of available cores and power budget. Task migrations can thereupon be used as a tool to keep the many-core operating at the peak performance. Furthermore, the relative impacts of power budget and LLC latency on a task's performance can change with its different execution phases mandating its migration on-the-fly. We propose the first run-time algorithm PCMig that increases the performance of a many-core with distributed shared LLC by migrating tasks based on their phases and the many-core's state. PCMig is based on a performance-prediction model that predicts the performance impact of migrations. PCMig results in up to 16% reduction in the average response time compared to the state-of-the-art.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-8DESIGN OF HARDWARE-FRIENDLY MEMORY ENHANCED NEURAL NETWORKS
Speaker:
Ann Franchesca Laguna, University of Notre Dame, US
Authors:
Ann Franchesca Laguna, Michael Niemier and X, Sharon Hu, University of Notre Dame, US
Abstract
Neural networks with external memories have been proven to minimize catastrophic forgetting, a major problem in applications such as lifelong and few-shot learning. However, such memory enhanced neural networks (MENNs) typically often require a large number of floating point-based cosine distance metric calculations to perform necessary attentional operations, which greatly increases energy consumption and hardware cost. This paper investigates other distance metrics in such neural networks in order to achieve more efficient hardware implementations in MENNs. We propose using content addressable memories (CAMs) to accelerate and simplify attentional operations. We focus on reducing the bit precision, memory size (MxD) and using alternative distance metric calculations such as L1, L2, and L∞ to perform attentional mechanism computations for MENNs. Our hardware friendly approach implements fixed point L∞ distance calculations via ternary content addressable memories (TCAM) and fixed point L1 and L2 distance calculations on a general purpose graphical processing unit (GPGPU) (Computing-in-memory arrays (CIM) might also be used). As a representative example, a 32-bit floating point-based cosine distance MENN with MD multiplications has a 99.06% accuracy for the Omniglot 5-way 5-shot classification task. Based on our approach with just 4-bit fixed point precision, a L∞-L1 distance hardware accuracy of 90.35% can be achieved with just 16 TCAM lookups and 16D addition and subtraction operations. With 4-bit precision and a L∞-L2 distance, hardware classification accuracies of 96.00% are possible. Hence, 16 TCAM lookups and 16D multiplication operations are needed. Assuming the hardware memory has 512 entries, the number of multiplication operations is reduced by 32x versus the cosine distance approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-9ENERGY-EFFICIENT INFERENCE ACCELERATOR FOR MEMORY-AUGMENTED NEURAL NETWORKS ON AN FPGA
Speaker:
Seongsik Park, Seoul National University, KR
Authors:
Seongsik Park, Jaehee Jang, Seijoon Kim and Sungroh Yoon, Seoul National University, KR
Abstract
Memory-augmented neural networks (MANNs) are designed for question-answering tasks. It is difficult to run a MANN effectively on accelerators designed for other neural networks (NNs), in particular on mobile devices, because MANNs require recurrent data paths and various types of operations related to external memory access. We implement an accelerator for MANNs on a field-programmable gate array (FPGA) based on a data flow architecture. Inference times are also reduced by inference thresholding, which is a data-based maximum inner-product search specialized for natural language tasks. Measurements on the bAbI data show that the energy efficiency of the accelerator (FLOPS/kJ) was higher than that of an NVIDIA TITAN V GPU by a factor of about 125, increasing to 140 with inference thresholding.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-10HDCLUSTER: AN ACCURATE CLUSTERING USING BRAIN-INSPIRED HIGH-DIMENSIONAL COMPUTING
Speaker:
Mohsen Imani, University of California, San Diego, US
Authors:
Mohsen Imani, Yeseong Kim, Thomas Worley, Saransh Gupta and Tajana Rosing, University of California San Diego, US
Abstract
Internet of things has increased the rate of data generation. Clustering is one of the most important tasks in this domain to find the latent correlation between data. However, performing today's clustering tasks is often inefficient due to the data movement cost between cores and memory. We propose HDCluster, a brain-inspired unsupervised learning algorithm which clusters input data in a high-dimensional space by fully mapping and processing in memory. Instead of clustering input data in either fixed-point or floating-point representation, HDCluster maps data to vectors with dimension in thousands, called hypervectors, to cluster them. Our evaluation shows that HDCluster provides better clustering quality for the tasks that involve a large amount of data while providing a potential for accelerating in a memory-centric architecture.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-11FINDING ALL DC OPERATING POINTS USING INTERVAL-ARITHMETIC BASED VERIFICATION ALGORITHMS
Speaker:
Itrat A. Akhter, University of British Columbia, CA
Authors:
Itrat Akhter, Justin Reiher and Mark Greenstreet, University of British Columbia, CA
Abstract
This paper applies interval-arithmetic based verification algorithms to circuit verification problems. In particular, we use Krawczyk's operator to find all DC operating points of CMOS circuits. We present what we believe to be the first, completely automatic verification of the Rambus ring-oscillator start-up problem. Comparisons with the dReal and Z3 SMT shows large performance and scalability advantages to the interval verification approach. We provide an open-source implementation that supports state-of-the-art short-channel device models.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-12GENIE: QOS-GUIDED DYNAMIC SCHEDULING FOR CNN-BASED TASKS ON SME CLUSTERS
Speaker:
Zhaoyun Chen, National University of Defense Technology, CN
Authors:
Zhaoyun Chen, Lei Luo, Haoduo Yang, Jie Yu, Mei Wen and Chunyuan Zhang, National University of Defense Technology, CN
Abstract
Convolutional Neural Network (CNN) has achieved dramatic developments in emerging Machine Learning (ML) services. Compared to online ML services, offline ML services that are full of diverse CNN workloads are common in small and medium-sized Enterprises (SMEs), research institutes and universities. Efficient scheduling and processing of multiple CNNbased tasks on SME clusters is both significant and challenging. Existing schedulers cannot predict the resource requirements of CNN-based tasks. In this paper, we propose GENIE, a QoS-guided dynamic scheduling framework for SME clusters that achieves users' QoS guarantee and high system utilization. Based on a prediction model derived from lightweight profiling, a QoS-guided scheduling strategy is proposed to identify the best placements for CNN-based tasks. We implement GENIE as a plugin of Tensorflow and experiment with real SME clusters and large-scale simulations. The results of the experiments demonstrate that the QoS-guided strategy outperforms other baseline schedulers by up to 67.4% and 28.2% in terms of QoSguarantee percentage and makespan.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-13ADIABATIC IMPLEMENTATION OF MANCHESTER ENCODING FOR PASSIVE NFC SYSTEM
Speaker:
Sachin Maheshwari, University of Westminster, GB
Authors:
Sachin Maheshwari1 and Izzet Kale2
1university of Westminster, GB; 2University of Westminster, GB
Abstract
Energy plays an important role in NFC passive tags as they are powered by radio waves from the reader. Hence reducing the energy consumption of the tag can bring large interrogation range, increase security and maximizes the reader's battery life. The ISO 14443 standard utilizes Manchester coding for the data transmission from passive tag to the reader in the majority of the cases for NFC passive communications. This paper proposes a novel method of Manchester encoding using the adiabatic logic technique for energy minimization. The design is implemented by generating replica bits of the actual transmitted bits and then flipping the replica bits, for generating the Manchester coded bits. The proposed design was implemented using two adiabatic logic families namely; Positive Feedback Adiabatic Logic (PFAL) and Improved Efficient Charge Recovery Logic (IECRL) which are compared in terms of energy for the range of frequency variations. The energy comparison was also made including the power-clock generator designed using 2-stepwise charging circuit (SWC) with FSM controller. The simulation results presented for 180nm CMOS technology at 1.8V power supply shows that IECRL shows approximately 40% less system energy compared to PFAL family.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-14A PULSE WIDTH MODULATION BASED POWER-ELASTIC AND ROBUST MIXED-SIGNAL PERCEPTRON DESIGN
Speaker:
Sergey Mileiko, MR, GB
Authors:
Sergey Mileiko1, Rishad Shafik1, Alex Yakovlev1 and Jonathan Edwards2
1Newcastle University, GB; 2Temporal Computing, GB
Abstract
Neural networks are exerting burgeoning influence in emerging artificial intelligence applications at the micro-edge, such as sensing systems. As many of these systems are typically self-powered, their circuits are expected to be resilient and efficient to continuous power variations imposed by the harvesters. In this paper, we propose a novel mixed-signal (i.e. analogue/digital) approach of designing a power-elastic perceptron using the principle of pulse width modulation (PWM). Fundamental to the design are a number of parallel inverters that transcode the input-weight pairs based on the principle of PWM duty cycle. Since PWM-based inverters are typically resilient to amplitude and frequency variations, the perceptron shows a high degree of power elasticity and robustness in the presence of these variations. Our extensive design analysis also demonstrates significant power and area efficiency, leading to significant reduction in dynamic and leakage energy when compared with a purely digital equivalent.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-15FAULT LOCALIZATION IN PROGRAMMABLE MICROFLUIDIC DEVICES
Speaker:
Ulf Schlichtmann, TUM, DE
Authors:
Alessandro Bernardini, Chunfeng Liu, Bing Li and Ulf Schlichtmann, TUM, DE
Abstract
Programmable Microfluidic Devices (PMDs) have revolutionized the traditional biochemical experiment flow. Test algorithms for PMDs have recently been proposed. Test patterns can be generated algorithmically. But an algorithm for fault localization once some faults have been identified is not yet available. When testing a PMD, once a test pattern fails it is unknown where the stuck valve is located. The stuck valve can be any one valve out of many valves forming the test pattern. In this paper, we propose an effective algorithm for the localization of stuck-at-0 faults and stuck-at-1 faults in a PMD. The stuck valve is localized either exactly or within a very small set of candidate valves. Once the locations of faulty valves are known, it becomes possible to continue to use the PMD by resynthesizing the application

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-16THERMAL SENSING USING MICRO-RING RESONATORS IN OPTICAL NETWORK-ON-CHIP
Speaker:
Mengquan Li, Chongqing University, CN
Authors:
Weichen Liu1, Mengquan Li2, Wanli Chang3, Chunhua Xiao2, Yiyuan Xie4, Nan Guan5 and Lei Jiang6
1Nanyang Technological University, SG; 2Chongqing University, CN; 3University of York, GB; 4Southwest University, CN; 5Hong Kong Polytechnic University, HK; 6Indiana University Bloomington, US
Abstract
In this paper, we for the first time utilize the micro-ring resonators (MRs) in optical networks-on-chip (ONoCs) to implement thermal sensing without requiring additional hardware or chip area. The challenges in accuracy and reliability that arise from fabrication-induced process variations (PVs) and device-level wavelength tuning mechanism are resolved.We quantitatively model the intrinsic thermal sensitivity of MRs with fine-grained consideration of wavelength tuning mechanism. Based on it, a novel PV-tolerant thermal sensor design is proposed. By exploiting the hidden 'redundancy' in wavelength division multiplexing (WDM) technique, our sensor achieves accurate and efficient temperature measurement with the capability of PV tolerance. Evaluation results based on professional photonic component and circuit simulations show an average of 86.49% improvement in measurement accuracy compared to the state-of-the-art on-chip thermal sensing approach using MRs. Our thermal sensor achieves stable performance in the ONoCs employing dense WDM with an inaccuracy of only 0.8650 K.

Download Paper (PDF; Only available from the DATE venue WiFi)

12.1 Special Day on "Model-Based Design of Intelligent Systems" Session: MBD of Safe and Secure Systems

Date: Thursday, March 28, 2019
Time: 16:00 - 17:30
Location / Room: Room 1

Chair:
Frédéric Mallet, Université Nice Sophia Antipolis, FR, Contact Frederic Mallet

Co-Chair:
Marc Geilen, Eindhoven University of Technology, NL, Contact Marc Geilen

TimeLabelPresentation Title
Authors
16:0012.1.1SEMANTIC INTEGRATION PLATFORM FOR CYBER-PHYSICAL SYSTEM DESIGN
Speaker:
Qishen Zhang, Institute for Software Integrated Systems Vanderbilt University, US
Authors:
Qishen Zhang, Ted Bapty, Tamas Kecskes and Janos Sztipanovits, Vanderbilt University, US
Abstract
Cyber-Physical Systems (CPS) are establishing heterogeneous engineering domains leading to engineering processes that span multiple design disciplines with separate modeling approaches, design flows and supporting tool suites. One of the challenges of design automation in CPS is the deep integration of models, tools and design flows such that design trade-offs across traditionally isolated design disciplines is facilitated. In this paper we overview experience and results gained along the implementation of an experimental design automation tool suite, OpenMETA, created for a complex CPS design challenge in the ground vehicle domain. The focus of the paper is domain agnostic methods and tools providing infrastructure for the model- and tool- integration platforms in OpenMETA. We present the arguments leading to the creation of the integration platforms instead of pursuing ad-hoc integration of heterogeneous tools and provide details on facilitating semantic integration.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.1.2WORST-CASE CAUSE-EFFECT REACTION LATENCY IN SYSTEMS WITH NON-BLOCKING COMMUNICATION
Speaker:
Yi Wang, Uppsala University, SE
Authors:
Jakaria Abdullah, Gaoyang Dai and Yi Wang, Uppsala University, SE
Abstract
In real-time embedded systems, a system function- ality is often implemented using a data-flow chain over a set of communicating tasks. A critical non-functional requirement in such systems is to restrict the amount of time, i.e. cause- effect latency, for an input to impact its corresponding output. The problem of estimating the worst-case cause-effect latency is well-studied in the context of blocking inter-task communication. Recent research results show that non-blocking communication preserving functional semantics is critical for the model-based design of dynamically updatable systems. In this paper, we study the worst-case cause-effect reaction latency estimation problem in the context of non-blocking inter-task communication. We present a computationally efficient algorithm that tightly over- approximates the exact worst-case reaction latency in cause-effect data-flow chains.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.1.3HARMONIZING SAFETY, SECURITY AND PERFORMANCE REQUIREMENTS IN EMBEDDED SYSTEMS
Speaker:
Ludovic Apvrille, LTCI, Télécom ParisTech, Université Paris-Saclay, FR
Authors:
Ludovic Apvrille and Letitia Li, Télécom ParisTech, FR
Abstract
Connected embedded systems have added new con- veniences and safety measures to our daily lives -monitoring, automation, entertainment, etc-, but many of them interact with their users in ways where flaws will have grave impacts on personal health, property, privacy, etc, such as systems in the domains of healthcare, automotives, avionics, and other personal devices with access to sensitive information. Designing these systems with a comprehensive model-driven design process, from requirement elicitation to iterative design, can help detect issues, or incongruities within the requirements themselves earlier. This paper discusses how safety, security, and performance require- ments should be assured with a systematic design process, and how these properties can support or conflict with each other as detected during the verification process.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session

12.2 The Art of Synthesizing Logic

Date: Thursday, March 28, 2019
Time: 16:00 - 17:30
Location / Room: Room 2

Chair:
Jordi Cortadella, UPC, ES, Contact Jordi Cortadella

Co-Chair:
Tiziano Villa, University of Verona, IT, Contact Tiziano Villa

The recent progress in logic synthesis is presented in this session. The research targets new emerging applications of synthesis and extends the current computing limits of logic synthesis. The first paper introduces an approximate synthesis algorithm for realizing a logic function on a lattice using Boolean satisfiability. The second paper presents a scalable Boolean optimization flow including enhancements to difference-based resubstitution, AIG optimization and kerneling. The third paper improves exact logic synthesis techniques and integrates them into a scalable generic logic rewriting algorithm. The fourth paper proposes a polynomial-time algorithm for computing the closest symmetric approximation for a Boolean function.

TimeLabelPresentation Title
Authors
16:0012.2.1A SATISFIABILITY-BASED APPROXIMATE ALGORITHM FOR LOGIC SYNTHESIS USING SWITCHING LATTICES
Speaker:
Levent Aksoy, Istabul Technical University, TR
Authors:
Levent Aksoy and Mustafa Altun, Istanbul Technical University, TR
Abstract
In recent years the realization of a logic function on two-dimensional arrays of four-terminal switches, called switching lattices, has attracted considerable interest. Exact and approximate methods have been proposed for the problem of synthesizing Boolean functions on switching lattices with minimum size, called lattice synthesis (LS) problem. However, the exact method can only handle relatively small instances and the approximate methods may find solutions that are far from the optimum. This paper introduces an approximate algorithm, called JANUS, that formalizes the problem of realizing a logic function on a given lattice, called lattice mapping (LM) problem, as a satisfiability problem and explores the search space of the LS problem in a dichotomic search manner, solving LM problems for possible lattice candidates. This paper also presents three methods to improve the initial upper bound and an efficient way to realize multiple logic functions on a single lattice. Experimental results show that JANUS can find solutions very close to the minimum in a reasonable time and obtain better results than the existing approximate methods. The solutions of JANUS can also be better than those of the exact method, which cannot be determined to be optimal due to the given time limit, where the maximum gain on the number of switches reaches up to 25%.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.2.2SCALABLE BOOLEAN METHODS IN A MODERN SYNTHESIS FLOW
Speaker:
Eleonora Testa, EPFL, CH
Authors:
Eleonora Testa1, Luca Amaru2, Mathias Soeken1, Alan Mishchenko3, Patrick Vuillod2, Jiong Luo2, Christopher Casares2, Pierre-Emmanuel Gaillardon4 and Giovanni De Micheli1
1EPFL, CH; 2Synopsys Inc., US; 3UC Berkeley, US; 4University of Utah, US
Abstract
With the continuous push to improve Quality of Results (QoR) in Electronic Design Automation (EDA), Boolean methods in logic synthesis have been recently drawing the attention of researchers. Boolean methods achieve better QoR than algebraic methods but require higher computational cost. In this paper, the Scalable Boolean Method (SBM) framework is presented. The SBM consists of 4 optimization engines designed to be scalable in a modern synthesis flow. The first presented engine is a generalized resubstitution framework based on computing, and implementing, the Boolean difference between two nodes. The second consists of a gradient-based AIG optimization, while the third one is based on heterogeneous elimination for kerneling. The last proposed engine is a revisiting of Maximum Set of Permissible Functions (MSPF) computation with BDDs. Altogether, the SBM framework enables promising synthesis results. We improve 12 of the best known area results in the EPFL synthesis competition. Embedded in a commercial EDA flow for state-of-the-art ASICs, the new Boolean methods enable - 2.20% combinational area savings and -5.99% total negative slack reduction, after physical implementation, at contained runtime cost.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.2.3ON-THE-FLY AND DAG-AWARE: REWRITING BOOLEAN NETWORKS WITH EXACT SYNTHESIS
Speaker:
Heinz Riener, EPFL, CH
Authors:
Heinz Riener1, Winston Haaswijk1, Alan Mishchenko2, Giovanni De Micheli1 and Mathias Soeken1
1EPFL, CH; 2UC Berkeley, US
Abstract
The paper presents a generalization of DAG-aware AIG rewriting for k-feasible Boolean networks, whose nodes are k-input lookup tables (k-LUTs). We introduce a DAG-aware rewriting algorithm, called cut rewriting, that uses exact synthesis to compute replace- ments on the fly. Cut rewriting pre-computes a large number of possible replacement candidates, but instead of eagerly rewriting the Boolean network, stores the replacements in a conflict graph. Heuristic optimization is used to determine a best, maximal subset of replacements that can be simultaneously applied to the Boolean network. We have implemented cut rewriting and have optimized 3- LUT mapped Boolean networks obtained from the ISCAS and EPFL combinational benchmark suites. For 3-LUT networks, experiments show that we achieve an average size improvement of 5.58% and up to 40.19% after state-of-the-art Boolean rewriting techniques were applied until saturation. Similarly, for 4-LUT networks, we obtain an average improvement of 4.04% and up to 12.60%.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:1512.2.4APPROXIMATE LOGIC SYNTHESIS BY SYMMETRIZATION
Speaker:
Anna Bernasconi, Universita, IT
Authors:
Anna Bernasconi1, Valentina Ciriani2 and Tiziano Villa3
1Università di Pisa, IT; 2Università degli Studi di Milano, IT; 3Dipartimento d'Informatica, Università di Verona, IT
Abstract
Approximate synthesis is a recent trend in logic synthesis that changes some outputs of a logic specification to take advantage of error tolerance of some applications and reduce complexity and consumption of the final implementation. We propose a new approach to approximate synthesis of combinational logic where we derive its closest symmetric approximation, i.e., the symmetric function obtained by injecting the minimum number of errors in the original function. Since BDDs of totally symmetric functions are quite compact, this approach is particularly convenient for BDD-based implementations, such as networks of MUXes directly mapped from BDDs. Our contribution is twofold: first we propose a polynomial algorithm for computing the closest symmetric approximation of an incompletely specified Boolean function with an unbounded number of errors; then we discuss strategies to achieve partial symmetrization of the original specification while satisfying given error bounds. Experimental results on classical and new benchmarks confirm the efficacy of the proposed approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session

12.3 Aging, calibration circuits and yield

Date: Thursday, March 28, 2019
Time: 16:00 - 17:30
Location / Room: Room 3

Chair:
Hank Walker, TAMU, US, Contact Hank Walker

Co-Chair:
Naghmeh Karimi, University of Maryland Baltimore county, US, Contact Naghmeh Karimi

This session discusses methods to mitigate defects, faults, variability and reliability

TimeLabelPresentation Title
Authors
16:0012.3.1PACKAGE AND CHIP ACCELERATED AGING TESTS FOR POWER MOSFET RELIABILITY EVALUATION
Speaker:
Tingyou Lin, Department of LAD Technology, Vanguard International Semiconductor, TW
Authors:
Tingyou Lin1, Chauchin Su1, Chung-Chih Hung1, Karuna Nidhi2, Chily Tu2 and Shao-Chang Huang2
1National Chiao Tung University, TW; 2Vanguard International Semiconductor Corporation, TW
Abstract
This paper investigates power MOSFET stress conditions for package aging and chip aging evaluation. It is used to reduce the measurement time to obtain the characterization shift while component aging. For the reliability of semiconductor devices, the lifetime is related to the device operating temperature and its electric field. In the power semiconductor, the junction temperature is related to the power pulse time, the chip size, and the heat sink of the self-heating effect of the device. To model the power MOSFET lifetime, a new method is proposed to accelerate aging with the pulse time and the power dissipation controlling. A test chip is designed and fabricated in a 0.15μm BCD process. The measured results demonstrate the 10kμm width of the power MOSFET with Rds,on increasing of 72% after the total stress time of 6.3hr for the package aging. In the chip aging, the measured results show the MOSFET by increasing Ron for 12% after 600 times stress pulse. The measurement verifies that the accelerated aging in the package and the chip can be controlled separately.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.3.2BAYESIAN OPTIMIZED IMPORTANCE SAMPLING FOR HIGH SIGMA FAILURE RATE ESTIMATION
Speaker:
Dennis Weller, Karlsruhe Institute of Technology, DE
Authors:
Dennis Weller, Michael Hefenbrock, Mohammad Saber Golanbari, Michael Beigl and Mehdi Tahoori, Karlsruhe Institute of Technology, DE
Abstract
Due to aggressive technology downscaling, process and runtime variations have a strong impact on the correct functionality in the field as well as manufacturing yield. The assessment of the yield and failure rate is extremely crucial for design optimization. The common practice is to use Monte Carlo simulations in order to account for device variations and estimate failure rate. However, Monte Carlo methods are infeasible for estimating rare events such as high sigma failure rates, and hence, various importance sampling methods have been proposed. In this paper, we present an efficient importance sampling approach based on Bayesian optimization. Its advantages include constant complexity independent of the dimensions of design space, the potential to find the global extrema, and higher trustworthiness of the estimated failure rate. We evaluated the approach on a 6T SRAM cell based on a 28nm FDSOI process. The results show significant speedup and more than two orders of magnitude better accuracy in failure rate estimation, compared to the best state-of-the-art technique.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.3.3WAFER-LEVEL ADAPTIVE VMIN CALIBRATION SEED FORECASTING
Speaker:
Yiorgos Makris, The University of Texas at Dallas, US
Authors:
Constantinos Xanthopoulos1, Deepika Neethirajan1, Sirish Boddikurapati2, Amit Nahar3 and Yiorgos Makris1
1The University of Texas at Dallas, US; 2Texas Instruments Inc., US; 3Texas Instruments inc., US
Abstract
To combat the effects of process variation in modern, high-performance integrated Circuits (ICs), various post-manufacturing calibrations are typically performed. These calibrations aim to bring each device within its specification limits and ensure that it abides by current technology standards. Moreover, with the increasing popularity of mobile devices that usually depend on finite energy sources, power consumption has been introduced as an additional constraint. As a result, post-silicon calibration is often performed to identify the optimal operating voltage (Vmin) of a given IC. This calibration is time-consuming, as it requires the device to be tested in a wide range of voltage inputs across a large number of tests. In this work, we propose a machine learning-based methodology for reducing the cost of performing the Vmin calibration search, by identifying an optimal wafer-level starting voltage (seed). The effectiveness of the proposed methodology is demonstrated on an industrial dataset.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:1512.3.4SINGLE-EVENT DOUBLE-UPSET SELF-RECOVERABLE AND SINGLE-EVENT TRANSIENT PULSE FILTERABLE LATCH DESIGN FOR LOW POWER APPLICATIONS
Speaker:
Aibin Yan, Anhui University, CN
Authors:
Aibin Yan1, Yuanjie Hu1, Jie Song1 and Xiaoqing Wen2
1Anhui University, CN; 2Kyushu Institute of Technology, JP
Abstract
This paper presents a single-event double-upset (SEDU) self-recoverable and single-event transient (SET) pulse filterable latch design for low power applications in 22nm CMOS technology. The latch mainly consists of eight mutually feeding back C-elements and a Schmitt trigger. Simulation results have demonstrated both the SEDU self-recoverability and SET pulse filterability for the latch using redundant silicon area. Using clock gating technology, the latch saves about 54.85% power dissipation on average compared with the up-to-date SEDU self-recoverable latch designs which are not SET pulse filterable at all.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session

12.4 Design and Optimization for Low-Power Applications

Date: Thursday, March 28, 2019
Time: 16:00 - 17:30
Location / Room: Room 4

Chair:
Giuseppe Tagliavini, Università di Bologna, IT, Contact Giuseppe Tagliavini

Co-Chair:
Jan Madsen, Technical Univ, Denmark, DK, Contact Jan Madsen

This session explores low-power design from different point of views, from neural network based scheduling of multicores and image processing, to ultra low-power for near-theshold computing and continous monitoring IoT sensors.

TimeLabelPresentation Title
Authors
16:0012.4.1DYNAMIC SCHEDULING ON HETEROGENEOUS MULTICORES
Speaker:
Ann Franchesca Laguna, University of Notre Dame, US
Authors:
Ayobami Edun, Ruben Vazquez, Ann Gordon-Ross and Greg Stitt, University of Florida, US
Abstract
Heterogeneous multicore systems help meet design goals by using different architectural components that are suitable for different application needs. The individual cores may also have different tunable architectural parameters for additional specialization. However, this creates a challenge in mapping applications to cores that contain the best configuration based on an application's needs. This decision can be made by performing a sample run of the application on each core type and configuration, or using heuristics to explore the design space, however, given complex systems, these methods may be infeasible. In this paper, we present a methodology for dynamic scheduling of applications on heterogeneous multicore systems using predictive methods for reduced energy consumption. We use an artificial neural network (ANN) to train our predictive model using hardware counters in the system. The trained network can then predict the best configuration. Our scheduler uses this prediction to schedule the application to the best core (the core that offers the best configuration) and configures that core to the best configuration. If the best core is busy, alternative idle cores are considered for scheduling or the application is stalled. This decision is made based on which option meets the energy advantage considerations. Our experiments show that the total energy of a system can be reduced by 28% on average as compared to the system that uses the same fixed cache configuration for all cores.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.4.2SELECTING THE OPTIMAL ENERGY POINT IN NEAR-THRESHOLD COMPUTING
Speaker:
Sami Salamin, Karlsruhe Institute of Technology (KIT), DE
Authors:
Sami Salamin, Hussam Amrouch and Joerg Henkel, Karlsruhe Institute of Technology, DE
Abstract
Near-Threshold Computing (NTC) has recently emerged as an attractive paradigm as it allows devices to operate close to their optimal energy point (OEP). This work demonstrates, for the first time, that determining where the OEP of a processor exists is challenging because standard cells, forming the processor's netlist, unevenly profit w.r.t power and also unevenly degrade w.r.t delay when the voltage approaches the near-threshold region. To precisely explore, at design time, where OEP is, we create voltage-aware cell libraries that enable designers to seamlessly employ the standard tool flows, even they were not designed for that purpose, to perform voltage-aware timing and power analysis. Besides determining where the OEP is, we also demonstrate how providing logic synthesis tool flows with voltage-aware cell libraries results in a 35% higher performance at NTC. In addition, we investigate how the performance loss at NTC can be compensated through parallelized computing demonstrating, for the first time, that the OEP moves far from NTC as the number of cores increases. Our proposed methodology enables designers to select the maximum number of cores along with the optimal operating voltage jointly in which a specific power budget is fulfilled. Finally, we show how voltage-aware design for parallelized NTC provides [40%-50%] performance increase compared to traditional (i.e., voltage-unaware design) parallelized NTC.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.4.3EXPLORATION AND DESIGN OF LOW-ENERGY LOGIC CELLS FOR 1 KHZ ALWAYS-ON SYSTEMS
Speaker:
Maxime Feyerick, ESAT-MICAS, KU Leuven, BE
Authors:
Maxime Feyerick, Jaro De Roose and Marian Verhelst, KU Leuven, BE
Abstract
A standard cell library targeting always-on operation at 1 kHz is designed at circuit-level. This paper proposes a design methodology to achieve robust operation with minimum energy. Such minimum energy per operation for always-on systems is achieved by one specific supply and threshold voltage Vth combination. As Vth is discrete in a practical bulk technology, this minimum can however not be achieved through simple voltage tuning. In the considered 90 nm CMOS technology, Vth is too low resulting in leakage dominated systems and preventing from attaining the minimum energy point in subthreshold. Three circuit techniques are optimally combined to fight leakage: stacking, reverse body biasing and optimal transistor dimensioning relying on second order effects of the dimensions on Vth. They jointly allow logic gates to achieve the best balance between dynamic and leakage power. Moreover, the paper presents modified flip-flop topologies that also reliably operate at 0.27 V along with the gates. Benefits of improved logic gates and flip-flops are demonstrated on a small always-on feature-extraction system calculating running average and variance on a 1 Ksample/s data stream. The resulting system consumes 162 pW in simulation, or two orders of magnitude less when compared to a commercial library at its 1 V nominal voltage, or 1 order of magnitude less when compared to the commercial library at the same 0.27 V operating voltage.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:1512.4.4ENABLING ENERGY-EFFICIENT UNSUPERVISED MONOCULAR DEPTH ESTIMATION ON ARMV7-BASED PLATFORMS
Speaker:
Antonio Cipolletta, Politecnico di Torino, IT
Authors:
Valentino Peluso1, Antonio Cipolletta1, Andrea Calimera1, Matteo Poggi2, Fabio Tosi2 and Stefano Mattoccia2
1Politecnico di Torino, IT; 2Università di Bologna, IT
Abstract
This work deals with the implementation of energy-efficient monocular depth estimation using a low-cost CPU for low-power embedded systems. The paper first describes the PyD-Net depth estimation network, which consists of a lightweight CNN able to approach state-of-the-art accuracy with ultra-low resource usage. Then it proposes an accuracy-driven complexity reduction strategy based on a hardware-friendly fixed-point quantization. Finally, it introduces the low-level optimization enabling effective use of integer neural kernels. The objective is threefold: (i) prove the efficiency of the new quantization flow on a depth estimation network, that is, the capability to retaining the accuracy reached by floating-point arithmetic using 16- and 8-bit integers, (ii) demonstrate the portability of the quantized model into a general-purpose 32-bit RISC architecture of the ARM Cortex family, (iii) quantify the accuracy-energy tradeoff of unsupervised monocular estimation to establish its use in the embedded domain. The experiments have been run on a Raspberry PI board powered by a Broadcom BCM2837 chipset. A parametric analysis conducted over the KITTI dateset shows marginal accuracy loss with 16-bit (8-bit) integers and energy savings up to 6.55x (9.23x) w.r.t. floating-point. Compared to high-end CPU and GPU the proposed solution improves scalability.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session

12.5 System Modelling for Analysis and Simulation

Date: Thursday, March 28, 2019
Time: 16:00 - 17:30
Location / Room: Room 5

Chair:
Ingo Sander, KTH Royal Institute of Technology, SE, Contact Ingo Sander

Co-Chair:
Gianluca Palermo, Politecnico di Milano, IT, Contact Gianluca Palermo

The session highlights the importance of system modelling for design, performance analysis and optimisation. The first paper proposes a novel dataflow model of computation supporting reconfigurability for dynamic systems. The second paper combines the synchronous dataflow model of computation with a probabilistic method for real-time analysis. Finally, the last paper addresses the simulation of SystemC-based virtual prototypes using speculative temporal decoupling.

TimeLabelPresentation Title
Authors
16:0012.5.1RDF: RECONFIGURABLE DATAFLOW
Speaker:
Xavier Nicollin, Univ. Grenoble Alpes, FR
Authors:
Pascal Fradet1, Alain Girault1, Ruby Krishnaswamy2, Xavier Nicollin3 and Arash Shafiei2
1INRIA, FR; 2Orange, FR; 3G-INP, FR
Abstract
Dataflow Models of Computation (MoCs) are widely used in embedded systems, including multimedia processing, digital signal processing, telecommunications, and automatic control. In a dataflow MoC, an application is specified as a graph of actors connected by FIFO channels. One of the most popular dataflow MoCs, Synchronous Dataflow (SDF), provides static analyses to guarantee boundedness and liveness, which are key properties for embedded systems. However, SDF (and most of its variants) lacks the capability to express the dynamism needed by modern streaming applications. In particular, the applications mentioned above have a strong need for reconfigurability to accommodate changes in the input data, the control objectives, or the environment. We address this need by proposing a new MoC called Reconfigurable Dataflow (RDF). RDF extends SDF with transformation rules that specify how the topology and actors of the graph may be reconfigured. Starting from an initial RDF graph and a set of transformation rules, an arbitrary number of new RDF graphs can be generated at runtime. A key feature of RDF is that it can be statically analyzed to guarantee that all possible graphs generated at runtime will be consistent and live. We introduce the RDF MoC, describe its associated static analyses, and outline its implementation.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.5.2PROBABILISTIC STATE-BASED RT-ANALYSIS OF SDFGS ON MPSOCS WITH SHARED MEMORY COMMUNICATION
Speaker:
Ralf Stemmer, Carl von Ossietzky Universität Oldenburg, DE
Authors:
Ralf Stemmer1, Henning Schlender1, Maher Fakih2, Kim Grüttner2 and Wolfgang Nebel1
1University of Oldenburg, DE; 2OFFIS e.V., DE
Abstract
This paper extends a state-based timing analysis for Synchronous Dataflow Applications on an MPSoC with shared memory. The existing approach transforms a mapped and timing annotated SDF graph into a timed automata representation for the analysis of timing properties. One major drawback of the existing timing annotation approach is the usage of best- and worst-case execution time intervals, resulting in an overestimation of the actual timing behavior. This paper proposes to replace the timing bound annotation with a probability density function. For the overall timing analysis we use a stochastic timed automata model. We demonstrate and evaluate our approach on a Sobel filter, which is used in many image and video processing algorithms. As a reference, we compare our stochastic execution time model against a fixed best-/worst-case execution time model and against the measured execution time on an FPGA prototype. The results are promising and clearly indicate that our probabilistic approach provides tighter timing analysis results in comparison to the best-/worst-case execution analysis model.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.5.3SPECULATIVE TEMPORAL DECOUPLING USING FORK()
Speaker:
Matthias Jung, Fraunhofer IESE, DE
Authors:
Matthias Jung1, Frank Schnicke1, Markus Damm1, Thomas Kuhn1 and Norbert Wehn2
1Fraunhofer IESE, DE; 2University of Kaiserslautern, DE
Abstract
Temporal decoupling is a state-of-the-art method to speed up virtual prototypes. In this technique, a process is allowed to run ahead of simulation time for a specific interval called quantum. By using this method, the number of synchronization points, i.e. context switches, in the simulator is reduced and therefore, the simulation speed can be increased significantly. However, using this approach can introduce functional simulation errors due to missed synchronization events. Thus, using temporal decoupling implies a trade-off between speed and accuracy and the size of the quantum must be chosen wisely with respect to the simulated application. In loosely timed simulations most of the functional errors are tolerable for the sake of simulation speed. However, for instance safety critical errors are rare but can lead to fatal results and must be handled carefully. Prior works present mechanisms based on checkpoints (storing/restoring the internal state of the simulation model) in order to rollback in simulation time and correct the occurred errors by forcing synchronization. However, checkpointing approaches are intrusive and require changes to both the source code of all the used simulation models and the kernel of the simulator. In this paper we present a non-intrusive rollback approach for error-free temporal decoupling, which allows the usage of closed source models by using Unix's fork() system call. Furthermore, we provide a case study based on the IEEE simulation standard SystemC.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session

12.6 Trojans and public key implementation challenges

Date: Thursday, March 28, 2019
Time: 16:00 - 17:30
Location / Room: Room 6

Chair:
Patrick Schaumont, Virginia Tech, US, Contact Patrick Schaumont

Co-Chair:
Nele Mentens, KU Leuven, BE, Contact Nele Mentens

This session contains 2 pagers on Trojans, of which one is on formal methods to design and detect, and the other of practical attacks in the context of multi-tenant FPGAs. The other two papers discuss implementation challenges of public key in ASIC and FPGA.

TimeLabelPresentation Title
Authors
16:0012.6.1WHEN CAPACITORS ATTACK: FORMAL METHOD DRIVEN DESIGN AND DETECTION OF CHARGE-DOMAIN TROJANS
Speaker:
Yier Jin, University of Florida, US
Authors:
Xiaolong Guo1, Huifeng Zhu2, Yier Jin1 and Xuan Zhang2
1University of Florida, US; 2Washington University in St. Louis, US
Abstract
The rapid growth and globalization of the integrated circuit (IC) industry put the threat of hardware Trojans (HTs) front and center among all security concerns in the IC supply chain. Current Trojan detection approaches always assume HTs are composed of digital circuits. However, recent demonstrations of analog attacks, such as A2 and Rowhammer, invalidate the digital assumption in previous HT detection or testing methods. At the system level, attackers can utilize the analog properties of the underlying circuits such as charge-sharing and capacitive coupling effects to create information leakage paths. These new capacitor-based vulnerabilities are rarely covered in digital testings. To address these stealthy yet harmful threats, we identify a large class of such capacitor-enabled attacks and define them as charge-domain Trojans. We are able to abstract the detailed charge-domain models for these Trojans and expose the circuit-level properties that critically contribute to their information leakage paths. Aided by the abstract models, an information flow tracking (IFT) based solution is developed to detect charge-domain leakage paths and then identify the charge-domain Trojans/vulnerabilities. Our proposed method is validated on an experimental RISC microcontroller design injected with different variants of charge-domain Trojans. We demonstrate that successful detection can be accomplished with an automatic tool which realizes the IFT-based solution.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.6.2FOURQ ON ASIC: BREAKING SPEED RECORDS FOR ELLIPTIC CURVE SCALAR MULTIPLICATION
Speaker:
Hiromitsu Awano, The University of Tokyo, JP
Authors:
Hiromitsu Awano and Makoto Ikeda, The University of Tokyo, JP
Abstract
An ASIC cryptoprocessor for scalar multiplication (SM) on FourQ is proposed. By exploiting Karatsuba multiplication and lazy reduction techniques, the arithmetic units of the proposed processor are tailored for operations over quadratic extension field (Fp2). We also propose an automated instruction scheduling methodology based on a combinatorial optimization solver to fully exploit the available instruction-level parallelism. With the proposed processor fabricated by using a 65 nm silicon-on-thin-box (SOTB) CMOS process, we demonstrate that an SM can be computed in 10.1us when a typical operating voltage of 1.20 V is applied, which corresponds to 3.66x acceleration compared to the conventional P-256 curve SM accelerator implemented on an ASIC platform and is the fastest ever reported. We also demonstrate that by lowering the supply voltage down to 0.32 V, the lowest ever reported energy consumption of 0.327uJ/SM is achieved.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.6.3DARL: DYNAMIC PARAMETER ADJUSTMENT FOR LWE-BASED SECURE INFERENCE
Speaker:
Song Bian, Kyoto University, JP
Authors:
Song Bian, Masayuki Hiromoto and Takashi Sato, Kyoto University, JP
Abstract
Packed additive homomorphic encryption (PAHE)-based secure neural network inference is attracting increasing attention in the field of applied cryptography. In this work, we seek to improve the practicality of LWE-based secure inference by dynamically changing the cryptographic parameters depending on the underlaying architecture of the neural network. We develop and apply theoretical methods to closely examine the error behavior of secure inference, and propose parameters that can reduce as much as 67% of ciphertext size when smaller networks are used. In addition, we use rare-event simulation techniques based on the sigma-scale sampling method to provide tight bounds on the size of cumulative errors drawn from (somewhat) arbitrary distributions. In the experiment, we instantiate an example PAHE scheme and show that we can further reduce the ciphertext size by 3.3x if we adopt a binarized neural network architecture, along with a computation speedup of 2x--3x.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:1512.6.4TIMING VIOLATION INDUCED FAULTS IN MULTI-TENANT FPGAS
Speaker:
Mirjana Stojilovic, EPFL, CH
Authors:
Dina Mahmoud and Mirjana Stojilovic, EPFL, CH
Abstract
FPGAs have made their way into the cloud, allowing users to gain remote access to the state-of-the-art reconfigurable fabric and implement their custom accelerators. Since FPGAs are large enough to accommodate multiple independent designs, the multi-tenant user scenario may soon be prevalent in cloud computing environments. However, shared use of an FPGA raises security concerns. Recently discovered hardware Trojans for use in multi-tenant FPGA settings target denial-of-service attacks, power side-channel attacks, and crosstalk side-channel attacks. In this work, we present an attack method for causing timing- constraints violation in the multi-tenant FPGA setting. This type of attack is very dangerous as the consequences of timing faults are temporary errors, which are often impossible to notice. We demonstrate the attack on a set of self-timed true random number generators (STRNGs), frequently used in cryptographic applications. When the attack is launched, the STRNG outputs become biased and fail randomness tests. However, after the attack, STRNGs recover and continue generating random bits.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session

12.7 Emerging Strategies for Deep Neural Network Hardware

Date: Thursday, March 28, 2019
Time: 16:00 - 17:30
Location / Room: Room 7

Chair:
Jim Harkin, University of Ulster, GB, Contact Jim Harkin

Co-Chair:
Li Jiang, Institute: Shanghai Jiao Tong University, CN, Contact Li Jiang

This session presents new approaches to the acceleration of deep neural networks focused on ReRAM-based architectures with papers focusing on the key challenges of reliable operation with unreliable devices and strategies for counter-aging effects. In addition, 3D ReRAM are proposed in the acceleration of general graphics processing. In the evolution of stochastic computing, emerging work on low-cost and energy efficient convolutional neural networks is also explored with deterministic bitstream processing.

TimeLabelPresentation Title
Authors
16:0012.7.1AGING-AWARE LIFETIME ENHANCEMENT FOR MEMRISTOR-BASED NEUROMORPHIC COMPUTING
Speaker:
Shuhang Zhang, TUM, DE
Authors:
Shuhang Zhang1, Grace Li Zhang1, Bing Li1, Hai (Helen) Li2 and Ulf Schlichtmann1
1TUM, DE; 2Duke University, US
Abstract
Deep Neural Networks (DNNs) have been applied in various fields successfully. Such networks, however, require significant computing resources. Traditional CMOS-based implementation cannot efficiently implement the specific computing patterns such as matrix multiplication. Therefore, memristor-based crossbars have been proposed to accelerate such computing tasks by their analog nature, which also leads to a significant reduction of power consumption. Neural networks must be trained to recognize the features of the applications. This training process leads to many repetitive updates of the memristors in the crossbar. However, memristors in the crossbar can only be programmed reliably for a given number of times. Afterwards, the working range of the memristors deviates from the fresh state. As a result, the weights of the corresponding neural networks cannot be implemented correctly and the classification accuracy drops significantly. This phenomenon is called aging, and it limits the lifetime of memristor-based crossbars. In this paper, we propose a co-optimization framework to reduce the aging effect in software training and hardware mapping simultaneously to counter the aging effect. Experimental results demonstrate that the proposed framework can extend the lifetime of such crossbars up to 15 times, while the expected accuracy of classification is maintained.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.7.2ENERGY-EFFICIENT CONVOLUTIONAL NEURAL NETWORKS WITH DETERMINISTIC BIT-STREAM PROCESSING
Speaker:
M. Hassan Najafi, University of Louisiana at Lafayette, US
Authors:
Sayed Abdolrasoul Faraji1, M. Hassan Najafi2, Bingzhe Li1, Kia Bazargan3 and David Lilja1
1University of Minnesota, Twin Cities, US; 2University of Louisiana at Lafayette, US; 3University of Minnesota, US
Abstract
Stochastic computing (SC) has been used for low-cost and low power implementation of neural networks. Inherent inaccuracy and long latency of processing random bit-streams have made prior SC-based implementations inefficient compared to conventional fixed-point designs. Random or pseudo-random bitstreams often need to be processed for a very long time to produce acceptable results. This long latency leads to a significantly higher energy consumption than the binary design counterparts. Low-discrepancy sequences have been recently used for fast-converging deterministic computation with stochastic constructs. In this work, we propose a low-cost, low-latency, and energy-efficient implementation of convolutional neural networks based on low-discrepancy deterministic bit-streams. Experimental results show a significant reduction in the energy consumption compared to conventional random bitstream-based implementations and to the optimized fixed-point design with no quality degradation.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.7.3RED: A RERAM-BASED DECONVOLUTION ACCELERATOR
Speaker:
Hai (Helen) Li, Duke University, US
Authors:
Zichen Fan1, Ziru Li1, Bing Li2, Yiran Chen3 and Hai (Helen) Li3
1Tsinghua University, CN; 2Duke university, US; 3Duke University, US
Abstract
Deconvolution has been widespread in neural networks. For example, it is essential for performing unsupervised learning in generative adversarial networks or constructing fully convolutional networks for semantic segmentation. Resistive RAM (ReRAM)-based processing-in-memory architecture has been widely explored in accelerating convolutional computation and demonstrates good performance. Performing deconvolution on existing ReRAM-based accelerator designs, however, suffers from long latency and high energy consumption because deconvolutional computation includes not only convolution but also extra add-on operations. To realize the more efficient execution for deconvolution, we analyze its computation requirement and propose a ReRAM-based accelerator design, namely, RED. More specific, RED integrates two orthogonal methods, the pixel-wise mapping scheme for reducing redundancy caused by zero-inserting operations and the zero-skipping data flow for increasing the computation parallelism and therefore improving performance. Experimental evaluations show that compared to the state-of-the-art ReRAM-based accelerator, RED can speed up operation 3.69-31.15x and reduce 8%_88.36% energy consumption.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:1512.7.4DESIGN OF RELIABLE DNN ACCELERATOR WITH UN-RELIABLE RERAM
Speaker:
Saibal Mukhopadhyay, GEORGIA TECH, US
Authors:
Yun Long and Saibal Mukhopadhyay, Georgia Institute of Technology, US
Abstract
Benefiting from the Computing-in-Memory (CIM) architecture and the unique device properties such as non-volatility, high density and fast read/write, ReRAM based deep learning accelerators provide a promising solution to greatly improve the computing efficiency for various artificial intelligence (AI) applications. However, the intrinsic stochastic behavior (the statistical distribution of device resistance, set/reset voltage, etc) making the computation error-prone. In this paper, we propose two algorithms to suppress the impact of device variation: (a) We employ the dynamical fixed point (DFP) data representation format to adaptively change the decimal point location, minimizing the unused integer bits. (b) We propose a noise-aware training methodology, enhancing the robustness of network to the parameter's variation. We evaluate the proposed algorithms with convolutional neural network (CNN) and recurrent neural network (RNN) across different dataset. Simulations indicate that, for all benchmarks, the accuracy is improved more than 15% with minimal hardware design overhead.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session

12.8 An Industry Approach to FPGA/ARM System Development and Verification (part 2)

Date: Thursday, March 28, 2019
Time: 16:00 - 17:30
Location / Room: Exhibition Theatre

Organiser:
John Zhao, MathWorks, US, Contact John Zhao

Part 2 of tutorial (see session 11.8 for description).

TimeLabelPresentation Title
Authors
16:0012.8.1AN INDUSTRY APPROACH TO FPGA/ARM SYSTEM DEVELOPMENT AND VERIFICATION (PART 2)
Speaker:
John Zhao, MathWorks, US
17:30End of session

UB01 Session 1

Date: Tuesday, March 26, 2019
Time: 10:30 - 12:30
Location / Room:

LabelPresentation Title
Authors
UB01.1TIMING & POWER CHARACTERIZATION FRAMEWORK FOR EMBEDDED PROCESSORS
Authors:
Mark Kettner and Frank Oppenheimer, OFFIS - Institute for Information Technology, DE
Abstract
We present a framework that significantly reduces the effort for creating accurate energy/timing models for embedded processors covering different conditions (e.g. varying temperature and clock frequency). It supports the systematic collection of large amount of timing and power data needed to cover the complete microprocessors' ISA in different working conditions. Since manual measurements are tedious and error-prone we present an automated approach. The physical setup consists of a processor board, a power measurement device, a heating element and a logic analyser observing the processor's GPIOs. The software consists of a code-generator for characterization binaries, a control program which orchestrates the physical setup and the evaluation software which generates the desired timing and power data. We will demonstrate this framework for an ARM Cortex-M microcontroller and present interesting and even undocumented behaviour while using certain CPU and FPU features.

Download Paper (PDF)
UB01.2WTG: WAVEFORM TRANSITION GRAPHS: A DESIGNER-FRIENDLY FORMALISM FOR ASYNCHRONOUS CIRCUITS
Author:
Danil Sokolov, Newcastle University, GB
Abstract
Asynchronous circuits are a promising class of digital circuits that has numerous advantages over their synchronous counterparts, especially in the domain of "little digital" speed-independent (SI) controllers. Nonetheless, their adoption has not been widespread, which in part is attributed to the difficulty of entry into complex models employed for specification of SI circuits, like Signal Transition Graphs (STGs), by electronic designers. We propose a new model called Waveform Transition Graphs (WTGs) which resembles the timing diagrams, that are very familiar to circuit designers, and defines its formal behaviour semantics. This formalization enables translation of the WTGs into equivalent STGs in order to reuse the existing body of research and tools for verification and logic synthesis of speed-independent circuits. The development of WTGs has been automated in the Workcraft toolkit (https://workcraft.org), allowing their conversion into STGs, verification and synthesis.

Download Paper (PDF)
UB01.3MICROPLAN: MICRO-SYSTEM DESIGN AND PRODUCTION PLANNING TOOL
Authors:
Horst Tilman, Robert Fischbach and Jens Lienig, Technische Universität Dresden, DE
Abstract
We present a tool that enables to layout and plan the production of heterogeneous micro-systems. The tool consists of a simple layout editor, a visualization of the wafer utilization and eventually a calculation of the production cost for a given order quantity. Being superior with regard to performance, heterogeneous systems are often rendered unviable due to high production costs. However, using our tool allows users to design heterogeneous systems with an emphasis on low production costs. The tool is developed within the MICROPRINCE project and in close cooperation with X-Fab. The tool doesn't require installation and can be used by any visitor on their smartphone or computer.

Download Paper (PDF)
UB01.4HIPACC: SYNTHESIZING HIGH-PERFORMANCE IMAGE PROCESSING APPLICATIONS WITH HIPACC
Authors:
M. Akif Oezkan1, M. Akif Özkan,1, Oliver Reiche1, Bo Qiao1, Richard Membarth,2, Jürgen Teich1 and Frank Hannig1
1Friedrich–Alexander University Erlangen–Nürnberg (FAU), DE; 2German Research Center for Artificial Intelligence (DFKI), DE
Abstract
Programming heterogeneous platforms to achieve high performance is laborious since writing efficient code requires tuning at a low level with architecture-specific optimizations and is based on drastically differing programming models. Performance portability across different platforms can be achieved by decoupling the algorithm description from the target implementation. We present Hipacc (http://hipacc-lang.org), a framework consisting of an open-source image processing DSL and a compiler to target CPUs, GPUs, and FPGAs from the same program. We demonstrate Hipacc's productivity by considering real-world computer vision applications, e.g. optical flow, and generating target code (C++, OpenCL, C-based HLS) for three platforms (CPU and GPU in a laptop and an FPGA board). Finally, we showcase real-time processing of images acquired by a USB camera on these platforms.

Download Paper (PDF)
UB01.5ACSIM: A NOVEL, SIMULATOR FOR HETEROGENEOUS PARALLEL AND DISTRIBUTED SYSTEMS THAT IN-CORPORATE CUSTOM HARDWARE ACCELERATORS
Authors:
Nikolaos Tampouratzis1 and Ioannis Papaefstathiou2
1Technical University of Crete, GR; 2Synelixis Solutions LTD, GR
Abstract
The growing use of hardware accelerators in both embedded (e.g. automotive) and high-end systems (e.g. Clouds) triggers an urgent demand for simulation frameworks that can simulate in an integrated manner all the components (i.e. CPUs, Memories, Networks, Hardware Accelerators) of a system-under-design (SuD). By utilizing such a simulator, software design can proceed in parallel with hardware development which results in the reduction of the so important time-to-market. The main problem, however, is that currently there is a shortage of such simulation frameworks; most simulators used for modelling the user applications (i.e. full-system CPU/Mem/Peripherals) lack any type of support for tailor-made hardware accelerators. ACSIM framework is the first known open-source, high-performance simulator that can handle holistically system-of-systems including processors, peripherals, accelerators and networks. The complete ACSIM framework together with its sophisticated GUI will be presented.

Download Paper (PDF)
UB01.6MDC: MULTI-DATAFLOW COMPOSER TOOL: DATAFLOW TO HARDWARE COMPOSITION AND OPTIMIZATION OF RECONFIGURABLE ACCELERATORS
Authors:
Francesca Palumbo1, Carlo Sau2, Tiziana Fanni2, Claudio Rubattu1 and Luigi Raffo2
1University of Sassari, IT; 2University of Cagliari, IT
Abstract
UNICA-Eolab and UNISS-IDEA booth is demonstrating the capabilities of the Multi-Dataflow Component (MDC) tool: a model-based toolset for design and development of virtual coarse-grain reconfigurable (CGR) circuits. MDC provides multi-function substrate composition, optimization and integration in real environments. 1 Baseline Core: automatic composition of CGR substrates. Inputs kernels are provided as dataflow networks, and target agnostic RTL description is derived. [FPGA(1)/ASIC(2)] 2 Profiler: automated design space exploration to determine the optimal multi-functional CGR substrate given a set of constraints. [2] 3 Power Manager: power consumption minimization. Model level identification of the logic regions to determine optimal power/clock domains and apply saving strategies. [1/2] 4 Prototyper: automatic generation of Xilinx-compliant IPs and APIs. [1] MDC is part of the H2020 CERBERO toolchain. Material: http://sites.unica.it/rpct/ and IDEA Lab Channel www.goo.gl/7fXme3.

Download Paper (PDF)
UB01.7DESIGN SPACE EXPLORATION FRAMEWORKS FOR APPROXIMATE COMPUTING
Authors:
Alberto Bosio1, Olivier Sentieys2 and Daniel Ménard3
1University of Lyon, FR; 2University of Rennes, INRIA/IRISA, FR; 3INSA Rennes - IETR, FR
Abstract
Approximate Computing (AxC) investigates how to design energy efficient, faster, and less complex computing systems. Instead of performing exact computation and, consequently, requiring a high amount of resources, AxC aims to selectively relax the specifications, trading accuracy off for efficiency. The goal of this demonstrator, is to present a Design Space Exploration framework able to automatically explore the impact of different approximate operators on a given application accordingly to the required level of accuracy and the available HW architecture. The first demonstration relates to the word-length optimization of variables in a software or hardware system to explore cost (e.g., energy) and quality trade-off solution. The tool is scalable and targets both customized fixed-point and floating-point arithmetic. The second demonstration is about the use of other approximate techniques. The proposed demonstrator is linked with the DATE19 Monday tutorial M03.

Download Paper (PDF)
UB01.8RESCUE: EDA TOOLSET FOR INTERDEPENDENT ASPECTS OF RELIABILITY, SECURITY AND QUALITY IN NANOELECTRONIC SYSTEMS DESIGN
Authors:
Cemil Cem Gürsoy1, Guilherme Cardoso Medeiros2, Junchao Chen3, Nevin George4, Josie Esteban Rodriguez Condia5, Thomas Lange6, Aleksa Damljanovic5, Raphael Segabinazzi Ferreira4, Aneesh Balakrishnan6, Xinhui Anna Lai1, Shayesteh Masoumian7, Dmytro Petryk3, Troya Cagil Koylu2, Felipe Augusto da Silva8, Ahmet Cagri Bagbaba8 and Maksim Jenihhin1
1Tallinn University of Technology, EE; 2Delft University of Technology, NL; 3IHP, DE; 4BTU Cottbus-Senftenberg, DE; 5Politecnico di Torino, IT; 6IROC Technologies, FR; 7Intrinsic ID B.V., NL; 8Cadence Design Systems GmbH, DE
Abstract
The demonstrator will introduce an EDA toolset developed by a team of PhD students in the H2020-MSCA-ITN RESCUE project. The recent trends for the computing systems include machine intelligence in the era of IoT, complex safety-critical applications, extreme miniaturization of technologies and intensive interaction with the physical world. These trends set tough requirements on mutually dependent extra-functional design aspects. RESCUE is focused on the key challenges for reliability (functional safety, ageing, soft errors), security (tamper-resistance, PUF technology, intelligent security) and quality (novel fault models, functional test, FMEA/FMECA, verification/debug) and related EDA methodologies. The objective of the interdisciplinary cross-sectoral team from Tallinn UT, TU Delft, BTU Cottbus, POLITO, IHP, IROC, Intrinsic-ID, Cadence and Bosch is to develop in collaboration a holistic EDA toolset for modelling, assessment and enhancement of these extra-functional design aspects.

Download Paper (PDF)
UB01.9RISC-V VP: RISC-V BASED VIRTUAL PROTOTYPE: AN OPEN SOURCE PLATFORM FOR MODELING AND VERIFICATION
Authors:
Vladimir Herdt1, Daniel Große2, Hoang M. Le1 and Rolf Drechsler2
1University of Bremen, DE; 2University of Bremen, DFKI GmbH, DE
Abstract
RISC-V, being an open and free Instruction Set Architecture (ISA), is gaining huge popularity as processor ISA in Internet-of-Things (IoT) devices. We propose an open source RISC-V based Virtual Prototype (VP) demonstrator (available at http://www.systemc-verification.org/riscv-vp). Our VP is implemented in standard compliant SystemC using a generic bus system with TLM 2.0 communication. At the heart of our VP is a 32 bit RISC-V (RV32IMAC) Instruction Set Simulator (ISS) with support for compressed instructions. This enables our VP to emulate IoT devices that work with a small amount of memory and limited resources. Our VP can be used as platform for early SW development and verification, as well as other system-level use cases. We support the GCC toolchain, provide SW debug, coverage measurement capabilities and support FreeRTOS. Our VP is designed as configurable and extensible platform. For example we provide the configuration for the RISC-V HiFive1 board from SiFive.

Download Paper (PDF)
UB01.10SETA-RAY: A NEW IDE TOOL FOR PREDICTING, ANALYZING AND MITIGATING RADIATION-INDUCED SOFT ERRORS ON FPGAS
Authors:
Luca Sterpone, Boyang Du and Sarah Azimi, Politecnico di Torino, IT
Abstract
One of the main concern for FPGA adopted in mission critical application such as space and avionic fields is radiation-induced soft errors. Therefore, we propose an IDE including two software tools compatible with commercial EDA tools. RAD-RAY as the first and only developed tool capable to predict the source of the SET phenomena by taking in to account the features of the radiation environment such as the type, LET and interaction angle of the particles, the material and physical layout of the device exposed to the radiation. The predicted source SET pulse in provided to the SETA tool as the second developed tool integrated with the commercial FPGA design tool for evaluating the sensitivity of the industrial circuit implemented on Flash-based FPGA and mitigate the original netlist based on the performed analysis. This IDE is supported by ESA and Thales Alenia Space. It has been applied to the EUCLID space mission project that will be launched in 2021.

Download Paper (PDF)
12:30End of session

UB02 Session 2

Date: Tuesday, March 26, 2019
Time: 12:30 - 15:00
Location / Room:

LabelPresentation Title
Authors
UB02.1TIMING & POWER CHARACTERIZATION FRAMEWORK FOR EMBEDDED PROCESSORS
Authors:
Mark Kettner and Frank Oppenheimer, OFFIS - Institute for Information Technology, DE
Abstract
We present a framework that significantly reduces the effort for creating accurate energy/timing models for embedded processors covering different conditions (e.g. varying temperature and clock frequency). It supports the systematic collection of large amount of timing and power data needed to cover the complete microprocessors' ISA in different working conditions. Since manual measurements are tedious and error-prone we present an automated approach. The physical setup consists of a processor board, a power measurement device, a heating element and a logic analyser observing the processor's GPIOs. The software consists of a code-generator for characterization binaries, a control program which orchestrates the physical setup and the evaluation software which generates the desired timing and power data. We will demonstrate this framework for an ARM Cortex-M microcontroller and present interesting and even undocumented behaviour while using certain CPU and FPU features.

Download Paper (PDF)
UB02.2WTG: WAVEFORM TRANSITION GRAPHS: A DESIGNER-FRIENDLY FORMALISM FOR ASYNCHRONOUS CIRCUITS
Author:
Danil Sokolov, Newcastle University, GB
Abstract
Asynchronous circuits are a promising class of digital circuits that has numerous advantages over their synchronous counterparts, especially in the domain of "little digital" speed-independent (SI) controllers. Nonetheless, their adoption has not been widespread, which in part is attributed to the difficulty of entry into complex models employed for specification of SI circuits, like Signal Transition Graphs (STGs), by electronic designers. We propose a new model called Waveform Transition Graphs (WTGs) which resembles the timing diagrams, that are very familiar to circuit designers, and defines its formal behaviour semantics. This formalization enables translation of the WTGs into equivalent STGs in order to reuse the existi