IP3 Interactive Presentations

Date: Wednesday 26 March 2014
Time: 16:00 - 16:30
Location / Room: Conference Level, foyer

Interactive Presentations run simultaneously during a 30-minute slot. A poster associated to the IP paper is on display throughout the afternoon. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session, prior to the actual Interactive Presentation. At the end of each afternoon Interactive Presentations session the award 'Best IP of the Day' is given.

Label	Presentation Title Authors
IP3-1	DESIGN AND FABRICATION OF A 315 μH BONDWIRE MICRO-TRANSFORMER FOR ULTRA-LOW VOLTAGE ENERGY HARVESTING Speakers: Enrico Macrelli¹, Ningning Wang², Saibal Roy², Michael Hayes², Rudi Paolo Paganelli³, Marco Tartagni¹ and Aldo Romani¹ ¹DEI, University of Bologna, IT; ²Tyndall National Institute, UCC, IE; ³CNR-IEIIT, University of Bologna, IT Abstract This paper presents a design study of a new topology for miniaturized bondwire transformers fabricated and assembled with standard IC bonding wires and toroidal ferrite (Fair-Rite 5975000801) as a magnetic core. The micro-transformer realized on a PCB substrate, enables the build of magnetics on-top-of-chip, thus leading to the design of high power density components. Impedance measurements in a frequency range between 100 kHz to 5 MHz, show that the secondary self-inductance is enhanced from 0.3 μH with an epoxy core to 315 μH with the ferrite core. Moreover, the micro-machined ferrite improves the coupling coefficient from 0.1 to 0.9 and increases the effective turns ratio from 0.5 to 35. Finally, a low-voltage IC DC-DC converter solution, with the transformer mounted on-top, is proposed for energy harvesting applications.
IP3-2	PROVIDING REGULATION SERVICES AND MANAGING DATA CENTER PEAK POWER BUDGETS Speakers: Baris Aksanli and Tajana Rosing, University of California San Diego, US Abstract Data centers are good candidates for providing regulation services in the power markets due to their large power consumption and flexibility. In this paper, we develop a framework that explores the feasibility of data center participation in these markets. We use a battery-based design that can not only help with providing ancillary services, but can also limit peak power costs without any workload performance degradation. The results of our study using data for a 21MW data center show up to $480,000/year savings can be obtained, corresponding to 1280 more servers providing services.
IP3-3	THE ENERGY BENEFIT OF LEVEL-CROSSING SAMPLING INCLUDING THE ACTUATOR'S ENERGY CONSUMPTION Speakers: Burkhard Hensel and Klaus Kabitzsch, Dresden University of Technology, DE Abstract When using level-crossing (also called send-on-delta) sampling in control loops, messages can be saved compared to periodic sampling without degrading control performance. While it is clear that reducing messages improves also the energy efficiency of battery-powered sensor devices, this can be disadvantageous for the energy efficiency the actuator device. This paper addresses the question, under which conditions level-crossing sampling is also for the actuator device more energy-efficient than periodic sampling. It is shown that there is an optimum inter-sample interval. Methods for reaching this optimum by appropriate controller and transmission settings are given. The theory is demonstrated using several known, standardized wireless network protocols.
IP3-4	SKETCHILOG: SKETCHING COMBINATIONAL CIRCUITS Speakers: Andrew Becker, David Novo and Paolo Ienne, École Polytechnique Fédérale de Lausanne, CH Abstract Despite the progress of higher-level languages and tools, Register Transfer Level (RTL) is still by far the dominant input format for high performance digital designs. Experienced designers can directly express their microarchitectural intuitions in RTL. Yet, RTL is terribly verbose, burdened with trivial details, and thus error prone. In this paper, we augment a modern RTL language (Chisel) with new semantic elements to express an imprecise specification: a sketch. We show how, in combination with a naive, unoptimized, but functionally correct reference, a designer can utilize the language and supporting infrastructure to focus on the key design intuition and omit some of the necessary details. The resulting design is exactly or almost exactly as good as the one the designer could have achieved by spending the time to manually complete the sketch. We show that, even limiting ourselves to combinational circuits, realistic instances of meaningful design problems are solved quickly, saving considerable design and debugging effort.
IP3-5	TOWARDS VERIFYING DETERMINISM OF SYSTEMC DESIGNS Speakers: Hoang M. Le and Rolf Drechsler, University of Bremen, DE Abstract Ensuring the correctness of high-level SystemC designs is an important and challenging problem in today's Electronic System Level (ESL) methodology. Prevalently, a design is checked against a functional specification given by e.g. a testcase with reference output or a user-defined property. Another research direction takes the view of a SystemC design as a piece of concurrent software. The design is then checked for common concurrency problems and thus, a functional specification is not required. Along this line, several methods for deadlock detection and race analysis have been developed. In this work, we propose to consider a new concurrency verification problem, namely input-output determinism, for SystemC designs. That means for each possible input, the design must produce the same output under any valid process schedule. We argue that determinism verification is stronger than both deadlock detection and race analysis. Beside being an attractive correctness criterion itself, proven determinism helps to accelerate both simulative and formal verification. We also present a preliminary study to show the feasibility of determinism verification for SystemC designs.
IP3-6	USING GUIDED LOCAL SEARCH FOR ADAPTIVE RESOURCE RESERVATION IN LARGE-SCALE EMBEDDED SYSTEMS Speaker: Timon ter Braak, University of Twente, NL Abstract To maintain a predictable execution environment, an embedded system must ensure that applications are, in advance, provided with sufficient resources to process tasks, exchange information and to control peripherals. The problem of assigning tasks to processing elements with limited resources, and routing communication channels through a capacitated interconnect is combined into an integer linear programming formulation. We describe a guided local search algorithm to solve this problem at run-time. This algorithm allows for a hybrid strategy where configurations computed at design-time may be used as references to lower the computational overhead at run-time. Computational experiments on a dataset with 100 tasks and 20 processing elements show the effectiveness of this algorithm compared to state-of-the-art solvers CPLEX and Gurobi. The guided local search algorithm finds an initial solution within 100 milliseconds, is competitive for small platforms, scales better with the size of the platform, and has lower memory usage (2-19%).
IP3-7	(Best Paper Award Candidate) ACCELERATING GRAPH COMPUTATION WITH RACETRACK MEMORY AND POINTER-ASSISTED GRAPH REPRESENTATION Speakers: Eunhyek Park¹, Helen Li², Sungjoo Yoo¹ and Sunggu Lee¹ ¹POSTECH, KR; ²Univ. of Pittsburgh, US Abstract The poor performance of NAND Flash memory, such as long access latency and large granularity access, is the major bottleneck of graph processing. This paper proposes an intelligent storage for graph processing which is based on fast and low cost racetrack memory and a pointer-assisted graph representation. Our experiments show that the proposed intelligent storage based on racetrack memory reduces total processing time of three representative graph computations by 40.2%~86.9% compared to the graph processing, GraphChi, which exploits sequential accesses based on normal NAND Flash memory-based SSD. Faster execution also reduces energy consumption by 39.6%~90.0%. The in-storage processing capability gives additional 10.5%~16.4% performance improvements and 12.0%~14.4% reduction of energy consumption.
IP3-8	PSP-CACHE: A LOW-COST FAULT-TOLERANT CACHE MEMORY ARCHITECTURE Speakers: Hamed Farbeh and Seyed Ghassem Miremadi, Sharif University of Technology, IR Abstract Cache memories constitute a large fraction of processor chip area and are highly vulnerable to soft errors caused by energetic particles. To protect these memories, most of the modern processors employ Error Detection Codes (EDCs) or Error Correction Codes (ECCs). EDCs/ECCs impose significant overheads in terms of area and energy; these overheads increase as a function of interleaving EDCs/ECCs to detect/correct multiple errors. This paper proposes a new cache architecture to minimize the area and energy overheads of EDCs/ECCs in set-associative L1-caches. Simulation results for a 4-way set-associative cache show that the proposed architecture reduces both the area and static power overheads of parity code by about 75% and the dynamic energy overhead by about 73% in comparison to conventional cache architecture. These reduction figures are about 68% and about 66%, respectively, for SEC-DED code. The above reductions are achieved without affecting the error coverage.
IP3-9	A HYBRID NON-VOLATILE SRAM CELL WITH CONCURRENT SEU DETECTION AND CORRECTION Speakers: Pilin Junsangsri¹, Fabrizio Lombardi¹ and Jie Han² ¹Northeastern University, US; ²University of Alberta, CA Abstract This paper presents a hybrid non-volatile (NV) SRAM cell with a new scheme for SEU tolerance. The proposed NVSRAM cell consists of a 6T SRAM core and a Resistive RAM (RRAM), made of a 1T and a Programmable Metallization Cell (PMC). The proposed cell has concurrent error detection (CED) and correction capabilities; CED is accomplished using a dual-rail checker, while correction is accomplished by utilizing the restore operation; data from the non-volatile memory element is copied back to the SRAM core. The dual-rail checker utilizes two XOR gates each made of 2 inverters and 2 ambipolar transistors, hence, it has a hybrid nature. Extensive simulation results are provided. The simulation results show that the proposed scheme is very efficient in terms of numerous figures of merit such as delay and circuit complexity and thus applicable to integrated circuits such as FPGAs requiring secure on-chip non-volatile storage (i.e. LUTs) for multi-context configurability.
IP3-10	BATTERY AWARE STOCHASTIC QOS BOOSTING IN MOBILE COMPUTING DEVICES Speakers: Hao Shen, Qiuwen Chen and Qinru Qiu, Syracuse University, US Abstract Mobile computing has been weaved into everyday lives to a great extend. Their usage is clearly imprinted with user's personal signature. The ability to learn such signature enables immense potential in workload prediction and resource management. In this work, we investigate the user behavior modeling and apply the model for energy management. Our goal is to maximize the quality of service (QoS) provided by the mobile device (i.e., smartphone), while keep the risk of battery depletion below a given threshold. A Markov Decision Process (MDP) is constructed from history user behavior. The optimal management policy is solved using linear programing. Simulations based on real user traces validate that, compared to existing battery energy management techniques, the stochastic control performs better in boosting the mobile devices' QoS without significantly increasing the chance of battery depletion.
IP3-11	A THERMAL RESILIENT INTEGRATION OF MANY-CORE MICROPROCESSORS AND MAIN MEMORY BY 2.5D TSI I/OS Speakers: Sih-Sian Wu¹, Kanwen Wang¹, Sai Manoj P. D.¹, Tsung-Yi Ho² and Hao Yu¹ ¹Nanyang Technological University, SG; ²National Cheng Kung University, TW Abstract One memory-logic-integration design platform is developed in this paper with thermal reliability analysis provided for 2.5D throughsilicon-interposer (TSI) and 3D through-silicon-via (TSV) based integrations. Temperature-dependent delay and power models have been developed at microarchitecture level for 2.5D and 3D integrations of many-core microprocessors and main memory, respectively. Experiments are performed by general-purpose benchmarks from SPEC CPU2006 and also cloud-oriented benchmarks from Phoenix with the following observations. The memory-logic integration by 3D RC-interconnected TSV I/Os can result in thermal runaway failures due to strong electrical-thermal couplings. On the other hand, the one by 2.5D transmission-line-interconnected TSI I/Os has shown almost the same energy efficiency and better thermal resilience.
IP3-12	LEVERAGING ON-CHIP NETWORKS FOR EFFICIENT PREDICTION ON MULTICORE COHERENCE Speaker: Libo Huang, National University of Defense Technology, CN Abstract Coherent data prediction is introduced as a promising architectural technique for reducing cache-to-cache accesses in directory protocol. However, limited on-chip resources cause the accuracy of current prediction to be generally low. Low accuracy would result in a large number of unnecessary or incorrect predictions, which would consequently generate excessive network traffic. This leads to large power and performance overhead for coherent memory access. This paper proposes an early abort mechanism (EBT) that leverages NoC design to reduce the negative effect of wrong prediction operations, thus facilitating overall performance improvement and traffic reduction. Using detailed full-system simulations, we conclude that EBT provides a cost-effective solution for designing efficient multicore processors. To the best of our knowledge, this study is the first to leverage on-chip network for the prediction optimization on multicore coherence.
IP3-13	AN ADAPTIVE MEMORY INTERFACE CONTROLLER FOR IMPROVING BANDWIDTH UTILIZATION OF HYBRID AND RECONFIGURABLE SYSTEMS Speakers: Vito Giovanni Castellana¹, Antonino Tumeo² and Fabrizio Ferrandi¹ ¹Politecnico di Milano, DEIB, IT; ²Pacific Northwest National Laboratory, US Abstract Data mining, bioinformatics, knowledge discovery, social network analysis, are emerging irregular applications that exploits data structures based on pointers or linked lists, such as graphs, unbalanced trees or unstructured grids. These applications are characterized by unpredictable memory accesses and generally are memory bandwidth bound, but also presents large amounts of inherent dynamic parallelism because they can potentially spawn concurrent activities for each one of the element they are exploring. Hybrid architectures, which integrate general purpose processors with reconfigurable devices, appears promising target platforms for accelerating irregular applications. These systems often connect to distributed and multi-ported memories, potentially enabling parallel memory operations. However, these memory architectures introduce several challenges, such as the necessity to manage concurrency and synchronization to avoid structural conflicts on shared memory locations and to guarantee consistency. In this paper we present an adaptive Memory Interface Controller (MIC) that addresses these issues. The MIC is a general and customizable solution that can target several different memory structures, and is suitable for High Level Synthesis frameworks. It implements a dynamic arbitration scheme, which avoids conflicts on memory resources at runtime, and supports atomic memory operations, commonly exploited for synchronization directives in parallel programming paradigms. The MIC simultaneously maps multiple accesses to different memory ports, allowing fine grained parallelism exploitation and ensuring correctness also in the presence of irregular and statically unpredictable memory access patterns. We evaluated the effectiveness of our approach on a typical irregular kernel, graph Breadth First Search (BFS), exploring different design alternatives.
IP3-14	ENERGY EFFICIENT IN-MEMORY AES ENCRYPTION BASED ON NONVOLATILE DOMAIN-WALL NANOWIRE Speakers: Yuhao Wang¹, Pingfan Kong¹, Hao Yu¹ and Dennis Sylvester² ¹Nanyang Technological University, SG; ²University of Michigan, US Abstract The widely applied Advanced Encryption Standard (AES) encryption algorithm is critical in secure big-data storage. Data oriented applications have imposed high throughput and low power, i.e., energy efficiency (J/bit), requirements when applying AES encryption. This paper explores an in-memory AES encryption using the newly introduced domain-wall nanowire. We show that all AES operations can be fully mapped to a logic-in-memory architecture by non-volatile domain-wall nanowire, called DW-AES. The experimental results show that DW-AES can achieve the best energy efficiency of 24 pJ/bit, which is 9X and 6.5X times better than CMOS ASIC and ReRAM-CMOL implementations, respectively. Under the same area budget, the proposed DW-AES exhibits 6.4X higher throughput and 29\% power saving compared to a CMOS ASIC implementation; 1.7X higher throughput and 74\% power reduction compared to a ReRAM-CMOL implementation.
IP3-15	ICE: INLINE CALIBRATION FOR MEMRISTOR CROSSBAR-BASED COMPUTING ENGINE Speakers: Boxun Li¹, Yu Wang¹, Yiran Chen², Helen Li² and Huazhong Yang¹ ¹Tsinghua University, CN; ²University of Pittsburgh, US Abstract The emerging neuromorphic computation provides a revolutionary solution to the alternative computing architecture and effectively extends Moore's Law. The discovery of the memristor presents a promising hardware realization of neuromorphic systems with incredible power efficiency, allowing efficiently executing the analog matrix-vector multiplication on the memristor crossbar architecture. However, during computations, the memristor will slowly drift from its initial programmed state, leading to a gradual decline of the computation precision of memristor crossbar-based computing engine (MCE). In this paper, we propose an inline calibration mechanism to guarantee the computation quality of the MCE. The inline calibration mechanism collects the MCE's computation error through `interrupt-and-benchmark (I&B)' operations and predicts the best calibration time through polynomial fitting of the computation error data. We also develop an adaptive technique to adjust the time interval between two neighbor I&B operations and minimize the negative impact of the I&B operation on system performance. The experiment results demonstrate that the proposed inline calibration mechanism achieves a calibration efficiency of 91.18% on average and negligible performance overhead (i.e., 0.439%)
IP3-16	COMPLEMENTARY RESISTIVE SWITCH BASED STATEFUL LOGIC OPERATIONS USING MATERIAL IMPLICATION Speakers: Yuanfan Yang¹, Jimson Mathew¹, Dhiraj K Pradhan¹, Marco Ottavi² and Salvatore Pontarelli² ¹University of Bristol, GB; ²University of Rome "Tor Vergata", IT Abstract Memristor based logic and memories are increasingly becoming one of the fundamental building blocks for future system design. Hence, it is important to explore various methodologies for implementing these blocks. In this paper, we present a novel Complementary Resistive Switching (CRS) based stateful logic operations using material implication. The proposed solution benefits from exponential reduction in sneak path current in crossbar implemented logic. We validated the effectiveness of our solution through SPICE simulations on a number of logic circuits. It has been shown that only 4 steps are required for implementing N input NAND gate whereas memristor based stateful logic needs N+1 steps.
IP3-17	A LAYERED APPROACH FOR TESTING TIMING IN THE MODEL-BASED IMPLEMENTATION Speakers: BaekGyu Kim¹, Hyeon I Hwang², Taejoon Park², Sanghyuk Son² and Insup Lee¹ ¹University of Pennsylvania, US; ²Daegu Gyeongbuk Institute of Science & Technology, KR Abstract The model-based implementation is to derive an implementation from a model that has been shown to meet requirements. Even though this approach can be used to guarantee that an implementation satisfies functional requirements that are shown to be correct at the model level, it is still challenging to assure timing requirements at the implementation level. We propose a layered approach in testing timing requirements conformance of implemented systems developed by model-based implementation. In our approach, the abstraction boundary of the implemented system is formally defined using Parnas' four-variables model. Then, the proposed approach tests timing aspects of the interaction between the auto-generated code and the target platform-dependent code based on the four-variables. This approach aims at not only detecting the timing requirement violation, but also at measuring delay-segments that contribute to the timing deviation of the implemented system w.r.t. the model. We show the case study of testing timing requirements of an infusion pump system to illustrate the applicability of the proposed framework.
IP3-18	MODEL-BASED PROTOCOL LOG GENERATION FOR TESTING A TELECOMMUNICATION TEST HARNESS USING CLP Speakers: Kenneth Balck¹, Olga Grinchtein¹ and Justin Pearson² ¹Ericsson AB, SE; ²Uppsala University, SE Abstract Within telecommunications development it is vital to have frameworks and systems to replay complicated scenarios on equipment under test, often there are not enough available scenarios. In this paper we study the problem of testing a test harness, which replays scenarios and analyses protocol logs for the Public Warning System service, which is a part of the Long Term Evolution (LTE) 4G standard. Protocol logs are sequences of messages with timestamps; and are generated by different mobile network entities. In our case study we focus on user equipment protocol logs. In order to test the test harness we require that logs have both incorrect and correct behaviour. It is easy to collect logs from real system runs, but these logs do not show much variation in the behaviour of system under test. We present an approach where we use constraint logic programming (CLP) for both modelling and test generation, where each test case is a protocol log. In this case study, we uncovered previously unknown faults in the test harness.
IP3-19	TIME-DECOUPLED PARALLEL SYSTEMC SIMULATION Speakers: Jan Weinstock¹, Christoph Schumacher¹, Rainer Leupers¹, Gerd Ascheid¹ and Laura Tosoratto² ¹RWTH Aachen, DE; ²Istituto Nazionale di Fisica Nucleare, Sezione di Roma, IT Abstract With increasing system size and complexity, designers of embedded systems face the challenge of efficiently simulating these systems in order to enable target specific software development and design space exploration as early as possible. Today's multicore workstations offer enormous computational power, but traditional simulation engines like the OSCI SystemC kernel only operate on a single thread, thereby leaving a lot of computational potential unused. Most modern embedded system designs include multiple processors. This work proposes SCope, a SystemC kernel that aims at exploiting the inherent parallelism of such systems by simulating the processors on different threads. A lookahead mechanism is employed to reduce the required synchronization between the simulation threads, thereby further increasing simulation speed. The virtual prototype of the European FP7 project EURETILE system simulator is used as demonstrator for the proposed work, showing a speedup of 4.01x on a four core host system compared to sequential simulation.
IP3-20	A UNIFIED METHODOLOGY FOR A FAST BENCHMARKING OF PARALLEL ARCHITECTURE Speakers: Alexandre Guerre, Jean-Thomas Acquaviva and Yves Lhuillier, CEA LIST, FR Abstract Benchmarking of architectures is today jeopardized by the explosion of parallel architectures and the dispersion of parallel programming models. Parallel programming requires architecture dependent compilers and languages as well as high programmer expertise. Thus, an objective comparison has become a harder task. This paper presents a novel methodology to evaluate and to compare parallel architectures in order to ease the programmer work. It is based on the usage of micro-benchmarks, code profiling and characterization tools. The main contribution of this methdology is a semi-automatic prediction of the performance for sequential applications on a set of parallel architectures. In addition the performance estimation is correlated with the cost of other criteria such as power or portability. Our methodology prediction was validated on anindustrial application. Results are within a range of 20%.

< Return to last page

Submissions

IP3 Interactive Presentations