3.7 On line Strategies for Reliability

Printer-friendly version PDF version

Date: Tuesday 25 March 2014
Time: 14:30 - 16:00
Location / Room: Konferenz 5

Chair:
Fabrizio Lombardi, Northwestern University, US

Co-Chair:
Jie Han, University of Alberta, CA

This section presents different approaches to improve reliability of circuits and systems by using on line techniques. It shows different methods that can be applied to caches, processors and multicore architectures.

TimeLabelPresentation Title
Authors
14:303.7.1SPATIAL PATTERN PREDICTION BASED MANAGEMENT OF FAULTY DATA CACHES
Speakers:
Georgios Keramidas1, Michail Mavropoulos2, Anna Karvouniari2 and Dimitris Nikolos2
1Researcher, Univerity of Patras, GR; 2University of Patras, GR
Abstract
Technology scaling leads to significant faulty bit rates in on-chip caches. In this work, we propose a methodology to mitigate the impact of defective bits (due to permanent faults) in first-level set-associative data caches. Our technique assumes that faulty caches are enhanced with the ability of disabling their defective parts at cache subblock granularity. Our experimental findings reveal that while the occurrence of hard-errors in faulty caches may have a significant impact in performance, a lot of room for improvement exists, if someone is able to take into account the spatial reuse patterns of the to-be-referenced blocks (not all the data fetched into the cache is accessed). To this end, we propose frugal PC-indexed spatial predictors (with very small storage requirements) to orchestrate the (re)placement decisions among the fully and partially unusable faulty blocks. Using cycle-accurate simulations, a wide range of scientific applications, and a plethora of cache fault maps, we showcase that our approach is able to offer significant benefits in cache performance.
15:003.7.2COMBINED DVFS AND MAPPING EXPLORATION FOR LIFETIME AND SOFT-ERROR SUSCEPTIBILITY IMPROVEMENT IN MPSOCS
Speakers:
Anup Das1, Akash Kumar1, Bharadwaj Veeravalli1, Cristiana Bolchini2 and Antonio Miele2
1National University of Singapore, SG; 2Politecnico di Milano, IT
Abstract
Energy and reliability optimization are two of the most critical objectives for the synthesis of multiprocessor systems-on-chip (MPSoCs). Task mapping has shown significant promise as a low cost solution in achieving these objectives as standalone or in tandem as well. This paper proposes a multi-objective design space exploration to determine the mapping of tasks of an application on a multiprocessor system and voltage/frequency level of each tasks (exploiting the DVFS capabilities of modern processors) such that the reliability of the platform is improved while fulfilling the energy budget and the performance constraint set by system designers. In this respect, the reliability of a given MPSoC platform incorporates not only the impact of voltage and frequency on the aging of the processors (wear-out effect) but also on the susceptibility to soft-errors -- a joint consideration missing in all existing works in this domain. Further, the proposed exploration also incorporates soft-error tolerance by selective replication of tasks, making the proposed approach an interesting blend of reactive and proactive fault-tolerance. The combined objective of minimizing core aging together with the susceptibility to transient faults under a given performance/energy budget is solved by using a multi-objective genetic algorithm exploiting tasks' mapping, DVFS and selective replication as tuning knobs. Experiments conducted with real-life and synthetic application graphs clearly demonstrate the advantage of the proposed approach.
15:303.7.3DARP: DYNAMICALLY ADAPTABLE RESILIENT PIPELINE DESIGN IN MICROPROCESSORS
Speakers:
Hu Chen, Sanghamitra Roy and Koushik Chakraborty, Utah State University, US
Abstract
In this paper, we demonstrate that the sensitized path delays in various microprocessor pipe stages exhibit intriguing temporal and spatial variations during the execution of real world applications. To effectively exploit these delay variations, we propose Dynamically Adaptable Resilient Pipeline (DARP)--a series of runtime techniques to boost power performance efficiency and fault tolerance in a pipelined microprocessor. DARP employs early error prediction to avoid a major portion of timing errors. Using a rigorous circuit-architectural infrastructure, we demonstrate substantial improvements in the performance (9.4-20%) and energy efficiency (6.4-27.9%), compared to state-of-the-art techniques.
16:00IP1-24, 45A FAULT DETECTION MECHANISM IN A DATA-FLOW SCHEDULED MULTITHREADED PROCESSOR
Speakers:
Jian Fu1, Qiang Yang1, Raphael Poss1, Chris Jesshope1 and Chunyuan Zhang2
1University of Amsterdam, NL; 2National University of Defense Technology, CN
Abstract
This paper designs and implements the Redundant Multi-Threading (RMT) in a Data-flow scheduled Multi-Threaded (DMT) multicore processor, called Data-flow scheduled Redundant Multi-Threading (DRMT). Meanwhile, It presents Asynchronous Output Comparison (AOC) for RMT techniques to avoid fault detection related inter-core communication and alleviate the performance and hardware overheads induced by output comparison. Results show that the performance overhead of DRMT is less than 60% even when the number of threads is four times the number of processing elements. Also the performance and hardware overheads of AOC are insignificant.
16:00End of session
Coffee Break in Exhibition Area
On Tuesday-Thursday the coffee and lunch breaks will be located in the Exhibition Area (Terrace Level).