3.7 On line Strategies for Reliability

Date: Tuesday 25 March 2014
Time: 14:30 - 16:00
Location / Room: Konferenz 5

Chair:
Fabrizio Lombardi, Northwestern University, US

Co-Chair:
Jie Han, University of Alberta, CA

This section presents different approaches to improve reliability of circuits and systems by using on line techniques. It shows different methods that can be applied to caches, processors and multicore architectures.

Time	Label	Presentation Title Authors
14:30	3.7.1	SPATIAL PATTERN PREDICTION BASED MANAGEMENT OF FAULTY DATA CACHES Speakers: Georgios Keramidas¹, Michail Mavropoulos², Anna Karvouniari² and Dimitris Nikolos² ¹Researcher, Univerity of Patras, GR; ²University of Patras, GR Abstract Technology scaling leads to significant faulty bit rates in on-chip caches. In this work, we propose a methodology to mitigate the impact of defective bits (due to permanent faults) in first-level set-associative data caches. Our technique assumes that faulty caches are enhanced with the ability of disabling their defective parts at cache subblock granularity. Our experimental findings reveal that while the occurrence of hard-errors in faulty caches may have a significant impact in performance, a lot of room for improvement exists, if someone is able to take into account the spatial reuse patterns of the to-be-referenced blocks (not all the data fetched into the cache is accessed). To this end, we propose frugal PC-indexed spatial predictors (with very small storage requirements) to orchestrate the (re)placement decisions among the fully and partially unusable faulty blocks. Using cycle-accurate simulations, a wide range of scientific applications, and a plethora of cache fault maps, we showcase that our approach is able to offer significant benefits in cache performance.
15:00	3.7.2	COMBINED DVFS AND MAPPING EXPLORATION FOR LIFETIME AND SOFT-ERROR SUSCEPTIBILITY IMPROVEMENT IN MPSOCS Speakers: Anup Das¹, Akash Kumar¹, Bharadwaj Veeravalli¹, Cristiana Bolchini² and Antonio Miele² ¹National University of Singapore, SG; ²Politecnico di Milano, IT Abstract Energy and reliability optimization are two of the most critical objectives for the synthesis of multiprocessor systems-on-chip (MPSoCs). Task mapping has shown significant promise as a low cost solution in achieving these objectives as standalone or in tandem as well. This paper proposes a multi-objective design space exploration to determine the mapping of tasks of an application on a multiprocessor system and voltage/frequency level of each tasks (exploiting the DVFS capabilities of modern processors) such that the reliability of the platform is improved while fulfilling the energy budget and the performance constraint set by system designers. In this respect, the reliability of a given MPSoC platform incorporates not only the impact of voltage and frequency on the aging of the processors (wear-out effect) but also on the susceptibility to soft-errors -- a joint consideration missing in all existing works in this domain. Further, the proposed exploration also incorporates soft-error tolerance by selective replication of tasks, making the proposed approach an interesting blend of reactive and proactive fault-tolerance. The combined objective of minimizing core aging together with the susceptibility to transient faults under a given performance/energy budget is solved by using a multi-objective genetic algorithm exploiting tasks' mapping, DVFS and selective replication as tuning knobs. Experiments conducted with real-life and synthetic application graphs clearly demonstrate the advantage of the proposed approach.
15:30	3.7.3	DARP: DYNAMICALLY ADAPTABLE RESILIENT PIPELINE DESIGN IN MICROPROCESSORS Speakers: Hu Chen, Sanghamitra Roy and Koushik Chakraborty, Utah State University, US Abstract In this paper, we demonstrate that the sensitized path delays in various microprocessor pipe stages exhibit intriguing temporal and spatial variations during the execution of real world applications. To effectively exploit these delay variations, we propose Dynamically Adaptable Resilient Pipeline (DARP)--a series of runtime techniques to boost power performance efficiency and fault tolerance in a pipelined microprocessor. DARP employs early error prediction to avoid a major portion of timing errors. Using a rigorous circuit-architectural infrastructure, we demonstrate substantial improvements in the performance (9.4-20%) and energy efficiency (6.4-27.9%), compared to state-of-the-art techniques.
16:00	IP1-24, 45	A FAULT DETECTION MECHANISM IN A DATA-FLOW SCHEDULED MULTITHREADED PROCESSOR Speakers: Jian Fu¹, Qiang Yang¹, Raphael Poss¹, Chris Jesshope¹ and Chunyuan Zhang² ¹University of Amsterdam, NL; ²National University of Defense Technology, CN Abstract This paper designs and implements the Redundant Multi-Threading (RMT) in a Data-flow scheduled Multi-Threaded (DMT) multicore processor, called Data-flow scheduled Redundant Multi-Threading (DRMT). Meanwhile, It presents Asynchronous Output Comparison (AOC) for RMT techniques to avoid fault detection related inter-core communication and alleviate the performance and hardware overheads induced by output comparison. Results show that the performance overhead of DRMT is less than 60% even when the number of threads is four times the number of processing elements. Also the performance and hardware overheads of AOC are insignificant.
16:00		End of session Coffee Break in Exhibition Area On Tuesday-Thursday the coffee and lunch breaks will be located in the Exhibition Area (Terrace Level).

< Return to last page

Submissions

3.7 On line Strategies for Reliability