3.7 Dealing with Runtime Failures

Printer-friendly version PDF version

Date: Tuesday 15 March 2016
Time: 14:30 - 16:00
Location / Room: Konferenz 5

Chair:
Lorena Anghel, TIMA Laboratory, FR

Co-Chair:
Michel Renovell, LIRMM, FR

Reliability is an important consideration in modern design. Two key issues in runtime resilience are robustness against soft errors and tolerance of aging effects. The papers in this session consider both effects.

TimeLabelPresentation Title
Authors
14:303.7.1A CROSS-LAYER ANALYSIS OF SOFT ERROR, AGING AND PROCESS VARIATION IN NEAR THRESHOLD COMPUTING
Speaker:
Mehdi B. Tahoori, Karlsruhe Institute of Technology (KIT), DE
Authors:
Anteneh Gebregiorgis, Saman Kiamehr, Fabian Oboril, Rajendra Bishnoi and Mehdi B. Tahoori, Karlsruhe Institute of Technology (KIT), DE
Abstract
Near Threshold Computing (NTC) is a promising approach to reduce the power consumption of modern VLSI designs. However, NTC designs suffer from functional failures and performance loss. Understanding the characteristics of the functional failures and variability effects is of decisive importance in order to mitigate them, and get the most out of NTC. This paper presents a cross-layer reliability analysis in the presence of soft errors, aging and process variation effects in the near threshold voltage domain. The objective is to quantify the reliability of different SRAM designs and to find a reliability-performance optimal cache organization for an NTC microprocessor. In this work, the Soft Error Rate (SER) and Signal Noise Margin (SNM) of 6T and 8T SRAM cells and their dependencies on aging and process variation are investigated by considering device, circuit and architecture level analysis. Their experimental results reveal that in NTC, process variation and aging-induced SNM degradation is 2.5X higher than in the super threshold domain while SER is 8X higher. The use of 8T instead of 6T SRAM cells can reduce the system-level SNM and SER by 14\% and 22\% respectively. Besides, we observe that we can find the right balance between performance and reliability by using an appropriate cache organization at NTC which is different from the super threshold.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:003.7.2FAST-YET-ACCURATE VARIATION-AWARE CURRENT AND VOLTAGE MODELLING OF RADIATION-INDUCED TRANSIENT FAULT
Speaker:
Yuwen Lin, National Chiao Tung University, TW
Authors:
Yuwen (Dave) Lin, Yuwen Lin and Hung-Pin Wen, National Chiao Tung University, TW
Abstract
For robust systems, it is important to mitigate radiation effect in early stages to reduce the design cost. Traditionally, a double-exponential current source is widely used to model the transient fault for analyzing the radiation effects. However, in light of complicating effects in the advanced technologies, such approach is no longer sufficient to estimate transient faults and may lead to inaccurate results. Therefore, we propose a fast-yet- accurate approach to model the radiation-induced transient fault, meanwhile considering the interaction between its transient current and transient voltage. Experimental results show that the proposed method can achieve 10^5X speedup with an average accuracy loss of only 2.6% compared to the 3D mixed-mode TCAD simulation. Moreover, variation sources also become big issues with the progressing technology nodes and thus the proposed method is then extended to incorporate these variations during transient-fault analysis. As a result, sensitivity analysis that covers voltage, gate-length and device-width variations can be performed fast and accurately in our method.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:303.7.3A DETAILED METHODOLOGY TO COMPUTE SOFT ERROR RATES IN ADVANCED TECHNOLOGIES
Speaker:
Marc Riera, Universitat Politècnica de Catalunya (UPC), ES
Authors:
Marc Riera1, Ramon Canal2, Jaume Abella3 and Antonio Gonzalez2
1Universitat Politècnica de Catalunya (UPC), ES; 2UPC-Barcelona, ES; 3Barcelona Supercomputing Center, ES
Abstract
System reliability has become a key design aspect for computer systems due to the aggressive technology miniaturization. Errors are typically dominated by transient faults due to radiation and are strongly related to the technology used to build hardware. However, there is a lack of detailed methodologies to model and fairly compare Soft Error Rates (SER) across different advanced technologies. This work first describes a common methodology that from (1) technology models, (2) location (latitude, longitude and altitude), (3) operating conditions and (4) circuit descriptions (i.e. SRAM, latches, logic gates) can obtain accurate Soft Error Rates. Then, we use it to characterize soft errors through current and future technologies. Results at the technology layer show that new technologies, such as FinFET and SOI, can reduce SER up to 100x while the location can increase SER up to 650x.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:453.7.4ANALYSIS OF NBTI EFFECTS ON HIGH FREQUENCY DIGITAL CIRCUITS
Speaker:
Ahmet Unutulmaz, OFFIS Institute for Information Technology, DE
Authors:
Ahmet Unutulmaz1, Domenik Helms1, Reef Eilers1, Malte Metzdorf1, Ben Kaczer2 and Wolfgang Nebel3
1OFFIS Institute for Information Technology, DE; 2IMEC, BE; 3University of Oldenburg and OFFIS, DE
Abstract
This paper analyzes some of the secondary effects in estimating negative bias temperature instability (NBTI) induced threshold voltage shift on high frequency digital circuits. Therefore, a circuit model is developed to be used for statistical estimation of the threshold voltage shift. Making use of this model as well as technology computer aided design (TCAD) and SPICE simulations, a methodology is developed to estimate NBTI induced threshold voltage shift. Simulation results reveal that commonly made assumptions on digital circuits, such as: square signal assumption and ignorable effect of drain bias, may yield overestimation of the NBTI induced threshold voltage shift by more than 10% after five years of operation, which may lead to a severe underestimation of a circuit's reliability

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP1-18, 91RT LEVEL TIMING MODELING FOR AGING PREDICTION
Speaker:
Nils Koppaetzky, OFFIS Institute for Information Technology, DE
Authors:
Nils Koppaetzky1, Malte Metzdorf1, Reef Eilers1, Domenik Helms1 and Wolfgang Nebel2
1OFFIS Institute for Information Technology, DE; 2University of Oldenburg and OFFIS, DE
Abstract
The simulation of aging related degradation mech- anisms is a challenging task for timing and reliability estimations during all design phases of digital systems. Some good approaches towards accurate, efficient and applicable timing models at the register transfer level (RTL) have already been made. However recent state-of-the-art models often have to access lower levels of abstraction, such as the underlying gate-level netlist for each timing estimation and require to repeat every analyzing step if parameters, input signals or designs are changed. This work introduces a new RTL timing model concept that provides a separation of design analysis and aging estimation. It allows more efficient design evaluations with respect to aging. Although this is work in progress and systematic evaluations are still ongoing, early results indicate the applicability and capability of the approach to compete with recent models both in accuracy and efficiency.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area