3.3 Methods and Characterisation techniques for Reliability

Printer-friendly version PDF version

Date: Tuesday, March 26, 2019
Time: 14:30 - 16:00
Location / Room: Room 3

Chair:
Said Hamdioui, TU Delft, NL, Contact Said Hamdioui

Co-Chair:
Arnaud Virazel, LIRMM, FR, Contact Arnaud Virazel

This sections discusses the characterisation of BIT and ESD as well as a methodology to analyse the aging of SRAMs

TimeLabelPresentation Title
Authors
14:303.3.1NEW METHOD FOR THE AUTOMATED MASSIVE CHARACTERIZATION OF BIAS TEMPERATURE INSTABILITY IN CMOS TRANSISTORS
Speaker:
Pablo Sarazá Canflanca, Universidad de Sevilla, ES
Authors:
Pablo Saraza-Canflanca1, Javier Diaz-Fortuny2, Rafael Castro-Lopez3, Elisenda Roca4, Javier Martin-Martinez2, Rosana Rodriguez5, Montserrat Nafria6 and Francisco Vidal Fernandez7
1Universidad de Sevilla (US) - Instituto de Microelectrónica de Sevilla (IMSE), ES; 2Universitat Autonoma de Barcelona UAB, ES; 3Instituto de Microelectrónica de Sevilla IMSE, ES; 4Instituto de Microelectrónica IMSE, ES; 5Universidad Autonoma de Barcelona, ES; 6Universitat Autonoma de Barcelona, ES; 7Universidad de Sevilla - Instituto de Microelectrónica de Sevilla, ES
Abstract
Bias Temperature Instability has become a critical issue for circuit reliability. This phenomenon has been found to have a stochastic and discrete nature in nanometer-scale CMOS technologies. To account for this random nature, massive experimental characterization is necessary so that the extracted model parameters are accurate enough. However, there is a lack of automated analysis tools for the extraction of the BTI parameters from the extensive amount of generated data in those massive characterization tests. In this paper, a novel algorithm that allows the precise and fully automated parameter extraction from experimental BTI recovery current traces is presented. This algorithm is based on the Maximum Likelihood Estimation principles, and is able to extract, in a robust and exact manner, the threshold voltage shifts and emission times associated to oxide trap emissions during BTI recovery, required to properly model the phenomenon.
15:003.3.2GUILTY AS CHARGED: COMPUTATIONAL RELIABILITY THREATS POSED BY ELECTROSTATIC DISCHARGE-INDUCED SOFT ERRORS
Speaker:
Keven Feng, University of Illinois at Urbana Champaign, US
Authors:
Keven Feng, Sandeep Vora, Rui Jiang, Elyse Rosenbaum and Shobha Vasudevan, ECE at Univ. of Illinois at Urbana-Champaign, US
Abstract
Electrostatic discharge (ESD) has been shown to cause severe reliability hazards at the physical level, resulting in permanent and transient failures. We present the first analysis of the effects of ESD induced errors on instruction level computation. Our data was measured on microcontroller test chip fabricated for this study, with discharges from a controlled ESD gun. Cosmic ray induced soft errors have been widely researched, and modeled as single event upsets (SEUs). Our observations across multiple trials on 3 test chips show that in contrast to radiation induced errors, ESD can cause much more widespread errors than SEUs. In our trials, we observe system hangs and clock glitches which are serious errors. We also observe errors in the following categories. Category A: multiple bit corruptions across multiple registers, Category B: multiple bit corruptions in the same register, and Category C: single bit corruptions across multiple registers. At the instruction level, these errors manifest as system hangs, serious malfunctioning of I/O operations, interrupt operations, data and program memory. We demonstrate that ESD induced errors form a significant reliability threat to higher level functionality, warranting modeling and mitigation techniques.
15:303.3.3METHODOLOGY FOR APPLICATION-DEPENDENT DEGRADATION ANALYSIS OF MEMORY TIMING
Speaker:
Daniel Kraak, Delft University of Technology, NL
Authors:
Daniel Kraak1, Innocent Agbo1, Mottaqiallah Taouil1, Said Hamdioui1, Pieter Weckx2, Stefan Cosemans2 and Francky Catthoor2
1Delft University of Technology, NL; 2imec vzw., BE
Abstract
Memory designs typically contain design margins to compensate for aging. As aging impact becomes more severe with technology scaling, it is crucial to accurately predict such impact to prevent overestimation or underestimation of the margins. This paper proposes a methodology to accurately and efficiently analyze the impact of aging on the memory's digital logic (e.g., timing circuit and address decoder) while considering realistic workloads extracted from applications. To demonstrate the superiority of the methodology, we analyzed the degradation of the L1 data and instruction caches for an ARM v8-a processor using both our methodology as well as the state-of-the-art methods. The results show that the existing methods may significantly over- or underestimate the impact (e.g., the decoder margin up to 221% and the access time up to 20%) as compared with the proposed scheme. In addition, the results show that in general the instruction cache has the highest degradation. For example, its access time degrades up to 9% and its decoder margin up to 44%.
16:00IP1-14, 303CHIP HEALTH TRACKING USING DYNAMIC IN-SITU DELAY MONITORING
Speaker:
Hadi Ahmadi Balef, Eindhoven University of Technology, NL
Authors:
Hadi Ahmadi Balef1, Kees Goossens2 and José Pineda de Gyvez1
1Eindhoven University of Technology, NL; 2Eindhoven university of technology, NL
Abstract
Tracking the gradual effect of silicon aging on circuit delays requires fine-grain slack monitoring. The conventional slack monitoring techniques intend to measure the worst-case static slack, i.e. the slack of longest timing path. In sharp contrast to the conventional techniques, we propose a novel technique that is based on dynamic excitation of in-situ delay monitors (i.e. the dynamic excitation of timing paths that are monitored). As delays degrade, path delays increase and the monitors are excited more frequently. With the proposed technique, a fine-grained signature of delay degradation is extracted from the excitation rate of monitors. The in-situ monitors are inserted at intermediate points along timing paths to increase the sensitivity of signature to delay degradation. A new efficient monitor insertion algorithm is also proposed that reduces the number of monitors by ~2.1X compared to other works for an ARM Cortex M0 processor.
16:01IP1-15, 541PCFI: PROGRAM COUNTER GUIDED FAULT INJECTION FOR ACCELERATING GPU RELIABILITY ASSESSMENT
Speaker:
Fritz Previlon, Northeastern University, US
Authors:
Fritz Previlon, Charu Kalra, Devesh Tiwari and David Kaeli, Northeastern University, US
Abstract
Reliability has become a first-class design objective for GPU devices due to increasing soft-error rate. To assess the reliability of GPU programs, researchers rely on software fault-injection methods. Unfortunately, software fault-injection process is prohibitively expensive, requiring multiple days to complete a statistically sound fault-injection campaign. Therefore, to address this challenge, this paper proposes a novel fault-injection method, PCFI, that reduces the number of fault injections by exploiting the predictability in fault-injection outcome based on the program counter of the soft-error affected instruction. Evaluation on a variety of GPU programs covering a wide range of application domains shows that PCFI reduces the time to complete fault-injection campaigns by 22% on average without sacrificing the accuracy.
16:02IP1-16, 696CHARACTERIZING THE RELIABILITY AND THRESHOLD VOLTAGE SHIFTING OF 3D CHARGE TRAP NAND FLASH
Speaker:
Weihua Liu, Huazhong University of Science and Technology, CN
Authors:
Weihua Liu1, Fei Wu1, Meng Zhang1, Yifei Wang1, Zhonghai Lu2, Xiangfeng Lu3 and Changsheng Xie1
1Huazhong University of Science and Technology, CN; 2KTH Royal Institute of Technology, SE; 3Beijing Memblaze Technology Co., Ltd., CN
Abstract
3D charge trap (CT) triple-level cell (TLC) NAND flash gradually becomes a mainstream storage component due to high storage capacity and performance, but introducing a concern about reliability. Fault tolerance and data management schemes are capable of improving reliability. Designing a more efficient solution, however, needs to understand the reliability characteristics of 3D CT TLC NAND flash. To facilitate such understanding, by exploiting a real-world testing platform, we investigate the reliability characteristics including the raw bit error rate (RBER) and the threshold voltage (Vth) shifting features after suffering from variable disturbances. We give an analysis of why these characteristics exist in 3D CT TLC NAND flash. We hope these observations can guide the designers to propose high efficient solutions to the reliability problem.
16:03IP1-17, 882HIDDEN DELAY FAULT SENSOR FOR TEST, RELIABILITY AND SECURITY
Speaker:
Giorgio Di Natale, CNRS - TIMA, FR
Authors:
Giorgio Di Natale1, Elena Ioana Vatajelu2, Kalpana SENTHAMARAI KANNAN2 and Lorena Anghel3
1LIRMM, FR; 2TIMA, FR; 3Grenoble-Alpes University, FR
Abstract
In this paper we present a novel hidden-delay-fault sensor design and a preliminary analysis of its circuit integration and applicability. In our proposed method, the delay sensing is achieved by sampling data on both rising and falling clock edges and using a variable duty cycle to control the range of the sensed delay fault. The main advantage of our proposed method is that it works at nominal frequency, can cover a wide range of delay faults and it is versatile in its applicability. It can be used (i) during testing to perform user-defined hidden-delay-fault test, (ii) for reliability degradation estimation due to process, environmental variations and ageing, and (iii) in security to detect the insertion of Trojan horses that alter the path delay.
16:03IP1-18, 219EFFECT OF DEVICE VARIATION ON MAPPING BINARY NEURAL NETWORK TO MEMRISTOR CROSSBAR ARRAY
Speaker:
Wooseok Yi, POSTECH, KR
Authors:
Wooseok Yi1, Yulhwa Kim1 and Jae-Joon Kim2
1Pohang University of Science and Technology, KR; 2Pohang University of Science and Techology, KR
Abstract
In memristor crossbar array (MCA)-based neural network hardware, it is generally assumed that entire wordlines (WLs) are simultaneously enabled for parallel matrix-vector multiplication (MxV) operation. However, the error probability of MxV in a memristor crossbar array (MCA) increases as the resistance ratio (R-ratio) of a memristor decreases and the resistance variation and the number of simultaneously activated WLs increase. In this paper, we analyze the effect of R-ratio and variation of memristor devices on read sense margin and inference accuracy of MCA-based Binary Neural Network (BNN) hardware. We first show that only a limited number of WLs should be enabled to ensure correct MxV output when the R-ratio is small. On the other hand, we also show that, if the resistance variation becomes higher than a certain level, simultaneous activation of large number of WLs produces the higher accuracy even when R-ratio is small. Based on the analysis, we propose the Accuracy Estimation (AE) factor to find the optimal number of word lines that are simultaneously activated.
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the ""Lunch Area"" to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in ""TBD"" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 27, 2019

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in ""TBD"" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 28, 2019

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in ""TBD"" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00