7.2 Embedded Tutorial: Cross Layer Resiliency in Real World

Printer-friendly version PDF version

Date: Wednesday 26 March 2014
Time: 14:30 - 16:00
Location / Room: Konferenz 6

Organiser:
Vikas Chandra, ARM, US

Chair:
Yanjing Li, Intel, US

Co-Chair:
Ulf Schlichtmann, TUM, DE

Resilience at different design hierarchies will be needed in Complex SoCs to handle failures due to variability, reliability and design errors (logical or electrical). The main reasons for the marginal behavior are sheer design complexity, uncertainties in manufacturing processes, temporal variability and operating conditions. In this session, we will cover the basics of cross layer resiliency and explore the reliability challenges in both embedded processors as well as large scale computing resources.

TimeLabelPresentation Title
Authors
14:307.2.1CROSS-LAYER RESILIENCE EXPLORATION AND OPTIMIZATION
Speaker:
Subhasish Mitra, Stanford University, US
Abstract
This talk will discuss systematic methodologies for exploring cross-layer resilience, encompassing error detection, correction and recovery techniques, for complex SoCs. The objective is to address several key questions such as: 1. Given a design, is cross-layer resilience always the best option? 2. What are the right models that link resilience techniques across multiple layers for quick, yet accurate, estimation of coverage and costs? 3. What is the proper framework to explore the large space of existing resilience techniques for error detection, correction, and recovery across various abstraction layers?
15:007.2.2RELIABILITY CHALLENGES IN EMBEDDED PROCESSORS
Speaker:
Vikas Chandra, ARM, US
Abstract
Embedded processors are now at the heart of the mobile revolution and have the aspirations to power even high performance data centers. It is of utmost importance to understand the reliability challenges in embedded processors and find ways to tackle them across different layers of design abstraction. In this talk, I will talk about the reliability requirements in embedded processors, the challenges we are facing and our approach to make the design more robust. We will discuss our approaches of measuring wearout in commercial processors as well as efficient design of in-situ monitors to track timing errors.
15:307.2.3BILLION CHIPS OF TRILLION TRANSISTORS: HOW TO MAKE THEM RELIABLE?
Speakers:
Chen-Yong Cher1 and Silvia Mueller2
1IBM Research, US; 2IBM Boeblingen, DE
Abstract
Due to increasing demand for personal devices, high performance computing systems and commercial data centers, microprocessor and main memory designers face numerous challenges in delivering large number of chips at effective cost. While frequency scaling effectively ended, technology scaling continues to provide increasing number of transistors. To effectively utilize these transistors for performance, designers turn to sophisticated and highly integrated chip designs such as multi-core (e.g., Intel i7, IBM POWER7, BlueGene/Q), GPGPU (e.g., NVIDIA Tigra) heterogeneous SoC (e.g., IBM Wirespeed). The increasing demand for chips and transistors presents numerous challenges on reliability, power and manufacturing costs. In large scale HPC systems and data centers, the increasing number of chips also raises per-chip reliability requirement in order to achieve system reliability targets.
16:00End of session
Coffee Break in Exhibition Area
On Tuesday-Thursday the coffee and lunch breaks will be located in the Exhibition Area (Terrace Level).