M09 Energy-Efficient System Design Through Error-Resilient Computing

Printer-friendly versionPDF version
Location / Room: 
Konferenz 3


Saibal Mukhopaddhyay, University of Georgia Tech., US
Shidhartha Das, ARM Ltd., GB
Anand Raghunathan, Purdue University, US
Srimat Chakradhar, NEC Labs, US


This is a half-day tutorial that covers a broad range of technologies for error-resilient computing, and highlights the significant role of resiliency technologies in achieving high energy-efficiency across different levels of abstractions (circuit, hardware architecture and software) in modern computing systems. Safety-margins added to address the impact of rising variations at nanometer geometries incur unacceptable power and performance overheads. Traditional adaptive techniques compensate for some manifestations of these variations, however, they require margins to account for localized and fast-changing variations. The adverse impact of margins has led to a recent research focus on so-called "error-resilient" techniques, both in academia and industry. Resilient techniques permit computational errors to occur at run-time, either by operating without the full setup margin or by deliberately designing for inexact outputs. In lieu of the "always-correct" output as mandated in the traditional model of computing, computing with errors enables significant improvements in energy-efficiency as long as the error-rate and/or the magnitude of errors are sufficiently low. Resilient techniques have wide-ranging applications that span high-performance general-purpose computing to digital signal processing (DSP) algorithms. In this tutorial, we provide an in-depth overview of error-resilient techniques encompassing circuits, micro-architectural, algorithmic and system-architecture aspects. We organize the material into two segments. In the first, we discuss error-resilient techniques for bit-exact applications where perfect recovery from errors is a key requirement. We briefly review the existing design space for traditional adaptive techniques and motivate the case for error-resiliency by analyzing the additional margins eliminated through explicit error-detection and correction. We then discuss error-detection and recovery approaches for microprocessor pipelines highlighting "Razor" as a specific example. We present measurement results from academia and industry on resilient techniques similar to Razor. The second segment of the tutorial focuses on "approximate" computing; an approach to computing that defines correctness as producing outputs of acceptable "quality". Many applications (such as web search, data analytics, sensor data processing, recognition, mining, and synthesis) have a high degree of intrinsic resilience to their underlying computations being executed incorrectly. We review software, hardware architecture and circuit design techniques to build approximate computing systems. These new techniques significantly improve performance or energy efficiency while ensuring that the results produced are acceptable. We will conclude with a discussion of the key challenges that need to be addressed in order to facilitate a broader adoption of approximate computing.



14:30M09.1Session 1
00:00M09.1.1Error-resilient Computing - Motivation and Example Applications
Saibal Mukhopaddhyay, University of Georgia Tech, US

00:00M09.1.2Error-resilience for general-purpose computing - Razor
Shidhartha Das, ARM Ltd, GB

16:30M09.2Session 2
00:00M09.2.1Approximate Computing - A circuits and architecture perspective
Anand Raghunathan, Purdue University, US

00:00M09.2.2Approximate Computing - A software and applications perspective
Srimat Chakradhar, NEC Labs, US