W1 International Workshop on Dependable GPU Computing

Fri, 2014-03-28
Konferenz 1

Workshop Organizers:
Dimitris Gizopoulos, University of Athens, GR (Contact Dimitris Gizopoulos)
Hans-Joachim Wunderlich, University of Stuttgart, DE (Contact Hans-Joachim Wunderlich)
Paolo Prinetto, Politecnico di Torino, IT (Contact Paolo Prinetto)


Workshop scope

Many-core hardware accelerators offer significant speedup for parallel applications in different computing segments. Some are originally architected for general-purpose parallel computing while others, like Graphics Processing Units (GPUs), are initially designed for graphics applications, only. GPUs are intensively used today for general-purpose computing (GPGPU computing) through the employment of effective software development frameworks. GPUs and other many-core accelerators have penetrated a very wide range of applications: from embedded and low-power devices to the highest performance super-computers.

This workshop focuses on an important aspect of GPUs and massively parallel hardware accelerators: dependability and corresponding performance and power/energy considerations. GPUs and accelerators implemented in modern manufacturing technologies are vulnerable (like all other chips) to transient as well as to permanent faults due to radiation, manufacturing defects, variability, aging, etc. In general-purpose computing, the correctness of operation has a much higher priority than in graphics and the applications are expected to deliver correct and fast results. Such dependability provision can be realized at the circuit level, the architectural level, the software level or a combination. However, dependability support significantly affects: (a) the delivered performance of the application, and (b) its power/energy consumption behavior. Industry and academia views on the dependability of GPUs will be discussed during the workshop.

Topics to be discussed in the workshop include (but are not limited to) the following:

  • Dependability requirements from GPUs and accelerators in different application domains: embedded computing vs. high-performance computing, low-power devices vs. large-scale systems, scientific computing vs. graphics processing and gaming.
  • Effective and reliable use of GPUs and accelerators in the various fields of electronic design automation among others.
  • Experimental evaluation, measurements and case studies for GPUs and accelerators dependability.
  • Dependability enhancement methodologies (software-based, hardware-based or mixed) for GPUs and accelerators.
  • Performance penalty, hardware cost, and power/energy overhead of dependability support for GPUs and accelerators.
  • CPU vs. GPU tolerance comparison against transient or permanent hardware faults.

Important Dates

  • Poster submission deadline: February 10, 2014.
    • Poster submission is a 1-page short summary (in the standard IEEE 2-column conference format).
    • Submit in PDF to the Workshop organizers emails above.
  • Poster acceptance notification: February 21, 2014.
  • Final poster and accompanying paper deadline (up to 2 pages): March 15, 2014.


08:30Opening Session
08:30W1.1.1Opening Remarks (Paper/SoftConf ID: 1331)
Dimitris Gizopoulos1, Hans-Joachim Weunderlich2 and Paolo Prinetto3
1University of Athens, GR; 2University of Stuttgart, DE; 3Politecnico di Torino, It

08:30W1.1.2Keynote 1: GPGPU for dependable systems – a blessing or a curse? (Paper/SoftConf ID: 1233)
Avi Mendelson, Technion, IL

09:15Invited Talk 1
09:15W1.2.1GPGPU Reliability – Challenges and Research Directions (Paper/SoftConf ID: 1234)
Sudhanva Gurumurthi, AMD, US

09:45Session 1 – “Software Approaches for GPUs Dependability Enhancement”

Murali Annavaram, University of Southern California, Los Angeles, US

Amir Nahir, IBM Research, IL

09:45W1.3.1An improved fault mitigation strategy for CUDA Fermi GPUs (Paper/SoftConf ID: 1235)
Stefano Di Carlo, Giulio Gambardella, Ippazio Martella, Paolo Prinetto, Daniele Rolfo and Pascal Trotta, Politecnico di Torino, IT

10:05W1.3.2Software-Based Techniques for Reducing the Vulnerability of GPU Applications (Paper/SoftConf ID: 1236)
Si Li1, Vilas Sridharan2, Sudhanva Gurumurthi2 and Sudhakar Yalamanchili1
1Georgia Tech., US; 2AMD, US

10:25W1.3.3A-ABFT: Autonomous Algorithm-Based Fault Tolerance on GPUs (Paper/SoftConf ID: 1237)
Claus Braun, Sebastian Halder and Hans-Joachim Wunderlich, University of Stuttgart, DE

10:45Coffee Break+Posters
11:30Invited Talk 2
11:30W1.4.1Reliable Acceleration – Reliability in a World of GPUs & Other Special Purpose Accelerators (Paper/SoftConf ID: 1238)
Arijit Biswas, Intel, US

13:00Keynote 2
13:00W1.5.1GPU Related Errors in Large Scale Systems: A Study of Blue Waters Supercomputer at NCSA-Illinois (Paper/SoftConf ID: 1329)
Ravishankar K. Iyer, University of Illinois at Urbana-Champaign, US

13:45Session 2 – “Fault Detection and Tolerance in GPUs”

Nathan DeBardeleben, Los Alamos National Laboratory, US

Hans-Joachim Wunderlich, University of Stuttgart, DE

13:45W1.6.1Benefits and Countermeasures of Increasing the GPU code Degree of Parallelism (Paper/SoftConf ID: 1239)
Paolo Rech and Luigi Carro, UFRGS, BR

13:45W1.6.2On the Evaluation of Soft-Errors Detection Techniques for GPGPUs (Paper/SoftConf ID: 1240)
Davide Sabena1, Matteo Sonza Reorda1, Luca Sterpone1, Paolo Rech2 and Luigi Carro2
1Politecnico di Torino, IT; 2UFRGS, BR

13:45W1.6.3Tolerating Hard Faults in GPGPUs (Paper/SoftConf ID: 1241)
Waleed Dweik, Mohammad AbdelMajeed and Murali Annavaram, University of Southern California, US

14:45Coffee Break
15:15Panel Session

Dimitris Gizopoulos, University of Athens, GR, Contact Dimitris Gizopoulos

15:15W1.7.1Faults in CPUs and GPUs: Same or Different Problems? Same or Different Solutions? (Paper/SoftConf ID: 1268)
Sudhakar Yalamanchili1, Ravishankar K. Iyer2, Stefano Di Carlo3, Sudhanva Gurumurthi4, Arijit Biswas5 and Bodo Hoppe6
1Georgia Tech., US; 2University of Illinois at Urbana-Champaign, US; 3Politecnico di Torino, IT; 4AMD, US; 5Intel, US; 6IBM, DE

16:45Closing Session