A Soft‐Error Resilient Route Computation Unit for 3D Networks‐on‐Chips

Alexandre Coelho1,a, Amir Charif1,2,b, Nacer-Eddine Zergainoh1,c, Juan Fraire1,3,d and Raoul Velazco1,e
1Université Grenoble Alpes, CNRS, Grenoble INP, TIMA,Grenoble, France
2Computing and Design Environment Laboratory, CEA LIST, Gif-sur-Yvette, France
3Universidad Nacional de Córdoba, CONICET, Córdoba, Argentina
aAlexandre.Coelho@univ-grenoble-alpes.fr
bAmir.Charif@univ-grenoble-alpes.fr
cNacer-Eddine.Zergainoh@univ-grenoble-alpes.fr
dJuan.Fraire@univ-grenoble-alpes.fr
eRaoul.Velazco@univ-grenoble-alpes.fr

ABSTRACT


Three‐dimensional Networks‐on‐Chips (3D‐NoCs) have emerged as an alternative to further enhance the performance, functionality, and packaging density of 2D‐NoCs. However, the increasing complexity of NoC routers, the continuous miniaturization of silicon technology, the lower operating voltages, and the higher operating frequencies have made the NoC increasingly vulnerable to soft errors. In particular, transient faults occurring in the route computation unit (RCU) can provoke misrouting which may lead to severe effects such as deadlocks or packet loss, corrupting the operation of the entire chip. By combining a reliable fault detection circuit leveraging circuit‐level double‐sampling, with a cost‐effective rerouting mechanism, we develop a full fault‐tolerance solution that can efficiently detect and correct such fatal errors before the affected packets leave the router. To validate the proposed solution, we also introduce a novel method for simulation‐based fault‐injection based on the NoC's gate‐level netlist. Experimental results obtained from a partially and vertically connected 3D‐NoC indicate that our solution can provide a high level of reliability in the presence of errors, at the expense of an area and power overhead of 4.1% and 6.8% respectively.

Keywords: Network‐On‐Chip, Soft‐errors, Fault‐injection, Router Micro‐architecture, Multi‐core System On Chip.



Full Text (PDF)