Networks-on-chip (NoCs) are increasingly being adopted as the interconnect model for systems-on-chip and chip-multiprocessors. As the only communication medium in these designs, the NoC’s functional correctness is critical. In practice, design-time verification of NoCs is always partial, due to their large scale and the challenges that hinder verification efforts. As a result, functional design bugs are bound to escape and potentially manifest at runtime, compromising system functionality.
We propose REPAIR, a runtime solution to detect and recover from functional design errors that have escaped in NoCs. Existing runtime verification techniques incur significant area and performance overheads to monitor and check the correctness of every packet traversing the network. However, REPAIR relies on a retransmission-based technique that adaptively determines the subset of packets requiring protection by identifying dynamic network regions where the specific runtime execution is likely to expose functional design bugs. We achieve runtime correctness at lower performance and area costs, relative to a traditional solution: on average, we are able to achieve more than 50% better overall performance with 2-3x fewer retransmission buffers.