A highly robust distributed fault-tolerant routing algorithm for NoCs with localized rerouting
Date Issued
2012
DOI
10.1145/2107763.2107771
Abstract
Denser transistor integration has enabled the fabrication of multi-tile chips, however, at the expense of higher susceptibility to defects and wear-out. Metal wires comprising the links of Networks-on-Chip (NoCs) are especially vulnerable to such defects, which can render some links disconnected. This paper presents a new fault-tolerant routing scheme to sustain on-chip communication. It uses a localized re-routing approach, whereby de-touring around faulty links -- or complex regions of faults -- is done locally at each node in a purely distributed and dynamic manner, while guaranteeing deadlock- and livelock-freedom. Results using synthetic traffic and real applications with full-system simulations prove its efficacy in addressing a large percentage of NoC links being faulty albeit at a gracefully degraded performance mode

