Resilient and Testable Energy-Efficient Digital Hardware

Resilient and Testable Energy-Efficient Digital Hardware

Power management is an essential enabling technology in today’s and future’s low-power devices. The downside of power management is that it decreases the reliability and increases the testability cost of energy-efficient hardware as demonstrated by recent academic and industrial research, including the one reported by the investigation team.
Currently there are no fault models or test methods for power distribution networks and power management circuitry and no on-line soft error monitoring and correction methods for power management hardware. This project was focused on developing new fault models, methods, circuits and their validation (simulation, FPGA and ASIC) to quantify and improve the resilience and testability of energy efficient digital hardware. Emphasis was placed upon cost-effectiveness through joint consideration of reliability, and test and re-using on-chip hardware to minimise silicon area, power consumption and impact on functional performance. The reliability analysis for this research was performed using a a high-k 32 nm CMOS technology library. Software tools for integrating the developed techniques with ASIC design flow were proposed. The characterisation was conducted at both SPICE and RT-Level for electrical and timing analysis. Measurements were collected from a custom Arm research chip (TOKACHI).

This was a three-year project funded by EPSRC UK. We demonstrated analytically and experimentally that static power consumption of VLSI designs decreases over time with Bias Temperature Instability (BTI) aging and that leakage power reduction techniques become more efficient with BTI aging. We proposed a sleep transistor design strategy for reliable power gating to harvest the benefits offered by Negative Bias Temperature Instability (NBTI) aging. We also explored dynamic schemes of DVFS drowsy cache memory design to harvest BTI aging benefits for such memories In addition, we researched a novel design-for-testability architecture for testing sleep transistors against stuck-open faults by considering a distributed model for the on-chip power network. Also, we developed a diagnosis algorithm that considers a distributed model for the on-chip power network to grade the impact of stuck-open faults on power gating designs and the impact of resistive faults on their power reduction efficiency. Furthermore, we proposed a novel low cost coarse-grained sensor for BTI by exploiting the impact of BTI on leakage current and by reusing power gating DFT infrastructure. Moreover, a comprehensive evaluation was conducted on the effects of BTI on level shifters and a selective fault tolerance design technique by utilizing probabilistic fault models was developed. In the end, a BTI-aware thermal management technique for DVFS designs using a fine-grained stress and temperature aware simulation flow (SPICE and RT-Level models characterization). The framework considers statistical probable workload for computing devices stress and temperature depending on the system policies that are followed for dynamic thermal management. (best paper award nominee at Defect and Fault Tolerance in VLSI and Nanotechnology Systems Symposium (DFT’16), Connecticut, US).


Related News

Related Publications