Run-time Adaptation of Embedded Systems for Power-Noise Issues

Run-time Adaptation of Embedded Systems for Power-Noise Issues

In the past, the very high associated design and validation cost for assuring the power integrity of SoCs could only be justified for enterprise applications. Nowadays, Internet-of-Things (IoT) applications require dependable embedded systems based on heterogeneous SoCs. However, the dependability of an application is affected by the power integrity of the SoC. At the same time, the multiple electrical states of a heterogeneous SoC have increased the cost for validating their power integrity.

In the domain of high performance mobile computing, the trend towards GHz+ operating frequencies and the ubiquity of low-power techniques make these systems limited by their power delivery and susceptible to pathological AC transients that undermine their reliability. Many studies have shown that power noise can lead to system failures, when operating frequency causes power-network to resonate. Also, the operating frequencies of these circuits are limited by their power-delivery due to the strict power budget of mobile devices that pushes operating voltage boundaries near threshold.

Near-threshold computing, though, makes these circuits more susceptible to data corruptions and system failures due to voltage noise. Voltage noise is sensitive to micro-architectural events driven by hardware and software interactions and can only be accurately evaluated online through direct measurement.

When the supply voltage droops to low values, then both dynamic and static power consumption reduce; however silent data corruptions and system crashes have been observed, which are very sensitive to inter-core process variation and the executed workload. As system designs become more complex in order to cope with a variety of applications, and the power noise depends on process variation and run-time conditions (workload, operating voltage and frequency), there is the need for run-time adaptation of the system to the exhibited power noise in order to optimize its power consumption.

This project’s aims are threefold:

  1. First, we aim at the development of a novel on-chip power integrity characterisation paradigm (software and hardware). In order to explore the power supply noise voltage magnitude droops on different cores at the system level. We utilized a system with a direct power-network measurement instrument. The dual-core Cortex-A57 / Cortex-A72 Platform (internally at ARM and code-named as the Juno platform) architecture comes with an on-chip digital-sampling oscilloscope (DSO) circuitry that supports direct measurement of the power-delivery network. Therefore, the first step of our research towards understanding and mitigating the problem was to develop a Linux driver of the instrument.
  2. We also aim at reducing the design cost of dependable embedded systems for meeting their strict time-to-market constraints of many applications. Specifically, we plan to collect data by intentionally undermining the power integrity of the developed paradigm, e.g. using various operating conditions and dynamic workloads, in order to develop novel reliability models for power and signal integrity that can assist design for power and signal integrity approaches. Based on the developed driver, for the second step, we developed a system-level power-network characterisation and modelling framework that allows the application of system-level benchmark scenarios and stress-tests. The framework allows for simultaneous collection of power-delivery voltage droop results and architectural events data from performance monitoring counters. We are also developing the appropriate analytical tools that can reveal hardware/software interactions from this data. Using the proposed methodology, we characterize the power-integrity implications of execution over an operating system on an ARM Cortex-A57 cluster. For example, this framework revealed that context switches and system-calls exhibit high noise, in many cases similar to that of stress tests and also many limitations on system timestamps.
  3. For mitigation, we aim at the dynamic runtime adaptation of embedded systems to the power supply environment of their applications, which will be explored through the integration of the developed models with the PRiME runtime software. We are exploring resonating power-network detection circuitry and software that can be employed for detecting voltage emergencies and drive accordingly a mitigation strategy such as dynamic voltage/frequency scaling. We have developed a run-time management system that detects and mitigates resonating power-networks software-based run-time resonating power-network on a dual-core arm cortex-a57 cluster. Through, an embedded power-network sensor that directly-measures the voltage of the power-network of an arm Cortex-a57 cluster, we characterize voltage emergencies of workloads and noise-viruses and we generate a predictive model of droops magnitude based on the cluster operating frequency and sensor voltage droops rate. Upon an imminent resonating emergency detection, the proposed run-time system adapts the operating frequency of the cluster in order to break out of the resonating conditions. The proposed system offers a trade-off between reliability and performance and achieves to dramatically reduce the amount of voltage emergencies with negligible performance reduction.

Related News