9.7 Temperature Awareness in Computing Systems

Time	Label	Presentation Title Authors
08:30	9.7.1	THERMAL-AWARE DYNAMIC PAGE ALLOCATION POLICY BY FUTURE ACCESS PATTERNS FOR HYBRID MEMORY CUBE (HMC) Speaker: Wei Hen Lo, National Tsing Hua University, TW Authors: Wei-Hen Lo, Kai-zen Liang and TingTing Hwang, National Tsing Hua University, TW Abstract The Hybrid Memory Cube (HMC) is a promising solution to overcome memory wall by stacking DRAM chips on top of a logic die and connecting them with dense and fast Through Silicon Vias (TSVs). However, 3D stacking technique brings another problem: high temperature and temperature variations between the DRAM dies. The thermal problem may lead to chip failure of 3D stacked DRAMs since the temperature may exceed the maximum operating temperature. Dynamic thermal management (DTM) scheme such as bandwidth throttling can effectively decrease the temperature. However, it results in the loss of the performance. To maximize the performance of the system with HMC, the appropriate memory mapping should consider the thermal characteristics of HMC, memory interference and bandwidth variations among processes, and current temperature conditions of each memory channel. This paper proposes a thermal-aware dynamic OS page allocation using future access pattern to find a best performance-oriented setting of the above factors. An analytical model has been proposed to estimate the system performance considering the memory interference, the bandwidth variation, and the throttling impact. Our method can improve the system performance by 12.7% compared to best performance-oriented allocation method (MCP) [1]. The average error rate of our analytical model to predict the trend of performance variations is only 0.86%. Download Paper (PDF; Only available from the DATE venue WiFi)
09:00	9.7.2	MINIMIZING PEAKTEMPERATURE FOR PIPELINED HARD REAL-TIME SYSTEMS Speaker: Long Cheng, Technische Universität München (TUM), DE Authors: Long Cheng¹, Kai Huang², Gang Chen¹, Biao Hu¹ and Alois Knoll¹ ¹Technische Universität München (TUM), DE; ²Sun Yat-sen University, CN Abstract This paper addresses the problem of minimizing the peak temperature for pipelined multi-core systems under hard end-to-end deadline constraints by adversely using the Pay-Burst-Only-Once principle. The Periodic Thermal Management is adopted to control the temperature and every core is periodically switched between two power modes. With the peak temperature representation, we first formulate the problem of finding the thermal optimal periodic schemes which satisfies deadline constraints and then present a fast heuristic algorithm to solve it. Adopting real life processor platforms and applications, our simulation demonstrates that our approach reduces the peak temperature by up to 15 celsius on the 4-stage arm platform compared to sub-deadline partition approach. Moreover, our algorithm is shown to be scalable w.r.t. the number of pipelined stages and its effectiveness is validated by the brutally searching approach. Download Paper (PDF; Only available from the DATE venue WiFi)
09:30	9.7.3	THERMAL AWARE SCHEDULING AND MAPPING OF MULTIPHASE APPLICATIONS ONTO CHIP MULTIPROCESSOR Speaker and Author: Aryabartta Sahu, IIT Guwahati, IN Abstract Thermal hot spot and high temperature gradient degrades the reliability and performance of chip multiprocessor. This is an important issue in the current days high transistor density chip multiprocessor. In this paper, we explored the benefits of different temperature aware scheduling and mapping approaches of applications onto chip multiprocessor to reduce the peak temperature. As most application's run time exhibit phase wise behavior, we have exploited the run time phase wise power consumption behavior of the applications to schedule and map the applications on to multicore chip to reduce peak temperature. We have evaluated five scheduling approaches (critical path, modified critical path, energy capped critical path, naive load balancing, and task partitioning and scheduling (TPS)) and five mapping approaches (random, greedy, row-col, checker board and boundary fix checker board) for both synthetic data and real benchmarks on assumed $8 imes 8$ chip multiprocessor. We have taken benefit of both (a) optimal scheduling of tree or chain of unit time tasks on multiprocessor using critical path heuristics and (b) phase wise behavior of applications. Result shows that greedy based mapping approach perform badly as compared to simple low overhead (without incurring extra cost of temperature sensing or prediction) location exchange based approaches when the effect of temperature of neighbor processors is significant. Boundary fix checker board mapping approach achieves up to 40\% reduction in peak temperature as compared to costly greedy mapping approach. Also our results shows critical path based scheduling in combination with location based mapping can reduce peak temperature of chip significantly without much increasing the execution time in executing phase wise applications on chip multiprocessor. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00	IP4-14, 207	FREQUENCY SCHEDULING FOR RESILIENT CHIP MULTI-PROCESSORS OPERATING AT NEAR THRESHOLD VOLTAGE Speaker: Huawei Li, Chinese Academy of Sciences, CN Authors: Ying Wang, Huawei Li and Xiaowei Li, Chinese Academy of Sciences, CN Abstract With the recently proposed redundancy-based core salvaging technology, resilient processors can survive the threat of severe timing violation induced by near-threshold Vdd and function correctly at aggressive clock rates. In our observation, proactively disabling the weakest components that limit the core frequency can still maintain a higher throughput at Near Threshold Voltage (NTV) supply if the cores with defected components are salvaged at a low cost. In this work, a resilience-aware frequency scaling and mapping strategy that considers defected processor states in scheduling is proposed to exploit the fault-tolerant architectures for higher energy efficiency. In our evaluation, it is witnessed that typical resilient multi-core processors can achieve significantly higher performance per watt in experiments compared to conventional scheduling policy. Download Paper (PDF; Only available from the DATE venue WiFi)
10:01	IP4-15, 324	(Best Paper Award Candidate) A LOW OVERHEAD ERROR CONFINEMENT METHOD BASED ON APPLICATION STATISTICAL CHARACTERISTICS Speaker: Anupam Chattopadhyay, Nanyang Technological University, SG Authors: Zheng Wang¹, Georgios Karakonstantis² and Anupam Chattopadhyay³ ¹RWTH-Aachen University, DE; ²Queen's University, GB; ³Nanyang Technological University, SG Abstract Reliability has emerged as a critical design constraint especially in memories. Designers have spent great efforts to guarantee fault free operation of the underlying silicon by adopting redundancy-based techniques, which essentially try to detect and correct every single error. However, such techniques come at a cost of large area, power and performance overheads which make many to doubt their efficiency especially for error resilient systems where 100% accuracy is not always required. In this paper, we present an alternative method focusing on the confinement of the resulting output error induced by any reliability issues. By focusing on memory faults, rather than correcting every single error the proposed method exploits the statistical characteristics of any target application and replaces any erroneous data with the best available estimate of that data. To realize the proposed method a RISC processor is augmented with custom instructions and special-purpose functional units. We apply the method on the proposed enhanced processor by studying the statistical characteristics of the various algorithms involved in a popular multimedia application. Our experimental results show that in contrast to state-of-the-art fault tolerance approaches, we are able to reduce runtime and area overhead by 71.3% and 83.3% respectively. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00		End of session Coffee Break in Exhibition Area

Time

Label

Presentation Title
Authors

08:30

9.7.1

THERMAL-AWARE DYNAMIC PAGE ALLOCATION POLICY BY FUTURE ACCESS PATTERNS FOR HYBRID MEMORY CUBE (HMC)
Speaker:
Wei Hen Lo, National Tsing Hua University, TW
Authors:
Wei-Hen Lo, Kai-zen Liang and TingTing Hwang, National Tsing Hua University, TW
Abstract
The Hybrid Memory Cube (HMC) is a promising solution to overcome memory wall by stacking DRAM chips on top of a logic die and connecting them with dense and fast Through Silicon Vias (TSVs). However, 3D stacking technique brings another problem: high temperature and temperature variations between the DRAM dies. The thermal problem may lead to chip failure of 3D stacked DRAMs since the temperature may exceed the maximum operating temperature. Dynamic thermal management (DTM) scheme such as bandwidth throttling can effectively decrease the temperature. However, it results in the loss of the performance. To maximize the performance of the system with HMC, the appropriate memory mapping should consider the thermal characteristics of HMC, memory interference and bandwidth variations among processes, and current temperature conditions of each memory channel. This paper proposes a thermal-aware dynamic OS page allocation using future access pattern to find a best performance-oriented setting of the above factors. An analytical model has been proposed to estimate the system performance considering the memory interference, the bandwidth variation, and the throttling impact. Our method can improve the system performance by 12.7% compared to best performance-oriented allocation method (MCP) [1]. The average error rate of our analytical model to predict the trend of performance variations is only 0.86%.
Download Paper (PDF; Only available from the DATE venue WiFi)

09:00

9.7.2

MINIMIZING PEAKTEMPERATURE FOR PIPELINED HARD REAL-TIME SYSTEMS
Speaker:
Long Cheng, Technische Universität München (TUM), DE
Authors:
Long Cheng¹, Kai Huang², Gang Chen¹, Biao Hu¹ and Alois Knoll¹
¹Technische Universität München (TUM), DE; ²Sun Yat-sen University, CN
Abstract
This paper addresses the problem of minimizing the peak temperature for pipelined multi-core systems under hard end-to-end deadline constraints by adversely using the Pay-Burst-Only-Once principle. The Periodic Thermal Management is adopted to control the temperature and every core is periodically switched between two power modes. With the peak temperature representation, we first formulate the problem of finding the thermal optimal periodic schemes which satisfies deadline constraints and then present a fast heuristic algorithm to solve it. Adopting real life processor platforms and applications, our simulation demonstrates that our approach reduces the peak temperature by up to 15 celsius on the 4-stage arm platform compared to sub-deadline partition approach. Moreover, our algorithm is shown to be scalable w.r.t. the number of pipelined stages and its effectiveness is validated by the brutally searching approach.
Download Paper (PDF; Only available from the DATE venue WiFi)

09:30

9.7.3

THERMAL AWARE SCHEDULING AND MAPPING OF MULTIPHASE APPLICATIONS ONTO CHIP MULTIPROCESSOR
Speaker and Author:
Aryabartta Sahu, IIT Guwahati, IN
Abstract
Thermal hot spot and high temperature gradient degrades the reliability and performance of chip multiprocessor. This is an important issue in the current days high transistor density chip multiprocessor. In this paper, we explored the benefits of different temperature aware scheduling and mapping approaches of applications onto chip multiprocessor to reduce the peak temperature. As most application's run time exhibit phase wise behavior, we have exploited the run time phase wise power consumption behavior of the applications to schedule and map the applications on to multicore chip to reduce peak temperature. We have evaluated five scheduling approaches (critical path, modified critical path, energy capped critical path, naive load balancing, and task partitioning and scheduling (TPS)) and five mapping approaches (random, greedy, row-col, checker board and boundary fix checker board) for both synthetic data and real benchmarks on assumed $8 imes 8$ chip multiprocessor. We have taken benefit of both (a) optimal scheduling of tree or chain of unit time tasks on multiprocessor using critical path heuristics and (b) phase wise behavior of applications. Result shows that greedy based mapping approach perform badly as compared to simple low overhead (without incurring extra cost of temperature sensing or prediction) location exchange based approaches when the effect of temperature of neighbor processors is significant. Boundary fix checker board mapping approach achieves up to 40\% reduction in peak temperature as compared to costly greedy mapping approach. Also our results shows critical path based scheduling in combination with location based mapping can reduce peak temperature of chip significantly without much increasing the execution time in executing phase wise applications on chip multiprocessor.
Download Paper (PDF; Only available from the DATE venue WiFi)

10:00

IP4-14, 207

FREQUENCY SCHEDULING FOR RESILIENT CHIP MULTI-PROCESSORS OPERATING AT NEAR THRESHOLD VOLTAGE
Speaker:
Huawei Li, Chinese Academy of Sciences, CN
Authors:
Ying Wang, Huawei Li and Xiaowei Li, Chinese Academy of Sciences, CN
Abstract
With the recently proposed redundancy-based core salvaging technology, resilient processors can survive the threat of severe timing violation induced by near-threshold Vdd and function correctly at aggressive clock rates. In our observation, proactively disabling the weakest components that limit the core frequency can still maintain a higher throughput at Near Threshold Voltage (NTV) supply if the cores with defected components are salvaged at a low cost. In this work, a resilience-aware frequency scaling and mapping strategy that considers defected processor states in scheduling is proposed to exploit the fault-tolerant architectures for higher energy efficiency. In our evaluation, it is witnessed that typical resilient multi-core processors can achieve significantly higher performance per watt in experiments compared to conventional scheduling policy.
Download Paper (PDF; Only available from the DATE venue WiFi)

10:01

IP4-15, 324

(Best Paper Award Candidate)
A LOW OVERHEAD ERROR CONFINEMENT METHOD BASED ON APPLICATION STATISTICAL CHARACTERISTICS
Speaker:
Anupam Chattopadhyay, Nanyang Technological University, SG
Authors:
Zheng Wang¹, Georgios Karakonstantis² and Anupam Chattopadhyay³
¹RWTH-Aachen University, DE; ²Queen's University, GB; ³Nanyang Technological University, SG
Abstract
Reliability has emerged as a critical design constraint especially in memories. Designers have spent great efforts to guarantee fault free operation of the underlying silicon by adopting redundancy-based techniques, which essentially try to detect and correct every single error. However, such techniques come at a cost of large area, power and performance overheads which make many to doubt their efficiency especially for error resilient systems where 100% accuracy is not always required. In this paper, we present an alternative method focusing on the confinement of the resulting output error induced by any reliability issues. By focusing on memory faults, rather than correcting every single error the proposed method exploits the statistical characteristics of any target application and replaces any erroneous data with the best available estimate of that data. To realize the proposed method a RISC processor is augmented with custom instructions and special-purpose functional units. We apply the method on the proposed enhanced processor by studying the statistical characteristics of the various algorithms involved in a popular multimedia application. Our experimental results show that in contrast to state-of-the-art fault tolerance approaches, we are able to reduce runtime and area overhead by 71.3% and 83.3% respectively.
Download Paper (PDF; Only available from the DATE venue WiFi)

10:00

End of session
Coffee Break in Exhibition Area

Visit us at DATE 2016