2.4 Temperature and Variability Driven Modeling and Runtime Management

Printer-friendly version PDF version

Date: Tuesday, March 26, 2019
Time: 11:30 - 13:00
Location / Room: Room 4

Chair:
Marco Marco Domenico Santambrogio, Polytechnic University of Milan, IT, Contact Marco Domenico Santambrogio

Co-Chair:
Ronald Ronald Dreslinski Jr, University of Michigan, US, Contact Ronald G. Dreslinski

Thermal modelling, hot spot prediction and optimization, and managing temperature variability during run-time are key questions to be answered during system design. This session consists of four regular papers and two IP papers that address these challenges using novel techniques ranging from manufacturing and hardware, all the way up to computational models. Considerations such as lithographic variations, cooling system design, run-time adaptivity are discussed in these papers.

TimeLabelPresentation Title
Authors
11:302.4.1HOT SPOT IDENTIFICATION AND SYSTEM PARAMETERIZED THERMAL MODELING FOR MULTI-CORE PROCESSORS THROUGH INFRARED THERMAL IMAGING
Speaker:
Sheldon Tan, University of California, Riverside, US
Authors:
Sheriff Sadiqbatcha1, Hengyang Zhao1, Hussam Amrouch2, Joerg Henkel2 and Sheldon Tan3
1University of California, Riverside, US; 2Karlsruhe Institute of Technology, DE; 3University of California at Riverside, US
Abstract
Accurate thermal models suitable for system level dynamic thermal, power and reliability regulation and management are vital for many commercial multi-core processors. However, developing such accurate thermal models and identifying the related thermal-power relevant spatial locations for commercial processors is a challenging task due to the lack of information and available tools. Existing tools such as HotSpot-like thermal models may suffer from inaccuracy or inefficiency for online applications, primarily because most rely on parameters that cannot be precisely quantified, such as power-traces, while others are numerical methods not suitable for runtime use. In this work, we propose a novel approach to automatically detecting the major heat-sources on a commercial multi-core microprocessor using an infrared thermal imaging setup. Our approach involves a number of steps including 2D discrete cosine transformation filter for noise reduction on the measured thermal maps, and Laplacian transformation followed by K-mean clustering for heat-source identification. Since the identified heat-sources are the thermally vulnerable areas of the die, we propose a novel approach to deriving a thermal model capable of predicting their temperatures during runtime. We apply Long-Short-Term-Memory (LSTM) networks to build a dynamic thermal model which uses system-level variables such as chip frequency, voltage and instruction count as inputs. The model is trained and tested exclusively using measured thermal data from a commercial multi-core processor. Experimental results show that the proposed thermal model achieves very high accuracy (root-mean-square-error: 2.04C to 2.57C) in predicting the temperature of all the identified heat-sources on the chip.
12:002.4.2LITHO-GPA: GAUSSIAN PROCESS ASSURANCE FOR LITHOGRAPHY HOTSPOT DETECTION
Speaker:
Wei Ye, University of Texas at Austin, US
Authors:
Wei Ye1, Mohamed Baker Alawieh1, Meng Li2, Yibo Lin1 and David Z. Pan1
1University of Texas at Austin, US; 2University of Texas, Austin, US
Abstract
Lithography hotspot detection is one of the fundamental steps in physical verification. Due to the increasingly complicated design patterns, early and quick feedback for lithography hotspots is desired to guide design closure in early stages. Machine learning approaches have been successfully applied to hotspot detection while demonstrating a remarkable capability of generalization to unseen hotspot patterns. However, most of the proposed machine learning approaches are not yet able to answer one critical question: how much a hotspot predicted from a trained model can be trusted? In this work, we present Litho-GPA, a lithography hotspot detection framework, with Gaussian Process assurance to provide confidence in each prediction. The framework also incorporates a data selection scheme with a sequence of weak classifiers to sample representative data and eventually reduce the amount of training data and lithography simulations needed. Experimental results demonstrate that our Litho-GPA is able to achieve the state-of-the-art accuracy while obtaining on average 28% reduction in false alarms.
12:302.4.3PINT: POLYNOMIAL IN TEMPERATURE DECODE WEIGHTS IN A NEUROMORPHIC ARCHITECTURE
Speaker:
Scott Reid, Stanford University, US
Authors:
Scott Reid, Antonio Montoya and Kwabena Boahen, Stanford University, US
Abstract
We present Polynomial in Temperature (PinT) decode weights, a novel approach to approximating functions with an ensemble of silicon neurons that increases thermal robustness. In mixed-signal neuromorphics, computing accurately across a wide range of temperatures is challenging because of individual silicon neurons' thermal sensitivity. To compensate for the resulting changes in the neuron's tuning-curves in the PinT framework, weights change continuously as a polynomial function of temperature. We validate PinT across a 38$degree$C range by applying it to tuning curves measured for ensembles of 64 to 1936 neurons on Braindrop, a mixed-signal neuromorphic chip fabricated in 28-nm FDSOI CMOS. LinT, the Linear in Temperature version of PinT, reduces error by a small margin on test data, relative to an ensemble with temperature-independent weights. LinT and higher-order models show much greater promise on training data, suggesting that performance can be further improved. When implemented on-chip, LinT's performance is very similar to the performance with temperature-independent decode weights. SpLinT and SpLSAT, the Sparse variants of LinT and LSAT, are promising avenues for efficiently reducing error. In the SpLSAT model, up to 90\% of neurons on chip can be deactivated while maintaining the same function-approximation error.
12:452.4.4ENHANCING TWO-PHASE COOLING EFFICIENCY THROUGH THERMAL-AWARE WORKLOAD MAPPING FOR POWER-HUNGRY SERVERS
Speaker:
Arman Iranfar, EPFL, CH
Authors:
Arman Iranfar1, Ali Pahlevan2, Marina Zapater1 and David Atienza3
1EPFL, CH; 2Embedded Systems Lab (ESL), Electrical Engineering Department, EPFL, CH; 3École Polytechnique Fédérale de Lausanne (EPFL), CH
Abstract
The power density and, consequently, power hungriness of server processors is growing by the day. Traditional air cooling systems fail to cope with such high heat densities, whereas single-phase liquid-cooling still requires high mass flow-rate, high pumping power, and large facility size. On the contrary, in a micro-scale gravity-driven thermosyphon attached on top of a processor, the refrigerant, absorbing the heat, turns into a two-phase mixture. The vapor-liquid mixture exchanges heat with a coolant at the condenser side, turns back to the liquid, and descends thanks to gravity, eliminating the need for pumping power. However, similar to other cooling technologies, thermosyphon efficiency can considerably vary with respect to workload performance requirements and thermal profile, in addition to the platform features, such as packaging and die floorplan. In this work, we first address the workload- and platform-aware design of a two-phase thermosyphon. Then, we propose a thermal-aware workload mapping strategy considering the potential and limitations of a two-phase thermosyphon to further minimize hot spots and spatial thermal gradients. Our experiments, performed on an 8-core Intel Xeon E5 CPU reveal, on average, up to $10^circ C$ reduction in thermal hot spots, and 45\% reduction in the maximum spatial thermal gradient on the die. Moreover, our design and mapping strategy are able to decrease the chiller cooling power at least 45\%.
13:00IP1-5, 711ADAPTIVE TRANSIENT LEAKAGE-AWARE LINEARISED MODEL FOR THERMAL ANALYSIS OF 3-D ICS
Speaker:
Milan Mihajlovic, University of Manchester, GB
Authors:
Chao Zhang1, Milan Mihajlovic1 and Vasilis Pavlidis2
1The University of Manchester, GB; 2University of Manchester, GB
Abstract
Physics-based models for thermal simulation that involve numerical solution of the heat equation are well placed to accurately capture the heterogeneity of materials and structures in modern 3-D integrated circuits (ICs). The introduction of non-linear effects in thermal coefficients and leakage power improves significantly the accuracy of thermal models. However, this non-linearity increases significantly the complexity and computational time of the analysis. In this paper, we introduce a linearised thermal model by demonstrating that weak temperature dependence of the specific heat and the thermal conductivity of silicon-based materials has only minor effect to computed temperature profiles. Thus, these parameters can be considered constant in working temperature ranges of modern ICs. The non-linearity in leakage power is approximated by a piecewise linear least square fit and the resulting model is linearised by exact Newton's method contrary to previous works that employ either simple iterative or inexact Newton's method. The method is implemented in the context of transient thermal analysis with adaptive time step selection, where we demonstrate that it is essential to apply Newton corrections to obtain the right time step size selection. The resulting method is up to 2x faster than a full non-linear method, typically introducing a global relative error of less than 1%.
13:01IP1-6, 363FASTCOOL: LEAKAGE AWARE DYNAMIC THERMAL MANAGEMENT OF 3D MEMORIES
Speaker:
Lokesh Siddhu, IIT Delhi, IN
Authors:
Lokesh Siddhu1 and Preeti Ranjan Panda2
1Indian Institute of Technology, Delhi, IN; 2IIT Delhi, IN
Abstract
3D memory systems offer several advantages in terms of area, bandwidth, and energy efficiency. However, thermal issues arising out of higher power densities have limited their widespread use. While prior works have looked at reducing dynamic power through reduced memory accesses, in these memories, both leakage and dynamic power consumption are comparable. Furthermore, as the temperature rises the leakage power increases, creating a thermal-leakage loop. We study the impact of leakage power on 3D memory temperature and propose turning OFF hot channels to meet thermal constraints. Data is migrated to a 2D memory before closing a 3D channel. We introduce an analytical model to assess the 2D memory delay and use the model to guide data migration decisions. Our experiments show that the proposed optimization improves performance by 27% on an average (up to 66%) over state-of-the-art strategies.
13:00End of session
Lunch Break in Lunch Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the ""Lunch Area"" to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in ""TBD"" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 27, 2019

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in ""TBD"" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 28, 2019

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in ""TBD"" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00