5.4 Architectural-level Low-power Design

Printer-friendly version PDF version

Date: Wednesday 16 March 2016
Time: 08:30 - 10:00
Location / Room: Konferenz 2

Chair:
Alberto Macii, Politecnico di Torino, IT

Co-Chair:
Pascal Vivet, CEA LETI, FR

This session will demonstrate some new techniques to minimize power consumption at architectural level. The first paper will present a 2-story power distribution network applied to a GPU. The technique is extended from circuit to architecture level. The workload is evenly partitioned between the cores so that the power network is never unbalanced. The second paper demonstrate many different write-assist techniques on a 4T SRAM structure in a dual-Vt Fin-FET technology. Those techniques are efficiently evaluated and applied to the 4T structure. The third paper of this session will focus on reliability issues due to dark silicon in processors. A new physical-based EM reliability will be presented to come up with a Q-learning methods to minimize the overall power consumption. Finally, an IP presentation will present two algorithms to detect and remove redundant resets for all registers in the design in one pass, saving design effort for RTL designers. This technique is demonstrated on multiple process technologies showing the impact on power and area.

TimeLabelPresentation Title
Authors
08:305.4.1MULTI-STORY POWER DISTRIBUTION NETWORKS FOR GPUS
Speaker:
Mark Gottscho, UCLA, US
Authors:
Qixiang Zhang1, Liangzhen Lai2, Mark Gottscho3 and Puneet Gupta3
1Zhejiang University, CN; 2ARM/UCLA, US; 3UCLA, US
Abstract
High-performance chips require many power pins to support large currents, which increases fabrication cost, limits scalability, and degrades power efficiency. Multi-story serial power distribution networks (PDNs) are a promising approach to reducing pin counts and power losses. We study the feasibility of 2-story PDNs for graphics processing units (GPUs). These PDNs use either an auxiliary off-chip regulator or integrated on-die supercapacitors to stabilize the virtual rail voltage. Static SIMT thread scheduling (SSTS) and dynamic current compensation (DCC) can reduce transient impedance mismatch when the auxiliary regulator is omitted. Simulation results show that compared to a traditional 1-story design, our 2-story GPU architectures can reduce the required number of core power pins by up to 2X, power losses in the PDN by up to 3.6X, and/or maximum voltage swing by up to 2X without any performance degradation. Our results demonstrate the efficiency and cost advantages of multi-story PDNs for GPUs without any impact on performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:005.4.2ENERGY-EFFICIENT CACHE MEMORIES USING A DUAL-VT 4T SRAM CELL WITH READ-ASSIST TECHNIQUES
Speaker:
Massoud Pedram, University of Southern California, US
Authors:
Alireza Shafaei Bejestan and Massoud Pedram, University of Southern California, US
Abstract
In order to improve the energy-efficiency of cache memories, this paper presents a static random access memory (SRAM) cell composed of four transistors using dual-Vt FinFET devices. The proposed 4T SRAM cell is designed by (i) removing pull-down transistors of the standard 6T SRAM, and (ii) using low-leakage high-Vt devices for pull-up transistors and fast low-Vt devices for access transistors. This dual-Vt design simultaneously improves hold and write characteristics, but results in a destructive read operation. Accordingly, read-assist techniques are employed to ensure a non-destructive and robust read operation. A selective row address decoder is also proposed to prevent the undesired write operation in half-selected cells. The 4T SRAM cell compared with the all-single-fin 6T counterpart has a 25% smaller layout area with an aspect ratio closer to one. Furthermore, using 7nm FinFET devices with a nominal supply voltage of 0.45V, the 4T SRAM cell achieves 3.5X lower cell leakage power. Because of these features, the energy consumption of a 32KB L1 (256KB L2) cache memory using 4T SRAM cell compared with its 6T counterpart is reduced by 18% (2X), with 35% (19%) higher cache access frequency.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:305.4.3LEARNING-BASED DYNAMIC RELIABILITY MANAGEMENT FOR DARK SILICON PROCESSOR CONSIDERING EM EFFECTS
Speaker:
Sheldon X.-D. Tan, University of California, Riverside, US
Authors:
Taeyoung Kim1, Xin Huang1, Hai-Bao Chen2, Valeriy Sukharev3 and Sheldon X.-D. Tan1
1University of California, Riverside, US; 2Shanghai Jiao Tong University, CN; 3Mentor Graphics Corporation, US
Abstract
In this article, we propose a new dynamic reliability management (DRM) technique for emerging dark silicon manycore processors. We formulate our DRM problem as minimizing the energy consumption subject to the reliability, performance and thermal constraints. The new approach is based on a newly proposed physics-based electromigration (EM) reliability model to predict the EM reliability of full-chip power grid networks. We consider thermal design power (TDP) as the power constraint for a dark silicon manycore processor. We employ both dynamic voltage and frequency scaling (DVFS) and dark silicon core using ON/OFF pulsing action as the two control knobs. To solve the problem, we apply the adaptive Q-learning based method, which is suitable for runtime operation as it can provide cost-effective yet good solutions. A large class of multithreaded applications is used as the benchmark to validate and compare the proposed dynamic reliability management methods. Experimental results on a 64-core dark silicon chip show that the proposed DRM algorithm can effectively reduce the energy consumption of a dark silicon manycore system when the system is not tightly constrained. The proposed method can outperform a simple global DVFS method significantly in this case.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP2-8, 245SEQUENTIAL ANALYSIS DRIVEN RESET OPTIMIZATION TO IMPROVE POWER, AREA AND ROUTABILITY
Speaker:
Srihari Yechangunja, Mentor Graphics Corporation, IN
Authors:
Srihari Yechangunja1, Raj Shekhar1, Mohit Kumar1, Nikhil Tripathi1, Abhishek Ranjan1, Abhishek Mittal1, Jianfeng Liu2, Minyoung Mo2, Kyungtae Do2, Jung Yun Choi2 and SungHo Park2
1Mentor Graphics Corporation, IN; 2S.LSI, Samsung Electronics Co. Ltd, KR
Abstract
Resets are required in the design to initialize the hardware for system operation and to force it into a known state for simulation or to recover from an error. Given the increasing design complexity and time-to-market pressures, figuring out the registers which do not require resets is extremely challenging. In this paper, we present a novel algorithm which uses observability based sequential analysis to identify the registers in design which do not require resets. With the proposed algorithm, we have seen that in some cases 70% registers in the design can have redundant resets. Further, with removal of the redundant resets on registers up to 22% sequential power savings and up to 3% area reduction post-layout can be obtained.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area