10.3 Green Computing Systems

Date: Thursday 27 March 2014
Time: 11:00 - 12:30
Location / Room: Konferenz 1

Chair:
Ayse Coskun, Boston University, US

Co-Chair:
Martino Ruggiero, University of Bologna, IT

This session discusses techniques to improve energy efficiency in large-scale computing systems, many-core systems, servers, and the cloud. The papers in this session particularly emphasize the practical experiences in academia and in industry.

Time	Label	Presentation Title Authors
11:00	10.3.1	GLOBAL FAN SPEED CONTROL CONSIDERING NON-IDEAL TEMPERATURE MEASUREMENTS IN ENTERPRISE SERVERS Speakers: Jungsoo Kim¹, Mohamed M. Sabry², David Atienza¹, Kalyan Vaidyanathan³ and Kenny Gross³ ¹EPFL, CH; ²ESL-EPFL, CH; ³Physical Sciences Research Center, Oracle, US Abstract Time lag and quantization in temperature sensors in enterprise servers lead to stability concerns on existing variable fan speed control schemes. Stability challenges become further aggravated when multiple local controllers are running together with the fan control scheme. In this paper, we present a global control scheme which tackles the concerns on the stability of enterprise servers while reducing the performance degradation caused by the variable fan speed control scheme. We first present a stable fan speed control scheme based on the Proportional-Integral-Derivative (PID) controller by adaptively adjusting the PID parameters according to the operating fan speed and eliminating the fan speed oscillation caused by temperature quantization. Then, we present a global control scheme which coordinates control actions among multiple local controllers. In addition, it guarantees the server stability while minimizing the overall performance degradation. We validated the proposed control scheme using a presently shipping commercial enterprise server. Our experimental results show that the proposed fan control scheme is stable under the non-ideal temperature measurement system (10 sec in time lag and 1C in quantization figures). Furthermore, the global control scheme enables to run multiple local controllers in a stable manner while reducing the performance degradation up to 19.2% compared to conventional coordination schemes with 19.1% savings in server power consumption.
11:30	10.3.2	UNVEILING EURORA - THERMAL AND POWER CHARACTERIZATION OF THE MOST ENERGY-EFFICIENT SUPERCOMPUTER IN THE WORLD Speakers: Andrea Bartolini¹, Matteo Cacciari¹, Carlo Cavazzoni², Giampietro Tecchiolli³ and Luca Benini⁴ ¹University of Bologna, IT; ²CINECA, IT; ³EUROTECH, IT; ⁴Università di Bologna, IT Abstract Eurora (EURopean many integrated cORe Architecture) is today the most energy efficient supercomputer in the world. Ranked 1st in the Green500 in July 2013, is a prototype built from Eurotech and Cineca toward next-generation Tier0 systems in the PRACE 2IP EU project. Eurora's outstanding energy-efficiency is achieved by adopting a direct liquid cooling solution and a heterogeneous architecture with best-in-class general purpose HW components (Intel Xeon E5, Intel Xeon Phi and NVIDIA Kepler K20). In this paper we present a novel, low-overhead monitoring infrastructure capable to track in detail and in real-time the thermal and power characteristics of Eurora's components with fine-grained resolution. Our experiments give insights on Eurora's thermal/power trade-offs and highlight opportunities for run-time power/thermal management and optimization.
12:00	10.3.3	CONTENTION AWARE FREQUENCY SCALING ON CMPS WITH GUARANTEED QUALITY OF SERVICE Speakers: Hao Shen and Qinru Qiu, Syracuse University, US Abstract Workload consolidation is usually performed in datacenters to improve server utilization for higher energy efficiency. One of the key issues related to workload consolidation is contention for shared resources such as last level cache, main memory, memory controller, etc. Dynamic voltage and frequency scaling (DVFS) of CPU is another effective technique that has widely been used to trade the performance for power reduction. We have found that the degree of resource contention of a system affects its performance sensitivity to CPU frequency. In this paper, we apply machine learning techniques to construct a model that quantifies runtime performance degradation caused by resource contention and frequency scaling. The inputs of our model are readings from Performance Monitoring Units (PMU) screened using standard feature selection technique. The model is tested on an SMT-enabled chip multi-processor and it reaches up to 90% accuracy. Experimental results show that, guided by the performance model, runtime power management techniques such as DVFS can achieve more accurate power and performance tradeoff without violating the quality of service (QoS) agreement. The QoS violation of the proposed system is significantly lower than systems that have no performance degradation information.
12:15	10.3.4	CONCURRENT PLACEMENT, CAPACITY PROVISIONING, AND REQUEST FLOW CONTROL FOR A DISTRIBUTED CLOUD INFRASTRUCTURE Speakers: Shuang Chen, Yanzhi Wang and Massoud Pedram, University of Southern California, US Abstract Cloud computing and storage have attracted a lot of attention due to the ever increasing demand for reliable and cost-effective access to vast resources and services available on the Internet. Cloud services are typically hosted in a Cloud computing and storage have attracted a lot of attention due to the ever increasing demand for reliable and cost-effective access to vast resources and services available on the Internet. Cloud services are typically hosted in a set of geographically distributed data centers, which we will call the cloud infrastructure. To minimize the total cost of ownership of this cloud infrastructure (which accounts for both the upfront capital cost and the operational cost of the infrastructure resources), the infrastructure owners/operators must do a careful planning of data center locations in the targeted service area (for example the US territories), data center capacity provisioning (i.e., the total CPU cycles per second that can be provided in each data center). In addition, they must have flow control policies that will distribute the incoming user requests to the available resources in the cloud infrastructure. This paper presents an approach for solving the unified problem of data center placement and provisioning, and request flow control in one shot. The solution technique is based on mathematical programming. Experimental results, using Google cluster data and placement/provisioning of up to eight data center sites demonstrate the cost savings of the proposed problem formulation and solution approach.
12:31	IP5-3, 664	COOLIP: SIMPLE YET EFFECTIVE JOB ALLOCATION FOR DISTRIBUTED THERMALLY-THROTTLED PROCESSORS Speakers: Pratyush Kumar, Hoeseok Yang, Iuliana Bacivarov and Lothar Thiele, ETH Zurich, CH Abstract Thermal constraints limit the time for which a processor can run at high frequency. Such thermal-throttling complicates the computation of response times of jobs. For multiple processors, a key decision is where to allocate the next job. For distributed thermally-throttled procesosrs, we present COOLIP with a simple allocation policy: a job is allocated to the earliest available processor, and if there are several available simultaneously, to the coolest one. For Poisson distribution of inter-arrival times and Gaussian distribution of execution demand of jobs, COOLIP matches the 95-percentile response time of Earliest Finish-Time (EFT) policy which minimizes response time with full knowledge of execution demand of unfinished jobs and thermal models of processors. We argue that COOLIP performs well because it directs the processors into states such that a defined sufficient condition of optimality holds.
12:33	IP5-4, 942	ENERGY OPTIMIZATION IN 3D MPSOCS WITH WIDE-I/O DRAM USING TEMPERATURE VARIATION AWARE BANK-WISE REFRESH Speakers: Mohammadsadegh Sadri¹, Matthias Jung², Christian Weis², Norbert Wehn² and Luca Benini¹ ¹Department of Electrical, Electronic and Information Engineering (DEI) University of Bologna, IT; ²Microelectronic Systems Design Research Group, University of Kaiserslautern, DE Abstract Heterogeneous 3D integrated systems with Wide-I/O DRAMs are a promising solution to squeeze more functionality and storage bits into an ever decreasing volume. Unfortunately, with 3D stacking, the challenges of high power densities and thermal dissipation are exacerbated. We improve DRAM refresh power by considering the lateral and vertical temperature variations in the 3D structure and adapting the per-DRAM-bank refresh period accordingly. In order to provide proof of our concepts we develop an advanced virtual platform which models the performance, power, and thermal behavior of a 3D-integrated MPSoC with Wide-I/O DRAMs in detail. On this platform we run the Android OS with real-world benchmarks to quantify the advantages of our ideas. We show improvements of 16% in DRAM refresh power due to temperature variation aware bank-wise refresh. Furthermore, two solutions are investigated to speedup system simulations: (1) Adaptive tuning of sampling intervals based on the estimated chip thermal profile, which results in speedups of 2X. (2) Hardware acceleration of thermal simulations using the Maxeler engine, which shows possible speedups of 12X.
12:30		End of session Lunch Break in Exhibition Area Sandwich lunch

< Return to last page

Submissions

10.3 Green Computing Systems