10.3 Green Computing Systems

Printer-friendly version PDF version

Date: Thursday 27 March 2014
Time: 11:00 - 12:30
Location / Room: Konferenz 1

Chair:
Ayse Coskun, Boston University, US

Co-Chair:
Martino Ruggiero, University of Bologna, IT

This session discusses techniques to improve energy efficiency in large-scale computing systems, many-core systems, servers, and the cloud. The papers in this session particularly emphasize the practical experiences in academia and in industry.

TimeLabelPresentation Title
Authors
11:0010.3.1GLOBAL FAN SPEED CONTROL CONSIDERING NON-IDEAL TEMPERATURE MEASUREMENTS IN ENTERPRISE SERVERS
Speakers:
Jungsoo Kim1, Mohamed M. Sabry2, David Atienza1, Kalyan Vaidyanathan3 and Kenny Gross3
1EPFL, CH; 2ESL-EPFL, CH; 3Physical Sciences Research Center, Oracle, US
Abstract
Time lag and quantization in temperature sensors in enterprise servers lead to stability concerns on existing variable fan speed control schemes. Stability challenges become further aggravated when multiple local controllers are running together with the fan control scheme. In this paper, we present a global control scheme which tackles the concerns on the stability of enterprise servers while reducing the performance degradation caused by the variable fan speed control scheme. We first present a stable fan speed control scheme based on the Proportional-Integral-Derivative (PID) controller by adaptively adjusting the PID parameters according to the operating fan speed and eliminating the fan speed oscillation caused by temperature quantization. Then, we present a global control scheme which coordinates control actions among multiple local controllers. In addition, it guarantees the server stability while minimizing the overall performance degradation. We validated the proposed control scheme using a presently shipping commercial enterprise server. Our experimental results show that the proposed fan control scheme is stable under the non-ideal temperature measurement system (10 sec in time lag and 1C in quantization figures). Furthermore, the global control scheme enables to run multiple local controllers in a stable manner while reducing the performance degradation up to 19.2% compared to conventional coordination schemes with 19.1% savings in server power consumption.
11:3010.3.2UNVEILING EURORA - THERMAL AND POWER CHARACTERIZATION OF THE MOST ENERGY-EFFICIENT SUPERCOMPUTER IN THE WORLD
Speakers:
Andrea Bartolini1, Matteo Cacciari1, Carlo Cavazzoni2, Giampietro Tecchiolli3 and Luca Benini4
1University of Bologna, IT; 2CINECA, IT; 3EUROTECH, IT; 4Università di Bologna, IT
Abstract
Eurora (EURopean many integrated cORe Architecture) is today the most energy efficient supercomputer in the world. Ranked 1st in the Green500 in July 2013, is a prototype built from Eurotech and Cineca toward next-generation Tier0 systems in the PRACE 2IP EU project. Eurora's outstanding energy-efficiency is achieved by adopting a direct liquid cooling solution and a heterogeneous architecture with best-in-class general purpose HW components (Intel Xeon E5, Intel Xeon Phi and NVIDIA Kepler K20). In this paper we present a novel, low-overhead monitoring infrastructure capable to track in detail and in real-time the thermal and power characteristics of Eurora's components with fine-grained resolution. Our experiments give insights on Eurora's thermal/power trade-offs and highlight opportunities for run-time power/thermal management and optimization.
12:0010.3.3CONTENTION AWARE FREQUENCY SCALING ON CMPS WITH GUARANTEED QUALITY OF SERVICE
Speakers:
Hao Shen and Qinru Qiu, Syracuse University, US
Abstract
Workload consolidation is usually performed in datacenters to improve server utilization for higher energy efficiency. One of the key issues related to workload consolidation is contention for shared resources such as last level cache, main memory, memory controller, etc. Dynamic voltage and frequency scaling (DVFS) of CPU is another effective technique that has widely been used to trade the performance for power reduction. We have found that the degree of resource contention of a system affects its performance sensitivity to CPU frequency. In this paper, we apply machine learning techniques to construct a model that quantifies runtime performance degradation caused by resource contention and frequency scaling. The inputs of our model are readings from Performance Monitoring Units (PMU) screened using standard feature selection technique. The model is tested on an SMT-enabled chip multi-processor and it reaches up to 90% accuracy. Experimental results show that, guided by the performance model, runtime power management techniques such as DVFS can achieve more accurate power and performance tradeoff without violating the quality of service (QoS) agreement. The QoS violation of the proposed system is significantly lower than systems that have no performance degradation information.
12:1510.3.4CONCURRENT PLACEMENT, CAPACITY PROVISIONING, AND REQUEST FLOW CONTROL FOR A DISTRIBUTED CLOUD INFRASTRUCTURE
Speakers:
Shuang Chen, Yanzhi Wang and Massoud Pedram, University of Southern California, US
Abstract
Cloud computing and storage have attracted a lot of attention due to the ever increasing demand for reliable and cost-effective access to vast resources and services available on the Internet. Cloud services are typically hosted in a Cloud computing and storage have attracted a lot of attention due to the ever increasing demand for reliable and cost-effective access to vast resources and services available on the Internet. Cloud services are typically hosted in a set of geographically distributed data centers, which we will call the cloud infrastructure. To minimize the total cost of ownership of this cloud infrastructure (which accounts for both the upfront capital cost and the operational cost of the infrastructure resources), the infrastructure owners/operators must do a careful planning of data center locations in the targeted service area (for example the US territories), data center capacity provisioning (i.e., the total CPU cycles per second that can be provided in each data center). In addition, they must have flow control policies that will distribute the incoming user requests to the available resources in the cloud infrastructure. This paper presents an approach for solving the unified problem of data center placement and provisioning, and request flow control in one shot. The solution technique is based on mathematical programming. Experimental results, using Google cluster data and placement/provisioning of up to eight data center sites demonstrate the cost savings of the proposed problem formulation and solution approach.
12:31IP5-3, 664COOLIP: SIMPLE YET EFFECTIVE JOB ALLOCATION FOR DISTRIBUTED THERMALLY-THROTTLED PROCESSORS
Speakers:
Pratyush Kumar, Hoeseok Yang, Iuliana Bacivarov and Lothar Thiele, ETH Zurich, CH
Abstract
Thermal constraints limit the time for which a processor can run at high frequency. Such thermal-throttling complicates the computation of response times of jobs. For multiple processors, a key decision is where to allocate the next job. For distributed thermally-throttled procesosrs, we present COOLIP with a simple allocation policy: a job is allocated to the earliest available processor, and if there are several available simultaneously, to the coolest one. For Poisson distribution of inter-arrival times and Gaussian distribution of execution demand of jobs, COOLIP matches the 95-percentile response time of Earliest Finish-Time (EFT) policy which minimizes response time with full knowledge of execution demand of unfinished jobs and thermal models of processors. We argue that COOLIP performs well because it directs the processors into states such that a defined sufficient condition of optimality holds.
12:33IP5-4, 942ENERGY OPTIMIZATION IN 3D MPSOCS WITH WIDE-I/O DRAM USING TEMPERATURE VARIATION AWARE BANK-WISE REFRESH
Speakers:
Mohammadsadegh Sadri1, Matthias Jung2, Christian Weis2, Norbert Wehn2 and Luca Benini1
1Department of Electrical, Electronic and Information Engineering (DEI) University of Bologna, IT; 2Microelectronic Systems Design Research Group, University of Kaiserslautern, DE
Abstract
Heterogeneous 3D integrated systems with Wide-I/O DRAMs are a promising solution to squeeze more functionality and storage bits into an ever decreasing volume. Unfortunately, with 3D stacking, the challenges of high power densities and thermal dissipation are exacerbated. We improve DRAM refresh power by considering the lateral and vertical temperature variations in the 3D structure and adapting the per-DRAM-bank refresh period accordingly. In order to provide proof of our concepts we develop an advanced virtual platform which models the performance, power, and thermal behavior of a 3D-integrated MPSoC with Wide-I/O DRAMs in detail. On this platform we run the Android OS with real-world benchmarks to quantify the advantages of our ideas. We show improvements of 16% in DRAM refresh power due to temperature variation aware bank-wise refresh. Furthermore, two solutions are investigated to speedup system simulations: (1) Adaptive tuning of sampling intervals based on the estimated chip thermal profile, which results in speedups of 2X. (2) Hardware acceleration of thermal simulations using the Maxeler engine, which shows possible speedups of 12X.
12:30End of session
Lunch Break in Exhibition Area
Sandwich lunch