11.3 Microarchitectures and Workload Allocation for Energy Efficiency

Printer-friendly version PDF version

Date: Thursday 17 March 2016
Time: 14:00 - 15:30
Location / Room: Konferenz 1

Chair:
Daniele Bortolotti, Univ. of Bologna, IT

Co-Chair:
Andreas Burg, École Polytechnique Fédérale de Lausanne (EPFL), CH

The session discusses novel power modeling, workload allocation, and microarchitectural techniques for improving energy efficiency in data centers and processors

TimeLabelPresentation Title
Authors
14:0011.3.1RESISTIVE CONFIGURABLE ASSOCIATIVE MEMORY FOR APPROXIMATE COMPUTING
Speaker:
Abbas Rahimi, University of California, Berkeley, US
Authors:
Mohsen Imani1, Abbas Rahimi2 and Tajana Rosing3
1UC San Diego, US; 2University of California, Berkeley, US; 3University of California, San Diego, US
Abstract
Modern computing machines are increasingly characterized by large scale parallelism in hardware (such as GP-GPUs) and advent of large scale and innovative memory blocks. Parallelism enables expanded performance tradeoffs whereas memories enable reuse of computational work. To be effective, however, one needs to ensure energy efficiency with minimal reuse overheads. In this paper, we describe a resistive configurable associative memory (ReCAM) that enables selective approximation and asymmetric voltage overscaling to manage delivered efficiency. The ReCAM structure matches an input pattern with pre-stored ones by applying an approximate search on selected bit indices (bitline-configurable) or selective pre-stored patterns (row-configurable). To further reduce energy, we explore proper ReCAM sizing, various configurable search operations with low overhead voltage overscaling, and different ReCAM update policies. Experimental result on the AMD Southern Islands GPUs for eight applications shows bitline-configurable and row-configurable ReCAM achieve on average to 43.6% and 44.5% energy savings with an acceptable quality loss of 10%.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.3.2EXPLOITING CPU-LOAD AND DATA CORRELATIONS IN MULTI-OBJECTIVE VM PLACEMENT FOR GEO-DISTRIBUTED DATA CENTERS
Speaker:
Ali Pahlevan, École Polytechnique Fédérale de Lausanne (EPFL), CH
Authors:
Ali Pahlevan, Pablo Garcia del Valle and David Atienza, École Polytechnique Fédérale de Lausanne (EPFL), CH
Abstract
Cloud computing has been proposed as a new paradigm to deliver services over the internet. The proliferation of cloud services and increasing users' demands for computing resources have led to the appearance of geo-distributed data centers (DCs). These DCs host heterogeneous applications with changing characteristics, like the CPU-load correlation, that provides significant potential for energy savings when the utilization peaks of two virtual machines (VMs) do not occur at the same time, or the amount of data exchanged between VMs, that directly impacts performance, i.e. response time. This paper presents a two-phase multi-objective VM placement, clustering and allocation algorithm, along with a dynamic migration technique, for geo-distributed DCs coupled with renewable and battery energy sources. It exploits the holistic knowledge of VMs characteristics, CPU-load and data correlations, to tackle the challenges of operational cost optimization and energy-performance trade-off. Experimental results demonstrate that the proposed method provides up to 55% operational cost savings, 15% energy consumption, and 12% performance (response time) improvements when compared to state-of-the-art schemes.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.3.3ENERGY EFFICIENCY IN CLOUD-BASED MAPREDUCE APPLICATIONS THROUGH BETTER PERFORMANCE ESTIMATION
Speaker:
Seyed Morteza Nabavinejad, Sharif University of Technology, IR
Authors:
Seyed Morteza Nabavinejad and Maziar Goudarzi, Sharif University of Technology, IR
Abstract
An important issue for efficient execution of MapReduce jobs on a cloud platform is selecting the best fitting virtual machine (VM) configuration(s) among the miscellany of choices that cloud providers offer. Wise selection of VM configurations can lead to better performance, cost and energy consumption. Therefore, it is crucial to explore the available configurations and choose the best one for each given MapReduce application. Executing the given application on all the configurations for comparison is a costly, time and energy consuming process. An alternative is to run the application on a subset of configurations (sample configurations) and estimate its performance on other configurations based on the obtained values on those sample configurations. We show that the choice of these sample configurations highly affects accuracy of later estimations. Our Smart Configuration Selection (SCS) scheme chooses better representatives from among all configurations by once-off analysis of given performance figures of the benchmarks so as to increase the accuracy of estimations of missing values, and consequently, to more accurately choose the configuration providing the highest performance. The results show that the SCS choice of sample configurations is very close to the best choice, and can reduce estimation error to 7.11% from the original 16.02% of random configuration selection. Furthermore, this more accurate performance estimation saves 24.3% energy on average.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:1511.3.4UNSUPERVISED POWER MODELING OF CO-ALLOCATED WORKLOADS FOR ENERGY EFFICIENCY IN DATA CENTERS
Speaker:
Juan Carlos Salinas-Hilburg, Universidad Politécnica de Madrid, ES
Authors:
Juan Carlos Salinas-Hilburg1, Marina Zapater2, José L. Risco-Martín3, Jose Manuel Moya1 and Jose L. Ayala3
1Universidad Politécnica de Madrid, ES; 2CEI Campus Moncloa, UCM-UPM, ES; 3Universidad Complutense de Madrid, ES
Abstract
Data centers are huge power consumers and their energy consumption keeps on rising despite the efforts to increase energy efficiency. A great body of research is devoted to the reduction of the computational power of these facilities, applying techniques such as power budgeting and power capping in servers. Such techniques rely on models to predict the power consumption of servers. However, estimating overall server power for arbitrary applications when running co-allocated in multithreaded servers is not a trivial task. In this paper, we use Grammatical Evolution techniques to predict the dynamic power of the CPU and memory subsystems of an enterprise server using the hardware counters of each application. On top of our dynamic power models, we use fan and temperature-dependent leakage power models to obtain the overall server power. To train and test our models we use real traces from a presently shipping enterprise server under a wide set of sequential and parallel workloads running at various frequencies We prove that our model is able to predict the power consumption of two different tasks co-allocated in the same server, keeping error below 8W. For the first time in literature, we develop a methodology able to combine the hardware counters of two individual applications, and estimate overall server power consumption without running the co-allocated application. Our results show a prediction error below 12W, which represents a 7.3% of the overall server power, outperforming previous approaches in the state of the art.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30IP5-10, 205A POWER-EFFICIENT 3-D ON-CHIP INTERCONNECT FOR MULTI-CORE ACCELERATORS WITH STACKED L2 CACHE
Speaker:
Kyungsu Kang, Samsung, KR
Authors:
Kyungsu Kang1, Luca Benini2, Giovanni De Micheli3, Sangho Park1 and Jong-Bae Lee1
1Samsung, KR; 2Università di Bologna, IT; 3École Polytechnique Fédérale de Lausanne (EPFL), CH
Abstract
The use of multi-core clusters is a promising option for data-intensive embedded applications such as multimodal sensor fusion, image understanding, mobile augmented reality. In this paper, we propose a power-efficient 3-D onchip interconnect for multi-core clusters with stacked L2 cache memory. A new switch design makes a circuit-switched Mesh-of-Tree (MoT) interconnect reconfigurable to support power-gating of processing cores, memory blocks, and unnecessary interconnect resources (routing switch, arbitration switch, inverters placed along the on-chip wires). The proposed 3-D MoT improves the power efficiency up to 77% in terms of energy-delay product (EDP).

Download Paper (PDF; Only available from the DATE venue WiFi)
15:31IP5-11, 898POWER-EFFICIENT LOAD-BALANCING ON HETEROGENEOUS COMPUTING PLATFORMS
Speaker:
Muhammad Shafique, Karlsruhe Institute of Technology (KIT), DE
Authors:
Muhammad Usman Karim Khan1, Muhammad Shafique1, Apratim Gupta2, Thomas Schumann2 and Jörg Henkel1
1Karlsruhe Institute of Technology (KIT), DE; 2University of Applied Sciences, Darmstadt, DE
Abstract
In order to address the throughput constraints of the system at minimal power consumption, the workload of computing nodes should be balanced. This requires accounting for the underlying hardware characteristics (e.g., power vs. frequency profiles) and throughput sustainable by these nodes. This work provides a workload distribution and balancing methodology of a divisible load under a throughput constraint, on heterogeneous nodes. The power efficiency of each node is considered during load distribution. For load balancing, the frequency of the node is determined which just fulfills the job requirements of the nodes. We functionally verify our methodology by implementing it on an FPGA-based system, with heterogeneous multi-cores and hardware accelerators, and report results for different image processing benchmarks. Compared to a state-of-the-art-approach, our approach results in up to 64% performance improvement for the benchmarks evaluated in this paper.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area