4.6 Managing Multi-Core and Flash Memory

Printer-friendly version PDF version

Date: Tuesday 15 March 2016
Time: 17:00 - 18:30
Location / Room: Konferenz 4

Chair:
Akash Kumar, Technische Universität Dresden, DE

Co-Chair:
Olivier Sentiyes, INRIA, FR

This session deals with methods to improve the management of multi- and many-core systems and flash memories. Various constraints and objectives are considered: real-time, process variation, fairness, power consumption and performance.

TimeLabelPresentation Title
Authors
17:004.6.1DISTRIBUTED FAIR SCHEDULING FOR MANY-CORES
Speaker:
Anuj Pathania, Karlsruhe Institute of Technology (KIT), DE
Authors:
Anuj Pathania1, Vanchinathan Venkataramani2, Muhammad Shafique1, Tulika Mitra2 and Jörg Henkel1
1Karlsruhe Institute of Technology (KIT), DE; 2National University of Singapore, SG
Abstract
Transition of embedded processors from multi-cores to many-cores continues unabated. Many-cores execute tens of tasks in parallel and in some contexts, it is crucial that the processing cores are distributed fairly amongst the tasks. Traditional queue-based centralized fair schedulers designed for multi-cores will have excessive overhead on many-cores due to the enlarged optimization search-space. Further, the processing requirements of executing tasks may vary under different phases of their execution necessitating lightweight dynamic fair schedulers to regularly perform partial reallocation of the cores. We introduce a distributed dynamic fair scheduler that can scale up with the increase in number of cores because it disburses the processing overhead of scheduling amongst all the cores. Based on observations made for task executions on many-cores, we propose an optimal solution under certain constraints for the fair scheduling problem, which in general is NP-Hard.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.6.2KEEP IT SLOW AND IN TIME: ONLINE DVFS WITH HARD REAL-TIME WORKLOADS
Speaker:
Kai Lampka, Uppsala University, SE
Authors:
Kai Lampka and Björn Forsberg, Uppsala University, SE
Abstract
To handle hot spots or power shortages, modern multicore processors are equipped with a supervisory dynamic thermal and power management (DTPM) system. When necessary, the DTPM system autonomously adapts the capacity of the cooling system or throttles the speed of core-local clocks via dynamic voltage and frequency scaling (DVFS) techniques. Opposed to best-effort scenarios, online DVFS with real-time workloads also needs to consider completion times of computations. Whereas execution times can be bounded adequately with worst-case estimates, arrival times of computation requests are potentially unknown. A deadline for completing a computation can easily be missed, if workloads suddenly peak and past clock speed assignments have built-up a non-negligible backlog of computations. To overcome this problem, we introduce an online DVFS management scheme which is history-aware. It operates a core at higher speed levels only if the future workload has the potential to result in timing violations, if not anticipated by rising clock speed assignments. We present an implementation of the scheme running on the Gem5 hardware simulator.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:004.6.3EXPLOITING PROCESS VARIATION FOR RETENTION INDUCED REFRESH MINIMIZATION ON FLASH MEMORY
Speaker:
Yejia Di, Chongqing University, CN
Authors:
Yejia Di1, Liang Shi1, Kaijie Wu1 and Chun Jason Xue2
1Chongqing University, CN; 2City University of Hong Kong, HK
Abstract
Solid state drives (SSDs) are becoming the default storage medium with the cost dropping of NAND flash memory. However, the cost dropping driven by the density improvement and technology scaling would bring in new challenges. One challenge is the overwhelmingly decreasing retention time. The duration of time for which the data written in flash memory cells can be read reliably is called retention time. To deal with the decreasing retention time, refresh has been highly recommended. However, refresh will seriously hurt the performance and lifetime, especially at the end life of flash memory. The second challenge is the process variation (PV). Significant PV has been observed in flash memory, which introduces large variations in the endurance of flash blocks. Blocks with high-endurance can provide long retention time, while the retention time is short for low-endurance blocks. Considering these two challenges, a novel refresh minimization scheme is proposed for lifetime and performance improvement. The main idea of the proposed approach is to allocate high endurance blocks to the data with long retention time requirement in priority. In this way, the refresh operations can be minimized. Implementation and analysis show that the overhead of the proposed work is negligible. Simulation results show that both the lifetime and performance are significantly improved over the state-of-the-art scheme.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP2-3, 253WORKLOAD-AWARE POWER OPTIMIZATION STRATEGY FOR ASYMMETRIC MULTIPROCESSORS
Speaker:
Emanuele Del Sozzo, Politecnico di Milano, IT
Authors:
Emanuele Del Sozzo, Gianluca Durelli, Ettore Trainiti, Antonio Miele, Marco Domenico Santambrogio and Cristiana Bolchini, Politecnico di Milano, IT
Abstract
Asymmetric multi-core architectures, such as the ARM big.LITTLE, are emerging as successful solutions for the embedded and mobile markets due to their capabilities to trade-off performance and power consumption. However, both the HMP scheduler integrated in the commercial products and the previous research approaches are not able to fully exploit such potentiality. We propose a new runtime resource management policy for the big.LITTLE architecture integrated in Linux aimed at optimizing the power consumption while fulfilling performance requirements specified for the running applications. Experimental results show an improvement of the 11% on the performance and at the same time 8% in peak power consumption w.r.t. the current Linux HMP solution.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:31IP2-4, 18(Best Paper Award Candidate)
THE SLOWDOWN OR RACE-TO-IDLE QUESTION: WORKLOAD-AWARE ENERGY OPTIMIZATION OF SMT MULTICORE PLATFORMS UNDER PROCESS VARIATION
Speaker:
Anup Das, University of Southampton, GB
Authors:
Anup Das, Geoff Merrett and Bashir Al-Hashimi, University of Southampton, GB
Abstract
Increasing use of high performance applications on multicore platforms has proliferated energy consumption, transforming this as a primary design optimization objective. Two widely used approaches for reducing energy consumption in multithreaded workloads are slowdown (using DVFS) and race-to-idle. In this paper, we first demonstrate that most energy efficient choice is dependent on (1) workload (memory bound, CPU bound etc.), (2) process variation and (3) support for Simultaneous Multithreading (SMT). We then propose an approach for mapping application threads on SMT multicore systems at runtime, to minimize energy consumption. The proposed approach interfaces with the operating system and hardware performance counters and timers to characterize application threads. This characterization captures the effect of process variation on execution time and identifies the break-even operating point, where one strategy (slowdown or race-to-idle) outperforms the other. Thread mapping is performed using these characterized data by iteratively collapsing application threads (SMT) followed by binary programming-based thread mapping. Finally, performance slack is exploited at run-time to select between slowdown and race-to-idle, based upon the break-even operating point calculated for each individual thread. This end-to-end approach is implemented as a run-time manager for the Linux operating system and is validated across a range of high performance applications. Results demonstrate up to 13% energy reduction over all state-of-the-art approaches, with an average of 18% improvement over Linux.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:32IP2-5, 165TOWARDS GENERAL PURPOSE COMPUTATIONS ON LOW-END MOBILE GPUS
Speaker:
Leonidas Kosmidis, Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES
Authors:
Matina Maria Trompouki1 and Leonidas Kosmidis2
1Universitat Politècnica de Catalunya, ES; 2Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES
Abstract
GPUs traditionally offer high computational capabilities, frequently higher than their CPU counterparts. While high-end mobile GPUs vendors introduced recently general purpose APIs, such as OpenCL, to leverage their computational power, the vast majority of the mobile devices lack such support. Despite that their graphics APIs have similarities with desktop graphics APIs, they have significant differences, which prevent the use of well-known techniques that offer general-purpose computations over such interfaces. In this paper we show how these obstacles can be overcome, in order to achieve general purpose programmability of these devices. As a proof of concept we implemented our proposal on a real embedded platform (Raspberry Pi) based on Broadcom's VideoCore IV GPU, obtaining a speedup of 7.2X over the CPU.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session