8.3 Hot Topic: Managing Heterogeneous Computing Resources at Runtime

Date: Wednesday 16 March 2016
Time: 17:00 - 18:30
Location / Room: Konferenz 1

Organisers:
Christian Plessl, University of Paderborn, DE
David Andrews, University of Arkansas, US

Chair:
Daniel Ziener, Hamburg University of Technology, DE

Co-Chair:
José L. Ayala, Complutense University of Madrid, ES

Embedded systems have been using different, specialized computing resources for optimizing the performance, energy consumption and/or real-time constraints of critical application parts. In recent years, we could witness an increasing trend to heterogeneous computing ranging from embedded systems to high performance computing systems. Today, a wide variety of heterogeneous computing architectures are available as off-the-shelf components, such as, heterogeneous SoCs for embedded applications or PCIe-based accelerator cards with FPGAs, GPUs, or many-cores for HPC systems. Also, the programming models, languages and design environments for creating software or hardware configurations for the heterogeneous computing resources are also maturing and increasingly standardized, e.g., OpenCL, OpenACC, and OpenMP. In contrast, the software stack for effectively managing heterogeneous computing resources at runtime is however still largely undeveloped. Hence, the decision at what time and on which computing resource a particular function is executed is explicitly managed at the application level. The constrained view of the application makes it difficult to operate a system to meet global objectives, for example, mapping tasks to available heterogeneous resources such that the performance requirements of all applications are met while minimizing energy consumption. In this hot topic session we focus on run-time systems that strive for overcoming this application-centric view and enable an automated use of heterogeneous computing by dynamically mapping computations to different resources such that global goals are optimized.

Time	Label	Presentation Title Authors
17:00	8.3.1	RUN TIME INTERPRETATION FOR CREATING CUSTOM ACCELERATORS Speaker: David Andrews, University of Arkansas, US Authors: Sen Ma, Zeyad Aklah and David Andrews, University of Arkansas, US Abstract Despite the significant advancements that have been made in High Level Synthesis, the reconfigurable computing community has not yet managed to achieve a wide-spread use of Field Programmable Gate Arrays (FPGAs) by programmers. Existing barriers that prevent programmers from using FPGAs include the need to work within vendor specific CAD tools, knowledge of hardware programming models, and the requirement to pass each design through a very time consuming synthesis, place and route process. In this paper we present a new approach that takes these barriers out of the design flows for programmers. We move synthesis out of the programmers path by composing pre-synthesized building blocks using a domain-specific language that supports programming patterns tailored to FPGA accelerators. Our results show that the achieved performance of run time assembling accelerators is equivalent to synthesizing a custom block of hardware using automated HLS tools. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30	8.3.2	A SELF-ADAPTIVE APPROACH TO EFFICIENTLY MANAGE ENERGY AND PERFORMANCE IN TOMORROW'S HETEROGENEOUS COMPUTING SYSTEMS Speaker: Marco Domenico Santambrogio, Politecnico di Milano, IT Authors: Ettore Trainiti, Gianluca Durelli, Antonio Miele, Cristiana Bolchini and Marco Domenico Santambrogio, Politecnico di Milano, IT Abstract ICT adoption rate boomed during the last decades as well as the power consumption footprint that generates from those technologies. This footprint is expected to more than triple by 2020. Moreover, we are moving towards an on-demand computing scenario, characterized by varying workloads, constituted of diverse applications with different performance requirements, and criticality. A promising approach to address the challenges posed by this scenario is to better exploit specialized computing resources integrated in a heterogeneous system architecture (HSA) by taking advantage of their individual characteristics to optimize the performance/energy trade-off of the overall system. Better exploitation although comes with higher complexity. System architects need to take into account the efficiency of systems units, i.e. GPP(s) either alone or with a single family of accelerators (e.g., GPUs or FPGAs), as well as the applications workload, which often leads to inefficiency in their exploitation, and therefore in performance/energy. The work presented in this paper will address these limitations by exploiting self-adaptivity to allow the system to autonomously decide which specialized resource to exploit for a carbon footprint reduction, due to a more effective execution of the application, optimizing goals that the user can set (e.g., performance, energy, reliability). Download Paper (PDF; Only available from the DATE venue WiFi)
18:00	8.3.3	PERFORMANCE-CENTRIC SCHEDULING WITH TASK MIGRATION FOR A HETEROGENEOUS COMPUTE NODE IN THE DATA CENTER Speaker: Christian Plessl, Paderborn University, DE Authors: Achim Lösch, Tobias Beisel, Tobias Kenter, Christian Plessl and Marco Platzner, Paderborn University, DE Abstract The use of heterogeneous computing resources, such as Graphic Processing Units or other specialized coprocessors, has become widespread in recent years because of their performance and energy efficiency advantages. Approaches for managing and scheduling tasks to heterogeneous resources are still subject to research. Although queuing systems have recently been extended to support accelerator resources, a general solution that manages heterogeneous resources at the operating system-level to exploit a global view of the system state is still missing. In this paper we present a user space scheduler that enables task scheduling and migration on heterogeneous processing resources in Linux. Using run queues for available resources we perform scheduling decisions based on the system state and on task characterization from earlier measurements. With a programming pattern that supports the integration of checkpoints into applications, we preempt tasks and migrate them between three very different compute resources. Considering static and dynamic workload scenarios, we show that this approach can gain up to 17% performance, on average 7%, by effectively avoiding idle resources. We demonstrate that a work-conserving strategy without migration is no suitable alternative. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30		End of session