8.6 Mapping and Scheduling for Many-Core Embedded Systems

Date: Wednesday 26 March 2014
Time: 17:00 - 18:30
Location / Room: Konferenz 4

Chair:
Marc Geilen, Eindhoven University of Technology, NL

Co-Chair:
Sébastien Le Beux, Ecole Centrale de Lyon, FR

This session discusses novel ideas for embedded software implementation on many-core architectures. The first presentation deals with an optimized implementation of a H265 video coding algorithm on many-core architectures. A run-time scheduling approach for GPGPU architectures for priority-based systems is presented in the second presentation. The third talk presents an efficient run-time resource manager heuristic for many-core architectures based on a Lagrangian relaxation technique.

Time	Label	Presentation Title Authors
17:00	8.6.1	SOFTWARE ARCHITECTURE OF HIGH EFFICIENCY VIDEO CODING FOR MANY-CORE SYSTEMS WITH POWER-EFFICIENT WORKLOAD BALANCING Speakers: Muhammad Usman Karim Khan, Muhammad Shafique and Jörg Henkel, Karlsruhe Institute of Technology (KIT), DE Abstract The High Efficiency Video Coding (HEVC) standard aims at providing ~50% better compression compared to its predecessor (H.264) at the cost of high computational complexity. To enable HEVC video encoding in real-time scenarios, special coding support for parallelization is provided in HEVC that can be exploited by many-core systems. In this work, we present a HEVC software architecture where a video frame is adaptively divided into independent video frame regions (i.e. so-called video tiles) which are processed concurrently on multiple cores. By balancing the workload of each video tile mapped to a particular core, the total power consumption of a system is reduced (through dynamically scaling the operating frequency) under a given frame-rate constraint. We also exploit user tolerance to further curtail the HEVC workload with insignificant video quality degradation. Experimental results illustrate that the proposed approach results in ~43% power savings on a many-core system.
17:30	8.6.2	GPU-EVR: RUN-TIME EVENT BASED REAL-TIME SCHEDULING FRAMEWORK ON GPGPU PLATFORM Speakers: Haeseung Lee¹ and Mohammad Abdullah Al Faruque² ¹University of California, Irvine, US; ²University of California Irvine, US Abstract GPU architecture has traditionally been used in graphics application because of its enormous computing capability. Moreover, GPU architecture has also been used for general purpose computing in these days. Most of the current scheduling frameworks that are developed to handle GPGPU workload operate sequentially. This is problematic since this sequential approach may not be scalable for real-time systems, which is a consequence of the approach's inability to support preemption. We propose a novel scheduling framework that provides real-time support for the GPGPU platform. In contrast to existing frameworks, our proposed framework considers both concurrent execution of applications on the GPU and mapping between streaming multiprocessors and thread blocks. By considering both concurrent execution and mapping, our framework is able to guarantee timing up to 6.4 times as many applications compared to TimeGraph and Global EDF. In addition, our experimental applications use up to 20% less power under our scheduling framework compared to TimeGraph and Global EDF.
18:00	8.6.3	MULTI-OBJECTIVE DISTRIBUTED RUN-TIME RESOURCE MANAGEMENT FOR MANY-CORES Speakers: Stefan Wildermann, Michael Glaß and Jürgen Teich, University of Erlangen-Nuremberg, DE Abstract Dynamic usage scenarios of many-core systems require sophisticated run-time resource management that can deal with multiple often conflicting application and system objectives. This paper proposes an approach based on non-linear programming techniques that is able to trade off between objectives while respecting targets regarding their values. We propose a distributed application embedding for dealing with soft system-wide constraints as well as a centralized one for strict constraints. The experiments show that both approaches may significantly outperform related heuristics.
18:30	IP4-7, 323	COMIK: A PREDICTABLE AND CYCLE-ACCURATELY COMPOSABLE REAL-TIME MICROKERNEL Speakers: Andrew Nelson¹, Ashkan Beyranvand Nejad¹, Anca Molnos², Martijn Koedam³ and Kees Goossens³ ¹TU Delft, NL; ²CEA Leti, FR; ³TU Eindhoven, NL Abstract The functionality of embedded systems is ever increasing. This has lead to mixed time-criticality systems, where applications with a variety of real-time requirements co-exist on the same platform and share resources. Due to inter-application interference, verifying the real-time requirements of such systems is generally non trivial. In this paper, we present the CoMik microkernel that provides temporally predictable and composable processor virtualisation. CoMik's virtual processors are cycle-accurately composable, i.e. their timing cannot affect the timing of co-existing virtual processors by even a single cycle. Real-time applications executing on dedicated virtual processors can therefore be verified and executed in isolation, simplifying the verification of mixed time-criticality systems. We demonstrate these properties through experimentation on an FPGA prototyped hardware platform.
18:31	IP4-8, 71	UTILIZATION-AWARE LOAD BALANCING FOR THE ENERGY EFFICIENT OPERATION ON THE BIG.LITTLE PROCESSOR Speakers: Myungsun Kim¹, Kibeom Kim², James Geraci¹ and Seongsoo Hong³ ¹Samsung Electronics, KR; ²SAMSUNG Electronics, KR; ³Seoul National University, KR Abstract ARM's big.LITTLE architecture introduces the opportunity to optimize power consumption by selecting the core type most suitable for a level of processing demand. To take advantage of this new axis of optimization, we introduce the processor utilization factor into the Linux kernel's load balancing algorithm after carefully analyzing the power management mechanism of the big.LITTLE processor's port of Linux and deriving its state diagram representation. Our mechanism improves the Linux kernel's ability to assign tasks to cores in an energy efficient manner without having to make it directly aware of the available core types. Our experiments with a real test bed show that our algorithm improves energy consumption over the standard Linux scheduler up to 11.35% with almost no corresponding reduction in performance.
18:32	IP4-9, 538	HEVCDTM: APPLICATION-DRIVEN DYNAMIC THERMAL MANAGEMENT FOR HIGH EFFICIENCY VIDEO CODING Speakers: Daniel Palomino¹, Muhammad Shafique², Hussam Amrouch², Altamiro Susin³ and Jörg Henkel² ¹Karlsruhe Institute of Technology (KIT), BR; ²Karlsruhe Institute of Technology (KIT), DE; ³Federal University of Rio Grande do Sul, BR Abstract This paper presents an application-driven algorithm for Dynamic Thermal Management (DTM) for the High Efficiency Video Coding (HEVC). For efficient design of such a DTM policy, we perform an offline thermal analysis of an HEVC encoder and demonstrate the impact of different video sequences and different coding configurations on the processor temperature. Our thermal analysis is leveraged to develop an efficient application-driven DTM policy that performs temperature-aware coding along with an application-driven control of DTM knobs (e.g., frequency scaling) in order to meet the temperature constraints while still providing high video quality (i.e. PSNR loss < 0.01dB). For accurate thermal analysis and evaluation, we deploy an infrared camera-based thermal measurement setup that, on the contrary to state-of-the-art setups, does not require adding any extra layer on top of the measured chip, thus allowing the camera to accurately capture the infrared emissions from the die.
18:33	IP4-10, 714	IMPROVING EFFICIENCY OF EXTENSIBLE PROCESSORS BY USING APPROXIMATE CUSTOM INSTRUCTIONS Speakers: Mehdi Kamal¹, Amin Ghasem Azar¹, Ali Afzali-Kusha¹ and Massoud Pedram² ¹University of Tehran, IR; ²University of Southern California, US Abstract In this paper, we propose to move the conventional extensible processor design flow to the approximate computing domain to gain more speedup. In this domain, the instruction set architecture (ISA) design flow selects both exact and approximate custom instructions (CIs). The proposed approach could be used for the applications where imprecise results may be tolerated. In the CI identification phase of the flow, the CIs which do not satisfy the maximum propagation delay but can provide approximate results also may be included in the CI candidate set. Next, in the selection phase, we propose a merit function which selects CIs with higher cycle savings and small error rates. The efficacy of the proposed approximate design flow is investigated using the case studies of the discrete cosine transform (DCT) and inverse DCT (iDCT) of the MPEG2 application. Also, the impact of the process variation on the impreciseness of the results is investigated.
18:30		End of session
19:30		DATE Party in "Gläserne Manufaktur" of the Volkswagen AG The DATE Party is again scheduled on the second conference day, Wednesday, March 26, 2014, starting from 19:30 h. This year, it will take place in one of Dresden's most exciting and modern buildings, the "Gläserne Manufaktur" of the car manufacturer Volkswagen AG (www.glaesernemanufaktur.de/en/). The party will feature a flying buffet style dinner with various catering points and accompanying drinks. Light background music and the possibility of guided visits through the extraordinary premises will round off the evening. It provides a perfect opportunity to meet friends and colleagues in a relaxed atmosphere while enjoying local amenities. Please kindly note that it is no seated dinner. All delegates, exhibitors and their guests are encouraged to attend the party. Please be aware that entrance is only possible with a party ticket. Each full conference registration includes a ticket for the DATE Party. Additional tickets can be purchased on-site at the registration desk (subject to availability of tickets). Ticket price for the full Evening Social Programme: 75 € per person.

< Return to last page

Submissions

8.6 Mapping and Scheduling for Many-Core Embedded Systems