4.6 Code Generation and Optimization for Embedded Platforms

Date: Tuesday 25 March 2014
Time: 17:00 - 18:30
Location / Room: Konferenz 4

Chair:
Heiko Falk, Ulm University, DE

Co-Chair:
Florence Maraninchi, Grenoble IMP/VERIMAG, FR

This session covers the broad spectrum of topics in compilers, code optimization, and validation under consideration of today's embedded platforms. The first paper addresses the automated validation of binary translators. The second paper focusses on the on-device optimization of apps and system libraries of mobile platforms. The third paper deals with the code generation of Android image processing applications for heterogeneous GPU-based architectures. The session is rounded off by short presentations of work-in-progress ideas on model transformation, energy and wear-leveling optimization, and scheduling/register allocation.

Time	Label	Presentation Title Authors
17:00	4.6.1	EATBIT: EFFECTIVE AUTOMATED TEST FOR BINARY TRANSLATION WITH HIGH CODE COVERAGE Speakers: Hui Guo¹, Zhenjiang Wang¹, Chenggang Wu¹ and Ruining He² ¹Institute of Computing Technology, Chinese Academy of Sciences, CN; ²University of California, San Diego, US Abstract Binary translation makes it convenient to emulate one instruction set by another. Nowadays, it is growing in popularity in various applications, especially the embedded platforms. When it comes to the test of binary translators, traditional methodologies which still mainly rely on manual unit test is costly, labor intensive and often not adequate to test complicated algorithms in the translators. Some standard benchmark suites, like SPEC CPU2006, are compiled with different compilation options for further tests. However, the translation modules still have over 30% of their code unexecuted after such tests, according to our experimental results. Methodologies based on randomization can generate a vast variety of tests, thus improve the code coverage in the translation system. In this paper, we propose such an approach named EATBit. Test binaries are generated with randomly selected instructions and operands. The binaries and a large amount of input data are then refined to exclude invalid ones. Experimental results on a real binary translator demonstrate that EATBit can not only improve code coverage by over 20%, but also find some new bugs in the translator successfully.
17:30	4.6.2	ON-DEVICE OBJECTIVE-C APPLICATION OPTIMIZATION FRAMEWORK FOR HIGH-PERFORMANCE MOBILE PROCESSORS Speakers: Garo Bournoutian and Alex Orailoglu, University of California, San Diego, US Abstract Smartphones provide applications that are increasingly similar to those of interactive desktop programs, providing rich graphics and animations. To simplify the creation of these interactive applications, mobile operating systems employ high-level object-oriented programming languages and shared libraries to manipulate the device's peripherals and provide common user-interface frameworks. The presence of dynamic dispatch and polymorphism allows for robust and extensible application coding. Unfortunately, the presence of dynamic dispatch also introduces significant overheads during method calls, which directly impact execution time. Furthermore, since these applications rely heavily on shared libraries and helper routines, the quantity of these method calls is higher than those found in typical desktop-based programs. Optimizing these method calls centrally before consumers download the application onto a given phone is exacerbated due to the large diversity of hardware and operating system versions that the application could run on. This paper proposes a methodology to tailor a given Objective-C application and its associated device-specific shared library codebase using on-device post-compilation code optimization and transformation. In doing so, many polymorphic sites can be resolved statically, improving the overall application performance.
18:00	4.6.3	CODE GENERATION FOR EMBEDDED HETEROGENEOUS ARCHITECTURES ON ANDROID Speakers: Richard Membarth, Oliver Reiche, Frank Hannig and Jürgen Teich, University of Erlangen-Nuremberg, DE Abstract The success of Android is based on its unified Java programming model that allows to write platform-independent programs for a variety of different target platforms. However, this comes at the cost of performance. As a consequence, Google introduced APIs that allow to write native applications and to exploit multiple cores as well as embedded GPUs for compute-intensive parts. This paper proposes code generation techniques in order to target the Renderscript and Filterscript APIs. Renderscript harnesses multi-core CPUs and unified shader GPUs, while the more restricted Filterscript also supports GPUs with earlier shader models. Our techniques focus on image processing applications and allow to target these APIs and OpenCL from a common description. We further supersede memory transfers by sharing the same memory region among different processing elements on HSA platforms. As reference, we use an embedded platform hosting a multi-core ARM CPU and an ARM Mali GPU. We show that our generated source code is faster than native implementations in OpenCV as well as the pre-implemented script intrinsics provided by Google for acceleration on the embedded GPU.
18:30	IP2-5, 990	DESIGN OF SAFETY CRITICAL SYSTEMS BY REFINEMENT Speakers: Alex Iliasov¹, Arseniy Alekseyev², Danil Sokolov³ and Andrey Mokhov³ ¹Newcastle University, GB; ²Newcastle University, ZW; ³Newcastle University, BB Abstract An increasingly large number of safety-critical embedded systems rely on software to prevent and mitigate hazards occurring due to design errors and unexpected interactions of the system with its users and the environment. Implementing a safety instrumented function in the way advocated by the traditional software methods requires an intimate understanding and thorough validation of a complex ecosystem of programming languages, compilers, operating systems and hardware. We propose to consider an alternative where a system designer, for each individual problem, creates in a correct-by-construction manner both the design of a system and its compilation and execution infrastructure. This permits an uninterrupted chain of a formal correctness argument spanning from formalised requirements all the way to the gate-level characterisation of an execution environment. The past decade of advances in verification technology turned the mechanical verification of large-scale models into a reality while the pressure of certification makes the cost of a formally verified development routine increasingly acceptable. The proposed technique fits the Grand Challenge for Computer Research posed by Hoare in 2003, namely, development of a Verifying Compiler which not only mechanically translates a given program from one language to another but also verifies its correctness according to a formal specification. This allows meeting the most stringent software certification requirements such as SIL 4. We illustrate the idea with a small case-study developed using the Event-B modelling notation and tools.
18:31	IP2-6, 651	ENERGY OPTIMIZATION IN ANDROID APPLICATIONS THROUGH WAKELOCK PLACEMENT Speakers: Faisal Alam¹, Preeti Ranjan Panda¹, Nikhil Tripathi², Namita Sharma³ and Sanjiv Narayan² ¹IIT Delhi, IN; ²Calypto Design Systems, IN; ³Indian Institute of Technology Delhi, IN Abstract Energy efficiency is a critical factor in mobile systems, and a significant body of recent research efforts has focused on reducing the energy dissipation in mobile hardware and applications. The Android OS Power Manager provides programming interface routines called wakelocks for controlling the activation state of devices on a mobile system. An appropriate placement of wakelock acquire and release functions in the application can make a significant difference to the energy consumption. In this paper, we propose a data flow analysis based strategy for determining the placement of wakelock statements corresponding to the uses of devices in an application. Our experimental evaluation on a set of Android applications show significant (up to 32%) energy savings with the proposed optimization strategy.
18:32	IP2-7, 778	A WEAR-LEVELING-AWARE DYNAMIC STACK FOR PCM MEMORY IN EMBEDDED SYSTEMS Speakers: Qingan Li¹, Yanxiang He², Yong Chen², Chun Xue³, Nan Jiang² and Chao Xu² ¹Wuhan University & City University of Hong Kong, CN; ²Wuhan University, CN; ³City University of Hong Kong, CN Abstract Phase Change Memory (PCM) is a promising DRAM replacement in embedded systems due to its attractive characteristics such as extremely low leakage power, high storage density and good scalability. However, PCM's low endurance constrains its practical applications. In this paper, we propose a Wear Leveling aware dynamic stack to extend PCM's lifetime when it is adopted in embedded systems as main memory. Through a dynamic stack, the memory space is circularly allocated to stack objects, and thus an even usage of PCM memory is achieved. The experimental results show that the proposed method can significantly reduce the write variation on PCM cells and enhance the lifetime of PCM memory.
18:33	IP2-8, 1056	LIFETIME HOLES AWARE REGISTER ALLOCATION FOR CLUSTERED VLIW PROCESSORS Speakers: Xuemeng Zhang¹, Hui Wu², Haiyan Sun¹ and Jingling Xue³ ¹National University of Defense Technology, CN; ²The University of New South Wales, AU; ³UNSW, AU Abstract This paper presents an on-the-fly register allocator which dynamically detects and utilises lifetime holes for clustered VLIW processors. A lifetime hole is an interval in which a variable does not contain a valid value. A register holding a lifetime hole can be allocated to another variable whose live range fits in the lifetime hole, leading to more efficient utilisation of registers. We propose efficient techniques for dynamically utilising lifetime holes and incorporate these techniques into our on-the-fly register allocator. We have simulated our register allocator and a linear scan register allocator without considering lifetime holes by using the MediaBench II benchmark suite. Our simulation results show that our register allocator reduces the number of spills by 12.5%, 11.7%, 12.7%, for three different processor models, respectively.
18:30		End of session Exhibition Reception in Several serving points inside the Exhibition Area (Terrace Level) The Exhibition Reception will take place in the exhibition area (Terrace Level). All exhibitors are welcome to provide drinks and snacks for delegates and visitors.

< Return to last page

Submissions

4.6 Code Generation and Optimization for Embedded Platforms