4.6 Code Generation and Optimization for Embedded Platforms

Printer-friendly version PDF version

Date: Tuesday 25 March 2014
Time: 17:00 - 18:30
Location / Room: Konferenz 4

Chair:
Heiko Falk, Ulm University, DE

Co-Chair:
Florence Maraninchi, Grenoble IMP/VERIMAG, FR

This session covers the broad spectrum of topics in compilers, code optimization, and validation under consideration of today's embedded platforms. The first paper addresses the automated validation of binary translators. The second paper focusses on the on-device optimization of apps and system libraries of mobile platforms. The third paper deals with the code generation of Android image processing applications for heterogeneous GPU-based architectures. The session is rounded off by short presentations of work-in-progress ideas on model transformation, energy and wear-leveling optimization, and scheduling/register allocation.

TimeLabelPresentation Title
Authors
17:004.6.1EATBIT: EFFECTIVE AUTOMATED TEST FOR BINARY TRANSLATION WITH HIGH CODE COVERAGE
Speakers:
Hui Guo1, Zhenjiang Wang1, Chenggang Wu1 and Ruining He2
1Institute of Computing Technology, Chinese Academy of Sciences, CN; 2University of California, San Diego, US
Abstract
Binary translation makes it convenient to emulate one instruction set by another. Nowadays, it is growing in popularity in various applications, especially the embedded platforms. When it comes to the test of binary translators, traditional methodologies which still mainly rely on manual unit test is costly, labor intensive and often not adequate to test complicated algorithms in the translators. Some standard benchmark suites, like SPEC CPU2006, are compiled with different compilation options for further tests. However, the translation modules still have over 30% of their code unexecuted after such tests, according to our experimental results. Methodologies based on randomization can generate a vast variety of tests, thus improve the code coverage in the translation system. In this paper, we propose such an approach named EATBit. Test binaries are generated with randomly selected instructions and operands. The binaries and a large amount of input data are then refined to exclude invalid ones. Experimental results on a real binary translator demonstrate that EATBit can not only improve code coverage by over 20%, but also find some new bugs in the translator successfully.
17:304.6.2ON-DEVICE OBJECTIVE-C APPLICATION OPTIMIZATION FRAMEWORK FOR HIGH-PERFORMANCE MOBILE PROCESSORS
Speakers:
Garo Bournoutian and Alex Orailoglu, University of California, San Diego, US
Abstract
Smartphones provide applications that are increasingly similar to those of interactive desktop programs, providing rich graphics and animations. To simplify the creation of these interactive applications, mobile operating systems employ high-level object-oriented programming languages and shared libraries to manipulate the device's peripherals and provide common user-interface frameworks. The presence of dynamic dispatch and polymorphism allows for robust and extensible application coding. Unfortunately, the presence of dynamic dispatch also introduces significant overheads during method calls, which directly impact execution time. Furthermore, since these applications rely heavily on shared libraries and helper routines, the quantity of these method calls is higher than those found in typical desktop-based programs. Optimizing these method calls centrally before consumers download the application onto a given phone is exacerbated due to the large diversity of hardware and operating system versions that the application could run on. This paper proposes a methodology to tailor a given Objective-C application and its associated device-specific shared library codebase using on-device post-compilation code optimization and transformation. In doing so, many polymorphic sites can be resolved statically, improving the overall application performance.
18:004.6.3CODE GENERATION FOR EMBEDDED HETEROGENEOUS ARCHITECTURES ON ANDROID
Speakers:
Richard Membarth, Oliver Reiche, Frank Hannig and Jürgen Teich, University of Erlangen-Nuremberg, DE
Abstract
The success of Android is based on its unified Java programming model that allows to write platform-independent programs for a variety of different target platforms. However, this comes at the cost of performance. As a consequence, Google introduced APIs that allow to write native applications and to exploit multiple cores as well as embedded GPUs for compute-intensive parts. This paper proposes code generation techniques in order to target the Renderscript and Filterscript APIs. Renderscript harnesses multi-core CPUs and unified shader GPUs, while the more restricted Filterscript also supports GPUs with earlier shader models. Our techniques focus on image processing applications and allow to target these APIs and OpenCL from a common description. We further supersede memory transfers by sharing the same memory region among different processing elements on HSA platforms. As reference, we use an embedded platform hosting a multi-core ARM CPU and an ARM Mali GPU. We show that our generated source code is faster than native implementations in OpenCV as well as the pre-implemented script intrinsics provided by Google for acceleration on the embedded GPU.
18:30IP2-5, 990DESIGN OF SAFETY CRITICAL SYSTEMS BY REFINEMENT
Speakers:
Alex Iliasov1, Arseniy Alekseyev2, Danil Sokolov3 and Andrey Mokhov3
1Newcastle University, GB; 2Newcastle University, ZW; 3Newcastle University, BB
Abstract
An increasingly large number of safety-critical embedded systems rely on software to prevent and mitigate hazards occurring due to design errors and unexpected interactions of the system with its users and the environment. Implementing a safety instrumented function in the way advocated by the traditional software methods requires an intimate understanding and thorough validation of a complex ecosystem of programming languages, compilers, operating systems and hardware. We propose to consider an alternative where a system designer, for each individual problem, creates in a correct-by-construction manner both the design of a system and its compilation and execution infrastructure. This permits an uninterrupted chain of a formal correctness argument spanning from formalised requirements all the way to the gate-level characterisation of an execution environment. The past decade of advances in verification technology turned the mechanical verification of large-scale models into a reality while the pressure of certification makes the cost of a formally verified development routine increasingly acceptable. The proposed technique fits the Grand Challenge for Computer Research posed by Hoare in 2003, namely, development of a Verifying Compiler which not only mechanically translates a given program from one language to another but also verifies its correctness according to a formal specification. This allows meeting the most stringent software certification requirements such as SIL 4. We illustrate the idea with a small case-study developed using the Event-B modelling notation and tools.
18:31IP2-6, 651ENERGY OPTIMIZATION IN ANDROID APPLICATIONS THROUGH WAKELOCK PLACEMENT
Speakers:
Faisal Alam1, Preeti Ranjan Panda1, Nikhil Tripathi2, Namita Sharma3 and Sanjiv Narayan2
1IIT Delhi, IN; 2Calypto Design Systems, IN; 3Indian Institute of Technology Delhi, IN
Abstract
Energy efficiency is a critical factor in mobile systems, and a significant body of recent research efforts has focused on reducing the energy dissipation in mobile hardware and applications. The Android OS Power Manager provides programming interface routines called wakelocks for controlling the activation state of devices on a mobile system. An appropriate placement of wakelock acquire and release functions in the application can make a significant difference to the energy consumption. In this paper, we propose a data flow analysis based strategy for determining the placement of wakelock statements corresponding to the uses of devices in an application. Our experimental evaluation on a set of Android applications show significant (up to 32%) energy savings with the proposed optimization strategy.
18:32IP2-7, 778A WEAR-LEVELING-AWARE DYNAMIC STACK FOR PCM MEMORY IN EMBEDDED SYSTEMS
Speakers:
Qingan Li1, Yanxiang He2, Yong Chen2, Chun Xue3, Nan Jiang2 and Chao Xu2
1Wuhan University & City University of Hong Kong, CN; 2Wuhan University, CN; 3City University of Hong Kong, CN
Abstract
Phase Change Memory (PCM) is a promising DRAM replacement in embedded systems due to its attractive characteristics such as extremely low leakage power, high storage density and good scalability. However, PCM's low endurance constrains its practical applications. In this paper, we propose a Wear Leveling aware dynamic stack to extend PCM's lifetime when it is adopted in embedded systems as main memory. Through a dynamic stack, the memory space is circularly allocated to stack objects, and thus an even usage of PCM memory is achieved. The experimental results show that the proposed method can significantly reduce the write variation on PCM cells and enhance the lifetime of PCM memory.
18:33IP2-8, 1056LIFETIME HOLES AWARE REGISTER ALLOCATION FOR CLUSTERED VLIW PROCESSORS
Speakers:
Xuemeng Zhang1, Hui Wu2, Haiyan Sun1 and Jingling Xue3
1National University of Defense Technology, CN; 2The University of New South Wales, AU; 3UNSW, AU
Abstract
This paper presents an on-the-fly register allocator which dynamically detects and utilises lifetime holes for clustered VLIW processors. A lifetime hole is an interval in which a variable does not contain a valid value. A register holding a lifetime hole can be allocated to another variable whose live range fits in the lifetime hole, leading to more efficient utilisation of registers. We propose efficient techniques for dynamically utilising lifetime holes and incorporate these techniques into our on-the-fly register allocator. We have simulated our register allocator and a linear scan register allocator without considering lifetime holes by using the MediaBench II benchmark suite. Our simulation results show that our register allocator reduces the number of spills by 12.5%, 11.7%, 12.7%, for three different processor models, respectively.
18:30End of session
Exhibition Reception in Several serving points inside the Exhibition Area (Terrace Level)
The Exhibition Reception will take place in the exhibition area (Terrace Level). All exhibitors are welcome to provide drinks and snacks for delegates and visitors.