10.3 Design Experiences for Multimedia and Communication Applications

Printer-friendly version PDF version

Date: Thursday 17 March 2016
Time: 11:00 - 12:30
Location / Room: Konferenz 1

Chair:
Theocharis Theocharides, University of Cyprus, CY

Co-Chair:
Steffen Paul, University Bremen, DE

This session presents new design experiences for multimedia and communication applications. The first presentation demonstrates the feasibility having a heterogeneous system for speeding-up computation-intensive algorithms at an ultra-low-power sub-10 mW budget. Two contributions provide new ideas on approximate computing and its application in real design cases. Novel architectures related to channel decoding are presented in two papers. One paper demonstrates the feasibility of high-performance and high-quality depth and colour sensor fusion targeting mobile devices. The session also includes an approach for designing an integrated prototype for a portable telepresence robot.

TimeLabelPresentation Title
Authors
11:0010.3.1ENABLING THE HETEROGENEOUS ACCELERATOR MODEL ON ULTRA-LOW POWER MICROCONTROLLER PLATFORMS
Speaker:
Francesco Conti, Università di Bologna, IT
Authors:
Francesco Conti1, Daniele Palossi2, Andrea Marongiu1, Davide Rossi1 and Luca Benini1
1Università di Bologna, IT; 2ETH Zurich, CH
Abstract
The stringent power constraints of complex microcontroller based devices (e.g. smart sensors for the IoT) represent an obstacle to the introduction of sophisticated functionality. Programmable accelerators would be extremely beneficial to provide the flexibility and energy efficiency required by fast-evolving IoT applications; however, the integration complexity and sub-10mW power budgets have been considered insurmountable obstacles so far. In this paper we demonstrate the feasibility of coupling a low power microcontroller unit (MCU) with a heterogenous programmable accelerator for speeding-up computation-intensive algorithms at an ultra-low power (ULP) sub-10mW budget. Specifically, we develop a heterogeneous architecture coupling a Cortex-M series MCU with PULP, a programmable accelerator for ULP parallel computing. Complex functionality is enabled by the support for offloading parallel computational kernels from the MCU to the accelerator using the OpenMP programming model. We prototype this platform using a STM Nucleo board and a PULP FPGA emulator. We show that our methodology can deliver up to 60x gains in performance and energy efficiency on a diverse set of applications, opening the way for a new class of ULP heterogeneous architectures.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.3.2THERMAL OPTIMIZATION USING ADAPTIVE APPROXIMATE COMPUTING FOR VIDEO CODING
Speaker:
Muhammad Shafique, Karlsruhe Institute of Technology (KIT), DE
Authors:
Daniel Palomino1, Muhammad Shafique2, Altamiro Susin1 and Jörg Henkel2
1Universidade Federal do Rio Grande do Sul (UFRGS), BR; 2Karlsruhe Institute of Technology (KIT), DE
Abstract
This paper presents a thermal optimization technique that adaptively employs varying degree of approximations at both algorithm and data levels in order to reduce the temperature associated with the high efficiency video coding process while maintaining good quality results. The technique evaluates, at run-time, the regions of a video sequence, frame-by-frame, in terms of tolerance to imprecise computations. It adapts the amount of approximation errors based on the video sequence properties and application-specific knowledge. The proposed technique adaptively controls the strength of approximations (at both algorithm and data levels) depending upon the varying resilience properties of coding different regions with different texture/motion properties. Our content-driven approximate computing technique demonstrates the potential to improve the thermal profile of a chip. Experimental results show that our technique improves temperature profiles by reducing the on-chip temperature by about 10° C on average, while maintaining good quality results.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.3.3HIGH PERFORMANCE TIME-OF-FLIGHT AND COLOR SENSOR FUSION WITH IMAGE-GUIDED DEPTH SUPER RESOLUTION
Speaker:
Hannes Plank, Infineon Technologies Austria AG, AT
Authors:
Hannes Plank, Gerald Holweg, Thomas Herndl and Norbert Druml, Infineon Technologies Austria AG, AT
Abstract
In recent years, depth sensing systems have gained popularity and have begun to appear on the consumer market. Of these systems, PMD-based Time-of-Flight cameras are the smallest available and will soon be integrated into mobile devices such as smart phones and tablets. Like all other available depth sensing systems, PMD-based Time-of-Flight cameras do not produce perfect depth data. Because of the sensor's characteristics, the data is noisy and the resolution is limited. Fast movements cause motion artifacts, which are undefined depth values due to corrupted measurements. Combining the data of a Time-of-Flight and a color camera can compensate these flaws and vastly improve depth image quality. This work uses color edge information as a guide so the depth image is upscaled with resolution gain and lossless noise reduction. A novel depth upscaling method is introduced, combining the creation of high quality depth data with fast execution. A high end smart phone development board, a color, and a Time-of-Flight camera are used to create a sensor fusion prototype. The complete processing pipeline is efficiently implemented on the graphics processing unit in order to maximize performance. The prototype proves the feasibility of our proposed fusion method on mobile devices. The result is a system capable of fusing color and depth data at interactive frame rates. When there is depth information available for every color pixel, new possibilities in computer vision, augmented reality and computational photography arise. The evaluation shows, our sensor fusion solution provides depth images with upscaled resolution, increased sharpness, less noise, less motion artifacts, and achieves high frame rates at the same time; thus significantly outperforms state-of-the-art solutions.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:1510.3.4SATURATED MIN-SUM DECODING: AN "AFTERBURNER" FOR LDPC DECODER HARDWARE
Speaker:
Stefan Scholl, University of Kaiserslautern, DE
Authors:
Stefan Scholl, Philipp Schläfer and Norbert Wehn, University of Kaiserslautern, DE
Abstract
LDPC codes are usually decoded by iterative belief propagation. However especially for small block lengths conventional belief propagation exhibits significant losses in signal-tonoise ratio compared to maximum likelihood decoding. In this paper we propose the combination of a conventional min-sum decoder enhanced by an advanced decoding scheme, that acts as a kind of "afterburner" to improve the frame error rate. We present hardware architectures and implementation results for a 28nm ASIC technology. The new decoder has a slightly higher complexity, but provides a gain of up to 1.6 dB signalto- noise ratio over conventional belief propagation decoding for short block length. In addition, we show, that the new decoder implementation can decrease the amount of dark silicon.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP5-3, 196A DYNAMICALLY RECONFIGURABLE ECC DECODER ARCHITECTURE
Speaker:
Philippe Coussy, Universite Bretagne Sud / Lab-STICC, FR
Authors:
Awais Sani1, Philippe Coussy2 and Cyrille Chavet3
1Universite de Bretagne-Sud, FR; 2Universite de Bretagne-Sud / Lab-STICC, FR; 3Lab-STICC / Université de Bretagne Sud, FR
Abstract
Due to their impressive error correction performances, Error Correcting Codes (ECC) are now widely used in communication systems. In order to achieve high throughput requirements ECC decoders are based on parallel architectures, which results in a major issue: memory access conflicts. In this paper, we introduce a new class of ECC decoder architectures that dynamically reconfigures by executing on-chip a memory mapping approach. For that purpose, a dedicated algorithm taking into account network constraint is presented. A smart architecture based on a butterfly network and a reconfiguration unit is also proposed. Experimental results show that real-time reconfiguration at reasonable hardware cost is possible.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:31IP5-4, 530RESISTIVE BLOOM FILTERS: FROM APPROXIMATE MEMBERSHIP TO APPROXIMATE COMPUTING WITH BOUNDED ERRORS
Speaker:
Abbas Rahimi, University of California, Berkeley, US
Authors:
Vahideh Akhlaghi1, Abbas Rahimi2 and Rajesh K. Gupta1
1University of California, San Diego, US; 2University of California, Berkeley, US
Abstract
Approximate computing provides an opportunity for exploiting application characteristics to trade the accuracy for gains in energy efficiency. However, such opportunity must be able to bound the error that the system designer provides to the application developer. Space-efficient probabilistic data structure such as Bloom filter can provide one such means. Bloom filter supports approximate set membership queries with a tunable rate of false positives (i.e., errors) and no false negatives. We propose a resistive Bloom filter (ReBF) to approximate a function by tightly integrating it to a functional unit (FU) implementing the function. ReBF approximately mimics partial functionality of the FU by recalling its frequent input patterns for computational reuse. The accuracy of the target FU is guaranteed by bounding the ReBF error behavior at the design time. We further lower energy consumption of a FU by designing its ReBF using low-power memristor arrays. The experimental results show that function approximation using ReBF for five image processing kernels running on the AMD Southern Islands GPU yields on average 24.1% energy saving in 45 nm technology compared to the exact computation.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:32IP5-5, 353REAL-TIME SYSTEM-LEVEL IMPLEMENTATION OF A TELEPRESENCE ROBOT USING AN EMBEDDED GPU PLATFORM
Speaker:
Swathi Gurumani, Advanced Digital Sciences Center, SG
Authors:
Muhammad Teguh Satria1, Swathi Gurumani1, Wang Zheng2, Keng Peng Tee2, Augustine Koh1, Pan Yu2, Kyle Rupnow1 and Deming Chen3
1Advanced Digital Sciences Center, SG; 2Institute for Infocomm Research, SG; 3UIUC, US
Abstract
Real-time applications such as telepresence systems present an opportunity to use embedded GPUs for compute acceleration to meet platform goals. In this paper, we develop a prototype of a portable, standalone telepresence robot that performs real-time attention-directed control using an NVIDIA Jetson TK1 embedded platform. We perform platform-specific optimizations to improve thread occupancy, optimize computa- tion workload and improve accuracy of face detection on the embedded GPU and achieve real-time performance of 30 frames per second on the Jetson TK1 and an overall speedup of 10x compared to the ARM CPU version.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal + Saal 1
Keynote Lecture in "Saal 2" 13:30 - 14:00