6.6 Model-Based Design and Hardware/Software Interfaces

Printer-friendly version PDF version

Date: Wednesday 26 March 2014
Time: 11:00 - 12:30
Location / Room: Konferenz 4

Chair:
Wang Wang Yi, Uppsala University, SE

Co-Chair:
Wolfgang Nebel, OFFIS, DE

This sessions covers multiple abstraction in embedded system design. The first paper proposes a scalable approach to refinement checking of component-based systems using contracts and local refinement assertions. The second paper revisits the paradigm of using a set of communicating asynchronous components for implementation of synchronous models. The third paper presents a hardware scheduling support for OpenMP and the fourth paper proposes an object-aware translation layer for flash memories.

TimeLabelPresentation Title
Authors
11:006.6.1LIBRARY-BASED SCALABLE REFINEMENT CHECKING FOR CONTRACT-BASED DESIGN
Speakers:
Antonio Iannopollo, Pierluigi Nuzzo, Stavros Tripakis and Alberto Sangiovanni-Vincentelli, University of California, Berkeley, US
Abstract
Given a global specification contract and a system described by a composition of contracts, system verification reduces to checking that the composite contract refines the specification contract, i.e. that any implementation of the composite contract implements the specification contract and is able to operate in any environment admitted by it. Contracts are captured using high-level declarative languages, for example, linear temporal logic (LTL). In this case, refinement checking reduces to an LTL satisfiability checking problem, which can be very expensive to solve for large composite contracts. This paper proposes a scalable refinement checking approach that relies on a library of contracts and local refinement assertions. We propose an algorithm that, given such a library, breaks down the refinement checking problem into multiple successive refinement checks, each of smaller scale. We illustrate the benefits of the approach on an industrial case study of an aircraft electric power system, with up to two orders of magnitude improvement in terms of execution time.
11:306.6.2ISOCHRONOUS NETWORKS BY CONSTRUCTION
Speakers:
Yu Bai and Klaus Schneider, University of Kaiserslautern, DE
Abstract
While synchronous system models have many advantages over asynchronous models concerning verification and validation, many implementation platforms do not provide efficient means for synchronization. For this reason, we consider a design flow that starts with a synchronous system model that is then transformed into an asynchronous one for synthesis. In essence, it partitions the synchronous system into a set of asynchronous components that communicate with each other via FIFO buffers. Of course, the synthesized system still has to behave as the original synchronous model, i.e., for each variable exactly the same flow of data values must be observed and only the membership to synchronous reaction steps is no longer explicitly given. In this paper, we prove that this correctness guarantee is given provided that (1) each component knows which of the input values have to be used for the next reaction (endochrony), (2) each component is able to perform the reaction (constructiveness), and (3) components agree on the clocks of their shared variables (isochrony/clock-consistency).
11:456.6.3TIGHTLY-COUPLED HARDWARE SUPPORT TO DYNAMIC PARALLELISM ACCELERATION IN EMBEDDED SHARED MEMORY CLUSTERS
Speakers:
Paolo Burgio1, Giuseppe Tagliavini2, Francesco Conti2, Andrea Marongiu2 and Luca Benini3
1University of Bologna, Université de Bretagne-Sud, IT; 2University of Bologna, IT; 3Università di Bologna, IT
Abstract
Modern designs for embedded systems are increasingly embracing cluster-based architectures, where small sets of cores communicate through tightly-coupled shared memory banks and high-performance interconnections. At the same time, the complexity of modern applications requires new programming abstractions to exploit dynamic and/or irregular parallelism on such platforms. Supporting dynamic parallelism in systems which i) are resource-constrained and ii) run applications with small units of work calls for a runtime environment which has minimal overhead for the scheduling of parallel tasks. In this work, we study the major sources of overhead in the implementation of OpenMP dynamic loops, sections and tasks, and propose a hardware implementation of a generic Scheduling Engine (HWSE) which fits the semantics of the three constructs. The HWSE is designed as a tightly-coupled block to the PEs within a multi-core cluster, communicating through a shared-memory interface. This allows very fast programming and synchronization with the controlling PEs, fundamental to achieving fast dynamic scheduling, and ultimately to enable fine-grained parallelism. We prove the effectiveness of our solutions with real applications and synthetic benchmarks, using a cycle-accurate virtual platform.
12:006.6.4P-OFTL: AN OBJECT-BASED SEMANTIC-AWARE PARALLEL FLASH TRANSLATION LAYER
Speakers:
Wei Wang, Youyou Lu and Jiwu Shu, Tsinghua Univiersity, CN
Abstract
With increased density and decreased price, flash memory has been widely used in storage systems for its low latency and low power features. However, traditional storage systems are designed and excessively optimized for magnetic disks, and the potential of flash memory is not brought into full play in the form of Solid State Drives (SSDs). In this paper, we propose p-OFTL, an object-based semantic-aware parallel flash translation layer (FTL). p-OFTL removes the mapping table in the FTL and directly manages the flash memory in file objects, which enables optimization of data layout in the flash using object semantics. While the removing of the mapping table improves system performance, a challenge remains to exploit the internal parallelism when maintaining the continuity of logical addresses in each object, which is essential for efficient garbage collection. To address this challenge, p-OFTL statically remaps the addresses by shifting the bits in the addresses, which spreads writes to different internal parallel units without another mapping table. Also, p-OFTL employs a semantic-aware data grouping algorithm to group data pages by trading off the hot-cold clustering for the continuity of logical addresses, so as to reduce the page movement in garbage collection. Experiments show that p-OFTL improves system performance by 4.0% ˜ 10.3% and reduces garbage collection overhead by 15.1% ˜ 32.5% in semantic-aware data grouping compared to those in semantic-unaware data grouping algorithms.
12:30IP3-6, 148USING GUIDED LOCAL SEARCH FOR ADAPTIVE RESOURCE RESERVATION IN LARGE-SCALE EMBEDDED SYSTEMS
Speaker:
Timon ter Braak, University of Twente, NL
Abstract
To maintain a predictable execution environment, an embedded system must ensure that applications are, in advance, provided with sufficient resources to process tasks, exchange information and to control peripherals. The problem of assigning tasks to processing elements with limited resources, and routing communication channels through a capacitated interconnect is combined into an integer linear programming formulation. We describe a guided local search algorithm to solve this problem at run-time. This algorithm allows for a hybrid strategy where configurations computed at design-time may be used as references to lower the computational overhead at run-time. Computational experiments on a dataset with 100 tasks and 20 processing elements show the effectiveness of this algorithm compared to state-of-the-art solvers CPLEX and Gurobi. The guided local search algorithm finds an initial solution within 100 milliseconds, is competitive for small platforms, scales better with the size of the platform, and has lower memory usage (2-19%).
12:32IP3-7, 797(Best Paper Award Candidate)
ACCELERATING GRAPH COMPUTATION WITH RACETRACK MEMORY AND POINTER-ASSISTED GRAPH REPRESENTATION
Speakers:
Eunhyek Park1, Helen Li2, Sungjoo Yoo1 and Sunggu Lee1
1POSTECH, KR; 2Univ. of Pittsburgh, US
Abstract
The poor performance of NAND Flash memory, such as long access latency and large granularity access, is the major bottleneck of graph processing. This paper proposes an intelligent storage for graph processing which is based on fast and low cost racetrack memory and a pointer-assisted graph representation. Our experiments show that the proposed intelligent storage based on racetrack memory reduces total processing time of three representative graph computations by 40.2%~86.9% compared to the graph processing, GraphChi, which exploits sequential accesses based on normal NAND Flash memory-based SSD. Faster execution also reduces energy consumption by 39.6%~90.0%. The in-storage processing capability gives additional 10.5%~16.4% performance improvements and 12.0%~14.4% reduction of energy consumption.
12:30End of session
Lunch Break in Exhibition Area
Sandwich lunch