11.5 Memory Resource Allocation and Scheduling in MPSoC

Printer-friendly version PDF version

Date: Thursday 27 March 2014
Time: 14:00 - 15:30
Location / Room: Konferenz 3

Chair:
Andreas Herkersdorf, Technische Universität Munchen, DE

Co-Chair:
Donatella Sciuto, Politecnico di Milano, IT

Low-latency data access and efficient interprocess communication are critical to MPSoC performance and power efficiency. This session introduces innovative approaches for data placement, memory bandwidth allocation and scheduling techniques in MPSoC architectures with heterogeneous 2D/3D memory hierarchies.

TimeLabelPresentation Title
Authors
14:0011.5.1(Best Paper Award Candidate)
SCENARIO-AWARE DATA PLACEMENT AND MEMORY AREA ALLOCATION FOR MULTI-PROCESSOR SYSTEM-ON-CHIPS WITH RECONFIGURABLE 3D-STACKED SRAMS
Speakers:
Meng-Ling Tsai, Yi-Jung Chen, Yi-Ting Chen and Ru-Hua Chang, Department of Computer Science and Information Engineering, National Chi Nan University, TW
Abstract
Integrating Multi-Processor System-on-Chips (MPSoCs) with 3D-stacked reconfigurable SRAM tiles has been proposed for embedded systems with high memory demands. At runtime, the SRAM tiles are configured into several memory areas, which can be reconfigured according to the dynamic behavior of the system. Targeting this architecture, in this paper, we propose a data placement and memory area allocation algorithm. The goal of the proposed algorithm is to optimize the performance of the memory system by minimizing the on-chip memory access latency, the number of off-chip memory accesses, and the number of reconfigurations. Since the behavior of an embedded system can be described by a set of scenarios, where each scenario specifies a set of applications that would execute concurrently, the proposed algorithm synthesizes data placements and the memory area allocation for each scenario. Not only the data access patterns within the scenario but also among all scenarios are considered for data placement. We evaluate the proposed algorithm on a set of synthetic and real-world applications. The experimental results show that, compared to the existing data placement method designed for MPSoCs with distributed memory modules, the proposed algorithm achieves up to 11.72% of data access latency reduction.
14:3011.5.2OPTIMIZED BUFFER ALLOCATION IN MULTICORE PLATFORMS
Speakers:
Maximilian Odendahl1, Andres Goens1, Rainer Leupers1, Gerd Ascheid1, Benjamin Ries1, Berthold Vöcking1 and Tomas Henriksson2
1RWTH Aachen University, DE; 2Huawei Technologies, SE
Abstract
With the availability of advanced MPSoC and emerging Dynamic RAM (DRAM) interface technologies, an optimal allocation of logical data buffers to physical memory cannot be handled manually anymore due to the huge design space. An allocation does not only need to decide between an on- or off-chip memory, but also needs to take an increasing number of available memory channels, different bandwidth capacities and several routing possibilities into account. We formalize this problem and introduce a Mixed Integer Linear Programming (MILP) model based on two different optimization criteria. We implement the MILP model into a retargetable tool and present a case study with representative data of the Long-Term-Evolution (LTE) standard to show the real-life applicability of our approach.
15:0011.5.3MEMORY-CONSTRAINED STATIC RATE-OPTIMAL SCHEDULING OF SYNCHRONOUS DATAFLOW GRAPHS VIA RETIMING
Speakers:
Xue-Yang Zhu1, Marc Geilen2, Twan Basten3 and Sander Stuijk2
1State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, CN; 2Department of Electrical Engineering, Eindhoven University of Technology, NL; 3Department of Electrical Engineering, Eindhoven University of Technology. Embedded Systems Institute, NL
Abstract
Synchronous dataflow graphs (SDFGs) are widely used to model digital signal processing and streaming media applications. In this paper, we use retiming to optimize SDFGs to achieve a high throughput with low storage requirement. Using a memory constraint as an additional enabling condition, we define a memory constrained self-timed execution of an SDFG. Exploring the state-space generated by the execution, we can check whether a retiming exists that leads to a rate-optimal schedule under the memory constraint. Combining this with a binary search strategy, we present a heuristic method to find a proper retiming and a static scheduling which schedules the retimed SDFG with optimal rate (i.e., maximal throughput) and with as little storage space as possible. Our experiments are carried out on hundreds of synthetic SDFGs and several models of real applications. Differential synthetic graph results and real application results show that, in 79% of the tested models, our method leads to a retimed SDFG whose rate-optimal schedule requires less storage space than the proven minimal storage requirement of the original graph, and in 20% of the cases, the returned storage requirements equal the minimal ones. The average improvement is about 7.3%. The results also show that our method is computationally efficient.
15:1511.5.4A CONSTRAINT-BASED DESIGN SPACE EXPLORATION FRAMEWORK FOR REAL-TIME APPLICATIONS ON MPSOCS
Speakers:
Kathrin Rosvall and Ingo Sander, KTH Royal Institute of Technology, SE
Abstract
Design space exploration (DSE) is a critical step in the design process of real-time multiprocessor systems. Combining a formal base in form of SDF graphs with predictable platforms providing guaranteed QoS, the paper proposes a flexible and extendable DSE framework that can provide performance guarantees for multiple applications implemented on a shared platform. The DSE framework is formulated in a declarative style as interprocess communication-aware constraint programming (CP) model. Apart from mapping and scheduling of application graphs, the model supports design constraints on several cost and performance metrics, as e.g. memory consumption and achievable throughput. Using constraints with different compliance level, the framework introduces support for mixed criticality in the CP model. The potential of the approach is demonstrated by means of experiments using a Sobel filter, a SUSAN filter, a RASTA-PLP application and a JPEG encoder.
15:31IP5-15, 472RELIABILITY-AWARE MAPPING OPTIMIZATION OF MULTI-CORE SYSTEMS WITH MIXED-CRITICALITY
Speakers:
Shin-Haeng Kang1, Hoeseok Yang2, Sungchan Kim3, Iuliana Bacivarov2, Soonhoi Ha1 and Lothar Thiele4
1Seoul National University, KR; 2ETH Zurich, CH; 3Chonbuk National University, KR; 4Swiss Federal Institute of Technology Zurich, CH
Abstract
This paper presents a novel mapping optimization technique for mixed critical multi-core systems with different reliability requirements. For this scope, we derived a quantitative reliability metric and presented a scheduling analysis that certifies given mixed-criticality constraints. Our framework is capable of investigating re-execution, passive replication, and modular redundancy with optimized voter placement, while typical hardening approaches consider only one or two of these techniques. The proposed technique complies with existing safety standards and is power-efficient, as demonstrated by our experiments.
15:32IP5-16, 498(Best Paper Award Candidate)
FROM SIMULINK TO NOC-BASED MPSOC ON FPGA
Speakers:
Francesco Robino and Johnny Öberg, KTH Royal Institute of Technology, SE
Abstract
Network-on-chip (NoC) based multi-processor systems are promising candidates for future embedded system platforms. However, because of their complexity, new high level modeling techniques are needed to design, simulate and synthesize embedded systems targeting NoC-based MPSoC. Simulink is a popular modeling environment suitable to model at system level. However, there is no clear standard to synthesize Simulink models into SW and HW towards a NoC-based MPSoC implementation. In addition, many of the proposed solutions require large overhead in terms of SW components and memory requirements, resulting in complex and customized multi-processor platforms. In this paper we present a novel design flow to synthesize Simulink models onto a NoC-based MPSoC running on low-cost FPGAs. Our design flow constrains the MPSoC and the Simulink model to share a common semantics domain. This permits to reduce the need of resource consuming SW components, reducing the memory requirements on the platform. At the same time, performances (throughput) of dataflow applications can increase when the number of processors of the target platform is increased. This is shown through a case study on FPGA.
15:30End of session
Coffee Break in Exhibition Area
On Tuesday-Thursday the coffee and lunch breaks will be located in the Exhibition Area (Terrace Level).