11.5 Memory Resource Allocation and Scheduling in MPSoC

Date: Thursday 27 March 2014
Time: 14:00 - 15:30
Location / Room: Konferenz 3

Chair:
Andreas Herkersdorf, Technische Universität Munchen, DE

Co-Chair:
Donatella Sciuto, Politecnico di Milano, IT

Low-latency data access and efficient interprocess communication are critical to MPSoC performance and power efficiency. This session introduces innovative approaches for data placement, memory bandwidth allocation and scheduling techniques in MPSoC architectures with heterogeneous 2D/3D memory hierarchies.

Time	Label	Presentation Title Authors
14:00	11.5.1	(Best Paper Award Candidate) SCENARIO-AWARE DATA PLACEMENT AND MEMORY AREA ALLOCATION FOR MULTI-PROCESSOR SYSTEM-ON-CHIPS WITH RECONFIGURABLE 3D-STACKED SRAMS Speakers: Meng-Ling Tsai, Yi-Jung Chen, Yi-Ting Chen and Ru-Hua Chang, Department of Computer Science and Information Engineering, National Chi Nan University, TW Abstract Integrating Multi-Processor System-on-Chips (MPSoCs) with 3D-stacked reconfigurable SRAM tiles has been proposed for embedded systems with high memory demands. At runtime, the SRAM tiles are configured into several memory areas, which can be reconfigured according to the dynamic behavior of the system. Targeting this architecture, in this paper, we propose a data placement and memory area allocation algorithm. The goal of the proposed algorithm is to optimize the performance of the memory system by minimizing the on-chip memory access latency, the number of off-chip memory accesses, and the number of reconfigurations. Since the behavior of an embedded system can be described by a set of scenarios, where each scenario specifies a set of applications that would execute concurrently, the proposed algorithm synthesizes data placements and the memory area allocation for each scenario. Not only the data access patterns within the scenario but also among all scenarios are considered for data placement. We evaluate the proposed algorithm on a set of synthetic and real-world applications. The experimental results show that, compared to the existing data placement method designed for MPSoCs with distributed memory modules, the proposed algorithm achieves up to 11.72% of data access latency reduction.
14:30	11.5.2	OPTIMIZED BUFFER ALLOCATION IN MULTICORE PLATFORMS Speakers: Maximilian Odendahl¹, Andres Goens¹, Rainer Leupers¹, Gerd Ascheid¹, Benjamin Ries¹, Berthold Vöcking¹ and Tomas Henriksson² ¹RWTH Aachen University, DE; ²Huawei Technologies, SE Abstract With the availability of advanced MPSoC and emerging Dynamic RAM (DRAM) interface technologies, an optimal allocation of logical data buffers to physical memory cannot be handled manually anymore due to the huge design space. An allocation does not only need to decide between an on- or off-chip memory, but also needs to take an increasing number of available memory channels, different bandwidth capacities and several routing possibilities into account. We formalize this problem and introduce a Mixed Integer Linear Programming (MILP) model based on two different optimization criteria. We implement the MILP model into a retargetable tool and present a case study with representative data of the Long-Term-Evolution (LTE) standard to show the real-life applicability of our approach.
15:00	11.5.3	MEMORY-CONSTRAINED STATIC RATE-OPTIMAL SCHEDULING OF SYNCHRONOUS DATAFLOW GRAPHS VIA RETIMING Speakers: Xue-Yang Zhu¹, Marc Geilen², Twan Basten³ and Sander Stuijk² ¹State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, CN; ²Department of Electrical Engineering, Eindhoven University of Technology, NL; ³Department of Electrical Engineering, Eindhoven University of Technology. Embedded Systems Institute, NL Abstract Synchronous dataflow graphs (SDFGs) are widely used to model digital signal processing and streaming media applications. In this paper, we use retiming to optimize SDFGs to achieve a high throughput with low storage requirement. Using a memory constraint as an additional enabling condition, we define a memory constrained self-timed execution of an SDFG. Exploring the state-space generated by the execution, we can check whether a retiming exists that leads to a rate-optimal schedule under the memory constraint. Combining this with a binary search strategy, we present a heuristic method to find a proper retiming and a static scheduling which schedules the retimed SDFG with optimal rate (i.e., maximal throughput) and with as little storage space as possible. Our experiments are carried out on hundreds of synthetic SDFGs and several models of real applications. Differential synthetic graph results and real application results show that, in 79% of the tested models, our method leads to a retimed SDFG whose rate-optimal schedule requires less storage space than the proven minimal storage requirement of the original graph, and in 20% of the cases, the returned storage requirements equal the minimal ones. The average improvement is about 7.3%. The results also show that our method is computationally efficient.
15:15	11.5.4	A CONSTRAINT-BASED DESIGN SPACE EXPLORATION FRAMEWORK FOR REAL-TIME APPLICATIONS ON MPSOCS Speakers: Kathrin Rosvall and Ingo Sander, KTH Royal Institute of Technology, SE Abstract Design space exploration (DSE) is a critical step in the design process of real-time multiprocessor systems. Combining a formal base in form of SDF graphs with predictable platforms providing guaranteed QoS, the paper proposes a flexible and extendable DSE framework that can provide performance guarantees for multiple applications implemented on a shared platform. The DSE framework is formulated in a declarative style as interprocess communication-aware constraint programming (CP) model. Apart from mapping and scheduling of application graphs, the model supports design constraints on several cost and performance metrics, as e.g. memory consumption and achievable throughput. Using constraints with different compliance level, the framework introduces support for mixed criticality in the CP model. The potential of the approach is demonstrated by means of experiments using a Sobel filter, a SUSAN filter, a RASTA-PLP application and a JPEG encoder.
15:31	IP5-15, 472	RELIABILITY-AWARE MAPPING OPTIMIZATION OF MULTI-CORE SYSTEMS WITH MIXED-CRITICALITY Speakers: Shin-Haeng Kang¹, Hoeseok Yang², Sungchan Kim³, Iuliana Bacivarov², Soonhoi Ha¹ and Lothar Thiele⁴ ¹Seoul National University, KR; ²ETH Zurich, CH; ³Chonbuk National University, KR; ⁴Swiss Federal Institute of Technology Zurich, CH Abstract This paper presents a novel mapping optimization technique for mixed critical multi-core systems with different reliability requirements. For this scope, we derived a quantitative reliability metric and presented a scheduling analysis that certifies given mixed-criticality constraints. Our framework is capable of investigating re-execution, passive replication, and modular redundancy with optimized voter placement, while typical hardening approaches consider only one or two of these techniques. The proposed technique complies with existing safety standards and is power-efficient, as demonstrated by our experiments.
15:32	IP5-16, 498	(Best Paper Award Candidate) FROM SIMULINK TO NOC-BASED MPSOC ON FPGA Speakers: Francesco Robino and Johnny Öberg, KTH Royal Institute of Technology, SE Abstract Network-on-chip (NoC) based multi-processor systems are promising candidates for future embedded system platforms. However, because of their complexity, new high level modeling techniques are needed to design, simulate and synthesize embedded systems targeting NoC-based MPSoC. Simulink is a popular modeling environment suitable to model at system level. However, there is no clear standard to synthesize Simulink models into SW and HW towards a NoC-based MPSoC implementation. In addition, many of the proposed solutions require large overhead in terms of SW components and memory requirements, resulting in complex and customized multi-processor platforms. In this paper we present a novel design flow to synthesize Simulink models onto a NoC-based MPSoC running on low-cost FPGAs. Our design flow constrains the MPSoC and the Simulink model to share a common semantics domain. This permits to reduce the need of resource consuming SW components, reducing the memory requirements on the platform. At the same time, performances (throughput) of dataflow applications can increase when the number of processors of the target platform is increased. This is shown through a case study on FPGA.
15:30		End of session Coffee Break in Exhibition Area On Tuesday-Thursday the coffee and lunch breaks will be located in the Exhibition Area (Terrace Level).

< Return to last page

Submissions

11.5 Memory Resource Allocation and Scheduling in MPSoC