Contact Information:

Research Associate

01062 Dresden, Germany

Tel.: +49 351 463 41041

www.vodafone-chair.com

tu-dresden.de

E-mail: oliver.arnold@

**Oliver Arnold** 

Vodafone Chair

# Performance Impact of Instruction Set Architecture Extensions for Dynamic Task Scheduling Units

DATE 2014



## Introduction

A heterogeneous MPSoC is controlled by a dynamic task scheduling unit called CoreManager (CM). The instruction set architecture of this unit has been extended to improve performance for dynamic data dependency checking, task scheduling, processing element (PE) allocation and data transfer management [1].

# Approach

The MPSoC depicted in Fig. 1 consists of 22 cores and three global memory ports (M). The data plane is composed of 20 PEs. Altogether ten digital signal processors (DSP) and ten general purpose (GP) cores are integrated. The CoreManager controls the data plane dynamically, according to the current system status. The heterogeneous nature of the system as well as the integrated local memories are considered. Its performance is improved by an application-specific instruction set, e.g., single instruction multiple data operations (SIMD). An application processor hosts the operating system and executes the sequential part of an application. All modules are connected by a 5x5 Network-on-Chip (NoC). Each module has a dedicated router, connected to its neighbors by point-to-point data links. The routers are responsible for packet scheduling and arbitration. XY routing is applied. All modules as well as the NoC are integrated in a cycle-accurate Tensilica XTSC simulation environment. Further tools have been newly developed for visualization of task executions and data transfers (TaskVisualizer [2]), as well as system status observation (DebugVisualizer [1]).



Fig. 2. CoreManager behavior

### Basic components and data flow (Fig. 2)

| 1-4 | Task description transfer |
|-----|---------------------------|
| 5   | Dynamic dependency check  |

- 6-7 Task scheduling
  - PE allocation
  - Memory allocation
- 10 PE start up code
- 11 DMA transfer
- 12-13 Task execution
- 14-16 Task clean up
- 17 Successor tasks

#### Results

8

9

In Fig. 3, the processing time of the task scheduling is shown. The number of tasks in the ready list are varied between 1 and 32. Three different CoreManager approaches are analyzed and The CoreManager with compared. extended instruction set (CM-EIS) outperforms the RISC-based implementations (CM-LX4 and CM-VLIW) by nearly two orders of magnitude in the case of 32 tasks. Further results can be found in [1], [3], [4] and [5]. It faster than an ARM9-based implementation is presented in [2] and [6]. Furthermore, the CoreManager CM-EIS was integrated in a TSMC 65 nm LP-CMOS prototype [7].

#### References

[1] O. Arnold, B. Nöthen and G. Fettweis, "Instruction Set Architecture Extensions for a Dynamic Task Scheduling Unit", IEEE Annual Symposium on VLSI (ISVLSI'12), 19.8. - 22.8.2012.

[2] O. Arnold and G. Fettweis, "Power Aware Heterogeneous MPSoC with Dynamic Task Scheduling and Increased Data Locality for Multiple Applications", International Workshop on Systems, Architectures, MOdeling, and Simulation (SAMOS'10), Samos, Greece, 19.7. - 22.7.2010.

[3] O. Arnold, B. Nöthen and G. Fettweis, "A Flexible Analytic Model for a Dynamic Task-Scheduling Unit for Heterogeneous MPSoCs", International Conference on Advances in System Simulation (SIMUL'13), Venice Italy, 27.10. - 1.11.2013

[4] O. Arnold, E. Matus, B. Nöthen, F. Pauls and G. Fettweis, "Towards Elastic SDR Architectures Using Dynamic Task Management", IEEE Global Conference on Signal and Information Processing (GlobalSIP'13), Austin, U.S.A., 3.12. - 5.12.2013

[5] O. Arnold, E. Matus, B. Nöthen, M. Winter, T. Limberg and G. Fettweis, "Tomahawk - Parallelism and Heterogeneity in Communications Signal Processing MPSoCs", ACM Transactions on Embedded Computing Systems (TECS), Volume 13, 2014.

[6] O. Arnold and G. Fettweis, "On the Impact of Dynamic Task Scheduling in Heterogeneous MPSoCs", International Conference on Embedded Computer Systems Architectures, Modeling and Simulation (SAMOS'11), 18.7. - 21.7.2011.

[7] B. Noethen et al., "A 105 GOPS 36mm2 heterogeneous SDR MPSoC with energy-aware dynamic scheduling and iterative detection-decoding for 4G in 65nm CMOS," ISSCC Dig. Tech. Papers 2014, San Francisco, USA, 9.2. - 13.2.2014.



VODAFONE CHAIR MOBILE COMMUNICATIONS SYSTEMS