eBSP: Managing NoC Traffic for BSP Workloads on the 16-Core Adapteva Epiphany-III Processor

Siddhartha1 and Nachiket Kapre2
1Nanyang Technological University, 50 Nanyang Avenue, Singapore.
siddhart005@e.ntu.edu.sg
2University of Waterloo, 200 University Ave W, Waterloo, Canada.
nachiket@uwaterloo.ca

ABSTRACT


We can deliver high performance and energy efficient operation on the multi-core NoC-based Adapteva Epiphany-III SoC for bulk-synchronous workloads using our proposed eBSP communication API. We characterize and automate performance tuning of spatial parallelism for supporting (1) random access load-store style traffic suitable for irregular sparse computations, as well as (2) variable, data-dependent traffic patterns in neural networks or PageRank-style workloads in a manner tailored for the Epiphany NoC. We aggressively optimize traffic by exposing spatial communication structure to the fabric through offline pre-computation of destination addresses, unrolling of message-passing loops, selective squelching of messages, and careful ordering of communication and compute. Using our approach, across a range of applications and datasets such as Sparse Matrix-Vector multiplication (Matrix Market datasets), PageRank (BerkStan SNAP dataset), and Izhikevich spiking neural evaluation, we deliver speedups of 6.5-10 x; while lowering power use by 2x;over optimized ARM-based mappings. When compared to optimized OpenMP x86 mappings, we observe a 11-31x; improvement in energy efficiency (GFLOP/s/W) for the Epiphany SoC. Epiphany is also able to beat state-of-the-art spatial FPGA (ZC706) and embedded GPU (Jetson TK1) mappings due to our communication optimizations. Our library is open-source and available at github.com/sidmontu/ebsp.git.



Full Text (PDF)