Maximizing System Performance by Balancing Computation Loads in LSTM Accelerators

Junki Parka, Jaeha Kungb, Wooseok Yic and Jae-Joon Kimd
Pohang University of Science and Technology (POSTECH), Pohang, Republic of Korea
ajunkipark@postech.ac.kr
bjhkung@postech.ac.kr
ciyohoyi@postech.ac.kr
djaejoon@postech.ac.kr

ABSTRACT


The LSTM is a popular neural network model for modeling or analyzing the time-varying data. The main operation of LSTM is a matrix-vector multiplication and it becomes sparse (spMxV) due to the widely-accepted weight pruning in deep learning. This paper presents a new sparse matrix format, named CBSR, to maximize the inference speed of the LSTM accelerator. In the CBSR format, speed-up is achieved by balancing out the computation loads over PEs. Along with the new format, we present a simple network transformation to completely remove the hardware overhead incurred when using the CBSR format. Also, the detailed analysis on the impact of network size or the number of PEs is performed, which lacks in the prior work. The simulation results show 16∼38% improvement in the system performance compared to the well-known CSC/CSR format. The power analysis is also performed in 65nm CMOS technology to show 9∼22% energy savings.



Full Text (PDF)