DATE 2018

moDNN: Memory Optimal DNN Training on GPUs

Xiaoming Chen^1,2,a, Danny Z. Chen^1,b and Xiaobo Sharon Hu^1,c
¹University of Notre Dame, Notre Dame, IN 46556, USA
²Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
^achenxiaoming@ict.ac.cn
^bdchen@nd.edu
^cshu@nd.edu

ABSTRACT

Graphics processing units (GPUs) are widely adopted to accelerate the training of deep neural networks (DNNs). However, the limited GPU memory size restricts the maximum scale of DNNs that can be trained on GPUs, which presents serious challenges. This paper proposes an moDNN framework to optimize the memory usage in DNN training. moDNN supports automatic tuning of DNN training code to match any given memory budget (not smaller than the theoretical lower bound). By taking full advantage of overlapping computations and data transfers, we have developed heuristics to judiciously schedule data offloading and prefetching, together with training algorithm selection, to optimize the memory usage. We further introduce a new sub-batch size selection method which also greatly reduces the memory usage. moDNN can save the memory usage up to 50×, compared with the ideal case which assumes that the GPU memory is sufficient to hold all data. When executing moDNN on a GPU with 12GB memory, the performance loss is only 8%, which is much lower than that caused by the best known existing approach, vDNN. moDNN is also applicable to multiple GPUs and attains 1.84× average speedup on two GPUs.

Full Text (PDF)