moDNN: Memory Optimal DNN Training on GPUs

Xiaoming Chen1,2,a, Danny Z. Chen1,b and Xiaobo Sharon Hu1,c
1University of Notre Dame, Notre Dame, IN 46556, USA
2Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
achenxiaoming@ict.ac.cn
bdchen@nd.edu
cshu@nd.edu

ABSTRACT


Graphics processing units (GPUs) are widely adopted to accelerate the training of deep neural networks (DNNs). However, the limited GPU memory size restricts the maximum scale of DNNs that can be trained on GPUs, which presents serious challenges. This paper proposes an moDNN framework to optimize the memory usage in DNN training. moDNN supports automatic tuning of DNN training code to match any given memory budget (not smaller than the theoretical lower bound). By taking full advantage of overlapping computations and data transfers, we have developed heuristics to judiciously schedule data offloading and prefetching, together with training algorithm selection, to optimize the memory usage. We further introduce a new sub-batch size selection method which also greatly reduces the memory usage. moDNN can save the memory usage up to 50×, compared with the ideal case which assumes that the GPU memory is sufficient to hold all data. When executing moDNN on a GPU with 12GB memory, the performance loss is only 8%, which is much lower than that caused by the best known existing approach, vDNN. moDNN is also applicable to multiple GPUs and attains 1.84× average speedup on two GPUs.



Full Text (PDF)