11.5 Vitello e Mozzarella alla Fiorentina: Virtualization, Multicore, and Fault-Tolerance

Printer-friendly version PDF version

Date: Thursday, March 28, 2019
Time: 14:00 - 15:30
Location / Room: Room 5

Chair:
Philippe Coussy, Universite de Bretagne-Sud / Lab-STICC, FR, Contact Philippe Coussy

Co-Chair:
Michael Glass, Ulm University, DE, Contact Michael GlaƟ

This session showcases innovation solutions for optimizing the performance of multiprocessors and virtual machines, as well as fault-tolerant deep neural networks (DNNs). The first paper presents an approach to improve virtual-machine (VM) performance in scenarios where multiple VMs share a single physical storage device. The second paper applies formal (ILP-based) and heuristic techniques to the problem of scheduling approximate computing tasks in asymmetric multiprocessors containing cores with different performance/power trade-offs. The third paper introduces techniques to improve the robustness of DNNs to bit-flip errors, such as those due to single-event-upsets in space and military applications. Two interactive presentations round out the session, with the first being on task and data migration in virtualized multiprocessors, and the second on how to optimize the performance of machine learning tasks in compute clusters.

TimeLabelPresentation Title
Authors
14:0011.5.1VM-AWARE FLUSH MECHANISM FOR MITIGATING INTER-VM I/O INTERFERENCE
Speaker:
Taehyung Lee, Sungkyunkwan University, KR
Authors:
Taehyung Lee, Minho Lee and Young Ik Eom, Sungkyunkwan University, KR
Abstract
Consolidating multiple servers into a physical machine is now commonplace in cloud infrastructures. The virtualized systems often arrange virtual disks of multiple virtual machines (VMs) on the same underlying storage device while striving to guarantee the performance service level objective (SLO) for each VM. Unfortunately, sync operations called by a VM make it hard to satisfy the performance SLO by disturbing I/O activities of other VMs. We reveal that the disk cache flush command is a root cause of this problem and present a novel VM-aware flush mechanism, called vFLUSH, which supports the VM-based persistency control of the disk cache flush command. Our evaluation shows that vFLUSH reduces the average latency of disk cache flush commands by up to 52.0% and improves the overall I/O performance by up to 59.6% on real workloads.
14:3011.5.2AN EFFICIENT BIT-FLIP RESILIENCE OPTIMIZATION METHOD FOR DEEP NEURAL NETWORKS
Speaker:
Christoph Schorn, Robert Bosch GmbH, DE
Authors:
Christoph Schorn1, Andre Guntoro1 and Gerd Ascheid2
1Robert Bosch GmbH, DE; 2RWTH Aachen University, DE
Abstract
Deep neural networks usually possess a high overall resilience against errors in their intermediate computations. However, it has been shown that error resilience is generally not homogeneous within a neural network and some neurons might be very sensitive to faults. Even a single bit-flip fault in one of these critical neuron outputs can result in a large degradation of the final network output accuracy, which cannot be tolerated in some safety-critical applications. While critical neuron computations can be protected using error correction techniques, a resilience optimization of the neural network itself is more desirable, since it can reduce the required effort for error correction and fault protection in hardware. In this paper, we develop a novel resilience optimization method for deep neural networks, which builds upon a previously proposed resilience estimation technique. The optimization involves only few steps and can be applied to pre-trained networks. In our experiments, we significantly reduce the worst-case failure rates after a bit-flip fault for deep neural networks trained on the MNIST, CIFAR-10 and ILSVRC classification benchmarks.
15:0011.5.3APPROXIMATION-AWARE TASK DEPLOYMENT ON ASYMMETRIC MULTICORE PROCESSORS
Speaker:
Lei Mo, INRIA, FR
Authors:
Lei Mo1, Angeliki Kritikakou2 and Olivier Sentieys1
1INRIA, FR; 2IRISA/INRIA, Univ. Rennes, FR
Abstract
Asymmetric multicore processors (AMP) are a very promising architecture to deal efficiently with the wide diversity of applications. In real-time application domains, in-time approximated results are preferred than accurate -- but too late -- results. In this work, we propose a deployment approach that exploits the heterogeneity provided by AMP architectures and the approximation tolerance provided by the applications, so as to increase as much as possible the quality of the results under given energy and timing constraints. Initially, an optimal approach is proposed based on problem linearization and decomposition. Then, a heuristic approach is developed based on iteration relaxation of the optimal version. The obtained results show 16.3% reduction in the computation time for the optimal approach compared to the conventional optimal approaches. The proposed heuristic approach is about 100 times faster at the cost of a 29.8% QoS degradation in comparison with the optimal solution.
15:30IP5-12, 5GENIE: QOS-GUIDED DYNAMIC SCHEDULING FOR CNN-BASED TASKS ON SME CLUSTERS
Speaker:
Zhaoyun Chen, National University of Defense Technology, CN
Authors:
Zhaoyun Chen, Lei Luo, Haoduo Yang, Jie Yu, Mei Wen and Chunyuan Zhang, National University of Defense Technology, CN
Abstract
Convolutional Neural Network (CNN) has achieved dramatic developments in emerging Machine Learning (ML) services. Compared to online ML services, offline ML services that are full of diverse CNN workloads are common in small and medium-sized Enterprises (SMEs), research institutes and universities. Efficient scheduling and processing of multiple CNNbased tasks on SME clusters is both significant and challenging. Existing schedulers cannot predict the resource requirements of CNN-based tasks. In this paper, we propose GENIE, a QoS-guided dynamic scheduling framework for SME clusters that achieves users' QoS guarantee and high system utilization. Based on a prediction model derived from lightweight profiling, a QoS-guided scheduling strategy is proposed to identify the best placements for CNN-based tasks. We implement GENIE as a plugin of Tensorflow and experiment with real SME clusters and large-scale simulations. The results of the experiments demonstrate that the QoS-guided strategy outperforms other baseline schedulers by up to 67.4% and 28.2% in terms of QoSguarantee percentage and makespan.
15:30End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019