2.4 Performance and Power Analysis

Printer-friendly version PDF version

Date: Tuesday 28 March 2017
Time: 11:30 - 13:00
Location / Room: 3A

Chair:
Gianluca Palermo, Politecnico di Milano, IT

Co-Chair:
Ingo Sander, KTH Royal Institute of Technology, SE

Early performance and power estimation is critical for computer system design. This session covers novel analytical and semi-analytical approaches for fast and accurate modeling of different system components, including GPUs, DRAMs and caches.

TimeLabelPresentation Title
Authors
11:302.4.1GATSIM: ABSTRACT TIMING SIMULATION OF GPUS
Speaker:
Andreas Gerstlauer, The University of Texas at Austin, US
Authors:
Kishore Punniyamurthy, Behzad Boroujerdian and Andreas Gerstlauer, The University of Texas at Austin, US
Abstract
General-Purpose Graphic Processing Units (GPUs) have become an integral part of heterogeneous system architectures. Ever increasing complexities have made rapid, early performance evaluation of GPU-based architectures and applications a primary design concern. Traditional cycle-accurate GPU simulators are too slow, while existing analytical or source-level estimation approaches are often inaccurate. This paper proposes a novel abstract GPU performance simulation approach that is based on flexible separation of functional and timing models, combining a fast functional execution either on existing simulators or native GPU hardware with a light, fast and accurate abstract timing model. Micro-architecture timing of individual GPU cores is abstracted through static, one-time pre-characterization of code, and only the dynamic scheduling effects are simulated. Using a native GPU for functional execution and excluding pre-characterization, our GPU simulation achieves a throughput of more than 80 MIPS. This is on average 400x faster with 4% error compared to a cycle-accurate GPU simulator for standard GPU benchmarks. Moreover, our simple timing model provides flexibility to target different GPU configurations with little or no extra effort.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.4.2MESAP: A FAST ANALYTIC POWER MODEL FOR DRAM MEMORIES
Speaker:
Sandeep Poddar, IBM Research, The Netherlands, NL
Authors:
Sandeep Poddar1, Rik Jongerius1, Leandro Fiorin1, Giovanni Mariani1, Gero Dittmann2, Andreea Anghel2 and Henk Corporaal3
1IBM Research, NL; 2IBM Research, CH; 3TU/e (Eindhoven University of Technology), NL
Abstract
The design of an energy-efficient memory subsystem is one of the key issues that system architects face today. To achieve this goal, architects usually rely on system simulators and trace-based DRAM power models. However, their long execution makes the approach infeasible for the design-space exploration of next-generation exascale computing systems. Analytic models, in contrast, are orders of magnitude faster. In this paper, we propose a new analytic memory scheduler-agnostic power model (MeSAP) for DRAM. Our model achieves an average error of 20% for DDR3 and DDR4 memory systems, similar to a state-of-the-art trace-based approach but our analytic model is an order of magnitude faster. Furthermore, we integrate MeSAP into an analytic performance model of general-purpose processors and show its applicability to the design of a computing system targeting scientific image processing applications.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.4.3AFEC: AN ANALYTICAL FRAMEWORK FOR EVALUATING CACHE PERFORMANCE IN OUT-OF-ORDER PROCESSORS
Speaker:
Kecheng Ji, Southeast University, CN
Authors:
Kecheng Ji1, Ming Ling1, Qin Wang1, Longxing Shi1 and Jianping Pan2
1Southeast University, CN; 2University of Victoria, CA
Abstract
Evaluating cache performance is becoming critically important to predict the overall performance of out-of-order processors. Non-blocking caches, which are very common in out-of-order CPUs, can reduce the average cache miss penalty by overlapping multiple outstanding memory requests and merging different cache misses with the same cacheline address into one memory request. Normally, memory-level-parallelism (MLP) has been used as a metric to describe the concurrency of memory access. Unfortunately, due to the extremely dynamic dependences among the program memory references, it is very difficult to quantify MLP without time-consuming simulations. Moreover, the merging of multiple cache misses, which makes the average cache miss service time less than the physical DDR access latency, is seldom considered in the existing researches. In this paper, we propose a cache performance evaluation framework based on program trace analysis and analytical models to fast estimate MLP and the effective cache miss service time without simulations. Comparing with the results by Gem5 simulations of MobyBench 2.0, Mibench 1.0 and Mediabench II, the average accuracy of the modeled MLP and the average cache miss service time is higher than 91% and 92%, respectively. Combined with cache misses calculated by the stack distance theory, the average absolute error of CPU stall time (due to cache misses) is lower than 10%, while the evaluation time can be sped up by 35 times relative to the Gem5 full simulations.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00IP1-5, 88MODELING INSTRUCTION CACHE AND INSTRUCTION BUFFER FOR PERFORMANCE ESTIMATION OF VLIW ARCHITECTURES USING NATIVE SIMULATION
Speaker:
Omayma Matoussi, Grenoble INP, TIMA laboratory, FR
Authors:
Omayma Matoussi1 and Frédéric Pétrot2
1Tima Laboratory at Grenoble, FR; 2TIMA Laboratory, Grenoble Institute of Technology, FR
Abstract
In this work, we propose an icache performance estimation approach that focuses on a component necessary to handle the instruction parallelism in a very long instruction word (VLIW) processor: the instruction buffer (IB). Our annotation approach is founded on an intermediate level native- simulation framework. It is evaluated with reference to a cycle accurate instruction set simulator leading to an average cycle count error of 9.3% and an average speedup of 10.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Garden Foyer

Keynote Lecture session 3.0 in "Garden Foyer" 1350 - 1420

Lunch Break in the Garden Foyer
On all conference days (Tuesday to Thursday), a buffet lunch will be offered in the Garden Foyer, in front of the session rooms. Kindly note that this is restricted to conference delegates possessing a lunch voucher only. When entering the lunch break area, delegates will be asked to present the corresponding lunch voucher of the day. Once the lunch area is being left, re-entrance is not allowed for the respective lunch.