7.5 Emerging Memory Architectures

Time	Label	Presentation Title Authors
14:30	7.5.1	LEADER: ACCELERATING RERAM-BASED MAIN MEMORY BY LEVERAGING ACCESS LATENCY DISCREPANCY IN CROSSBAR ARRAYS Speaker: Hang Zhang, National University of Defense Technology, CN Authors: Hang Zhang, Nong Xiao, Fang Liu and Zhiguang Chen, National University of Defense Technology, CN Abstract Emerging Resistive Memory (ReRAM) technology is a promising candidate as the replacement to DRAM due to its low leakage power consumption, good scalability, and high density. By employing crossbar structures, the density of ReRAM can be further improved for capacity benefits. However, such structure also causes an IR drop issue due to wire resistance and sneak currents, which lead to an access latency discrepancy in ReRAM memory banks. Existing designs conservatively utilize the worst-case latency of ReRAM arrays, and thus fail to explore the potential of the fast access speed of ReRAM, resulting in sub-optimal performance. In this work, we present an asymmetric ReRAM memory design, which separates a crossbar array into multiple logical regions according to their access latency, and further groups logical regions across different crossbars into virtual regions. Based on the observation of access hotspots inside memory banks, we design a table structure to remap memory requests to different virtual regions with non-uniform access latency, so as to match these access hotspots with the underlying asymmetric bank design. We then introduce both static mapping and dynamic mapping schemes to prioritize memory requests from critical applications to the fast regions for better performance. Experimental results show that our design can improve the 4-core system performance by 13.3% and reduce the memory latency by 21.6% on average for a ReRAM-based memory system across memory intensive applications. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	7.5.2	SLIDING BASKET: AN ADAPTIVE ECC SCHEME FOR RUNTIME WRITE FAILURE SUPPRESSION OF STT-RAM CACHE Speaker: Yiran Chen, University of Pittsburgh, US Authors: Xue Wang¹, Mengjie Mao¹, Wujie Wen², Enes Eken¹, Hai Li¹ and Yiran Chen¹ ¹University of Pittsburgh, US; ²Florida International University, US Abstract Write reliability is one of the major challenges in design of spin-transfer torque random access memory (STT- RAM) caches. To ensure design quality, error correction code (ECC) scheme is usually adopted in STT-RAM caches. However, it incurs significant hardware overhead. In observance of the dynamic error correcting requirements, in this work, we propose Sliding Basket - an adaptive ECC scheme to suppress the runtime write failures of STT-RAM cache with minimized hardware cost. Our simulation results show that compared to the STT-RAM caches with conventional ECC scheme, applying Sliding Basket can achieve up to 80.2% saving in ECC bit overhead, comparable write reliability and even better system performance. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	7.5.3	EXPLOITING MORE PARALLELISM FROM WRITE OPERATIONS ON PCM Speaker: Zheng Li, Huazhong University of Science and Technology, CN Authors: Zheng Li, Fang Wang, Yu Hua, Wei Tong, Jingning Liu, Yu Chen and Dan Feng, Huazhong University of Science and Technology, CN Abstract The number of bits can be written concurrently to PCM, called write unit, is restricted due to heavy write energy consumption and we need many serially executed write units to finish a cache line service, which results in long write time and poor write performance of PCM. In order to address the poor write performance problem, we propose a novel PCM write scheme called IZV. The key idea behind IZV is to reduce the number of write unit execution in a cache line service. IZV design includes sFPC (simplified FPC data coding), RW (Reordering Write operations) and WP (Write Parallelism circuits). By means of sFPC, RW and WP, the zero parts of write units can be indicated with predefined prefix bits and the residues can be reordered and written concurrently under power constraints. IZV is highly effective and efficient in improving the performance and reducing the energy consumption. Experimental results of 4-core PARSEC 2.0 workloads show that IZV improves 32.5% performance and reduces 48% energy as well as 44% latency compared with the conventional write scheme. When combined with partly data flip, the variation of IZV (IZV-PF) yields 12% performance improvement, 23% energy saving and 22% latency reduction compared with the state-of-the-art FNW. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00		End of session Coffee Break in Exhibition Area

Time

Label

Presentation Title
Authors

14:30

7.5.1

LEADER: ACCELERATING RERAM-BASED MAIN MEMORY BY LEVERAGING ACCESS LATENCY DISCREPANCY IN CROSSBAR ARRAYS
Speaker:
Hang Zhang, National University of Defense Technology, CN
Authors:
Hang Zhang, Nong Xiao, Fang Liu and Zhiguang Chen, National University of Defense Technology, CN
Abstract
Emerging Resistive Memory (ReRAM) technology is a promising candidate as the replacement to DRAM due to its low leakage power consumption, good scalability, and high density. By employing crossbar structures, the density of ReRAM can be further improved for capacity benefits. However, such structure also causes an IR drop issue due to wire resistance and sneak currents, which lead to an access latency discrepancy in ReRAM memory banks. Existing designs conservatively utilize the worst-case latency of ReRAM arrays, and thus fail to explore the potential of the fast access speed of ReRAM, resulting in sub-optimal performance. In this work, we present an asymmetric ReRAM memory design, which separates a crossbar array into multiple logical regions according to their access latency, and further groups logical regions across different crossbars into virtual regions. Based on the observation of access hotspots inside memory banks, we design a table structure to remap memory requests to different virtual regions with non-uniform access latency, so as to match these access hotspots with the underlying asymmetric bank design. We then introduce both static mapping and dynamic mapping schemes to prioritize memory requests from critical applications to the fast regions for better performance. Experimental results show that our design can improve the 4-core system performance by 13.3% and reduce the memory latency by 21.6% on average for a ReRAM-based memory system across memory intensive applications.
Download Paper (PDF; Only available from the DATE venue WiFi)

15:00

7.5.2

SLIDING BASKET: AN ADAPTIVE ECC SCHEME FOR RUNTIME WRITE FAILURE SUPPRESSION OF STT-RAM CACHE
Speaker:
Yiran Chen, University of Pittsburgh, US
Authors:
Xue Wang¹, Mengjie Mao¹, Wujie Wen², Enes Eken¹, Hai Li¹ and Yiran Chen¹
¹University of Pittsburgh, US; ²Florida International University, US
Abstract
Write reliability is one of the major challenges in design of spin-transfer torque random access memory (STT- RAM) caches. To ensure design quality, error correction code (ECC) scheme is usually adopted in STT-RAM caches. However, it incurs significant hardware overhead. In observance of the dynamic error correcting requirements, in this work, we propose Sliding Basket - an adaptive ECC scheme to suppress the runtime write failures of STT-RAM cache with minimized hardware cost. Our simulation results show that compared to the STT-RAM caches with conventional ECC scheme, applying Sliding Basket can achieve up to 80.2% saving in ECC bit overhead, comparable write reliability and even better system performance.
Download Paper (PDF; Only available from the DATE venue WiFi)

15:30

7.5.3

EXPLOITING MORE PARALLELISM FROM WRITE OPERATIONS ON PCM
Speaker:
Zheng Li, Huazhong University of Science and Technology, CN
Authors:
Zheng Li, Fang Wang, Yu Hua, Wei Tong, Jingning Liu, Yu Chen and Dan Feng, Huazhong University of Science and Technology, CN
Abstract
The number of bits can be written concurrently to PCM, called write unit, is restricted due to heavy write energy consumption and we need many serially executed write units to finish a cache line service, which results in long write time and poor write performance of PCM. In order to address the poor write performance problem, we propose a novel PCM write scheme called IZV. The key idea behind IZV is to reduce the number of write unit execution in a cache line service. IZV design includes sFPC (simplified FPC data coding), RW (Reordering Write operations) and WP (Write Parallelism circuits). By means of sFPC, RW and WP, the zero parts of write units can be indicated with predefined prefix bits and the residues can be reordered and written concurrently under power constraints. IZV is highly effective and efficient in improving the performance and reducing the energy consumption. Experimental results of 4-core PARSEC 2.0 workloads show that IZV improves 32.5% performance and reduces 48% energy as well as 44% latency compared with the conventional write scheme. When combined with partly data flip, the variation of IZV (IZV-PF) yields 12% performance improvement, 23% energy saving and 22% latency reduction compared with the state-of-the-art FNW.
Download Paper (PDF; Only available from the DATE venue WiFi)

16:00

End of session
Coffee Break in Exhibition Area

Visit us at DATE 2016