STAIR: High Reliable STT-MRAM Aware Multi-Level I/O Cache Architecture by Adaptive ECC Allocation

Mostafa Hadizadeha, Elham Cheshmikhanib and Hossein Asadic

Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
amhadizdeh@ce.sharif.edu
belham.cheshmikhani@sharif.edu
casadi@sharif.edu

ABSTRACT

Hybrid Multi–Level Cache Architectures (HCAs) are promising solutions for the growing need of high-performance and cost-efficient data storage systems. HCAs employ a high endurable memory as the first-level cache and a Solid–State Drive (SSD) as the second-level cache. Spin–Transfer Torque Magnetic RAM (STT-MRAM) is one of the most promising candidates for the first-level cache of HCAs because of its high endurance and DRAM-comparable performance along with non-volatility. However, STT-MRAM faces with three major reliability challenges named Read Disturbance, Write Failure, and Retention Failure. To provide a reliable HCA, the reliability challenges of STT-MRAM should be carefully addressed. To this end, this paper first makes a careful distinction between clean and dirty pages to classify and prioritize their different vulnerabilities. Then, we investigate the distribution of more vulnerable pages in the first-level cache of HCAs over 17 storage workloads. Our observations show that the protection overhead can be significantly reduced by adjusting the protection level of data pages based on their vulnerability. To this aim, we propose a STT–MRAM Aware Multi–Level I=O Cache Architecture (STAIR) to improve HCA reliability by dynamically generating extra strong Error–Correction Codes (ECCs) for the dirty data pages. STAIR adaptively allocates under-utilized parts of the first-level cache to store these extra ECCs. Our evaluations show that STAIR decreases the data loss probability by five orders of magnitude, on average, with negligible performance overhead (0.12% hit ratio reduction in the worst case) and 1.56% memory overhead for the cache controller.

Keywords: Data Storage Systems, Hybrid Multi-Level Cache Architecture, STT-MRAM, Error-Correction Code (ECC)



Full Text (PDF)