4.4 Digital processing with emerging memory technologies

Printer-friendly version PDF version

Date: Tuesday, March 26, 2019
Time: 17:00 - 18:30
Location / Room: Room 4

Shahar Kvatinsky, Technion, IL, Contact Shahar Kvatinsky

Elena-Ioana Vataleju, TIMA, FR, Contact Elena Ioana Vatajelu

This session looks at how emerging memory technologies improve the processing in digital systems for applications like processing-in-memory, graph processing, Binary Neural Networks and nonvolatile processors.

TimeLabelPresentation Title
Roberto Giorgio Rizzo, Politecnico di Torino, IT
Valerio Tenace1, Roberto Giorgio Rizzo1, Debjyoti Bhattacharjee2, Anupam Chattopadhyay2 and Andrea Calimera1
1Politecnico di Torino, IT; 2Nanyang Technological University, SG
A Memristor is a two-terminal device that can serve as a non-volatile memory element with built-in logic capabilities. Arranged in a crossbar structure, memristive arrays allow to represent complex Boolean logic functions that adhere to the logic-in-memory paradigm, where data and logic gates are glued together on the same piece of hardware. Needless to say, novel and ad-hoc CAD solutions are required to achieve practical and feasible hardware implementations. Existing techniques aim at optimal mapping strategies that account for Boolean logic functions described by means of 2-input NOR and NOT gates, thus overlooking the optimization capabilities that a smart and dedicated technology-aware logic synthesis can provide. In this paper, we introduce a novel library-free supergate-aided (SAID) logic synthesis approach with a dedicated mapping strategy tailored on MAGIC crossbars. Supergates are obtained with a Look-Up Table (LUT)-based synthesis that splits a complex logic network into smaller Boolean functions. Those functions are then mapped on the crossbar array as to minimize latency. The proposed SAID flow allows to (i) maximize supergate-level parallelism, thus reducing the total number of computing cycles, and (ii) relax mapping constraints, allowing an easy and fast mapping of Boolean functions on memristive crossbars. Experimental results obtained on several benchmarks from ISCAS'85 and IWLS'93 suites demonstrate that our solution is capable to outperform other state-of-the-art techniques in terms of speedup (3.89X in the best case), at the expense of a very low area overhead.
Deliang Fan, University of Central Florida, US
Shaahin Angizi1, Jiao Sun2, Wei Zhang2 and Deliang Fan3
1Department of Electrical and Computer Engineering, University of Central Florida, US; 2Department of Computer Science, University of Central Florida, US; 3University of Central Florida, US
In this work, we present GraphS architecture, which transforms current Spin Orbit Torque Magnetic Random Access Memory (SOT-MRAM) to massively parallel computational units capable of accelerating graph processing applications. GraphS can be leveraged to greatly reduce energy consumption dealing with underlying adjacency matrix computations, eliminating unnecessary off-chip accesses and providing ultra-high internal bandwidth. The device-to-architecture co-simulation for three social network data-sets indicate roughly 3.6× higher energy-efficiency and 5.3× speed-up over recent ReRAM crossbar. It achieves ⁓4× higher energy-efficiency and 5.1× speed-up over recent processing-in-DRAM acceleration methods.
Liang Chang, Beihang University, CN
Liang Chang1, Xin Ma2, Zhaohao Wang1, Youguang Zhang1, Weisheng Zhao1 and Yuan Xie3
1Beihang University, CN; 2University of California, Santa Barbara, US; 3UC Santa Barbara, US
Binary Neural Networks (BNNs) have obtained great attention since they reduce memory usage and power consumption as well as achieve a satisfying recognition accuracy on Image Classification. In particular to computation, the multiply-accumulate operations of Conventional Neural Networks (CNNs) are replaced with the bit-wise operations (XNOR and pop-count). Such bit-wise operations are well suited for the hardware accelerator such as in-memory computing (IMC). However, an additional digital processing unit (DPU) is required for the pop-count operation, which induces considerable data movement between the Process Engines (PEs) and data buffers reducing the efficiency of the IMC. In this paper, we present a BNN computing accelerator, namely CORNs, which consists of a Non-Volatile Memory (NVM) based data buffer to perform the majority operation (to replace the pop-count process) with the NVM-based IMC to accelerate the computing of BNNs. CORN can naturally implement the XNOR operation in the NVM memory array, and feed results to the computing data buffer for the majority write operation. Such a design removes the pop-counter implemented by the DPU and reduces data movement between the data buffer and the memory array. Based on the evaluation results, CORN achieves 61% and 14% power saving with 1.74x and 2.12x speedup, compared to the FPGA and DPU based IMC architecture, respectively.
Robert Perricone, University of Notre Dame, US
Robert Perricone1, Zhaoxin Liang2, Meghna Mankalale2, Michael Niemier1, Sachin S. Sapatnekar3, Jian-Ping Wang2 and X, Sharon Hu1
1University of Notre Dame, US; 2University of Minnesota, US; 3University of Minnesota,
As we approach the limits of CMOS scaling, researchers are developing "beyond-CMOS" technologies to sustain the technological benefits associated with device scaling. Spintronic technologies have emerged as a promising beyond-CMOS technology due to their inherent benefits over CMOS such as high integration density, low leakage power, radiation hardness, and non-volatility. These benefits make spintronic devices an attractive successor to CMOS-especially for memory circuits. However, spintronic devices generally suffer from slower switching speeds and higher write energy, which limits their usability. In an effort to close the energy-delay gap between CMOS and spintronics, device concepts such as CoMET (Composite-Input Magnetoelectric-base Logic Technology) have been introduced, which collectively leverage material phenomena such as the spin-Hall effect and the magnetoelectric effect to enable fast, energy efficient device operation. In this work, we propose a non-volatile flip-flop (NVFF) based on CoMET technology that is capable of achieving up to two orders of magnitude less write energy than CMOS. This low write energy (~2 aJ) makes our CoMET NVFF especially attractive to architectures that require frequent backup operations-e.g., for energy harvesting non-volatile processors.
18:30End of session
Exhibition Reception in Exhibition Area

The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.