Comparative study of power-gating architectures for nonvolatile FinFET-SRAM using spintronics-based retention technology

Yusuke Shuto, Shuu’ichirou Yamamoto, and Satoshi Sugahara
Imaging Science and Engineering Laboratory, Tokyo Institute of Technology, Yokohama, Japan
shuto@isl.titech.ac.jp

Abstract—Power-gating (PG) architectures employing nonvolatile state/data retention are expected to be a highly efficient energy reduction technique for high-performance CMOS logic systems. Recently, two types of PG architectures using nonvolatile retention have been proposed: One architecture is nonvolatile PG (NVPG) using nonvolatile bistable circuits such as nonvolatile SRAM (NV-SRAM) and nonvolatile flip-flop (NV-FF), in which nonvolatile retention is not utilized during the normal SRAM/FF operation mode and it is used only when there are energetically meaningful shutdown periods given by break-even time (BET). In contrast, the other architecture employs nonvolatile retention during the normal SRAM/FF operation mode. In this type of architecture, an even short standby period can be replaced by a shutdown period, and thus this architecture is also called normally-off (NOF) rather than PG. In this paper, these two PG architectures for a FinFET-based high-performance NV-SRAM cell employing spintronics-based nonvolatile retention were systematically analyzed using HSPICE with a magnetoresistive-device macromodel. The NVPG architecture shows effective reduction of energy dissipation without performance degradation, whereas the NOF architecture causes severe performance degradation and the energy efficiency of the NOF architecture cannot be superior to that of the NVPG architecture.

Keywords— power-gating; nonvolatile SRAM; break-even time; FinFET

I. INTRODUCTION

Power-gating (PG) is the most attractive architecture to reduce static power in advanced CMOS logic systems such as microprocessors and SoCs [1]. However, data-transfer for state and data retention that is required to execute PG costs the performance overhead for PG and restricts the energy performance of PG. Thus, nonvolatile memory circuits, such as nonvolatile caches, register files, and registers, have a great impact on highly energy-efficient PG for microprocessors and SoCs [2-6]. Although various types of nonvolatile SRAM/FF cells have been proposed [2,3,5-14], their microarchitecture for PG using nonvolatile state/data retention can be divided into two categories. Yamamoto et al. proposed a nonvolatile PG (NVPG) architecture using nonvolatile bistable circuits such as nonvolatile SRAM (NV-SRAM) and nonvolatile flip-flop (NV-FF) [7], in which various bistable memory circuits such as caches, register files, and registers in logic systems are configured with these NV-SRAM and NV-FF circuits. In this type of architecture, the operations are divided into normal operation mode and shutdown (power-off) mode (Fig. 1(a)). In the normal operation mode, nonvolatile retention is not used even when there exists a relatively short standby period, i.e., the NV-SRAM and NV-FF circuits execute only the ordinary SRAM/FF operations in the normal operation mode. When a shutdown period is longer than break-even time (BET) (that is, an important performance index for energy performance of PG and gives a minimum shutdown period), shutdown of circuit domains (or a system) with nonvolatile state/data retention is executed. In the NVPG architecture, the NV-SRAM/NV-FF cells need to have ability to electrically separate the nonvolatile retention mode from the normal SRAM/FF operation mode in order to avoid degradation of circuit performance during the normal operation mode and an increase in their BET [2,5-9]. The other type of PG architecture is the so-called “normally-off (NOF)”. In this type of architecture, the nonvolatile retention is employed even during the normal operation mode [10-12], and thereby the circuit domains can be powered off even during the normal operation mode, that is, the normally-off operation is possible (Fig. 1(b)). However, since the nonvolatile retention is used for the normal operation mode, fatal problems such as an increase in run-time energy dissipation are caused. This is due to high store energy for nonvolatile memory elements. Therefore, the NOF architecture is not suitable for always-on applications of microprocessors and SoCs (e.g., wearable devices) [15], but literally applicable to normally-off applications such as specific microcontrollers with very long standby intervals between occasional operations. However, controversy has arisen over the energy efficiency of the NOF architecture [4,10-12].

Fig. 1. Time evolution of power dissipation for (a) nonvolatile power-gating (NVPG) and (b) normally-off (NOF) architectures.
The NVPG and NOF architectures have been individually investigated using different NV-SRAM/NV-FF cells that have different leakage currents, noise margins, and operation speeds. In this paper, these two types of PG architectures for a high-performance FinFET-based NV-SRAM cell employing spintronics technology (in which spin-transfer-torque magnetic tunnel junctions (MTJs) are used as nonvolatile memory elements for the NV-SRAM cell) are systematically analyzed using HSPICE with an accurate MTJ macromodel [7]. The comparative study of the energy performance and BET of these architectural similarities in the cell configurations.

The NVPG architecture can achieve effective reduction of the energy dissipation without degradation of the circuit performance. On the other hand, the NOF architecture is difficult to show effective energy reduction and causes degradation of the circuit performance owing to the store operation to MTJs during the normal operation mode.

II. CIRCUIT CONFIGURATION AND SIMULATION METHOD

Figure 2 shows the circuit configuration of the FinFET-based NV-SRAM cell employing pseudo-spin-FinFET (PS-FinFET) architecture [2-9]. The FinFET architecture is a circuit for reproducing spin transistor functions using a FinFET and an MTJ [3]. The PS-FinFETs in the cell can electrically separate the MTJs from the ordinary 6T-SRAM part during the normal SRAM operation mode [2,8]. In general, FinFET-based SRAM cells are designed by the fin number of the constituent FinFETs rather than channel width optimization. Here, the fin numbers of the load transistors, driver transistors, pass transistors, and PS-FinFETs in the NV-SRAM cell are denoted by $N_{FL}$, $N_{FD}$, $N_{FP}$, and $N_{FPS}$, respectively. Design of $N_{FL}$ and $N_{FD}$ is highly important, since it restricts the occupied area and static noise margins (SNMs) of the cell [16]. The base design of $(N_{FL}, N_{FD}) = (1,1)$ is beneficial to minimize cell area, although the cell stability is lowered. However, bias assist techniques such as wordline undervoltage are helpful to achieve sufficiently stable operations even for this aggressive design. In this study, the NV-SRAM cell with the $(N_{FL}, N_{FD}, N_{FP}, N_{FPS}) = (1,1,1,1)$ design is used, whereas any bias assist technique for the normal SRAM operations is not employed for simplicity. The cell area of the NV-SRAM cell increases because of the two PS-FinFETs. Nevertheless, the overhead would be minimized to the same degree as that of dual port SRAM cells owing to the similarity in the cell configurations.

For the shutdown mode, virtual-$V_{DD}$ (virtual supply voltage) architecture using a header power switch that shuts off supply voltage to the cell is employed as shown in Fig. 2. In this paper, these cell and power switch configuration are used for both the NVPG and NOF architectures.

The circuit operation and energy performance of the cell using the two architectures were analyzed by HSPICE with a 20-nm-technology FinFET PTM [17] and an MTJ macromodel [7]. This macromodel can closely fit experimentally observed electrical characteristics of MTJs within an error of 1.5% [7]. The device and circuit parameters used in this study are shown in Table I, which were determined by reference to an optimized FinFET-based 6T-SRAM design [3,16] and recently reported characteristics of perpendicular CoFeB/MgO/CoFeB MTJs [18,19].

III. CELL AND ARRAY ARCHITECTURES

The normal SRAM operation mode of the NV-SRAM cell can be performed by turning off the PS-FinFETs, and thus the normal SRAM operations can be achieved in the same manner as those of standard 6T-SRAM cells. The leakage current during the normal SRAM operation mode can be minimized by applying $V_{CTRL}$, as shown in Fig. 3(a). This $V_{CTRL}$ control is also effective at reducing leakage currents during the sleep mode (low-voltage retention mode). In this study, $V_{CTRL} = 0.07V$ and 0.04V were used for the normal operation and sleep modes, respectively.

For the shutdown and wake-up modes, the cell executes the store and restore operations. The store operation is divided into two steps [9]. In the first step, $V_{SR}$ is applied to activate the PS-FinFETs, and then H-level data on the storage node (Q or QB) is stored into the MTJ connected to the node (H-store operation) by current-induced magnetization switching (CIMS). In the second step, $V_{CTRL}$ is applied with $V_{SR}$ fixed, and then L-level data of the other storage node is stored in the other MTJ (L-store operation). After these store operations, the cell can be shut down without losing its data.

Figures 3(b) and (c) show store currents $I_{MTJ}^{P\rightarrow AP}$ and $I_{MTJ}^{AP\rightarrow AP}$ for the H-store and L-store operations as a function of $V_{SR}$ and $V_{CTRL}$, respectively. To ensure CIMS of the MTJs, the store currents need a sufficient margin. However, high store

---

**TABLE I. DEVICE AND CIRCUIT PARAMETERS FOR HSPICE SIMULATIONS**

<table>
<thead>
<tr>
<th>Device</th>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>FinFET</td>
<td>Channel length: $L$</td>
<td>20 nm</td>
</tr>
<tr>
<td></td>
<td>Supply voltage: $V_{DD}$</td>
<td>0.9 V</td>
</tr>
<tr>
<td></td>
<td>Fin width</td>
<td>15 nm</td>
</tr>
<tr>
<td></td>
<td>Fin height</td>
<td>28 nm</td>
</tr>
</tbody>
</table>

**NV-SRAM cell**

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Fin No.</td>
<td>(Load, Driver, Access, PS-FinFET) (1, 1, 1, 1)</td>
</tr>
<tr>
<td>$V_{DD}$</td>
<td>0.9 V</td>
</tr>
<tr>
<td>$V_{CTRL}$</td>
<td>0.5 V</td>
</tr>
<tr>
<td>$V_{SR}$</td>
<td>0.65 V</td>
</tr>
<tr>
<td>$V_{QL}$</td>
<td>0.5 V</td>
</tr>
<tr>
<td>Supply voltage to the cell</td>
<td>0.9 V</td>
</tr>
<tr>
<td>Vhalf</td>
<td>0.5 V</td>
</tr>
<tr>
<td>$I_{C}$</td>
<td>5x10$^{10}$ A/cm$^2$</td>
</tr>
<tr>
<td>RA</td>
<td>(Par mag.) 2$\Omega$</td>
</tr>
<tr>
<td>$R_P(0)$</td>
<td>(Antipar mag.) 6.36 k$\Omega$</td>
</tr>
<tr>
<td>Device diameter</td>
<td>20 nm</td>
</tr>
<tr>
<td>CIMS critical current</td>
<td>15.7 $\mu$A</td>
</tr>
<tr>
<td>$R_A(0)$</td>
<td>12.7 k$\Omega$</td>
</tr>
</tbody>
</table>

---

2015 Design, Automation & Test in Europe Conference & Exhibition (DATE)
During the store operation is degraded with decreasing \( V = 0.5 \text{V} \) can be chosen to ensure the store currents of 1.5
SRAM cell arrays and have

circuit by pull-up of the virtual VDD for the cell owing to the

The data stored in the MTJs are

FinFETs are turned on by applying

The electrical connection of the MTJs to the bistable circuit part.

which is caused by lowering the cell impedance owing to the

operation can be obtained even for

architectures. In practical design, much smaller

is effective at extracting the nature of the NVPG and NOF

figures 3(b) and (c), i.e., \( V_{SR} = 0.65 \text{V} \) and \( V_{CTRL} = 0.5 \text{V} \) can be chosen to ensure the store currents of 1.5\( \times \)IC.

When the cell wakes up from the shutdown mode, the PS-

FinFETs are turned on by applying \( V_{SR} \) at the initial stage of the

The data stored in the MTJs are automatically restored to the storage nodes of the bistable circuit by pull-up of the virtual VDD for the cell owing to the difference in current drivability of the PS-FinFETs [9].

Figure 4 shows virtual-\( V_{DD} \) (\( V_{DD} \)) voltage as a function of the fin number (\( N_{FSW} \)) of the FinFET power switch per cell for the normal SRAM operation and store operation modes. \( V_{DD} \) during the store operation is degraded with decreasing \( N_{FSW} \), which is caused by lowering the cell impedance owing to the electrical connection of the MTJs to the bistable circuit part. \( N_{FSW} \) can be determined so as to ensure the store operation to the MTJs. In this study, the fin number of 7 is used in order to minimize the influence of the power switch (in which \( V_{DD} \) can retain 97% of \( V_{DD} \)). This hypothetical power switch design is effective at extracting the nature of the NVPG and NOF architectures. In practical design, much smaller \( N_{FSW} \) is applicable, since a sufficient noise margin for the store operation can be obtained even for \( N_{FSW} = 1 \) [3].

The power domains examined in this study consist of NV-SRAM cell arrays and have \( N \) word lines with \( M \)-bit word length. The supply voltage for the \( M \)-bit cells connected to a common word line is simultaneously managed through the power switches. In practical cache applications, the entire NV-SRAM array consists of a number of power domains. However, it is not necessary to execute the store operation for the whole SRAM array. Only a part of data that is required for restart can be stored owing to spatial and temporal data locality. This data size could be several 10kByte (kB) or less. Therefore, the NVPG and NOF architectures can be evaluated using a single power domain with an appropriate array size. Note that the NV-SRAM array needs additional driver units for the SR and CTRL lines. The influence of these peripheral circuits on energy performance of the cell is excluded for the following analysis for simplicity.

In this study, to extract the nature of the NVPG and NOF architectures, simplified benchmark sequences are employed. Figures 5(a)-(c) show sequence diagrams of benchmark testing for the ordinary SRAM (OSR), NVPG, and NOF architectures. In these sequences, \( t_{SL} \) represents the duration of the sleep mode (low voltage retention mode) for the OSR and NVPG architectures, which is replaced by a duration \( t_{SD} \) of the short shutdown mode for the NOF architecture, and \( t_{SD} \) represents a duration of the long shutdown mode for the NVPG and NOF architectures, which is replaced by a duration \( t_{SD} \) of the long sleep mode for the OSR architecture. The benchmark sequence (Fig. 5(b)) for the NVPG architecture is done as follows: All the bit cells are read and written in series, then the array enters

![Fig. 3. (a) Leakage current \( I_{NV} \) during the normal SRAM operation mode as a function of \( V_{CTRL} \), in which the leakage current \( I_{NV} \) for equivalent volatile FinFET-based 6T-SRAM cell are also shown. (b) Store current \( I_{MTJ} \) as a function of \( V_{SR} \). (c) \( I_{MTJ} \) as a function of \( V_{CTRL} \) for optimized \( V_{DD} \).

![Fig. 4. Virtual-\( V_{DD} \) voltage as a function of the fin number (\( N_{FSW} \)) of the FinFET power switch per cell during the normal operation and store operation modes.

![Fig. 5. Sequence diagrams of benchmark testing for (a) OSR, (b) NVPG, and (c) NOF architectures.](image-url)
the short sleep mode. These processes repeat \( n_{RW} \) times. Subsequently, after the store operation to the MTJs, the array is shut down during \( t_{SD} \). Finally, the array wakes up by the restore operation. For the OSR and NOF architectures, the procedure is modified as shown in Figs. 5(a) and (c). In general, the repetition of the read operation would be much higher than that of the write operation. Nevertheless, the same repetition number is mainly used for the read and write operations for simplicity. The effect of different repetitions is also discussed briefly.

IV. ENERGY PERFORMANCE AND BREAK-EVEN TIME

Figures 6(a)-(c) show time variation of power consumption per cell for the 6T-SRAM cell with the benchmark sequence (a) shown in Fig. 5 and for the NV-SRAM cell with the sequences (b) and (c) shown in Fig. 5, in which the duration required for the store operation to the MTJs in the NV-SRAM cell is set to 10 ns so as to ensure complete magnetization switching of the MTJs (It is well known that the store time cannot be easily reduced to suppress the error rate of CIMS to a sufficiently low value. At least, several ns is required and also a shorter store time needs a higher store current). The NV-SRAM cell with the NVPG architecture can have the same read/write speed as the 6T-SRAM cell, since the NV-SRAM cell does not use nonvolatile retention and the bistable circuit part of the cell is electrically separated from the MTJs during the normal SRAM operation mode. Note that this separation is also effective at achieving large static noise margins in the normal SRAM operation mode [2,3]. On the other hand, the cell executing the NOF architecture suffers from the degradation of the read/write cycle speed owing to its every-cycle MTJ-store (write back) and cell-shutdown/wake-up.

Using the above-described leakage reduction technique (\( V_{CTRL} \) control) [2], the static power of the NV-SRAM cell is comparable to that of the 6T-SRAM cell during the normal operation and sleep (\( V_{DD} = 0.7 \text{V} \)) modes, as shown in Fig. 6(c). The static power of the NV-SRAM cell during the shutdown mode can be dramatically reduced by the supercutoff technique [20], as shown in Fig. 6(c).

Figure 7(a) shows \( E_{cyc} \) as a function of \( n_{RW} \), in which \( E_{cyc} \) is defined by the sum of energies for all the operation modes (i.e., read, write, sleep, store, shutdown and restore modes) per single cycle (\( n_{cyc} = 1 \)) of the benchmark sequences. In this calculation, \( t_{SD} \) is set to zero (i.e., immediately after the store operation, the restore operation is executed for the NVPG architecture) and \( t_{SL} \) is varied. This condition is effective at evaluating the effect of the store and restore operations. When \( n_{RW} \) increases, \( E_{cyc} \) for the NVPG architecture approaches asymptotically to that for the OSR architecture, i.e., the effect of the store and restore operations on the energy consumption...
recedes with increasing $n_{RW}$. This clearly shows a remarkable advantage for the NVPG architecture. On the other hand, $E_{cyc}$ for the NOF architecture monotonously increases with increasing $n_{RW}$ and is much higher than that for the OSR architecture. Note that $E_{cyc}$ of the NVPG architecture is almost the same as that of the NOF architecture under a condition of $n_{RW} = 1$, since the number of the store operation for these two architectures is the same for $n_{RW} = 1$. Also note that when a repetition ratio of the read operation to the write operation enlarges (10 times or more), these features remain unchanged.

Figure 7(b) shows $E_{cyc}$ per cell as a function of $n_{RW}$ with $t_{SD} = 0$ and $t_{SL} = 100\text{ns}$, in which $M$ is fixed at 32 bit and $N$ is varied from 32 to 2048, i.e., the size of the power domain is varied from 128B to 8kB (that would be a suitable power domain size for cache applications). The NVPG architecture retains the advantage of the $n_{RW}$-dependent energy reduction. Note that for very small $n_{RW}$, $E_{cyc}$ for the NVPG architecture with larger $N$ ($\geq 256$) is higher than that for the NOF architecture. When the domain size is large, the period required for the store mode of the NVPG architecture is prolonged. Therefore, the extra static power becomes prominent for the NVPG architecture, in which $n_{RW}$ is large.

Figure 7(c) shows the effect of $t_{SD}$ on $E_{cyc}$. For the non-zero conditions of $t_{SD}$, the nonlinear $n_{RW}$-dependence appears for all the architectures. $E_{cyc}$ increases with increasing $t_{SD}$ owing to the leakage currents during the shutdown mode for the NVPG and NOF architectures and during the standby (sleep) mode for the OSR architecture. However, the increasing tendencies depend on the type of architecture. When $t_{SD} \geq \approx 10\ \mu s$, $E_{cyc}$ for the NVPG architecture is lower than that for the OSR architecture in the whole $n_{RW}$ range, implying that the cell has a BET of several $10\ \mu s$ (Note that BET is defined by a shutdown period when the extra energy required for NVPG execution is equal to the static energy saved (not wasted) during the shutdown period, i.e., BET gives a minimum shutdown period [5,6]). On the other hand, although $E_{cyc}$ for the NOF architecture can also become lower than that for the OSR architecture, this condition strongly depends on $n_{RW}$, as shown in Fig. 7(c). To clarify these situations, $E_{cyc}$ is plotted as a function of $t_{SD}$ (Figs. 8(a) and (b)). The intersection of the $E_{cyc}$ curves of the NVPG (NOF) and OSR architectures represents the BET of the NVPG (NOF) architecture. The NVPG architecture has a sufficiently short BET (~ several $10\ \mu s$). On the other hand, $E_{cyc}$ for the NOF architecture requires much longer BET, i.e., the energy efficiency of the NOF architecture is low compared to the NVPG architecture. Note that although various types of NV-SRAM cells using MTJs have been proposed, it would be difficult for the NOF architecture to derive advantages of static energy reduction owing to the store operation to the MTJs during the normal operation mode.

Topmost curves in Fig. 9(a) shows BET as a function of $N$ for the NVPG architecture, in which $n_{RW}$ is varied from 10 to 1000. BET depends on $N$ and $n_{RW}$, and short or moderate BET values can be achieved for array sizes ($N \times M$) of ~10kB. When $n_{RW}$ or $N$ is large, the duration of the normal SRAM operation is prolonged. In the case of a long duration of the normal

![Figure 7](image.png)

![Figure 8](image.png)
SRAM operation mode, BET is determined by the leakage current of the cell and is proportional to the duration [5,6]. Therefore, BET increases with increasing \( N \) or \( n_{RW} \).

There frequently exists the situation that data already-stored in the MTJs of the NV-SRAM cells before the store operation are required after the next wake-up. In this situation, the store operation can be skipped for the shutdown mode, and thus the store energy can be dramatically reduced. This “store-free shutdown” operation [8] can dramatically reduce BET to several \( \mu \)s as shown by middle and bottom curves in Fig. 9(a). Note that the store-free shutdown is also effective for the NOF architecture. However, the NOF architecture is difficult to reduce the BET to the same level as that of the NVPG architecture. It is worthy to note that in the above described analyses the relatively low read/write speed for the normal operation mode and high \( J_L \) of the MTJs are used (see Table I). When the read/write speed and \( J_L \) are set to 1 GHz and \( 1 \times 10^6 \) A/cm², respectively, a much shorter BET and a larger domain operation mode and high analyses the relatively low read/write speed for the normal architecture. It is worthy to note that in the above described architecture. However, the NOF architecture is difficult to store-free shutdown is also effective for the NOF several shutdown” operation [8] can dramatically reduce BET to

Therefore, BET increases with increasing \( N \) or \( n_{RW} \).

V. SUMMARY

Two types of PG architectures (NVPG and NOF) employing nonvolatile state/data retention for the NV-SRAM cell using PS-FinFETs were systematically analyzed using HSPICE with an accurate MTJ macromodel. The comparative study of energy performance and BET revealed that the NVPG architecture was able to achieve effective reduction of the static energy dissipation without degradation of the circuit performance. This is due to the electrical separation of the normal operation mode and the nonvolatile retention mode. This important feature originates from the NV-SRAM cell structure using PS-FinFETs.

REFERENCES


