Characterizing Power Delivery Systems with On/Off-Chip Voltage Regulators for Many-Core Processors

Xuan Wang, Jiang Xu, Zhe Wang, Kevin J. Chen, Xiaowen Wu, Zhehui Wang
Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology
Email: {eexwang, jiang.xu, eekjchen}@ust.hk

Abstract—Design of power delivery system has great influence on the power management in many-core processor systems. Moving voltage regulators from off-chip to on-chip gains more and more interest in the power delivery system design, because it is able to provide fast voltage scaling and multiple power domains. Previous works are proposed to implement power efficient on-chip regulators. It is also important to analyze the characteristics of the entire power delivery system to explore the tradeoff between the promising properties and costs of employing on-chip regulators. In this work, we develop an analytical model to evaluate important characteristics of the power delivery system, including on-chip/off-chip voltage regulators and the passive on-chip/on-board parasitic. Compared with SPICE simulations, our model achieves a fast system-level evaluation with comparable accuracy. Based on the model, geometric programming is utilized to find the optimal power efficiency of different architectures of power delivery systems under constraints of output voltage stability and area. Experiments show that compared with the conventional architecture using off-chip regulators, the hybrid one using both on-chip and off-chip voltage regulators achieves 1.0% power efficiency improvement and 68% area reduction of voltage regulators on average. We conclude that the hybrid architecture has potential for high power efficiency and small area at heavy workload, but careful account for the overhead of on-chip regulators is needed.

I. INTRODUCTION

Due to the power dissipation limitation, many-core processor system becomes promising to improve performance and power efficiency instead of feature size scaling alone. Power delivery system is a key subsystem within it, and voltage regulators are the essential components of a power delivery system. Traditionally regulators are contained in board-level with large inductors or capacitors. However, the costs and sizes of off-chip regulators severely limit their use for multiple power domains. The large inductors and capacitors also slow down their feedback control. Hence, there is an interest in developing fully integrated on-chip voltage regulators [1].

In recent years, there is a surge of interest to implement on-chip integrated voltage regulators [1] [2]. An on-chip regulator can allow the filter components to be integrated entirely on chip or on package, due to its high switching frequency. It is enable to provide fast voltage scaling and multiple on-chip power domains. However, the potential benefits are tempered by the lower power efficiency from high switching frequency and low-quality inductor, and the increased susceptibility to the load transient inductor. The design of voltage regulators is difficult due to the wide design space. Modeling and optimization provides an effective method to find the optimal design variables and investigate characteristics of voltage regulators [3] [4]. Previous works focus on the implementation of on-chip voltage regulators. It is also important to characterize the entire power delivery system, including on/off-chip regulators and the passive on-chip/board parasitics, and to explore the tradeoff between the promising characteristics and costs of using on-chip regulators in many-core processor systems.

In this paper, an analytical model of the power delivery system is proposed to investigate important characteristics, e.g. power efficiency and load transient response. It achieves a fast evaluation with comparable accuracy, compared with SPICE simulations. Based on our model, the characteristics of different architectures of power delivery systems are optimized and compared under design constraints. The hybrid architecture shows the potential of high power efficiency and small area, but careful account for the cost of on-chip regulators is needed.

II. MODELING OF POWER DELIVERY SYSTEM

With the development of on-chip voltage regulators, engineers have more choices to build up the customized power delivery systems for many-core processors. On-chip regulators are recommended for power delivery of many-core processors,
because they can provide multiple power domains and fast voltage scaling. The conventional design using only off-chip voltage regulators directly steps the power supply voltage down to the core voltage. While a multi-stage power delivery system using both off-chip and on-chip regulators shown in Figure 1 becomes promising. Off-chip regulators perform the initial step-down to an intermediate voltage. The intermediate power supply then drives on-chip regulators to further step down to the core voltage. The analysis and comparison of different power delivery systems will be discussed in Section IV.

Tight steady state and dynamic tolerance requirements set a big challenge for powering high performance processors. Interleaved multi-phase buck converter becomes popular to supply high-current processors [5]. Each phase of the regulator is implemented with fixed switching frequency and pulse width modulation, which eliminates undesirable noise in certain frequency band. The type-III feedback control is adopted. Similar phases of the regulator will be operated in parallel with a common output capacitance shown in Fig. 2. By applying a 360°/N phase difference between the adjacent phases, the output ripple can be canceled out while maintaining fast transient response. N is the number of the parallel phases. Because we focus on high performance many-core processors, continuous mode operation of regulators is assumed.

Power efficiency is one of the most important features of regulators, which directly influences the power efficiency of the entire power delivery system. Generally speaking, there are some important kinds of power losses, e.g. switching loss and resistive loss of the power bridge, switching loss of the bridge driver circuit, conductive loss of the inductor, power of the control circuit and static power [6]. The estimation of those power losses in one phase is presented as follows:

\[
P_{\text{driver}} = (C_L + C_G)V_{\text{driver}}^2 f_{\text{sw}}
\]

(1)

\[
P_{\text{bridge}} = (D R_{ds,p} + (1 - D)R_{ds,n}) (I_{\text{ind}}^2 + \frac{\Delta I_{\text{ind}}^2}{12})
\]

(2)

\[
P_{\text{ind}} = R_{\text{ind}} (I_{\text{ind}}^2 + \frac{\Delta I_{\text{ind}}^2}{12})
\]

(3)

\[
P_{\text{control}} = I_{\text{per}} V_{\text{driver}}
\]

(4)

where \( C_L \) and \( C_G \) is the parasitic load capacitance of the bridge and drivers, and \( f_{\text{sw}} \) is the switching frequency. \( V_{\text{driver}} \) is the supply voltage of the drivers and control logic. \( R_{ds,p} \) and \( R_{ds,n} \) are the on-resistance of the NMOS and PMOS transistor of the bridge, and \( R_{\text{ind}} \) is the resistance of the inductor. \( D \) is the duty ratio of the gate signal. \( I_{\text{per}} \) stands for the load current of the control circuit per phase. \( I_{\text{ind}} \) and \( \Delta I_{\text{ind}} \) are the average and peak-to-peak values of the inductor current. The detailed device models are discussed in Section III for efficient design space exploration. The other variables can be derived according to principles of buck converters.

\[
D = \frac{V_{\text{out}}}{V_{\text{in}}} R_{\text{ind}} + DR_{ds,p} + (1 - D)R_{ds,n} + R_{\text{load}}
\]

(5)

\[
\Delta I_{\text{ind}} = \frac{(V_{\text{in}} - V_{\text{out}})D}{f_{\text{sw}} I_{\text{ind}}}
\]

(6)

The direct path loss is neglected, because dead-time control for gating signals usually makes it negligible [7]. Besides the power efficiency, there are other important characteristics, e.g. the area and output voltage stability. Output ripple is one criterion of the output voltage stability. Assuming that all of the inductor currents flow through the filter capacitor \( C_{\text{out}} \), the output ripple decreases due to the interleaving technique [8].

\[
\Delta V_{\text{out,ripple}} = \frac{\Delta I_{\text{ind}}}{8 f_{\text{sw}} C_{\text{out}}} \frac{0.25}{D(1 - D)} \frac{1}{N^2}
\]

(7)

Load transient response is another important criterion to maintain the stability. If a regulator is integrated on-chip, the output voltage drops much more in response to the load current step [1]. This is because the on-chip capacitor is much smaller than off-chip regulators. The voltage drop is derived in Eq. 8 [9]. The worst case voltage drop tends to approach the open-loop value, when the response of the feedback control is sluggish. \( \alpha \) is an user-specified empirical factor to bring the open-loop estimation into agreement with the actual voltage drop. It is about 0.6 using the voltage mode feedback control at switching frequencies of hundreds of MHz [2].

\[
\Delta V_{\text{out, tr, on}} = \alpha \cdot \Delta V_{\text{out, tr}} \cdot \sqrt{\frac{I_{\text{ind}}}{C_{\text{out}}}}
\]

(8)

In order to characterize the entire power delivery system, we also pay attention to the parasitics of the power delivery network. A ladder RLC network is utilized to capture the parasitics, which consists of PC board, socket, package, and off-chip decoupling capacitors [10] [11]. The parasitics of the power delivery network will be linearly scaled to be consistent with the power consumption of processors.

III. DESIGN OPTIMIZATION OF POWER DELIVERY SYSTEMS AND MODEL VALIDATION

The analytical model of the power delivery system is derived including the voltage regulators and the parasitics of the power delivery network. Because of the wide design space, a method based on geometric programming (GP) is adopted to find the optimal regulator design [3]. Without losing generality, the optimization of the hybrid architecture using on/off-chip regulators shown in Fig. 1 is illustrated.
Algorithm 1 Optimization of the hybrid architecture

**Require:** workload of power domains, design specs, parameters of device models, intermediate voltage levels

1: minimize $P_{tr}\over V_{tr}$
2: subject to
3: $W_{min, on} \leq W_{P MOS/n MOS, on, ij} \leq W_{max, on, ij}$
4: $f_{sw, min} \leq f_{sw, on, ij} \leq f_{sw, max, ij}$
5: $I_{ind, on, min} \leq I_{ind, on, ij} \leq I_{ind, max, ij}$
6: $V_{on, i,j} \leq V_{load, on, ij} + R_{ind, on, ij} + R_{pin, on, ij} + R_{ds, on, ij}$

7: $D_{on, ij} - V_{out, i,j} - D_{on, ij} = \Delta I_{ind, on, ij, ij}$
8: $(C_{L, on, ij} + C_{G, on, ij})V_{sw, on, ij} + (R_{ds, on, ij} + R_{ind, on, ij} + R_{pin, on, ij})(I_{f, out, ij} + I_{t, r, max}) + P_{stat, on, ij} + V_{b, ij} \cdot I_{per, on, ij} \cdot N_{on, ij} + Z_{grid, ij} \cdot I_{out, ij} + P_{out, ij} \leq V_{b, ij} \cdot I_{on, ij} \cdot \frac{1}{V_{on, ij}}$

9: $V_{diff, on, ij} \leq \frac{V_{out, i,j} - V_{out, i,j}}{1 - \frac{1}{N_{on, ij}}}$

10: $\alpha_{on} \cdot \Delta I_{out, r, on, ij} \cdot \sqrt{\frac{V_{on, i,j}}{V_{out, i,j}}} \leq V_{tr, max, ij}$

11: $A_{bridge, on} \cdot \sum_j (W_{P MOS/n MOS, on, ij} + W_{N MOS, on, ij}) + A_{per, on} \cdot \sum_j \delta N_{on, ij} + A_{ind, on} \cdot \sum_j L_{ind, on, ij} N_{on, ij}^2 + A_{cap, on} \cdot \sum_j C_{out, on, ij} \leq A_{on, max}$

12: $\sum_i M_{ij} I_{on, ij} \leq I_{off, ij, ij}$

13: $W_{min, off} \leq W_{P MOS/n MOS, off, ij} \leq W_{max, off, ij}$

14: $f_{sw, off} \leq f_{sw, max, ij}$

15: $L_{ind, off, min} \leq L_{ind, off, ij} \leq L_{ind, off, max}$

16: $V_{out, off, i,j} - V_{load, off, i,j} + R_{ind, off, ij} + R_{ds, off, ij} \leq D_{off, ij}$

17: $V_{diff, off, ij} \cdot L_{ind, off, ij} = \Delta I_{ind, off, ij, ij}$

18: $(C_{L, off, i,j} + C_{G, off, i,j})V_{driver, off, i,j} + (R_{ds, off, i,j} + R_{ind, off, ij} + R_{pin, off, ij})(I_{f, out, i,j} + \frac{1}{2} \Delta I_{ind, off, ij}) + P_{stat, off, ij} + V_{driver, off, ij} \cdot I_{per, off, ij} \cdot N_{off, ij} + Z_{package} + Z_{PCB} \cdot I_{f, out, i,j} + V_{b, ij} \cdot I_{off, ij} \leq V_{r, i,j} \cdot I_{in, ij}$

19: $V_{diff, off, ij} \cdot C_{out, off, ij} + D_{off, ij} \cdot (V_{in} - V_{out, i,j}) N_{off, ij}$

20: $\alpha_{off} \cdot \Delta I_{out, r, off, i,j} \cdot \sqrt{\frac{V_{on, i,j}}{V_{out, i,j}}} \leq V_{tr, max, ij}$

21: $A_{bridge, off} \cdot \sum_j (W_{P MOS, off, ij} + W_{N MOS, off, ij}) + A_{per, off} \cdot \sum_j \delta N_{off, ij} + A_{ind, off} \cdot \sum_j L_{ind, off, ij} N_{off, ij}^2 + A_{cap, off} \cdot \sum_j C_{out, off, ij} \leq A_{off, max}$

22: $\sum_j \delta I_{in, ij} \cdot V_{in} \leq P_{in}$

A. design optimization of power delivery system

The optimization is conducted to optimize the design parameters and conversion ratios of on/off-chip voltage regulators, provided the supply voltage of the power delivery system $V_{in}$, the driver voltage $V_{driver}$, the load distribution of on-chip power domains $V_{out, i,j}$ and $I_{out, i,j}$, the design specs and the parameters of device models. The assignment of on/off-chip regulators is based on the principle of load balance. The design specs includes the maximum output ripples $V_{ripple, on, max}$ and $V_{ripple, off, max}$, the maximum transient voltage drop $V_{tr, on, max}$ and $V_{tr, max}$, the area constrain $A_{on, max}$ and $A_{off, max}$, and the boundaries of design parameters. In order to apply GP, the device models have to be compatible. Our model leverages the transistor models in [3] and inductor model in [6]. All parameters are found for each process technology by numerical fitting to SPICE simulation data. The drivers for the power bridge is designed to be fan-out of 4.

The optimization of power delivery systems selects not only the design parameters of regulators, but also the conversion ratio in a multi-stage power delivery system. The intermediate voltage level, $V_b$, tradeoff the power efficiency between the stages, but it is incompatible with GP. Given a combination of $V_b$, the power efficiency of the power delivery system will be optimized using GP shown in Alg. 1. The formulation for optimizing on-chip regulators are described in Lines 3-11 referring to Section II. The area is constrained in Line 11, where $A_{bridge, on, ij}$, $A_{per, on, ij}$, $A_{ind, on, ij}$ and $A_{cap, on}$ are the unit area of power bridge and drivers, control logic, inductor and capacitor. The formulation for optimizing off-chip regulators is similar in Lines 12-21. The optimal power efficiency under different $V_b$ will be derived to find the highest power efficiency of the system and the corresponding conversion ratios.

B. analytical model validation

The analytical model will be compared with SPICE simulations to evaluate the accuracy of the optimized results. The parameters of the transistor, inductor and capacitor model are discussed in Section IV. According to the specs in Table I, we optimize the design parameters of a two-phase interleaved buck converters using 130nm technology [12]. The SPICE model is built up with the optimized design parameters. Table I shows a comparison between SPICE simulation and our model in terms of the power efficiency, output ripple and transient voltage drop due to load step of 50% load current. The optimized results based on the analytical model are well matched with SPICE simulations within 5% difference.

IV. QUANTITATIVE ANALYSIS AND COMPARISON OF DIFFERENT POWER DELIVERY SYSTEMS

The analytical model provides a fast and accurate evaluation of the characteristics of power delivery systems. The evaluation and comparison of different architectures will be conducted to explore the tradeoff of using on-chip regulators.

The transistor models of power bridge of off-chip regulators are estimated according to [13], and that of control logic and drivers uses 1.5µm process [12]. The transistors of on-chip regulators are implemented using 130nm process [12]. The parameters of the inductor and capacitor of on-chip regulators

<table>
<thead>
<tr>
<th>Vout/Vin (V/N)</th>
<th>Rload (Ω)</th>
<th>SPICE Efficiency</th>
<th>Model Efficiency</th>
<th>SPICE $V_{ripple}$ (mV)</th>
<th>Model $V_{ripple}$ (mV)</th>
<th>SPICE $V_{IR}$ (mV)</th>
<th>Model $V_{IR}$ (mV)</th>
</tr>
</thead>
<tbody>
<tr>
<td>1/1.8</td>
<td>0.25</td>
<td>86.4%</td>
<td>85.2%</td>
<td>9.0</td>
<td>8.9</td>
<td>104</td>
<td>100</td>
</tr>
<tr>
<td>1/1.8</td>
<td>0.125</td>
<td>86.4%</td>
<td>87.3%</td>
<td>8.3</td>
<td>7.9</td>
<td>100</td>
<td>99</td>
</tr>
<tr>
<td>1/2.2</td>
<td>0.25</td>
<td>84.1%</td>
<td>84.7%</td>
<td>1.2</td>
<td>1.3</td>
<td>97</td>
<td>100</td>
</tr>
<tr>
<td>1/2.2</td>
<td>0.125</td>
<td>84.5%</td>
<td>85.1%</td>
<td>0.6</td>
<td>0.7</td>
<td>97</td>
<td>100</td>
</tr>
</tbody>
</table>
are derived based on [2]. The bulky inductor and capacitor of off-chip regulators is estimated from commercial products. The overhead of the control logic is estimated [2] [3].

The core voltages and currents of high performance processors are approaching 1V and 130A [5]. As a case study, the power delivery system is designed to support a 64-core homogeneous processor with 128W average power consumption, stepping down from 12V to 1V. The voltage of the drivers of off-chip regulators is 5V. The parasitic resistance of on-chip power grid is 0.1mΩ for the whole processor, and that of the package and PCB trace are 1.3mΩ and 0.1mΩ [10]. The maximum output ripple is 10% of its output voltage. The maximum transient voltage drop is also 10%, due to a load step of 50% load current [2]. The maximum area of on-chip and off-chip regulators is 110mm² and 2700mm² [14].

The design parameters of different architectures are optimized at the workload of 128W to maximize the power efficiency under the constraints of output ripple, transient voltage drop and area. Different configurations of the same architecture will also be evaluated, e.g. the hybrid architecture with 1 off-chip regulator and 8 on-chip regulators per off-chip regulator, labeled as hybrid 1*8 – domain. The power efficiency curves of the optimized power delivery systems are shown in Fig. 3. The hybrid architecture achieves a flat power efficiency curve within a large range of workload.

The conduction loss from the package and PCB plays an important role. It increases quadratically with the increase of workload, and can not be alleviated by off-chip regulators. While, the hybrid architecture decreases the loss by reducing the current flowing off-chip, and improves the power efficiency by decreasing the conversion ratio of on-chip regulators. Compared with conventional architecture, the hybrid one achieves 1.0% power efficiency improvement and 68% area reduction of voltage regulators on average at 128W workload. The power efficiency of different configurations of the same architecture is similar, because it supports a homogeneous processor.

The hybrid architecture also alleviates the design requirements of off-chip regulators by decreasing its load [14]. Fig. 4 shows the minimum area of regulators in different architectures, with different power efficiency constraints at 128W workload. Compared with the conventional architecture, the area of regulators in hybrid architecture decreases from 82% to 85% with tighter requirement of power efficiency.

V. CONCLUSION

In this paper, an analytical model is proposed to fast evaluate important characteristics of power delivery system with comparable accuracy. Based on our model, geometric programming is used to optimize the power efficiency of different architectures, and to explore the tradeoff between the promising characteristics and costs of utilizing on-chip regulators in many-core processor systems. It is concluded that the hybrid architecture using both on/off-chip regulators has potential for high power efficiency and small area, but careful account of the overhead of on-chip regulators is needed.

REFERENCES