# Low-Jitter Active Deskewing Through Injection-Locked Resonant Clocking

# Zheng Xu and K. L. Shepard

Columbia University, Department of Electrical Engineering, New York, New York, USA Email:zheng@cisl.columbia.edu, shepard@ee.columbia.edu

# **Abstract**

Active deskewing is an important technique for managing variability in clock distributions but introduces latency and power-supply-noise sensitivity to the resulting networks. In this paper, we demonstrate how active deskewing can be achieved with resonant distributions without introducing significant jitter. The prototype network operates at a nominal 2-GHz frequency in a 0.18 $\mu$ m CMOS technology with more than 25 pF/mm² of clock loading.

## 1. Introduction

Global clock distributions for large-scale digital chips have traditionally been distributed with balanced trees or tree-driven grids. Rendering these networks low-skew and low-jitter in the presence of process, voltage, and temperature (PVT) variations has become an increasingly difficult task. Static mismatches can lead to significant clock skew between different regions of the clock network. This skew is mollified by the presence of a clock grid, but rigid, dense grids introduce significant clock loading and consume wiring resources, favoring skew compensation through the driving tree. Active deskewing techniques[1] have been effectively applied to trees to improve immunity to variability but add latency to the clock distributions, rendering traditional clock networks more susceptible to power-supply-induced jitter.

Resonant clock distributions[2, 3] have demonstrated dramatically reduced power-supply-noise sensitivity over traditional clock distributions while consuming significantly less power. In this work, we demonstrate that resonant clock networks can incorporate active deskewing without introducing significant power-supply-noise sensitivity. We prototype a large-area resonant global clock distribution that incorporates robust active deskewing. Automatic amplitude control assures full-rail operation with minimal energy consumption. Both control loops benefit from nearly all-digital implementations.

This paper is organized as follows. In Section 2, we describe the general design issues associated with active deskewing of resonant distributions with an injection-lock source, including residual skew and power-supply-noise sensitivity. Section 3 describes the details of the prototype system. Measurement results are presented in Section 4. Section 5 concludes.

#### 2. Resonant clock networks with active deskewing

Fig. 1 shows the general configuration of an injection-locked resonant clock distribution. A distributed differential oscillator (DDO) resonant network is assumed[3]. This consists of a differential clock grid distributing two phases ( $\phi$  and  $\overline{\phi}$ ), rendered resonant with symmetric inductors connect-

This work was supported by the MARCO/DARPA C2S2 Focus Center (www.c2s2.org) and by the SRC-GRC.



Fig. 1. Differential resonant clock distribution system with active deskewing.

ing the two phases and distributed throughout the grid. Gain elements, to compensate for losses and sustain oscillation, are also distributed through the grid. Injection locking is achieved with a differential injection-locked *tree* incorporating digitally-controlled delay lines (DCDL) which entrains this resonant grid. The strength of this injection-lock source is also digitally programmable at the final buffer stage, with an injection strength given by  $S_c = I_{inj}/(I_{inj} + I_{ind})$ .  $I_{inj}$  is the peak current supplied to the clock network by these final buffer stages and  $I_{ind}$  is the peak current flowing through the spiral inductors in that region of the grid.

Due to capacitive loading variation or gain-element variability, the natural resonant frequency of one sector of the clock grid may be significantly different from other areas of the chip; skew in the clock network will result. The DCDLs introduced into the injection lock tree form part of the deskewing loop that also includes phase detectors that compare phases between adjacent regions of the grid (shown in Fig 1). One of the clock regions is used as a reference region. The phase of each non-reference region is compared with the phase of the neighbor closest to the reference. The variable delay on the injection source to each non-reference region is adjusted to be closer to the neighbor against which it is being compared until alignment is achieved.

In the case of only two regions a (discrete) time-domain analysis finds that the phase of the non-reference region  $(\phi/(k+1))$  at time step k+1, where  $(\phi_{ref})$  is the phase of the reference region and is given by:

$$\phi[k+1] = \phi[k] + \alpha \cdot \operatorname{sgn}\left(\phi_{ref} - \phi[k]\right) \tag{1}$$

 $\alpha$  is the phase adjustment that results from a single-step correction of the DCDL. The resulting system is first-order, similar to a delay-locked loop. Since clock regions share a common grid, which also influences the phase  $\phi$ , in order to achieve a phase shift of  $\alpha$ , a phase shift of  $K\alpha$  (K > 1) is needed in the injection clock. As a result, deskewing causes the phase difference across the final (inverting) injection buffer to become less than  $\pi$ , reducing the effective injection strength and resulting in static power

dissipation in this driver. Large K values and large skews in the injection-lock tree should consequently be avoided in the design of the tree, grid, and DCDLs. Because of the quantization of the phase correction, dithering will occur in lock, limited to  $\pm \alpha T_i/2\pi$ , where  $T_i$  is the period of the injected clock reference.

Residual skew due to dithering. Consider the general m-by-n injection-locked clock grid shown in Fig. 1. Because of this dithering, residual skew will exist in the network in lock. For this distributed system, this residual skew increases with distance from the reference region. Let (I,I) in Fig. 1 be the reference region. The probability distribution function (PDF) for the residual phase difference between (I,I) and (i,j) ( $\Delta \phi = \phi_{(i,j)} - \phi_{ref}$ ) can be approximated as the uniform sum distribution for i+j-2 random numbers, representing the path length between the two regions. Each of these random numbers ranges from  $-\alpha$  to  $\alpha$ , uniformly distributed with zero mean.  $\Delta \phi$  is bounded by  $-\alpha(i+j-2) < \Delta \phi < \alpha(i+j-2)$  while the probability distribution function is given by[4]:

$$P(\Delta\phi) = \frac{1}{2\alpha(i+j-3)!} \sum_{k=0}^{i+j-2} (-1)^k \binom{i+j-2}{k} \left\{ \left( \frac{\Delta\phi}{2\alpha} \frac{i+j-2}{2} - k \right)^{i+j-3} \times \text{sgn} \left( \frac{\Delta\phi}{2\alpha} \frac{i+j-2}{2} - k \right) \right\}$$
(2)

Residual phase error can be estimated by calculating the expectation value

$$E\{|\Delta\phi|\} = 2\int_{0}^{i+j-2} uP(u)du (3)$$

from this PDF.

A more accurate estimation comes from a detailed time-domain simulation of the injection-locked grid beginning from randomly determined initial conditions. In Fig. 2, the simulated maximum residual phase error for a m-by-n grid ( $\Delta \phi = \phi(m,n)$ - $\phi_{ref}$ ) is plotted as a function of m+n-2, a parameterization of system size. The approximation represented by the expectation value



Fig, 2. Residual phase error magnitude predicted from detailed simulation and  $E\{|\Delta\varphi|\}$  using the PDF of Eqn. 2.

of Eqn. 3 is also plotted for comparison.

Power-supply noise sensitivity. Active deskewing as employed in traditional clock trees [1] or tree-driven grids has the potential to introduce significant jitter into the clock network. In the case of active deskewing from an injection-lock source, however, any jitter that does exist in the injection source is attenuated by the low-pass transfer function of entrainment. The z-domain



Fig. 3. z-domain model for injection-locking resonant clock grid. model for this is shown in Fig. 3, leading to a jitter transfer function given by [5]:

$$\frac{\sigma_{res}}{\sigma_{inj}} = \frac{S_c}{z - (1 - S_c)} \quad (4)$$

where  $\sigma_{res}$  is rms jitter of the resonant distribution and  $\sigma_{inj}$  is the rms jitter of the injected clock reference. The system has a single pole at  $p_0 \cong ln(1-S_c)/T_i$ .

### 3. Test chip design

To test the efficacy of this deskewing approach and the resulting power-supply sensitivity of the clock network, we designed, fabricated, and tested the simple resonant clock grid shown in Fig. 4. The die photo of the 3-mm-by-3-mm chip as fabricated in a 0.18 μm CMOS technology is shown in Fig. 4(b). Fig. 4(a) shows the schematic of the test chip. In order to emulate the effects of skew across a larger distribution, the clock grid is divided into four regions (labeled A through D), the coupling between which is made deliberately weak by narrowing the grid wires between these sectors by a factor of six. The capacitance of each of these four regions can be varied from 23 pF/mm² to 28 pF/mm² through switchable MOS capacitors (allowing for both digital tuning of the natural resonance of the network as well as the introduction of static capacitive mismatch); 2 pF/mm² of this



Fig. 4. System diagram and die photo of the test chip. Different clock regions are highlighted, along with the injection tree.

capacitance is wiring capacitance. The injection strength,  $S_c$  can be varied from 0.062 to 0.145 on the test chip.

Deskewing control logic and phase detectors. The control logic in each clock region is independent, deriving its own sampling clock by dividing the local resonant clock as shown in Fig 5. Binary phase detectors are used in the deskewing control loop. To avoid wire delay mismatches in sampling the clock waveforms, phase detectors are physically placed at midpoints between the clocks regions being compared. The phase detector is implemented using a SR detector latch, metastability filter, and sampling latches. Since the detector SR latch resets when both

**2-2-2** 10



Fig. 5. System diagram for deskewing control with phase detectors.

input clocks go low, another SR latch is used to hold the result of the comparison. The output of the second SR latch is then synchronized using a D-type flop-flop.

DCDLs, control counter, and injection buffers. The DCDLs are implemented as chains of CMOS inverters, each with a variable load controlled by a counter as shown in Fig. 6. The DCDLs have a programmable delay range of between 800ps and 1.1ns on steps of 4.5ps. The counter combines both binary and thermome-



Fig. 6. Implementation of the control counter, DCDL and injection buffers.

ter codes to achieve both high linearity and high dynamic range. The relatively poor power-supply noise rejection of this implementation is mitigated by the filtering properties of injection locking. The injection signal is rendered differential before it reaches the injection buffers. The injection buffers on the test chip are composed of a set of tristate buffers with programmable, three-bit-binary-coded injection strength.

Automatic amplitude control. Depending on the Q of the resonator and the strength of the negative resistance elements, when operating full-rail (voltage-limited), it is possible to supply



Fig. 7. System diagram for automatic amplitude control.

more power to the resonator than what is needed to sustain oscillation, resulting in wasted energy and enhanced power-supplynoise sensitivity. Fig. 7 shows the block diagram of the automatic amplitude control (AAC) system designed to achieve optimal biasing of the gain elements, consisting of a peak detector,
clocked comparator, counter, and control logic. Reduced swing
differential clocks can be programmed with this AAC loop. After
achieving the desired amplitude, the system goes into a standby
mode in which it consumes negligible power.

On-chip skew and jitter measurement circuits. A highprecision jitter and skew measurement system is included on chip



Fig. 8. System diagram for on-chip jitter and skew measurement circuits.

to characterize the clock network, as shown in Fig. 8. Period jitter is measured using a circuit consisting of two delay lines nominally differing in delay by a clock period and a differential sense-amplifier flip-flop[6]. The delay lines are constructed from self-biased differential delay elements, reducing their power supply noise sensitivity[7]. The number of counts from the latch indicates what fraction of the data signal distribution arrives before the clock; the derivative of this resulting CDF yields a jitter histogram. The same circuit is modified to support skew measurement by conducting two separate jitter measurements against a fixed reference clock of the same frequency. The temporal distance between the two resulting jitter histograms is a measure of the skew between the two clock waveform. On the test chip, the measurement circuitry is placed in the middle of the die to provide equidistance to all four clock regions for skew measurement.



Fig. 9. Dynamics of deskewing, starting with a static offset, with a second offset introduced at 300ns, measured using on-chip measurement circuits.

**2-2-3** 11

#### 4. Measurement results

The clock skew and jitter are measured on-chip (using the measurement circuits described above). Jitter is also measured off-chip (with the resonant clock buffered though an open-drain driver), allowing a comparison of the results.

Fig. 9 shows the measured dynamics of skew correction; time scales of the sample points are set to match the time scale of the on-chip sample clock (500 MHz). By providing region D with an initial capacitance 2 pF/mm<sup>2</sup> over that of regions A, B, and C, region D has an skew of 22 ps when driven from a balanced injection lock source with an injection coupling strength of  $S_c = 0.08$ . The skew correction loop corrects this skew by t=300 ns, at which point another capacitance offset of 1 pF/mm<sup>2</sup> is introduced, which is further corrected. For  $S_c = 0.08$ , approximately 75 ps of delay in the injection lock source is required to compensate for each 1 pF/mm<sup>2</sup> offset. Each correction step in the injection clock of 4.5 ps results in a change of 0.70 ps in the resonant clock (K=6.4). A residual skew of approximately 1.5 ps is observed between regions A and D in lock, which is comparable to the residual skew bound of 1.4ps predicted in Section 2.



Fig. 10. Measured jitter transfer function showing the low-pass filtering of jitter from the injection-lock source.

The low-pass response of injection locking (see Eqn. 4) is verified by the on-chip measurement of the jitter transfer function shown in Fig 10. Jitter is artificially introduced into the injection clock and is plotted for different injection strengths. In general, higher values of S<sub>c</sub> show slightly higher jitter levels.

Injection locking from a low-jitter source also helps to reduce jitter caused by power-supply noise in the gain elements in the



Fig. 11. Period rms jitter as function of power-supply noise frequency (MHz).

resonant distribution itself[8]. Fig. 11 shows the rms period jitter measured on-chip for  $S_c$ =0.094 when deskewing is enabled, when deskewing is disabled, and when the oscillator is in free-running



Fig. 12. Jitter histogram from on-chip (a) and off-chip (b) measurements. The graph shows rms period jitter at three different levels of power supply noise.

mode (injection locking disabled). Power-supply noise is introduced through a variable-strength MOS shorting switch triggered with a full-rail square-wave input with the different frequencies shown in Fig. 11; the resulting noise amplitude is measured to be approximately 300mV through active pico-probing. Fig. 12 shows clock jitter histograms for different power-supply noise amplitudes at a frequency of 50 MHz comparing on-chip and offchip measurement. The jitter numbers in Figs. 11 and 12 are slightly different for comparable frequencies, due to differences in amplitude, loading and injection strength. In all cases, the activation of the deskewing control loop introduces negligible jitter degradation.

When operating at 2 GHz, the prototype clock network (driving a total clock capacitance of 92 pF) consumes an average power of 500mW (5.4 mW/pF) -290mW from the gain elements, 70 mW from the last-stage injection-lock buffers (at S<sub>c</sub>=0.08), 70 mW in the rest of the injection-lock tree (including the DCDLs), and 70 mW in the remainder of the AAC and deskewing circuitry. For a conventional tree-driven grid, more than 1W (CV<sup>2</sup>f power of the clock load and clock tree and crowbar currents) would be required for the same load in addition to any power required for active deskewing.

## 5. Conclusions

We have demonstrated how active deskewing can be applied to a resonant clock distribution without the power-supply-noise jitter degradation associated with these techniques in traditional clock distributions. Automatic amplitude control ensures energyoptimal operation. Aggressive on-chip jitter and skew measurement circuits are employed for characterization.

# References

[1]Mahoney, P, et al, ISSCC, pp 292-293, 2005. [2] Chan, S.C., Shepard, K.L., Restle, P.J, JSSC 40, pp 102-109, 2005.

[3]Chan, S.C., Shepard, K.L., Restle, P.J, ISSCC, pp 518-519, 2005.

[4] Weisstein, Eric. W. "Uniform Sum Distribution.", MathWorld--"http://mathworld.wolfram.com/UniformSumDistribution.html."

[5]Lee, E., Dally, W., JSSC 38, pp 614-620, 2003.

[6] Jenkins, K.A, Jose, A, Heidel, D. F., ESSCIRC, pp 157-160, 2005.

[7] Maneatis, J, JSSC 31, pp. 1723-1732, November, 1996.

[8Mesgarzadeh, B. Alvandpour, A., ISCAS, vol 6, pp 5464-5468., 2005.

2-2-4 12