# Practical Considerations in RLCK Crosstalk Analysis for Digital Integrated Circuits \*

Steven C. Chan
Cadence Design Systems, Inc
San Jose, CA 95134
schan@cadence.com

K. L. Shepard
Department of Electrical Engineering
Columbia University, New York, NY 10027
shepard@ee.columbia.edu

#### Abstract

Inductance and inductive crosstalk has become an important new concern for on-chip wires in deep-submicron integrated circuits. Recent advances in extractors to include inductance make possible the extraction of coupled RLCK interconnect networks from large, complex on-chip layouts. In this paper, we describe the techniques we use in a commercial static noise analysis tool to analyze crosstalk noise due to fully-coupled RLCK networks extracted from layout. Notable are the approaches we use to filter and lump aggressor couplings, as well as the techniques used to handle degeneracies in the modified nodel analysis (MNA) formulation. Furthermore, the nonmonotonicity of interconnect responses in the presence of inductance require additional "sensitizations" in searching the possible switching events inducing the worst-case noise. Comparisons with silicon indicate the need to include the substrate in the extracted models in certain cases.

#### 1 Introduction

With technology scaling, chips consist of more interconnect wires of smaller cross sections packed closer together. As a result, capacitive coupling is a significant source of on-chip noise. Crosstalk noise can result in functional failures if glitches are captured in latches and disrupt the state of digital circuits. Furthermore, aggressor coupling that is coincident with switching transitions can produce a "noise-on-delay" effect. These trends have resulted in significant effort to model and extract interconnect as coupled RC networks[1] for use in static timing and static noise analysis.

Most recently, however, inductance and inductive coupling [2, 3, 4, 5] have become important in the timing and noise analysis of a growing number of on-chip signal lines. Inductance must be included to accurately predict rise and fall times and delays in timing analysis. If an inductive net is overdriven, an underdamped ring response can be observed, which can result in functional failure in receiving circuits or produce reliability problems through gate oxide stress. Moreover, inductive coupling, along with capacitive coupling, can be a significant source of noise on quiet nets due to the switching of nearby perpetrators.

Static noise analysis [6, 7, 8] has become a mature, commercial technology for calculating noise and verifying the noise immunity of digital integrated circuits, but has been so far limited to consider only capacitive crosstalk [9, 10]. In general, development of full-chip electrical analysis tools for digital ICs that consider inductance have largely been stymied by the lack of a full-chip RLCK<sup>1</sup>

extraction capability for on-chip wires. Fortunately, significant progress in this regard has been made recently[11]. This has enabled the extraction of coupled RLCK interconnect networks from large, complex on-chip layouts for analysis.

In this paper, we describe the techniques we use in a commercial static noise analysis tool to analyze crosstalk noise effects due to capacitive and inductive coupling. Notable in our interconnect analysis generally are the approaches we use to filter and lump aggressor coupling and the techniques employed to handle degeneracies in the modified nodal analysis (MNA) formulation. We also consider the unique issues associated with crosstalk analysis with inductance. In particular, we consider the additional "sensitizations" that must be considered because of the nonmonotonicity of interconnect responses in the presence of inductance. In Section 2, we describe how capacitively and inductively coupling wires are sifted to assemble "net complexes" for analysis. Section 3 considers the degeneracies that may exist in the traditional statespace MNA formulation and how these are handled for robust multi-input, multi-output (MIMO) reduced-order modelling. Section 4 considers the complex potential "superpositions" of switching events which must be analyzed to determine the "worst-case" noise due to capacitive and inductive crosstalk. In Section 5, we present results on a testchip design which was fabricated in a  $0.25\mu m$ , five-level-metal process. The discrepancy between the measured and simulated crosstalk will be considered. We offer conclusions and directions for future work in Section 6.

# 2 Assembling "net complexes" for analysis

References [7, 8] introduce the idea of static noise analysis as a key technology for verifying the functionality of large digital integrated circuits in the presence of noise. Other recent implementations have also been reported[6]. The approach here is one of worst-case analysis; that is, all of the noise sources are bounded and act simultaneously at the worst allowable superposition. In its transistor-level manifestation, the approach involves decomposing the design into a collection of channelconnected components (CCCs), transistors that are connected together through their sources and drains. The maximum noise that is possible on each net is calculated as a time-domain waveshape. This worst-case noise considers all possible noise sources: leakage, charge-sharing noise, coupling though the interconnect, and power-supply noise. This is done with a careful choice of vectors on the driving CCCs, referred to as the sensitization, which produces this worst case noise. Noise can also propagate from CCCinput to CCC-output (propagated noise). Noise failures are determined by the noise stability, a type of AC noise margin analysis, of each CCC given the worst case noise appearing at its inputs. This

and  $L_2$ , normalized as a dimensionless coupling coefficient,  $K = M/\sqrt{L_1L_2}$ 

<sup>\*</sup>The work at Columbia University was supported in part by the National Science Foundation under grant CCR-97-34216

<sup>&</sup>lt;sup>1</sup>K is the SPICE notation for mutual inductance M between two inductance  $L_1$ 

involves calculating the transient sensitivity of the output noise with respect to the dc-level of the input noise. Cell characterizations are also possible [12], allowing static noise analysis to be run from a netlist of standard-cell library elements, rather than at the transistor level. In this paper, we consider only the details of the interconnect analysis. We refer the reader to Reference [8] for more details on static noise analysis generally.

It is convenient to classify noise according to the voltages' relationship to the rails, and we will use this convention in the following discussion:  $V_H$  noise reduces an node voltage below the supply level;  $V_H^{\star}$  noise increases an evaluation node voltage above the supply level;  $V_L$  noise increases an evaluation node voltage above the ground level; and  $V_L^{\star}$  noise decreases an evaluation node voltage below the ground level. The supply and ground reference levels are presumed to be set from the external reference (power supply) to the chip.

Because interconnect crosstalk is an important noise source, interconnect analysis is an important part of static noise analysis. To handle coupled RCLK interconnect networks, the network is decomposed into a set of net complexes. The primary (or victim) net of the complex is the net on which we are trying to calculate the noise; that is, the net which should be statically quiet. The complex also includes secondary(or aggressor) nets of significant coupling to the primary net. (There is redundancy in this decomposition, since a given net can be repeated many times as a secondary net in multiple net complexes.) To determine which secondary nets to include in a complex, we use very crude lumped element models to estimate the coupling noise from each aggressor.

Aggressors which are determined by this coupling estimation to be significant are included in the net complex of the victim as a "named" aggressor; that is, the full distributed RCLK representation of the net is used. Representing every aggressor as a named aggressors is not possible because of the the potential explosion in the number of ports in the model since one port is associated with each aggressor driver. Aggressors determined to be insignificant are lumped together as a "virtual attacker." The virtual attacker concept allows the user to specify the disposition of the small aggressors; they can be ignored, added, or scaled and added; this allows a "statistical" treatment of this noise rather than a worst-case analysis.

Figure 1 shows an example of a net complex with two named aggressors and a virtual attacker. The virtual attacker is represented as two sources, a saturate ramp voltage source with slew time  $t_{slew}$  applied to the coupling capacitances, and a square pulse current source applied to the mutual inductances. The magnitude of the current pulse matches the lumped-capacitance charging currents of the victims with a width of  $t_{slew}$ , the fastest possible slew time expected on any aggressor.

Once the net complex for a given victim is formed, a reducedorder model is created for circuit simulation. The drivers of the primary and secondary nets are identified as *ports*, while the receivers are designated as *taps*. Two additional ports are associated with the virtual attacker. Ports are places where one is interested in actively connecting other circuits elements (i. e., the drivers), while taps are placed where one is only interested in monitoring the voltage (i. e., no current flows into a tap). As a result, ports are characterized with an admittance matrix that gives the port currents as a function of the port voltages. Taps need only be characterized by a voltage-transfer matrix that gives the tap voltages as



Figure 1: An example net complex with victim net, two explicitly represented aggressors, and two virtual aggressors.

a function of the port voltages. The receiver circuits are modelled as grounded capacitors at the taps which are built into the reduced order model.<sup>3</sup>

We choose admittance, rather than impedance, characterization of the ports because the model can then be directly stamped into the MNA matrices for circuit simulation. Let  $u_p$  denote the voltages at the ports,  $u_t$  the voltages at the taps, and  $i_p$  the port currents. The reduced-order model is a hybrid admittance-transfer multiport model, given in the Laplace domain as:

$$\begin{bmatrix} i_p \\ u_t \end{bmatrix} = \begin{bmatrix} Y_q(s) \\ H_q(s) \end{bmatrix} u_p \tag{1}$$

If the system has p port and t taps, then  $Y_q(s) \in \mathcal{R}^{p \times p}$  is an the admittance matrix and  $H_q(s) \in \mathcal{R}^{t \times p}$  is the voltage-transfer matrix. q is the order of the reduction, which is generally a multiple of the number of ports. The complete circuit equations required to find this model can be expressed as:

$$sCx = -Gx + Bu_p$$

$$i_p = B^Tx$$

$$u_t = L^Tx$$
(2)

The state vector x is given by:

$$\begin{bmatrix} v_p \\ v_{int} \\ i_L \\ i_p \end{bmatrix}$$
 (3)

 $v_p$  are the port voltages,  $v_{int}$  are the internal node voltages,  $i_L$  are the inductor branch currents,  $\mathcal G$  and  $\mathcal C$  are the conductance and susceptance matrices,  $\mathcal B$  is the input-output matrix of the admittance, and  $\mathcal L$  is the output matrix of the transfer function. From this, one can write

$$Y(s) = B^{T}(\mathcal{G} + s\mathcal{C})^{-1}B$$

$$H(s) = L^{T}(\mathcal{G} + s\mathcal{C})^{-1}B$$
(4)

One can now apply one of several MIMO reduction techniques. A block Arnoldi[13] iteration produces a matrix  $\boldsymbol{V_q}$ 

<sup>&</sup>lt;sup>2</sup>These analysis techniques generally apply to "single-ended" logic threshold detection, although they can be extended to differential detection.

<sup>&</sup>lt;sup>3</sup>More sophisticated modelling of the receiver generally requires modelling it as a port.

which can be used to congruence transform the system matrices. The resulting PRIMA model is provably passive[14].

$$\tilde{G} = V_q^T \mathcal{G} V_q 
\tilde{C} = V_q^T \mathcal{C} V_q 
\tilde{B} = V_q^T B 
\tilde{L} = V_q^T L$$
(5)

yielding the admittance and transfer functions

$$Y_{q}(s) = \tilde{\boldsymbol{B}}^{T} (\tilde{\boldsymbol{G}} + s\tilde{\boldsymbol{C}})^{-1} \tilde{\boldsymbol{B}}$$

$$H_{q}(s) = \tilde{\boldsymbol{L}}^{T} (\tilde{\boldsymbol{G}} + s\tilde{\boldsymbol{C}})^{-1} \tilde{\boldsymbol{B}}$$
(6)

 $\tilde{G}$ ,  $\tilde{C}$ ,  $\tilde{B}$ , and  $\tilde{L}$  (though dense) are dramatically smaller than the original G, C, B, and L. Other reduced-order modelling approaches, such as MPVL[15], can also be used.

#### 3 Dealing with degeneracy in the MNA formulation

One practical problem of note in applying these reduced-order modelling techniques occurs when Y(s) is not strictly proper; that is, Y(s) does not go to zero as  $s \to \infty[16]$ . In general, because of the positive realness (p. r.) of the admittance as  $s \to \infty$ ,  $Y(s) \rightarrow Hs^{m-n}$ , where m-n=1,0,-1; that is, any zero or pole at infinity must be simple. The dimensions of C determine the order of the system, n. When C is nonsingular, the admittance will be strictly proper; that is  $Y(s) \to 0$  as  $s \to \infty$ [16]. In the general case that C is singular (a so-called degenerate system), the number of degrees of freedom of the system is actually reduced to f = rank(C) < r which is referred to as the generalized order. Now, the natural frequencies of the system are the n finite frequencies  $s = \lambda$  at which |sC + G| is zero. For a p. r. Y(s), n = for n = f - 1. In the latter case, there is a simple pole at infinity. The degeneracies represented by f < r, in general, contribute a constant feedthrough term between port voltages and port currents that is independent of state.

We find that when Y(s) is formulated as a degenerate system, multiport reduced-order models based on expansion around s=0give very pathological frequency domain behavior as  $s \to \infty$  and are often characterized by singular  $\tilde{G}$  (in the case of PRIMA) or right-half-plane poles. Figures 2-4 show Nyquist plots of  $Y_{11}(j\omega)$ for three simple RC testcases. The original curves correspond to the full unreduced system. The Arnoldi curves result from straight application of block Arnoldi, while Prima is the PRIMA reduction. In Figure 2, the admittance should go from 0.02 mhos at  $\omega = 0$  (all capacitors open) to 0.1 mhos at  $\omega \to \infty$  (all capacitors short) as shown for the original curve. Both the Prima and Arnoldi curves incorrectly head back to zero admittance at high frequency. In the case of Figure 3, the admittance should go to infinity as  $\omega \to \infty$  (ports become shorted by the coupling cap) but both the Prima and Arnoldi curves head to zero. Similar pathological behavior of the reduced-order models is shown in Figure 4 with a coupling capacitor to the port.

Various "kludges" have been used in practical implementations to avoid these problems, including the introduction of "small" resistors at the ports[17] which allows a formulation in which there is an explicit constant direct term and a strictly proper Y(s) for reduction. Instead, we "extract" these direct terms from Y(s) (through G and C), yielding a strictly proper  $\overline{Y}(s)$ [18] with the very general MNA formulation of equations 2 - 4 without the introduction of extraneous elements.

$$Y(s) = \overline{Y}(s) + sD_2 + D_1 \tag{7}$$



Figure 2: Nyquist plot of  $Y_{11}(j\omega)$  for simple testcase shown.



Figure 3: Nyquist plot of  $Y_{11}(j\omega)$  for simple testcase shown.



Figure 4: Nyquist plot of  $Y_{11}(j\omega)$  for simple testcase shown.

The term  $sD_2$  corresponds to the pole at infinity, while the term  $D_1$  represents the constant feedthrough term. We refer to  $D_2$  and  $D_1$  as the *s-proportional* and *constant* direct terms, respectively.

The procedure to extract the direct terms, though a little tedious to expose, is very robust to "pathological" circuit topologies and computationally efficient. With the state vector  $\boldsymbol{x}$  ordered as in Equation 3,  $\boldsymbol{\mathcal{G}}$  and  $\boldsymbol{\mathcal{C}}$ , the conductance and susceptance matrices, are stamped as follows:

$$C = \begin{bmatrix} C & 0_{(m+p)\times l} & 0_{(m+p)\times p} \\ 0_{l\times(m+p)} & N & 0_{l\times p} \\ 0_{p\times(m+p)} & 0_{p\times l} & 0_{p\times p} \end{bmatrix}$$
(8)

where  $C \in \mathcal{R}^{(m+p)\times(m+p)}$  is the capacitance matrix and  $N \in \mathcal{R}^{l\times l}$  is the inductance matrix.  $\mathbf{0}_{(m+p)\times p}$  denotes a  $(m+p)\times p$  matrix of zeros.

$$\mathcal{G} = \begin{bmatrix} G & E & I_p \\ E^T & \mathbf{0}_{(m+p) \times l} & \mathbf{0}_{(m+p) \times l} \\ I_p & \mathbf{0}_{p \times l} & \mathbf{0}_{p \times p} \end{bmatrix}$$
(9)

where  $G \in \mathcal{R}^{(m+p)\times(m+p)}$  is the conductance matrix and  $E \in \mathcal{R}^{(m+p)\times l}$  is the incidence matrix of branch currents.  $I_p$  is the  $p \times p$  identity matrix. Because we have chosen to order the node voltages so that the port voltages  $(v_p)$  are specified before the internal node voltages  $(v_{int})$ , the capacitance, conductance, and branch-incidence matrices can be broken into blocks:

$$C = \begin{pmatrix} C_{port} & C_{port-int} \\ C_{port-int}^T & C_{int} \end{pmatrix}$$
 (10)

$$G = \begin{pmatrix} G_{port} & G_{port-int} \\ G_{port-int}^T & G_{int} \end{pmatrix}$$
(11)

$$E = \begin{pmatrix} E_{port} & E_{port-int} \\ E_{int-port} & E_{int} \end{pmatrix}$$
 (12)

B is the input-output matrix of the admittance given by:

$$\boldsymbol{B} = \begin{pmatrix} 0_{(m+p+l)\times p} \\ I_p \end{pmatrix} \tag{13}$$

 $G_{port}, E_{port}, C_{port} \in \mathcal{R}^{p \times p}, G_{int}, E_{int}, C_{int} \in \mathcal{R}^{m \times m}.$  The off-diagonal elements  $G_{port-int}, C_{port-int}, E_{port-int} \in \mathcal{R}^{p \times m}$  and  $E_{int-port} \in \mathcal{R}^{m \times p}$ .

We recognize that since  $\overline{Y}(s)$  in Equation 7 is strictly proper,  $\lim_{s\to\infty}\overline{Y}(s)=0$ . To find  $D_1$  and  $D_2$ , we first open circuit all the inductors in the network since as  $s\to\infty$ , these will all have infinite impedance. The G and C matrices are then deflated to remove the rows and columns that correspond to internal nodes with no path to the ports. We next identify all of the remaining capacitances of the network which do not have a capacitive path to the ports. We delete these capacitances from the C matrix and add up the rows and columns of the C matrix corresponding to the nodes connected by these capacitors. Since one internal node is eliminated for each capacitor "shorted" by this process, we perform a corresponding deletion of the row and column corresponding to this node from the C matrix. Following these two deflations, an admittance Y(s), which still contains the correct direct terms, is given by:

Enry.
$$\begin{cases}
\begin{bmatrix}
G_{port} & G_{port-int}^{d} \\
(G_{port-int}^{d})^{T} & G_{int}^{d}
\end{bmatrix} \\
+s \begin{bmatrix}
C_{port} & C_{port-int}^{d} \\
(C_{port-int}^{d})^{T} & C_{int}^{d}
\end{bmatrix}
\end{bmatrix}
\begin{bmatrix}
I_{p} \\
V_{int}
\end{bmatrix}$$

$$= \begin{bmatrix}
Y(s) \\
0
\end{bmatrix}$$
(14)

Let us presume that we have deleted k rows and columns from G and G. Each column of  $V_{int} \in \mathcal{R}^{(m-k)\times p}$  corresponds to the internal node voltages when the port voltage associated with the column is one and the remaining port voltages are zero.  $G_{int}^d \in \mathcal{R}^{(m-k)\times (m-k)}$  and  $G_{port-int}^d \in \mathcal{R}^{p\times (m-k)}$ . From Equation 14, we find:

$$(G_{int}^d + sC_{int}^d)V_{int} = -\left[ (G_{port-int}^d)^T + s(C_{port-int}^d)^T \right]$$

$$(G_{port} + sC_{port}) + (G_{port-int}^d + sC_{port-int}^d)V_{int} = Y(s)$$

$$(16)$$

We reorder the internal nodes so that the zero rows and columns of  $C_{int}$  appear last. Equation 15 can then be written in the following form:

$$\begin{cases}
\begin{bmatrix}
\tilde{G}_{int,11} & \tilde{G}_{int,12} \\
\tilde{G}_{int,12}^T & \tilde{G}_{int,22}
\end{bmatrix} \\
+s \begin{bmatrix}
\tilde{C}_{int} & 0 \\
0 & 0
\end{bmatrix}
\end{cases}
\begin{bmatrix}
V_c \\
V_g
\end{bmatrix} = \\
-\begin{bmatrix}
\tilde{G}_{port-int,1}^T \\
\tilde{G}_{port-int,2}^T
\end{bmatrix} - s \begin{bmatrix}
\tilde{C}_{port-int}^T \\
0
\end{bmatrix}$$
(17)

where the reordered  $V_{int}$  is written as  $(V_c V_g)^T$ . If l rows and columns of  $C_{int}$  are zero, then  $\tilde{G}_{int,11}, \tilde{C}_{int} \in \mathcal{R}^{(m-k-l)\times(m-k-l)}, \tilde{G}_{int,12} \in \mathcal{R}^{(m-k-l)\times l}$  and  $\tilde{G}_{int,22} \in \mathcal{R}^{l\times l}$ .  $V_c \in \mathcal{R}^{(m-k-l)\times p}$  and  $V_g \in \mathcal{R}^{l\times p}$ . Solving Equation 17, we find:

$$\tilde{\boldsymbol{G}}_{int,11}\boldsymbol{V}_{c}+\tilde{\boldsymbol{G}}_{int,12}\boldsymbol{V}_{g}+s\tilde{\boldsymbol{C}}_{int}\boldsymbol{V}_{c}=-\tilde{\boldsymbol{G}}_{port-int,1}^{T}-s\tilde{\boldsymbol{C}}_{port-int}^{T}$$
(18)

$$\tilde{\boldsymbol{G}}_{int,12}^{T} \boldsymbol{V}_{c} + \tilde{\boldsymbol{G}}_{int,22} \boldsymbol{V}_{g} = -\tilde{\boldsymbol{G}}_{port-int,2}^{T}$$
 (19)

 $V_q$  is then given by:

$$V_g = -\tilde{G}_{int,22}^{-1}(\tilde{G}_{port-int,2}^T + \tilde{G}_{int,12}^T V_c)$$
 (20)

Substituting this into Equation 18 gives:

$$\begin{split} & [\tilde{G}_{int,11} - \tilde{G}_{int,12} \tilde{G}_{int,22}^{-1} \tilde{G}_{int,12}^T + s \tilde{C}_{int}] V_c = \\ & - \tilde{G}_{port-int,1}^T + \tilde{G}_{int,12} \tilde{G}_{int,22}^{-1} \tilde{G}_{port-int,2}^T - s \tilde{C}_{port-int}^T \end{split}$$

Solving for  $V_c$  gives:

$$\begin{split} & \boldsymbol{V}_{c} = \left[ \tilde{\boldsymbol{C}}_{int}^{-1} (\tilde{\boldsymbol{G}}_{int,11} - \tilde{\boldsymbol{G}}_{int,12} \tilde{\boldsymbol{G}}_{int,22}^{-1} \tilde{\boldsymbol{G}}_{int,12}^{T}) s^{-1} + \boldsymbol{I} \right]^{-1} \\ & \times \tilde{\boldsymbol{C}}_{int}^{-1} \left[ s^{-1} (-\tilde{\boldsymbol{G}}_{port-int,1}^{T} + \tilde{\boldsymbol{G}}_{int,12} \tilde{\boldsymbol{G}}_{int,22}^{-1} \tilde{\boldsymbol{G}}_{port-int,2}^{T}) \right. \\ & \left. - \tilde{\boldsymbol{C}}_{port-int}^{T} \right] \end{split}$$

Performing an expansion around  $s^{-1} = 0$  yields:

$$\begin{split} & \boldsymbol{V}_{c} \cong [\boldsymbol{I} - \boldsymbol{s}^{-1} \tilde{\boldsymbol{C}}_{int}^{-1} (\tilde{\boldsymbol{G}}_{int,11} - \tilde{\boldsymbol{G}}_{int,12} \tilde{\boldsymbol{G}}_{int,22}^{-1} \tilde{\boldsymbol{G}}_{int,12}^{T})] \tilde{\boldsymbol{C}}_{int}^{-1} \\ & \times [\boldsymbol{s}^{-1} (-\tilde{\boldsymbol{G}}_{port-int,1}^{T} + \tilde{\boldsymbol{G}}_{int,12} \tilde{\boldsymbol{G}}_{int,22}^{-1} \tilde{\boldsymbol{G}}_{port-int,2}^{T}) - \tilde{\boldsymbol{C}}_{port-int]}^{T} \end{split}$$

To order  $s^{-1}$ , one finds, therefore, that:

$$\begin{split} & \boldsymbol{V}_{c} \cong -\tilde{\boldsymbol{C}}_{int}^{-1} \tilde{\boldsymbol{C}}_{port-int}^{T} + \\ & s^{-1} \tilde{\boldsymbol{C}}_{int}^{-1} (-\tilde{\boldsymbol{G}}_{port-int,1}^{T} + \tilde{\boldsymbol{G}}_{int,12} \tilde{\boldsymbol{G}}_{int,22}^{-1} \tilde{\boldsymbol{G}}_{port-int,2}^{T}) + \\ & s^{-1} \tilde{\boldsymbol{C}}_{int}^{-1} (\tilde{\boldsymbol{G}}_{int,11} - \tilde{\boldsymbol{G}}_{int,12} \tilde{\boldsymbol{G}}_{int,22}^{-1} \tilde{\boldsymbol{G}}_{int,12}^{T}) \\ & \times \tilde{\boldsymbol{C}}_{int}^{-1} \tilde{\boldsymbol{C}}_{port-int}^{T} \end{split}$$

$$(21)$$

Using Equation 14, the direct terms are then given by:

$$D_2 = C_{port} - \tilde{C}_{port-int} \tilde{C}_{int}^{-1} \tilde{C}_{port-int}^T$$
 (22)

$$D_{1} = G_{port} - \tilde{G}_{port-int,1} \tilde{C}_{int}^{-1} \tilde{C}_{port-int}^{T} + \tilde{C}_{port-int} \left[ \tilde{C}_{int}^{-1} (-\tilde{G}_{port-int,1}^{T} + \tilde{G}_{int,12} \tilde{G}_{int,22}^{-1} \tilde{G}_{int,22}^{T} \tilde{G}_{int,12}^{T} \right] + \tilde{C}_{int}^{-1} (\tilde{G}_{int,11} - \tilde{G}_{int,12} \tilde{G}_{int,22}^{-1} \tilde{G}_{int,12}^{T})$$

$$\times \tilde{C}_{int}^{-1} \tilde{C}_{port-int}^{T} - \tilde{G}_{int,22}^{T} \tilde{G}_{int,22}^{T}$$

$$\times \tilde{G}_{port-int,2}^{T} \tilde{G}_{int,22}^{-1}$$

$$\times \left[ \tilde{G}_{port-int,2}^{T} - \tilde{G}_{int,12}^{T} \tilde{C}_{int}^{T} \tilde{C}_{port-int}^{T} \right]$$

$$(23)$$

The major cost of this direct term calculation is the LU factorization of the  $\tilde{C}_{int}$  matrix and the LU factorization of the  $\tilde{C}_{int}$  matrix, but with all the deletions in the original system, these matrices are typically quite small. Once we have calculated  $D_1$  and  $D_2$ , we must form new C and G matrices, G' and G' by adjusting  $G_{port}$  and  $G_{port}$ .

$$C'_{port} = C_{port} - D_2$$

$$G'_{port} = G_{port} - D_1$$
(24)

Equation 4 with this adjustment, then, becomes:

$$Y(s) = B^{T}(G' + sC')^{-1}B + sD_{2} + D_{1}$$
  
 $H(s) = L^{T}(G' + sC')^{-1}B$  (25)

A trivial deflation of B, L,  $\tilde{\mathcal{G}}$ ,  $\tilde{\mathcal{C}}$  removes state-variables that are eliminated by this transformation (all zero rows and associated columns). The transformation of equation 4 to equation 25 is similar to that proposed by Fettweis [18].

Figures 2 - 4 show the reduced order model (of the same order as the **Arnoldi** and **Prima** curves) that results from PRIMA-based model-order reduction after direct-term extraction (noted as **ExtractDirect**). These curves show excellent agreement with the unreduced system, in most cases indistinguishable from the unreduced system.

### 4 Superposition to determine worst-case crosstalk noise

As a result of the analysis of Section 2, we stamp  $Y_q(s)$  into the circuit simulator engine along with the (nonlinear) transistor models at the ports of the net complex. Once the port voltages are determined by circuit simulation, recursive convolution[17] is used to determined the tap voltages from  $H_q(s)$ .

In simulating the crosstalk noise, the static noise analysis tool must determine the worst-case vector set for the aggressors. We assume that the nonlinearities introduced by the drivers are small enough that superposition can be used to determine the skew of switching events required to produce the worst-case noise. The noise produced by each aggressor driver is determined individually. Switching times are chosen so that the individual peak noises align. Timing windows, when known, are used to impose "orthogonality" contraints on switching events; that is, switching events can only occur simultaneously when the arrival time windows (as defined by the earliest and latest arrival times) overlap. More details on this can be found in References [1, 8].

The unique difference when inductance and inductive coupling are included in the coupled interconnect extraction is that the noise is not monotonic in response to a switching event. In the presence of only resistance and capacitances,  $V_H$  and  $V_L^*$  noise can be produced only by coupled high-to-low transitions. Similarly,  $V_L$  and  $V_H^*$  noise can be produced only by coupled low-to-high transitions. With inductance and inductive coupling, either transition can be responsible for a given noise type. As a result, to determine the maximum noise, twice as many "single aggressor" noise simulations must be performed.

# 5 Results

To further illustrate crosstalk the crosstalk calculation, including both inductive and capacitive coupling, we work with a specific example, the die photo of which is shown in Figure 5. This is a testchip fabricated in a TSMC  $0.25 \mu m$  5M1P process. The chip contains a 4-mm-long, 16-bit bus snaking through a power-ground distribution. This structure is evident in the top, left corner of the chip. The spacing of the power-ground grid is  $100 \mu m$ , and the bus is routing entirely within this spacing (i. e., power-ground is not interdigitated with the signal lines). In addition to the bus, the chip includes on-chip measurement circuits to noninvasively measure the time-domain waveforms on the bus[19]. Only the inductances associated with the 16-bit bus structure survive the inductance filtering in the extractor. In its current form, the extraction ignores potential current returns through the substrate in calculating the inductance. The distant power-ground gives fairly large extracted inductances and mutual inductances under these conditions.

Figure 6(c) shows the worst-case  $V_H$  and  $V_H^*$  noise waveforms for bit 7 of the 16-bit bus. No timing windows (and, therefore, no timing orthogonality constraints) were used. Figure 6(a) shows



Figure 5: Die photo of test chip.



Figure 6: Noise waveforms on the far end of bit 7: (a) Constituent "single aggressor" noise waveforms due to the high-to-low transitioning of aggressors, acting individually; (b) constituent "single aggressor noise waveforms due to the low-to-high transitioning of aggressors, acting individually; and (c) worst total peak  $V_H$  and  $V_H^*$  noise waveforms due to the switching of all aggressors.

all of the "single aggressor" noise waveforms on bit 7 for highto-low aggressor transitions. For bits 6 and 8 (the adjacent lines), the coupling is dominantly capacitive, while for the other wires, the coupling is inductive. Figure 6(b) shows all of the "single aggressor" noise waveforms on bit 7 for low-to-high aggressor transitions. Once again, for bits 6 and 8, the coupling is capacitive while for the other aggressors it is primarily inductive. The worst case  $V_H^*$  noise switches bits 6 and 8 from low to high, but switches the other aggressors (bits 1 through 5 and 9 through 16) from high to low. The high-to-low aggressors are switched approximately 150 psec after the switching of bits 6 and 8. The tool does this to superimpose the  $V_H^*$  noise peak of Figure 6(a) for bits 1-5, 9-16 with the  $V_H^*$  noise peak of Figure 6(b) for bits 6,8. This result is not too surprising, given conventional understanding of coupled transmission line behavior[20]. At the far-end, capacitive and inductive currents are in the opposite direction for a given aggressor switch. To achieve the worst noise, one must chose one switch direction for aggressors dominated by capacitive coupling (nearest neighbors) and choose the opposite direction switch for aggressors dominated by inductive coupling (more distant aggressors).

The  $V_H$  curve of Figure 6(c) switches bits 6 and 8 from high to low, but switches the other aggressors (bits 1 through 5 and 9 through 16) from low to high. In this case, the low-to-high aggressors are switched approximately 150 psec after the switching of bits 6 and 8.

As shown in Figure 6(c), the  $V_H$  and  $V_H^*$  noise is substantial, but unfortunately, it fails to match the measured results in silicon. Figure 7 shows the measured crosstalk on bit 7 (far-end) for all the other bits switching simultaneously in the same direction (in this case all from high to low). [Our measurement circuits are not able to switch the bits in the pattern of Figure 6(c).] On the same figure, we show simulation results for both an RC and an RLCK model. The RC model matches the measurement result fairly precisely, while the RLCK model is significantly different. We attribute this discrepancy to current returns in the substrate, which are ignored in the analysis from extracted layout. If a significant fraction of the current returns through the substrate, the inductances will be significantly decreased (because of the very distant power-ground lines) and the loss will be increased.<sup>4</sup> In our efforts to drive up the inductance effects in this example (with a very loose powerground grid), we have, in fact, made substrate current returns more favorable.

# 6 Conclusions and future work

Inductance is rapidly becoming an important concern on-chip. While self-inductance has received most of the attention in the literature, mutual inductance (particularly in the context of simultaneously switching buses) is probably a far more serious concern.

Taking advantage of recent advances in extraction to include inductance, we have described practical issues in capacitive and inductive crosstalk analysis in a commercial static noise analysis tool. This involves techniques for estimating the noise produced by a switching aggressor, forming net complexes including named aggressors and a virtual attacker, creating reduced-order models and stamping them into circuit simulation, and performing the noise analysis. We have also discussed techniques we have used to handle degenerate interconnect networks to produce a robust

<sup>&</sup>lt;sup>4</sup>This test structure is not representative of the "metal-rich" environment typical in most digital integrated circuits, which generally preclude the need to consider substrate returns.

 $<sup>^{5}</sup>$ This technology has a lightly-doped  $7\mu m$  epitaxial layer on top of a heavily-doped substrate, typical for digital integrated circuits.

reduced-order modelling algorithm. We have been humbled by measured silicon results that indicate that returns through the substrate can be a significant influence on the interconnect response in metal-sparse environments.



Figure 7: Circles represent the measured crosstalk on bit 7 (farend) due to the simultaneous switching of all of the other bits of the bus. The solid curve is the RC-only simulation. The dashed curve is the RLCK simulation.

#### 7 Acknowledgments

We gratefully acknowledge the significant contributions of Kevin Chou, Charlie Huang, and Vinod Narayanan to this work.

### References

- K. L. Shepard, V. Narayanan, P. C. Elmendorf, and Gutuan Zheng. Global Harmony: Coupled Noise Analysis for Full-Chip RC Interconnect Networks. In Proceedings of the IEEE International Conference on Computer-Aided Design, 1997.
- [2] A. Deutsch, G. V. Kopcsay, P. J. Restle, H. H. Smith, G. Katopis, W. D. Becker, P. W. Coteus, C. W. Surovic, B. J. Rubin, Jr. R. P. Dunne, T. Gallo, K. A. Jenkins, L. M. Terman, R. H. Dennard, G. A. Sai-Halasz, B. L. Krauter, and D. R. Knebel. When are transmission-line effects important for on-chip interconnections. *IEEE Transactions on Microwave Theory and Techniques*, 45(10):1836-1846, October 1997.
- [3] Y. I. Ismail, E. G. Friedman, and J. L. Neves. Figures of merit to characterize the importance of on-chip inductance. In 35<sup>th</sup> ACM/IEEE Design Automation Conference, pages 560-565, June 1998.
- [4] S. Lin, N. Chang, and S. Nakagawa. Quick on-chip self- and mutual-inductance screen. In Proceedings of the International Symposium on Quality Electronic Design, pages 513 520, March 2000.
- [5] B. Krauter, S. Mehrotra, and V. Chandramouli. Including inductive effects in interconnect timing analysis. In *Proceedings of the CICC*, pages 445 – 452, 1999.
- [6] R. Levy, D. Blaauw, G. Braca, A. Dasgupta, A. Grinshpon, C. Oh, B. Orshav, S. Sirichotiyakul, and V. Zolotov. Clarinet: A noise analysis tool for deep submicron design. In 37<sup>th</sup> ACM/IEEE Design Automation Conference, pages 233 – 238, 2000.

- [7] K. L. Shepard and V. Narayanan. Noise in deep submicron digital design. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, pages 524-531, San Jose, CA, November 1996.
- [8] K. L. Shepard, V. Narayanan, and R. Rose. Harmony: Static noise analysis for deep-submicron digital integrated circuits. *IEEE Trans.* CAD, pages 1132–1150, August 1999.
- [9] A. Devgan. Efficient coupled noise estimation for on-chip interconnects. In Proceedings of the International Conference on Computer-Aided Design, pages 147–151, 1997.
- [10] M. Kuhlmann, S. S. Sapatnekar, and K. K. Parhi. Efficient crosstalk estimation. In Proceedings of the International Conference on Computer Design, pages 143-148, 1999.
- [11] K. L. Shepard, D. Sitaram, and Y. Zheng. Full-chip, three-dimensional, shapes-based RLC extraction. In Proceedings of the International Conference on Computer-Aided Design, pages 142—149, 2000.
- [12] K. L. Shepard and K. Chou. Cell characterization for noise stability. In *Proceedings of the CICC*, pages 91 – 94, 2000.
- [13] L. Miguel Silveira, Mattan Kamon, Ibrahim Elfadel, and Jacob White. A Coordinate-Transformed Arnoldi Algorithm for Generating Guaranteed Stable Reduced-Order Models of RLC Circuits. In IEEE/ACM International Conference on Computer-Aided Design, pages 288 – 294, San Jose, CA, November 1996.
- [14] A. Odabasioglu, M. Celik, and L. Pileggi. PRIMA: Passive reducedorder interconnect macromodeling algorithm. *IEEE Trans. CAD*, August 1998.
- [15] R. W. Freund and P. Feldmann. Reduced-order modelling of large linear passive multi-terminal circuits using matrix-Pade approximation. In Proceedings of Design, Automation, and Test in Europe, pages 530 – 537, 1998.
- [16] George C. Verghese, Bernard C. Levy, and Thomas Kailath. A generalized state-spec for singular systems. *IEEE Transactions on Automatic Control*, AC-26(4):811 831, August 1981.
- [17] Vivek Raghavan, J. Eric Bracken, and Ronald A. Rohrer. AWE-Spice: A General Tool for the Accurate and Efficient Simulation of Interconnect Problems. In 29<sup>th</sup> ACM/IEEE Design Automation Conference, pages 87-92, Anaheim, California, June 1992.
- [18] Alfred Fettweis. On the algebraic derivation of the state equations. IEEE Transactions on Circuit Theory, CT-16(2):171 - 175, May 1969.
- [19] K. L. Shepard and Y. Zheng. On-chip oscilloscopes for noninvasive time-domain measurement of waveforms. In Proceedings of the International Conference on Computer Design, 2001. To be published.
- [20] H. B. Bakoglu. Circuits, Interconnects, and Packaging for VLSI. Addison Wesley, 1990.