Design Techniques for High-Speed I/Os: Challenges and Opportunities

This editorial examines design techniques for high-speed serial data links over wire channels. The state-of-the-art of serial links over wire channels is briefly studied. The imperfections of wire channels at high frequencies and their effect on multi-Gbps serial links are examined. It is followed with a close examination of modulation schemes effective in combating the effect of the finite bandwidth of wire channels. Channel equalization, both pre-emphasis and post-equalization, are investigated with an emphasis on adaptive decision feedback equalization. Challenges and opportunities in combating ISI are explored


Introduction
Impedance discontinuities of channels, typically occurring at vias, connectors, and packages, cause reflection. Crosstalk at both the near and far-end of channels arising from capacitive and inductive coupling between the channels and neighboring devices also undermines signal integrity. All contributes to ISI that reduces the opening of data eyes and yields a poor Bit Error Rate (BER). The frequency response of a wire channel typically consists of a smooth roll-off section at low frequencies and deepening toughs from capacitive impedance discontinuities and crests from inductive impedance mismatches at high frequencies. As a result, at the receiver end, data symbols typically consist of pre-cursors, a main cursor, and post-cursors. For severely lossy channels, the number of post-cursors is significantly larger as compared with that of the precursors. For reflective channels with strong impedance discontinuities located far away from the transmitters, large post-cursors exist both near and far away from the main cursor with a large number of insignificant small post-cursors in between. The characteristics and behavior of these channels differ fundamentally from those with only high dispersion thereby requiring different compensation schemes in channel equalization.

Modulation Schemes
Although a large number of modulation schemes exist for wireless communications, only a handful modulation schemes have been successfully deployed for multi-Gbps data communications over wire channels to combat channel imperfections. Perhaps the most widely used modulation scheme in backplane applications is Pulse Amplitude Modulation (PAM) with 2PAM the most power and silicon efficient. Although 2PAM enjoys the largest noise margins subsequently the best BER, it suffers from the drawback of poor spectral efficiency. As compared with 2PAM, 4PAM trades voltage spacing for spectral efficiency. Both transmitters and receivers of 4PAM data links are considerably more complex as compared with those of 2PAM data links, mainly due to the Abstract This editorial examines design techniques for high-speed serial data links over wire channels. The state-of-theart of serial links over wire channels is briefly studied. The imperfections of wire channels at high frequencies and their effect on multi-Gbps serial links are examined. It is followed with a close examination of modulation schemes effective in combating the effect of the finite bandwidth of wire channels. Channel equalization, both pre-emphasis and post-equalization, are investigated with an emphasis on adaptive decision feedback equalization. Challenges and opportunities in combating ISI are explored.

References
Tech.
[nm] need for a Digital-to-Analog Converter (DAC) at the transmitters, an Analog-to-Digital Converter (ADC) at the receiver, and complex clock recovery schemes [17][18][19][20][21][22][23][24]. High-order PAM was also used for wireline communications to improve spectral efficiency. For example, Foley and Flynn demonstrated a 1.3 Gb/s serial link in 0.5 m CMOS using 8PAM [25]. Song, et al. [26] proposed a 10 Gb/s transceiver with dual-mode 10PAM. The main difficulties encountered in deploying high-order PAM include reduced noise margins subsequently a poor BER, the need for high-speed power-greedy ADCs, and complex clock recovery circuitry.
Not only can data be modulated spatially to improve spectral efficiency, they can also be modulated temporally, i.e., data are represented by pulses with pulse width modulated by data (Pulse Width Modulation or PWM) [27-31]. As compared with PAM, PWM is less spectral efficient. For example, the worst symbol time of 4PWM is 4UI where UI is unit interval while the symbol time of 4PAM is constant and is one UI. When the both edges of PWM pulses are used to carry data information, the symbol time of 4PWM is reduced to 2 UIs [31]. Want, et al. proposed a dual-edge PWM to improve channel efficiency, especially highly lossy channels [31]. PWM was also used as pre-emphasis to boost the high frequency components of data symbols prior to their transmission [27,29]. PWM that utilizes the orthogonal characteristics of Walsh codes was also used in serial links to transmit multiple Walsh-code encoded data symbols via the same channel simultaneously [32]. Pulse-width-amplitude modulation (PWAM) that utilizes the advantages of both PAM and PWM simultaneously to further improve data rates by modulating data symbols both spatially and temporally becomes attractive [28, [33][34][35][36][37][38]. For example, 4PAM and 4PWM transmit 2 bits per symbol while 4PWAM transmits 4 bitsper symbol. Note the symbol time of 4PWAM is the same as 2PWM.
Modulation schemes also found their applications in parallel data links. For example, in [39], the transmitter utilized Analog Multi-Tone (AMT) to achieve 24 Gbps in 90 nm CMOS. A three-level differential encoding was used to obtain higher I/O pin efficiency [40]. A Codedivision-Multiple-Access (CDMA) based transceiver was proposed for parallel links to improve crosstalk rejection [41]. Kossel, et al. showed that the use of Tomlinson-Harashima pre-coding initially proposed in [42,43] at transmitters can remove post-cursor ISI [44].
The preceding investigation demonstrates that an effective means to combat ISI caused by channel imperfections is to utilize advanced modulation schemes to alleviate bandwidth constraint on channels. For example, the orthogonal characteristics of CDMA enable the transmission of multiple data via parallel links with the minimum crosstalk among them. The orthogonality identity of Walsh codes allows multiple Walsh-code encoded data symbols to be transmitted to the same channel simultaneously without interfering each other. More research is clearly needed in deploying advanced modulation and dataencoding schemes popular in wireless communications for Gbps data links.

Channel Equalization
Due to channel imperfections, data symbols received at the far end of the channel consist of pre-cursors, a main cursor, and post-cursors. The main cursor represents the transmitted data and is used for data recovery while precursors and post-cursors must be removed by means of channel equalization. Boosting the high-frequency components [45,46] or attenuating the low-frequency components of data symbols prior to their transmission, known as pre-emphasis, are effective [47,48]. Since the former deteriorates crosstalk, the latter is generally preferred. Near-end channel equalization is usually implemented using a finite impulse response filter that introduces zeros at the location of the dominant poles of the channels so as to cancel the poles [20,49]. The order of pre-emphasis filters is low, usually limited to 4, indicating that typical wire channels can be adequately modeled using a 4th-order low-pass when reflection and cross-talk are not accounted for [50,51]. Since the characteristics of the channel are not known prior to data transmission, the optimal tap coefficients of pre-emphasis filters cannot be obtained a prior, revealing the rigidness of pre-emphasis. Another intrinsic limitation of pre-emphasis is its inability to remove ISI caused by reflection and crosstalk, which are particularly important when data rate is high and channels contain multiple impedance discontinuities. Far-end channel equalization combats ISI by amplifying the highfrequency components of received data symbols or subtracting the estimated post-cursors from data symbols prior to clock and data recovery, often both simultaneously. Similar to pre-emphasis, Continuous-Time Linear Equalizers (CTLEs) provide zeros to cancel out the dominant poles of the channels so that the equalized channel exhibits the desired all-pass characteristic. CTLE is often deployed in conjunction with nonlinear post-equalization [51,52]. It is also used solely for channels with negligible reflection and crosstalk [53,54]. Because the received symbol is severely attenuated, input offset voltage compensation is typically required in CTLE [55,56]. Unlike CTLE, nonlinear post-equalization removes post-cursors by subtracting estimated post-cursors directly from data symbols prior to slicing. Perhaps the most widely used nonlinear equalization is Decision Feedback Equalization (DFE) [57]. As compared with CTLE, DFE does not deteriorate crosstalk. Also, since the taps of DFE can be adjusted in accordance with the opening of data eyes or the power of the error between the input and output of the slicer, it is most effective and robust in eliminating ISI caused by finite channel bandwidth, reflection, and crosstalk. As DFE has no effect on pre-cursors, it is usually deployed together with pre-emphasis to remove both pre-cursors and postcursors.
DFE operations include data slicing, multiplication, and subtraction. All need to be completed within one UI for tap-1. Loop-unrolling has been proven to be an effective technique in meeting the timing constraint of the tap-1 of DFE [19,[58][59][60]. High-order loop-unrolling also emerged when data rate exceeds 10 Gb/s, however, at the cost of high silicon consumption [10,11,61]. To relax the timing constraint and at the same time to lower the power consumption of the remaining DFE taps, the half rate approach is widely used [59]. Quarter-rate approach was also deployed to further relax the timing constraint, however, at the cost of high silicon consumption [10,11,49,61]. Since DFE operation is based on the correct recovery of data, an error occurring in data slicing will propagate through the delay chain of the equalizer and affect subsequent data recovery decisions. To mitigate this, the input of the slicer must be sufficiently large and disturbance-free. Currentintegrating is proven to be effective in eliminating transient disturbances in data symbols prior to slicing [62,63]. Using CTLE prior to slicing to boost data symbols is also effective [7,52,64]. To minimize the kickback of slicers, slicers should also be multistage configured [65,66]. To speed up multiplication and summation operations, current-steering multiplication is widely used for its speed. For the same reason, current mode summation including current-integrating summers that offer the key advantage of high-speed and low power consumption is widely preferred [52,[67][68][69].
The variation of the characteristics of wire channels requires the tap coefficients of DFE be set adaptively in accordance with the characteristics of the channels. Least Mean Square (LMS) updates tap coefficients in such a way that the power of the error between the output and input of the slicer is minimized [70]. Sign-Sign LMS (SS-LMS) where only the sign of the error and data symbol are used in search for optimal tap coefficients are widely used due to its ease of implementation and fast convergence. Alternatively, the opening of data eyes can be used to guide DFE search [71,72]. Eye-opening can be captured using an Eye-Opening Monitor (EOM) [73][74][75][76][77][78][79][80][81]. Jitter-based eye-opening monitors that minimize timing jitter at the edges of data eyes also emerged [82]. Dual-mode adaptive DFE consisting of a data-DFE to maximize vertical opening and a jitter-DFE to minimize timing jitter outperforms EOM-based DFE [49]. For highly reflective channels, large post-cursors might exist at taps both close to and far away from the main cursor with a large number of small post-cursors between them. Equalizing these channels using conventional DFE requires a large number of taps, resulting in high power and silicon consumption even though taps corresponding to the insignificant post-cursors contribute little to channel equaliation. Floating-tap DFE proposed in [7,64,83] is an elegant technique in eliminating reflection-induced post-cursors located far away from the main cursor.
The preceding study shows that the adaptivity of DFE should address the following key issues in DFE-based channel equalization: (i) Optimal tap coefficients to provide the complete cancellation of postcursors. (ii) Optimal number of taps to minimize power and silicon consumption. (iii) Optimal distribution of taps to remove noncritical taps so as to minimize the number of taps and achieve the best channel equalization without sacrificing power and silicon resources. Although SS-LMS is widely used in search for these optimal parameters, EOM and jitter-based DFE demonstrates promising performance and power efficiency especially they unify DFE and CDR into one operation.

Conclusions
The imperfections of wire channels and their impact on multi-Gbps data links were examined. It was followed by a close examination of modulation schemes effective in combating the effect of channel imperfections. Channel equalization, both pre-emphasis and postequalization, were investigated with an emphasis on adaptive decision feedback equalization. Challenges and opportunities in combating ISI were explored. We showed that two directions of research that could result in improved performance of data links are advanced modulation and data-encoding schemes and adaptive channel equalization.