Low power/low voltage techniques for analog CMOS circuits

Cassia, Marco

Publication date:
2004

Document Version
Early version, also known as pre-print

Link back to DTU Orbit

Citation (APA):
LOW POWER / LOW VOLTAGE
TECHNIQUES FOR
ANALOG CMOS CIRCUITS

Marco Cassia

This thesis is submitted in partial fulfillment
of the requirements for obtaining the Ph.D. degree at:

Ørsted•DTU
Technical University of Denmark (DTU)

31 July, 2004
ABSTRACT

This work presents two separate study cases to shed light on the different aspects of low-power and low-voltage design.

In the first example, a low-voltage folded cascode operational transconductance amplifier was designed to achieve 1-V power supply operation. This is made possible by a novel current driven bulk (CDB) technique, which reduces the MOST threshold voltage by forcing a constant current though the transistor bulk terminal. A prototype was fabricated in a standard CMOS process; measurements show a 69-dB dc gain over a 2-MHz bandwidth, and compatible input- and output voltage levels at a 1-V power supply. Limitations and improvements of this CDB technique are discussed.

The second part of the work is concerned with analog RF circuits. A previously unknown intrinsic non-linearity of standard $\Sigma\Delta$ fractional-N synthesizers is identified. A general analytical model for $\Sigma\Delta$ fractional-N phased-locked loops (PLLs) that includes the effect of the non-linearity is derived and an improvement to the standard synthesizer topology is discussed. Also, a new methodology for behavioral simulation is presented: the proposed methodology is based on an object-oriented event-driven approach and offers the possibility to perform very fast and accurate simulations; the theoretical models developed validate the simulation results. A study case for EGSM/DCS modulation is used to demonstrate the applicability of the simulation methodology to the analysis of real situations.

A novel method to calibrate the frequency response of a Phase-Locked Loop concludes the research. The method requires just an additional digital counter to measure the natural frequency of the PLL; moreover it is capable of estimating the static phase offset. The measured value can be used to tune the PLL response to the desired value. The method is demonstrated mathematically on a typical PLL topology and it is extended to $\Sigma\Delta$ fractional-N PLLs.
RESUMÉ

I afhandlingen præsenteres to forskellige case studies til belysning af forskellige aspekter ved low power / low voltage design.

Det første case study omhandler en lavspændings foldet kaskode transkonduktans operationsforstærker, konstrueret til at fungere ved en forsyningsspænding på 1V. Dette er muliggjort ved anvendelse af en ny forspændingsteknik, hvor transistorens bulkterminal kontrolleres med en konstant strøm i lederetningen. Herved reduceres transistorens tærskelspænding. Der er fremstillet en prototype i en standard CMOS teknologi, og målinger viser en forstærkning på 69dB, en bånbredde på 2 MHz og kompatible indgangs og udgangsspændinger ved en forsyningsspænding på 1 V. Begrensninger og forbedringer ved den udviklede forspændingsmetode diskuteres.

Det andet case study omhandler RF kredsløb. En ikke tidligere beskrevet intrinsisk ulinearitet ved en standard Sigma-Delta fractional-N synt hesizer er blevet identificeret. Der er udviklet en generel analytisk model for en Sigma-Delta fractional-N faselåst kreds (PLL) som tager hensyn til denne ulinearitet, og der er foreslået en forbedring af den normale synthesizer topologi. Der er endvidere præsenteret en ny metode til systemsimulering af det faselåste system. Denne simulering anvender en objekt orienteret, event-driven metode og giver mulighed for meget hurtige og nøjagtige simuleringer. Metoden er demonstreret på et praktisk eksempel til EGSM/DCS modulation.

ACKNOWLEDGMENTS

This Ph. D. project was carried out at Ørsted•DTU, Technical University of Denmark, under the supervision of professor Erik Bruun, Ørsted•DTU. It was financed by a scholarship from DTU.

I would like to express my gratitude to everyone that have helped and supported me over the last three years. First of all I would like to thank my supervisor Erik Bruun for making the study possible and for his guidance. For CAD support, OS setup and several other practical issues, thanks to Allan Jørgensen for his ability in making things run smoother.

Special thanks are due to my good friend Peter Shah for arranging my staying at Qualcomm and for his supervision. I also would like to thank all the QCT department of Qualcomm, San Diego for the great support during the internship.

For the proof reading effort of this thesis and for the several technical discussions, my deepest gratitude goes to my colleague and friend Jannik H. Nielsen. Finally, I would like to acknowledge my brother Fabio for the final readings.
# CONTENTS

1 Introduction 1

I Low Voltage Amplifier 5

2 Limits to low-voltage low-power design 7
  2.1 Low-voltage supply limits 7
  2.2 Threshold voltage 8
  2.3 Sub-threshold region 8
  2.4 Transistor speed 9
  2.5 Power limitations 9
  2.6 Analog switches 10
  2.7 Transistor stacking and cascoding 11
  2.8 Dynamic range 11
  2.9 Summary 12

3 CMOS bulk techniques for Low-Voltage analog design 13
  3.1 Bulk-driven MOS 13
  3.2 MOS threshold voltage 15
    3.2.1 DTMOS 16
  3.3 Current-Driven Bulk approach 16
  3.4 Summary 17

4 Current Driven Bulk MOS 19
  4.1 CDB MOS analysis 19
    4.1.1 Output impedance 20
    4.1.2 Parasitic capacitances 21
    4.1.3 Slew-rate 22
    4.1.4 Frequency behavior 22
      4.1.4.1 Increased bias currents 25
      4.1.4.2 Cascode transistor 25
      4.1.4.3 Decoupling capacitor 25
    4.1.5 Parasitic bipolar gain 25
  4.2 CDB noise performance 26
    4.2.1 Supply noise coupling 28
  4.3 CDB MOS summary 29

5 1 V Operational Transconductance Amplifier 31
  5.1 OTA architecture 31
## LIST OF FIGURES

1.1 Factors driving the low-power low-voltage trend. ........................................... 2
1.2 Power supply voltage and scaling trends. ......................................................... 3
2.1 100% current efficient transconductor for single pole realization. .................. 10
2.2 Conductance and charge injection for analog switches. ..................................... 11
3.1 Bulk driven input pair and bulk driven current mirror. ..................................... 14
3.2 Dynamic threshold MOS topology (a) and current-driven bulk MOS (b). .......... 16
4.1 Current Driven Bulk MOS model and PMOS cross-section. ............................... 20
4.2 Threshold voltage vs. bulk bias current. .......................................................... 20
4.3 CDB compared with normal MOS output characteristics. ............................... 21
4.4 CDB Source Follower and CDB Common Source configurations. ..................... 22
4.5 Source follower step response. ........................................................................... 23
4.6 Common source AC response. ............................................................................. 23
4.7 Common-source small signal schematic. ........................................................... 24
4.8 SF stage with increased bias current and CS stage with cascode MOS. .......... 26
4.9 Source follower step response with decoupling capacitor. ............................... 27
4.10 Common source AC response with decoupling capacitor. .............................. 28
4.11 CDB MOS bias circuitry and CDB layout. ......................................................... 29
5.1 1V cascode amplifier. ......................................................................................... 32
5.2 DC responses at different common-mode input levels. ..................................... 34
5.3 Transfer characteristic for different values of the decoupling capacitor. ......... 35
5.4 AC response. ...................................................................................................... 36
5.5 Layout of the CDB amplifier. ............................................................................. 37
5.6 OTA bias circuitry. ............................................................................................. 38
5.7 Measured DC OTA responses at different common-mode input voltages. ....... 39
5.8 DC gain for different input common-mode voltages. ....................................... 39
5.9 Simulated (-) and measured (X) CDB OTA AC responses. ......................... 40
5.10 Measured DC transfer function at $V_{DD} = 0.75$ V with and without (two bottom traces) bulk current. ................................................................. 41
6.1 Integer $N$ synthesizer. ....................................................................................... 46
6.2 Standard Fractional-$N$ architecture. ............................................................... 48
6.3 $\Sigma \Delta$ fractional-$N$ architecture. ................................................................. 49
6.4 First order $\Sigma \Delta$ modulator. ........................................................................... 49
6.5 MASH architecture. ......................................................................................... 50
6.6 NTF for different MASH order. .......................................................................... 51
6.7 Candy architecture. ........................................................................................... 52
6.8 Effects of LSB dithering. .................................................................................. 53
6.9 Non-uniform sampling. ..................................................................................... 54
LIST OF FIGURES

6.10 S/H ΣΔ fractional-N synthesizer. ........................................ 55
6.11 Sample-and-hold response. ........................................ 55
6.12 Possible implementation of the PFD and of the CP with S/H. .... 56
6.13 Linear model of S/H portion ........................................ 57
6.14 PLL waveforms ...................................................... 58
6.15 Complete linearized S/H ΣΔ fractional-N PLL ....................... 60
6.16 Non-linearity model. ................................................. 61
6.17 Phase noise PSD for different ΣΔ modulator orders (--- without S/H, — with S/H). 62
7.1 Simulation model. ...................................................... 66
7.2 Simulation structure. .................................................. 68
7.3 Simulation steps. ...................................................... 69
7.4 Phase noise PSD: S/H PLL vs. regular PLL. ......................... 75
7.5 Phase noise PSD: effects of the truncation error. .................. 76
7.6 S/H Synthesizer phase noise PSD. ................................ 76
7.7 Phase noise PSD with VCO noise added. ................................ 77
7.8 Phase noise PSD for different charge-pump current mismatches. 77
8.1 Mixer based modulation: (a) heterodyne, (b) homodyne. ............ 80
8.2 Open-loop modulation (top figure) and indirect VCO modulation. 81
8.3 Transmitter architecture. ............................................. 82
8.4 Gaussian transfer function and modulated data. .................... 83
8.5 Transmitter architecture linear model. ................................ 84
8.6 PSD and RMS of phase error for variable VCO gain. .............. 85
8.7 EGSM transmitted power spectrum. ................................ 87
8.8 DCS transmitted power spectrum. .................................. 88
8.9 Phase error RMS value for variable delay and CP mismatch. ....... 89
8.10 Voltage Power Spectral Density for different divider delays with GSM modulation. 89
9.1 Integer-N PLL with calibration resistors. ............................ 92
9.2 Phase error with corresponding UP/DOWN pulses. ................. 93
9.3 Auxiliary phase-frequency detector architecture. .................. 94
9.4 Integer-N PLL linear model. ........................................ 96
9.5 Phase error curves. .................................................. 97
9.6 ΣΔ fractional-N linear model. ....................................... 99
9.7 Counter behavior vs. time. .......................................... 100
9.8 Counter maximum vs. frequency steps. ............................... 101
LIST OF TABLES

5.1 OTA transistor dimensions.  .................................................. 33
5.2 OTA main parameters.  ...................................................... 36
5.3 Output range for different input common-mode voltages. .................. 36
5.4 DC gain at different input common-mode voltages at 0.7 V and 0.8 V power supply. 40
5.5 Main parameters for the amplifier for reduced supply voltage. .............. 40
5.6 DC gain for different input common-mode voltages. ....................... 42
5.7 CDB OTA vs. state-of-the art current mirror OTA. ............................ 42
7.1 Design parameters.  .......................................................... 74
7.2 Loop parameters.  ............................................................. 74
8.1 EGSM and GSM specifications. ............................................... 82
8.2 PLL design variables.  ....................................................... 86
8.3 Simulated errors for MASH and Candy modulators. .......................... 86
8.4 DCS spurious performance for various offset currents. ...................... 88
# List of Abbreviations and Acronyms

<table>
<thead>
<tr>
<th>Abbreviation</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>A/D</td>
<td>Analog to digital</td>
</tr>
<tr>
<td>BDDP</td>
<td>Bulk driven differential pair</td>
</tr>
<tr>
<td>BJT</td>
<td>Bipolar junction transistor</td>
</tr>
<tr>
<td>CMOS</td>
<td>Complementary metal oxide semiconductor</td>
</tr>
<tr>
<td>CP</td>
<td>Charge pump</td>
</tr>
<tr>
<td>CS</td>
<td>Common-source</td>
</tr>
<tr>
<td>D/A</td>
<td>Digital to analog</td>
</tr>
<tr>
<td>DAC</td>
<td>Digital to analog converter</td>
</tr>
<tr>
<td>DCS</td>
<td>Digital cellular system</td>
</tr>
<tr>
<td>DMD</td>
<td>Dual-modulus divider</td>
</tr>
<tr>
<td>DPA</td>
<td>Digital phase accumulator</td>
</tr>
<tr>
<td>DTMOS</td>
<td>Dynamic threshold metal oxide semiconductor</td>
</tr>
<tr>
<td>EGSM</td>
<td>Extended group mobile standard</td>
</tr>
<tr>
<td>JFET</td>
<td>Junction field effect transistor</td>
</tr>
<tr>
<td>ICMR</td>
<td>Input common mode range</td>
</tr>
<tr>
<td>I/Q</td>
<td>In-phase/quadrature</td>
</tr>
<tr>
<td>LF</td>
<td>Loop Filter</td>
</tr>
<tr>
<td>LO</td>
<td>Local oscillator</td>
</tr>
<tr>
<td>LSB</td>
<td>Least significant bit</td>
</tr>
<tr>
<td>MASH</td>
<td>Multi-stage noise shaping</td>
</tr>
<tr>
<td>MMD</td>
<td>Multi-modulus divider</td>
</tr>
<tr>
<td>MOS</td>
<td>Metal oxide semiconductor</td>
</tr>
<tr>
<td>MOSFET</td>
<td>Metal oxide semiconductor field-effect transistor</td>
</tr>
<tr>
<td>NTF</td>
<td>Noise transfer function</td>
</tr>
</tbody>
</table>
OSR  Oversampling ratio
OTA  Operational transconductance amplifier
PA   Power Amplifier
PFD  Phase-frequency detector
PLI  Programming language interface
PLL  Phase-locked loop
PM   Phase modulation
PRBS Pseudo-random binary sequence generator
PSD  Power spectral density
PSRR Power supply rejection ratio
RF   Radio frequency
RISC Reduced instruction set computer
RMS  Root mean square
SF   Source-follower
S/H  Sample and hold
SNR  Signal to noise ratio
STF  Signal transfer function
TCXO Temperature compensated crystal oscillator
VCO  Voltage controlled oscillator
CHAPTER 1

INTRODUCTION

The last few decades have experienced a proliferation and a worldwide diffusion of computers, digital communication systems and consumer electronics. The driving force for this trend is the ability of the industry to produce faster and more power-efficient circuits, which is mainly due to the continuous scaling of the CMOS technology [1]. The interaction among the different factors heading to the continuous growth of electronic devices can be better understood with the aid of diagram 1.1.

The increasing demand for portable systems with yet growing complexity places new challenges on the semiconductor industry. New solutions to satisfy the strict requirements posed by different factors, such as integration, low-power, cohabitation of different communication standards, are the subject of massive research both at physical level (device scaling, alternate dielectric materials) and at system level (new architectures, novel circuit techniques). At the same time, the possibilities offered by the industry pushes the market demand for even more complex, portable and higher performing devices.

The transformation of the mobile phone in the last few years is a perfect example of the on-going situation: from the initial bulky device, the mobile phone has progressively been down-shrunk into the present compact size and simultaneously has started to offer more services than simple wireless conversation, such as internet applications, high-rate data exchange, music and even games.

The continuous scaling of the MOS channel length increases the maximum number of transistors per unit area; according to Moore’s law [2], the amount of components per chip doubles every 24 months. The growing package density leads to integrated solutions: mixed-mode systems are implemented onto the same chip with fewer exterior components. Integration has not only driven the cost per function down, but has also lead to increased operation speed and power saving; in fact in modern digital circuits, power and speed are respectively limited by heat dissipation and by interconnection delays rather than information exchange-rate between different chips and exterior components.

Another direct consequence of MOS scaling is the strong progress of MOS RF performance [3]; due to the smaller dimension and the reduced parasitics, the maximum cut-off frequency $f_T$ has greatly improved. Consequently, CMOS has become an attractive option for analog RF applications and RF systems on-chip. Besides the advantage of low production costs, compared to Bi-CMOS processes, CMOS technology offers the unique possibility of integration of both RF analog and
baseband digital circuits on the same chip.

The demand for high-performance and low-power systems has important consequences at circuit level. The two main causes behind power reduction are the limited heat dissipation per unit area, especially when a dedicated heat sink is not available, and the demand for long-life autonomous portable equipment. Reliability of MOS performance and, even if in minor contribution, reduced power consumption for the digital chip section are the principal forces driving the voltage scaling. Note that behind the progress of ICs, there is always a driving factor dictated by technology limitations and a driving factor related to market demand. According to the scaling trend, the power supply will decrease down to sub-1 \( V \) voltage for the coming technology nodes, i.e. 60 nm. The ITRS forecast for near years term is graphed in fig. 1.2; it shows that the supply voltage is slowly decreasing with years.

A reduced supply voltage has a large impact on the power consumption: in analog circuits, scaling the supply-voltage of a factor \( k \) while preserving a constant product signal-to-noise ratio \( \times \) bandwidth, does not increase the power consumption at a first order approximation; nevertheless, several factors, discussed at a later stage, lead to an increased power consumption. Instead, for digital circuit, scaling the supply-voltage reduces the dynamic power consumption. The latter has historically been the greatest source of power dissipation: however, due to the continuous scaling, the magnitude of leakage currents increases and a growing amount of power consumption is determined by the stand-by power [4]. Hence, for further supply voltage scaling, the power saving is not going to be linearly related to the voltage reduction anymore.

Reducing both power and supply voltage sets new challenges both at architectural level as well as at circuit performance level, especially for RF applications, where the sensitivity of the blocks is a crucial element. Rather than showing the different approaches developed in the last years for low-voltage and low-power design, it was chosen in this work to present two specific examples,
analyzed from different perspectives. Since the analog amplifier is one of the main building blocks in analog design, a low voltage amplifier is the first case examined. The research focus has been placed on the challenges and constraints met at transistor level design.

The second example finds its application in the context of RF circuits; since the frequency synthesizer is one of the most important blocks in integrated transceivers, it was chosen to show how power consumption can be decreased by a proper choice at architecture level. The frequency synthesizer is responsible for generating the carrier frequency for both the transmitter and the receiver; depending on the system, the same synthesizers can be shared to generate the transmit and the receive carriers. The frequency synthesizer is also one of the most critical components: a poor design can largely affect the performance and the battery lifetime. Moreover, single chip integration is a challenging task, due to the stringent performance demands: typically, on-chip passive elements show poorer quality than the equivalent discrete components and interference between different chip sections may cause performance degradation.

Recently, ΣΔ fractional-N architectures have risen in popularity; besides the high frequency resolution capability, this type of synthesizers enables indirect digital VCO modulation. Low power operation can be achieved since a minimum number of analog blocks are required for transmission e.g. no need for DACs or an analog transmit filter.

This thesis is hence divided into two parts. The first part, dedicated to the low-voltage amplifier, starts with a general description of the limitations and constraints dictated by reduced supply voltage on analog design: chapter 2 shows that the minimum power consumption is established by fundamental limits and that the power consumption has to be increased with voltage down-scaling in order to maintain bandwidth and dynamic range. One of the greatest limitations is related to the MOS threshold voltage $V_{th}$; unfortunately for analog design, this value does not scale linearly with the supply voltage.

Chapter 3 focuses on approaches which directly or indirectly target the low-voltage goal by...
limiting the effects of the threshold voltage. In the first method, the MOS is bulk driven to remove $V_{th}$ from the signal path. The other two approaches exploit the bulk effect to achieve a threshold voltage modulation; it is shown how $V_{th}$ can be decreased by forcing a small bias current through the bulk terminal.

A detailed analysis of the novel Current Driven Bulk (CDB) technique is the subject of chapter 4. It is discussed how the frequency and the transient behavior of the CDB MOS compares to the standard MOS configuration; the second part of the chapter shows a simple circuital solution to address the drawbacks introduced by the CDB technique.

In order to demonstrate the applicability of the CDB technique to standard analog design, a general operational transconductance amplifier capable of sub-1V power supply operation has been fabricated in a 0.5 $\mu$m process. The details of the implementation and the measurement results are discussed in chapter 5. This concludes the first part of the thesis.

The second part of the work opens with a discussion of PLL based techniques for frequency synthesis. Chapter 6 presents an accurate derivation of a general analytical model for a fractional-$N$ PLL that also includes a newly identified non-linear issue, causing down-folding of high power noise into baseband. A novel architecture to overcome this issue is presented. Simulation of $\Sigma\Delta$ synthesizers is challenging due to the non-periodic steady-state behavior and other factors explained in chapter 7; a new methodology entirely based on event-driven simulation is introduced and detailed implementation aspects are discussed throughout chapter 7.

The research on $\Sigma\Delta$ fractional-$N$ PLLs was carried out at Qualcomm CDMA Technology in San Diego; the objective of the research was to investigate the feasibility of $\Sigma\Delta$ fractional-$N$ PLLs in the context of indirect GSM modulation for industrial applications. The complete study case for a GSM/DCS transmitter is the topic of chapter 8. Since the investigation was targeted for industrial applications, a large variety of cases have been analyzed and considered to establish the system reliability and robustness. Chapter 8 presents a brief summary of the main results and conclusions extrapolated from a vast set of simulations.

The PLL calibration issue is discussed in chapter 9. Due to phase noise requirements there is a fundamental trade-off between bandwidth/modulator order and hence power of quantization noise. Since bandwidth extension leads to increased in-band quantization noise, $\Sigma\Delta$ fractional-$N$ PLLs are not suitable for wide-band modulation schemes. A feasible way to extend the modulation bandwidth is to pre-distort the data by means of an inverse filter matching the synthesizer transfer function; however due to the inevitable variation of analog components, a calibration method is necessary to avoid modulation errors. Chapter 9 presents a technique to estimate the bandwidth of the synthesizer by applying a frequency step. The method is justified mathematically and the drawbacks of this approach are discussed through the rest of the chapter.

Conclusions of the work are drawn in chapter 10.
Part I

Low Voltage Amplifier
CHAPTER 2
LIMITS TO LOW-VOLTAGE LOW-POWER DESIGN

The previous chapter has provided the basis to understand the constantly growing demand of low-power low-voltage systems. The next few sections deal with the consequences at circuit level dictated by a lower supply voltage. The focus is especially set on the relationship between low-power and low-voltage requirements. As we shall see, for analog circuits, a reduced supply voltage only brings new design challenges and no added benefits.

2.1 Low-voltage supply limits

As mentioned in the introduction, the lowering of the supply voltage is also a direct consequence of physical limitations: with the down-scaling of the transistor dimensions, the supply voltage must be reduced to avoid power and reliability issues. The oxide thickness of a MOS transistor gate scales with the channel length $L$ at a rate approximately equal to $L/50$ and can tolerate about 800 V/$\mu m$ before break-down [4]. The minimum channel length has rapidly decreased in the last decades into the actual sub-100 nm length: the maximum voltage that such MOS device can tolerate is slightly above 1.2 V [5]. This means that the integrated circuits has to operate with a voltage supply which is a fraction of the given limit voltage.

When reducing the supply voltage, a number of limitations in the design and in the performance of analog circuits arise; in the next sections we shall see what the consequences and the impacts are on the following issues:

1. Threshold voltage.
2. Sub-threshold region.
3. MOS transistor speed.
4. Analog switches.
5. Transistor stacking.
6. Power consumption.
7. Dynamic range.
2.2 Threshold voltage

The threshold voltage dictates perhaps the most serious constraint in low voltage design. The minimum supply voltage is usually required to be at least equal to:

\[ V_{\text{min}} = V_{\text{tn}} + |V_{\text{tp}}| \]  

where \( V_{\text{tn}} \) and \( V_{\text{tp}} \) are the threshold voltage of the n-type and of the p-type transistors. This limitation occurs when the gate of the n-type and of the p-type MOS are connected together, as in a basic inverter configuration. If the supply voltage is below the limit set by eq. 2.1, a dead zone occurs at the middle of the input range.

Even when such configurations are avoided, the threshold voltage seriously limits the available signal swing. In fact, first of all, the power supply must be able to turn the MOS on; assuming strong inversion, this condition can be expressed as:

\[ V_{\text{DD}} - V_{\text{SS}} \geq V_{\text{GS}} = V_{\text{DS,sat}} + |V_{\text{th}}| \]  

Moreover, if the transistor is gate driven, the signal-swing needs to be added on the top of the turn-on voltage, leading to:

\[ V_{\text{DD}} - V_{\text{SS}} \geq V_{\text{GS}} = V_{\text{DS,sat}} + |V_{\text{th}}| + V_{\text{signal}} \]  

Finally, consider that the most basic analog configuration (e.g. source-follower), in addition to the limit set by eq. 2.3, requires headroom for at least one more drain-source saturation voltage. Assuming a 1 V power supply, a threshold-voltage equal to 0.7 V (which is the typical value for a 3.3 V process) and a \( V_{\text{DS,sat}} \) of approximately 100 mV, the allowed signal swing is at most 100 mV, under the assumption that the input signal is limited by the supply voltage. Hence, the threshold voltage is a strong limitation for the signal swing and, unfortunately does not scale down at the same rate of the MOS channel length [4]. This also means that \( V_{\text{th}} \) does not decrease linearly with the supply voltage: the main reason to avoid low \( V_{\text{th}} \) transistors is the increased sub-threshold leakage currents, which would cause a performance degradation (especially from a power consumption point of view).

2.3 Sub-threshold region

Due to the smaller turn-on voltage, weak inversion bias helps in reducing the requirements established by eq. 2.3. Moreover, for a given bias current \( I_{\text{D}} \), the MOS transconductance \( g_{\text{m}} \) is maximized and approximately equal to [6]:

\[ g_{\text{m}} = \frac{I_{\text{D}}}{nV_{\text{T}}} \]  

where \( V_{\text{T}} \) is the thermal voltage \( (V_{\text{T}} \simeq 26 \text{ mV}) \) and \( n \) is the slope factor of the gate voltage \( V_{\text{G}} \) versus pinch-off voltage \( V_{\text{P}} \), defined as the voltage that should be applied to the equipotential
channel ($V_D = V_S$) to cancel the effect of the gate voltage [6]. When only small biasing currents are available, operating the MOS in saturation region would result in a smaller transconductance.

Another advantage of sub-threshold operation is a reduced input-referred noise contribution with respect to the saturation-region counterpart, due to the larger transconductance. However the relative output noise current is maximized: this prevents the use of sub-threshold MOS for biasing circuit (e.g. not directly involved in signal-processing).

The main unwanted issues can be summarized as:

- Lack of accuracy in setting the transistor current: the poor transistor matching limits the use of the sub-threshold region whenever current precision is required (as in current mirrors).
- Increased leakage currents (increasing the power consumption).
- Larger transistor sizes with increased parasitics (compared to the saturated MOS).

2.4 Transistor speed

The supply voltage directly affects the maximum operation frequency of the transistor. The transition frequency $f_T$ for a MOS in strong inversion can be approximately expressed as [7]:

$$f_T = \frac{\mu V_p}{L^2}$$

where $V_p$ is the pinch-off voltage and $\mu$ is the carrier mobility. If the threshold voltage gets scaled with the supply-voltage for a fixed technology, then the maximum frequency $f_T$ decreases.

2.5 Power limitations

Unfortunately, in analog circuits, lowering the supply voltage does not lead to reduced power consumption. In analog signal processing circuits, power is consumed to maintain the energy of the signal above the thermal noise: therefore, what matters is the signal-to-noise ratio (SNR) and the desired bandwidth. These two parameters set the minimum power consumption: in fact, it can be shown that the minimum power required to implement a single pole is given by [7]:

$$P = 8 \cdot kT \cdot f \cdot \frac{\text{SNR} (V_{DD} - V_{SS})}{V_{pp}}$$

where $T$ is the absolute temperature, $k$ is the Boltzmann constant, $f$ is the signal frequency and $V_{sig}$ is the peak-to-peak voltage amplitude, as shown in fig. 2.1. The above equation has been derived assuming a 100% current efficient integrator.

Since no assumptions have been made on the technology or on the power supply, equation 2.6 places a fundamental limit. In a real design, there are several factors that boost the power consumption beyond the established limit, such as additional noise sources ($1/f$ noise, supply noise) or circuit branches not directly used for signal processing (e.g. level shifter). The bias circuitry
directly contributes to augment the power consumption and should therefore minimized, but on the other hand a bad biasing scheme could increase the noise of the circuit.

As equation 2.6 states, power dissipation is increased if the signal at a node that realizes a pole has a peak to peak voltage amplitude smaller than the available supply voltage. This means that the input signal should be amplified in the first stage of the design.

Finally, observe that in circuits requiring a timing signal (e.g. switched-capacitors circuits), the clock must operate at twice the maximum frequency of the processed signal in order to avoid aliasing (Nyquist theorem). Therefore, for certain applications, the power consumption of this block might dominate.

2.6 Analog switches

The use of analog switches in low voltage design faces several challenges. A first obvious problem is the increased switch resistivity due to the reduced turn-on voltage available. To compensate this effect, transistors must be designed with larger dimensions; on the other hand, this leads to a larger clock feed-through and increased static power dissipation [8].

A non-conducting region centered around \((V_{DD} - V_{SS})/2\) is exhibited by complementary switch when the voltage supply is below a critical voltage \(V_{crit}\) given by [9]:

\[
V_{crit} = \frac{2V_{th}}{2 - n}
\]

where \(n\) is the same slope factor as defined in eq. 2.4. The critical voltage value \(V_{crit}\) has been derived under the assumption that the NMOS and the PMOS parameters of the complementary switch are the same. This gap, illustrated in fig. 2.2, limits the voltage range of the op-amp connected to the switch. An approach to solve the issue is the boot-strap technique [10].

A well-known issue related to analog switches is the charge injection problem; when the switch is turned off, the charge in the MOS channel flows out from the channel region to the drain and the
2.7 TRANSISTOR STACKING AND CASCODING.

source junctions. Consider fig. 2.2; the fraction of charge \( Q \) released into capacitor \( C \) causes an absolute voltage error equal to:

\[ \Delta V = \frac{\Delta Q}{C} \]

The relative voltage error across the capacitor is given by:

\[ \frac{\Delta V}{V} = \frac{\Delta Q}{(V_{DD} - V_{SS}) C} \]

where \( (V_{DD} - V_{SS}) C \) is the capacitor maximum available charge. Eq. 2.9 indicates that the relative voltage error grows proportionally to the reduction of the supply voltage.

2.7 Transistor stacking and cascoding.

Clearly, when the supply voltage is lowered, the allowed voltage swings of the circuit nodes become narrower, restraining the use of standard configurations such as stacked transistors.

For deep sub-micron CMOS technology, the intrinsic transistor gain \( A = g_m r_{ds} \) is usually lower than 20 dB. The standard configuration for gain boost is the cascoding configuration that, in low-voltage design, is not readily available due to output swing limitations. Cascoding could be replaced by cascading; however, cascading structures augment the power consumption and demand frequency compensation, since the gain boosting is achieved through several amplifying stages.

2.8 Dynamic range

The impact of low-power low-voltage design on the dynamic range of an analog amplifier can be briefly summarized as follows:

- To maintain a respectable signal swing extra power must be consumed; if this is not the case than circuits experience a drastic reduction of the dynamic range.
To achieve a rail-to-rail output swing, an op-amp must include a common-source stage at the output. As a consequence, the input range of the common-source is limited by its gain: in other words, the signal has to be kept small until it reaches the last stage.

The standard approach to maximize the input swing is to use complementary pairs. However, in order to obtain a constant transconductance over the entire input range [11], extra circuitry (and hence more power dissipation) is required.

2.9 Summary

Lowering the supply voltage does not decrease the power consumption in analog circuits and introduces several design constraints, limiting the use of standard circuit configurations. The limits placed by the decreased power supply are fundamental and can only be approached by proper design choices, usually at the expense of increased power consumption. In particular, providing high gain and high output swing becomes very challenging in low-voltage applications; it appears clear that trade-offs between performance and power dissipation must be accepted as a natural consequence.
Different approaches have been proposed and new solutions are constantly investigated to address the challenges of designing reliable analog building blocks operating with a low supply voltage.

Generic low-voltage amplifiers (up to a sub-1 V power supply) have been designed using floating gates [12, 13], charge-pumps [14] or switched-capacitors techniques [15, 8]. Each of these approaches shows its own limitations, like low input stage gain, necessary trimming or discrete time signal processing. Solutions based on voltage doublers are penalized from power consumptions, power supply noise and voltage stress point of view (i.e. device reliability).

A complete analysis of all the available low-voltage analog techniques is beyond the scope of this work; this chapter focuses on a particular category of low-voltage approaches, namely techniques based on bulk-driving the MOS transistor [16, 17, 18, 19].

3.1 Bulk-driven MOS

The operational amplifier is perhaps the most important basic building block in analog and mixed-mode circuits. The minimum supply voltage is usually imposed by the differential input pair and, as discussed in the previous chapter, is equal to the threshold voltage plus, at least, two overdrive voltages. To minimize the supply requirements, the terminals of the input pair must operate with their voltage potential very close to one of the supply rails, penalizing the input-common mode range (ICMR) of the amplifier.

An innovative method to overcome the constraint imposed by the threshold voltage \( V_{th} \) is based on bulk driving the MOS transistor [16] to remove the gate-source turn-on voltage from the signal path (eq. 2.3).

Operating the MOS through the bulk-terminal allows the design of extremely low-supply voltage circuits. The behavior of the bulk-driven MOSFET is very close to a junction-field-effect transistor (JFET); the signal is applied between the bulk and the source and the current flowing from the source to the drain is modulated by the reverse bias applied on the bulk. The gate-source voltage of the MOS is fixed and is set to turn the MOSFET on. The characteristics of the bulk-driven MOS can be summarized as:

- large input common-mode range, allowing a wide range of bias voltage, including also small
CHAPTER 3. CMOS BULK TECHNIQUES FOR LOW-VOLTAGE ANALOG DESIGN

Figure 3.1: Bulk driven input pair and bulk driven current mirror.

Positive values.

- the small signal transconductance $g_{mb}$ can even be larger than the MOSFET transconductance $g_m$; at the risk of relevant current injection, for $V_{BS} > 0.5\, \text{V}$ the bulk transconductance exceeds the MOS transconductance.

- high input impedance.

The limits of this technique are a lower transitional frequency $f_T$, due to the larger input capacitance, and increased noise due to the added thermal noise of the bulk sheet resistance. An approximate expression to compare the bulk-driven $f_T$ with the standard MOS $f_T$, can be found in [16]:

$$\frac{f_{T,\text{bulk-driven}}}{f_{T,\text{gate-driven}}} \approx \frac{\eta \sqrt{S}}{3.8}$$  \hspace{1cm} (3.1)

where $\eta$ is the ratio of $g_{mb}$ to $g_m$ ($\eta = 0.2$) and $S$ is the scaling factor. The ratio set by eq. 3.1 is going to be reduced by future scaling.

Figure 3.1 shows two examples of the applicability of the bulk driven technique to standard analog blocks to decrease the voltage requirements. Bulk-driven transistors $M_1$ and $M_2$ can be used in an input pair configuration to achieve a large common-mode voltage $V_{CM}$ input range. As reported in [20], within a 1-V supply, $V_{CM}$ can move rail-to-rail, without the risk of forward-biasing the bulk-source junction. Given the same bias current and load, the Bulk Driven Differential Pair (BDDP) voltage gain is slightly smaller compared to the standard differential pair; another drawback is the requirement of a negative supply-rail, in order to achieve rail-to-rail swing. Note that the minimum supply voltage can theoretically be as low as three saturation voltages $V_{DS,\text{sat}}$ (i.e. three stacked transistors from ground to $V_{DD}$), but the differential pair still maintains a compatible input range, i.e. the input common mode can voltage has rail-to-rail range.
3.2. MOS THRESHOLD VOLTAGE

A second standard building block is shown on the right side of figure 3.1; this current mirror uses bulk-drain connections rather than the standard gate-drain diode connection. According to the measured results in [16], the bulk-driven current mirror shows matching, bandwidth and impedance characteristics comparable with the ones of the standard diode-connected current mirror. Also in this case, the supply voltage requirements are lowered, since the threshold voltage is not in the signal path: the voltage drop input-ground has been reported to be less than 0.5 V even for appreciable currents [16].

In chapter 5 the performance of a 1-V supply amplifier based on the bulk-driven MOS will be discussed and compared with the 1-V supply OTA developed in this work.

3.2 MOS threshold voltage

Another possibility to overcome the voltage limits set by the threshold voltage is to decrease $V_{th}$ either through technology scaling or through circuital technique. As previously mentioned, the $V_{th}$ scaling leads to a degradation of the sub-threshold characteristics [4] and a degradation of the noise margin [21], together with an increased stand-by power (more relevant for digital circuits).

The circuital approach relies on exploiting the MOS body-effect to modulate the value of $V_{th}$ [22]. The threshold voltage is defined as the applied gate voltage required to achieve the threshold inversion point; this condition is reached when an inversion layer of holes (for a n-type semiconductor) or an inversion layer of electrons (for a p-type semiconductor) is created at the oxide-semiconductor interface [22]. Once the layer (channel) is created, the current can flow between drain and source.

For a PMOS transistor the current is determined by holes transition; by applying a negative voltage to the PMOS gate (with respect to the source potential), the holes are collected under the gate and the electrons are pushed away from the gate toward the substrate (or the well). Usually the substrate (or the well) terminal is tied to the proper supply rail, so that the bulk-source diode is always in the reverse biasing condition. The effect of a voltage difference between the bulk terminal and the source terminal directly affect the threshold voltage value, according to the following formula [22]:

$$V_{th} = V_{th0} + \gamma \cdot (\sqrt{2\phi_F - V_{BS}} - \sqrt{2\phi_F})$$

where $V_{th0}$ is the zero bias threshold voltage, $\gamma$ is the bulk effect factor, $\phi_F$ is the Fermi potential and $V_{BS}$ is the bulk-source voltage.

In a standard 3.3 V CMOS process the typical PMOS $V_{th}$ value is about $-0.6, -0.7$ V; however, according to eq. 3.2, $V_{th}$ can be changed by modulating the bulk-source voltages. By applying a positive bulk-source voltage $V_{BS}$, the width of the channel-body depletion layer increases, resulting in an increase in the density of the trapped carriers in the depletion layer [22]; to maintain charge neutrality, the channel charge must decrease. This means a higher gate voltage (in absolute value) is now required to achieve the inversion point.
CHAPTER 3. CMOS BULK TECHNIQUES FOR LOW-VOLTAGE ANALOG DESIGN

By applying a small negative bulk-source voltage $V_{BS}$, the magnitude of the threshold voltage can be decreased. However the allowed range for negative $V_{BS}$ is quite narrow to maintain the bulk-source diode in cutoff condition. This is required in order to avoid parasitic currents inside the transistor body that would deteriorate the transistor performance. Thus for a PMOS, the condition $V_{BS} > 0$ should always be ensured.

On the other hand, a diode is not conducting a significant current for small forward biasing voltages, typically less than $500 \text{ mV}$ [23]. Two issues still remain:

- An accurate value for the forward biasing limit can not be established precisely.
- Due to the exponential behavior of the diode I-V characteristic, a small variation in the applied voltage (or a temperature change) can determine a large current in the diode.

3.2.1 DTMOS

A good example of actively using the bulk-effect is given by the Dynamic Threshold Voltage MOS (DTMOS) [17]. As shown in fig. 3.2a, the DTMOS is formed by connecting the MOS gate to its well terminal. When the device is turned on, the connection causes $V_{th}$ to decrease increasing the gate overdrive. The opposite occurs when the DTMOS is off: the threshold voltage is increased reducing the sub-threshold leakage. For the limits discussed in the previous section, this technique is limited to very low supply voltage [24]: if the bulk source diode gets forward-biased, large body-to-source/drain junction capacitances and currents will result, degrading speed and increasing static power dissipation (critical especially for digital applications).

3.3 Current-Driven Bulk approach

The key issue to completely exploit the bulk-effect to lower the MOS $V_{th}$ is to find an efficient and robust method to achieve the maximum allowable forward biasing, without having large parasitic
currents flowing in the well or in the substrate. For the reasons discussed in the previous section, applying a voltage source to the MOS body is not a feasible solution.

On the contrary, controlling the parasitic current by means of a current source is the solution to the above mentioned issues. The current driven bulk transistor consists of a MOS with the bulk terminal connected to a current source (3.2b). Setting the biasing voltage by controlling the MOS parasitic body currents shows the following features:

- The parasitic current can be set to the desired value.
- Given a parasitic current, the maximum forward biasing of the bulk-source junction is obtained.

Observe that the voltage $V_{BS}$ is unknown; nevertheless this is not a concern for the method applicability, since the $V_{BS}$ voltage is maximized compatibly with the set parasitic current.

### 3.4 Summary

Techniques based on using the MOS as a four terminals devices have demonstrated the capability of low-voltage operation. Driving the MOS through its bulk removes the minimum supply voltage constraint imposed by the threshold-voltage $V_{th}$, that, for several reasons, does not linearly scale with the supply voltage. Another promising approach is to reduce the $V_{th}$ by means of a small positive source-bulk voltage $V_{SB}$; this can be achieved in a robust way through direct control of the parasitic current.
CHAPTER 4

CURRENT DRIVEN BULK MOS

Controlling the bulk MOS current appears to be an efficient solution to achieve a robust source-bulk forward biasing. On the other hand, the effects of the parasitic bipolars can not be neglected now and, potentially, can constitute a serious problem for the CDB MOS performance. Fortunately, as we shall see, simple circuital techniques can be used to overcome the introduced issues.

Results from simulations demonstrate that the relevant characteristics of the CDB are comparable to the standard MOS ones.

4.1 CDB MOS analysis

In order to establish the magnitude of the bulk parasitic current precisely, the physical structure of the MOS transistor has to be considered. As shown in fig. 4.1, two parasitic bipolar junction transistors (BJT) exist in the PMOS inner structure [25]: a lateral PNP transistor and a vertical PNP transistor. For both BJTs, the emitter terminal corresponds to the MOS source terminal and the base is formed by the n-well. For the lateral bipolar, the collector terminal corresponds to the PMOS drain terminal; the collector of the vertical bipolar coincides with the substrate.

Since the bulk current flows through the common bases of the BJTs, the total parasitic current of the MOS is given by the sum of the base currents multiplied by the correspondent gain \( \beta \). In order to account for the presence of the parasitic bipolar transistors, the CDB model shown in figure 4.1 is used in both the analysis and the simulations, since the parasitic BJTs are not included in the BSIM3 model [26]. Furthermore, for proper simulation, the MOS source area and perimeter has to be set to minimum sizes, otherwise no current will flow through the bipolar emitters.

Assuming the gain of the vertical bipolar and of the lateral bipolar to be, respectively \( \beta_{CS} \) and \( \beta_{CD} \), the total parasitic PMOS current is given by:

\[
I_E = (1 + \beta_{CD} + \beta_{CS}) \cdot I_{BB}
\]  

(4.1)

where \( I_E \) and \( I_{BB} \) are, respectively, the total emitter current and the base current (refer to fig. 4.1). The previous equation can be used to dimension the bulk current source; in order to keep the parasitic current negligible, the ratio of the emitter current to the PMOS bias current has been set to 1/10 in all the CDB transistor.

Simulations based on the CDB model of fig. 4.1 predict the threshold voltage modulation de-
Figure 4.1: Current Driven Bulk MOS model and PMOS cross-section.

According to the graph, the threshold voltage can be decreased more than 150 mV with a parasitic current in the \( \mu A \) range. Unfortunately, current driving the MOS bulk also adds a set of unwanted effects which might limit the applicability of the CDB technique:

- lowering of output impedance.
- increasing impact of parasitic capacitances.
- unknown parasitic bipolars gain.
- increasing transistor noise.

### 4.1.1 Output impedance

Since in the CDB MOS current flows through the parasitic bipolars, the first obvious consequence is the lowering of the MOS output impedance: in fact, the MOS drain-source resistance \( R_{DS} \) is placed...
### 4.1. CDB MOS ANALYSIS

#### 4.1.2 Parasitic capacitances

Since the parasitic bipolar transistors are now active devices, the impact of their capacitances on the transient and on the frequency behavior cannot be neglected. The main capacitances, shown in Fig. 4.1, are respectively:

- $C_{BS}$: MOS bulk-source diffusion capacitance corresponding to the bipolar transistor diffusion capacitance $C_n$
- $C_{BD}$: MOS bulk-drain diffusion capacitance corresponding to the lateral bipolar junction capacitance $C_\mu$
- $C_{BSS}$: MOS bulk-substrate parasitic capacitance corresponding to the vertical bipolar junction capacitance $C_\mu$

By ensuring the bipolar current to be significantly smaller (e.g., one order of magnitude) than the MOS current, the CDB MOS output impedance is not affected by the bipolar emitter-collector resistance, since the magnitude of $R_0$ is much larger compared to the drain-source resistance $R_{DS}$.

If this is not the case, another possibility is to bias the parasitic bipolar transistors in their active region to achieve a resistance $R_0$ of the same order of magnitude of $R_{DS}$: the bipolar base-emitter junction needs to be forward-biased, while the base-collector junction operates in the cut-off region. The latter condition is achieved when $V_{SD} > 200 \text{ mV}$. A comparison between the output characteristics of the MOS and the CDB MOS is shown in Fig. 4.3.

**Figure 4.3: CDB compared with normal MOS output characteristics.**
The effects of these capacitances can be characterized by the behavior of the CDB transistor in two basic configurations, the common-source stage and the source-follower stage.

### 4.1.3 Slew-rate

The basic source-follower configuration is shown on the left side of fig. 4.4. When a positive input voltage step $V_{in}$ is applied to the MOS transistor gate, the source node voltage increases, trying to maintain a constant gate-source voltage $V_{GS}$.

Since the voltage drop across the base-emitter junction of the parasitic BJTs changes, the current flowing in the bipolars starts to increase. Thus, the quiescent bias current $I_{BIAS}$ flows entirely through the bipolars: this causes the bulk-drain capacitor $C_{BD}$ and the bulk-substrate capacitor $C_{BSS}$ to be charged by a current equal to the quiescent current divided by the base-emitter current gain, namely $I_{bias}/(\beta_{CD} + \beta_{CS} + 1)$. The bulk voltage begins to rise, reducing the voltage drop across the base-emitter junction, and the charging current across $C_{BD}$ and $C_{BSS}$ rapidly decreases.

When a negative voltage step is applied, the MOS source voltage falls, shutting down the base-emitter junction. The capacitors are then forced to discharge, but the only current available for this operation is the base current $I_{BB}$, which is a very small current. As a consequence, the slew rate of the CDB MOS is very poor.

Fig. 4.5 shows the transient response of the CDB and of the standard MOS to an input step; note that, as predicted, the negative step shows the longest transient time, since the discharging current is around ten times smaller than the charging current.

### 4.1.4 Frequency behavior

Besides causing slew-rating issues, the drain bulk capacitance also affects the AC performance of the CDB, as fig. 4.6 exhibits. The frequency response of the common-source stage reveals the
4.1. CDB MOS ANALYSIS

Figure 4.5: Source follower step response.

Figure 4.6: Common source AC response.
presence of a low frequency pole and a low frequency zero. The origin of this pole-zero pair can be determined with the aid of small-signal analysis. Fig. 4.7 presents the small signal model corresponding to the common source stage configuration of fig. 4.6. A few simplifications have been introduced in the schematic, considering that:

- The effect of the vertical bipolar can be disregarded since both emitter and collector are at AC ground.
- The body effect is modeled by the lateral bipolar transconductance $g_{mb}$.
- The output resistance $R_{OUT}$ is the parallel combination of the CDB MOS output resistance, lateral BJT output resistance and MOS biasing current source resistance.

The purpose of the decoupling capacitor $C_{DEC}$ will be explained at a later stage.

The frequency of the lowest pole can be calculated with the time-constant method [27]. The contribution of the gate-source capacitance and of the gate-drain capacitance can be disregarded in the analysis, since they only affect the high frequency circuit behavior. The time constant $\tau_{BD}$ and $\tau_{BS}$ associated to the bulk-source capacitance $C_{BD}$ and to the bulk-source capacitance $C_{BS}$ can be expressed as:

$$\tau_{BS} = (R_{IB}/r_{p}) \cdot (C_{BS} + C_{BSS})$$  \hspace{1cm} (4.2)

$$\tau_{BD} = [(R_{IB}/r_{p}) \cdot (1 + g_{mb}R_{out}) + R_{out}] \cdot C_{BD}$$  \hspace{1cm} (4.3)

The largest time constant is the one associated to the $C_{BD}$ capacitance; to cancel this low frequency pole several solutions are available:

- Increased bias currents.
- use of a cascode configuration.
- application of a decoupling capacitor.
4.1. CDB MOS ANALYSIS

4.1.4.1 Increased bias currents

The slew rate issue can be eliminated by providing more current for charging and discharging the bulk-drain capacitance. This solution can be implemented by adding and subtracting a DC bias current at the bulk terminal, as shown on the left side of fig. 4.8. Unfortunately, this approach is not very robust: in fact, the current responsible for the forward biasing is, in this case, given by the difference between the source current $I_1$ and the sink current $I_2$. To maintain an accurate control of the bipolars base current, the ratio between the parasitic current and the DC bias currents, should not be larger than one order of magnitude. Moreover, this extra biasing current is not used for any signal processing (i.e. it is wasted current from power consumption point of view).

4.1.4.2 Cascode transistor

The poor AC performance of the CDB MOS is caused by voltage changes across the drain-bulk capacitance. In the common source stage the MOS source is kept at a fixed voltage and changes of the $C_{BD}$ voltages are a consequence of the variations of the drain potential. By cascoding the CDB transistor (right side of fig. 4.8), the drain voltage is almost fixed and the bulk-drain capacitance is not charged or discharged anymore.

4.1.4.3 Decoupling capacitor.

This solution consists in placing a capacitor across the bulk and the source terminals (dashed capacitor in fig. 4.7) to maintain the $V_{BS}$ voltage constant. Both the slew-rate limiting effects and the low frequency zero-pole pair are canceled: in fact, the decoupling capacitor provides a path to the bulk-drain capacitance to get more current in the charge-discharge phases and, at the same time, shifts the pole toward lower frequencies ($\tau_{BS}$ increases, being the decoupling capacitor in parallel with $C_{BS}$ and $C_{BSS}$). Simultaneously, the zero is moved closer to the pole, canceling each other. The effects of the decoupling capacitor are presented in figures 4.9 and 4.10 which show the improved CDB transient and frequency behavior.

4.1.5 Parasitic bipolar gain

So far, in the previous sections, the gains of the vertical bipolar and of the lateral bipolar have been set to the same arbitrary, but reasonable value ($\beta_{CS} = \beta_{CE} = 100$) for simulation purposes, but the real value of these two parameters is actually unknown. If the gain value can not be estimated, then the parasitic currents of the MOS would remain undetermined.

A way to fix the bulk current is to use the biasing circuitry proposed in fig. 4.11. The basic operation is structured as follows: transistor $M_1$ is biased to drive a current $I_{S,E}$ that comprises both the current flowing in the source of the driven bulk transistor $M_3$ and the current running through the two bipolars. Transistor $M_2$ is biased to drive a current $I_{D,C}$ that includes the MOS current $I_D$ and the collector current $I_{CD}$ of the lateral BJT. The feedback connection will set a bulk-bias current...
I_{BB} and a bias voltage \( V_{\text{bias}} \) for transistor \( M_4 \), so that, regardless of the actual values of the \( \beta \)'s, the following relation holds:

\[
I_{S,E} = I_D + I_{BB}(1 + \beta_{CS} + \beta_{CD})
\]

In order to keep the magnitude of \( I_{BB} \) negligible compared to the MOS drain current, \( I_{D,C} \) and \( I_{S,E} \) can be fixed so that \( I_{S,E} \approx 1.1 I_D \) and \( I_{D,C} \approx 1.2 I_D \). For the biasing scheme to work properly, the following conditions must be ensured:

- \( V_{\text{BE}} < V_{\text{th},N} \) otherwise no drain-source is available for transistor \( M_4 \).
- \( |V_{\text{th},P}| < V_{\text{th},N} \) otherwise no drain-source is available for the current source \( M_2 \).

The requirements can be lessened by including the dashed level-shifter in fig. 4.11. The main concern with the parasitic bipolars is that they can have quite high base-collector current gains [28]; this puts some limitations to the applicability of the CDB technique. To minimize the gain of the lateral bipolar, a MOS with a channel length longer than the minimum size should be used. The gain of the vertical bipolar is instead more difficult to diminish: a solution is to use the layout presented in fig. 4.11; as shown, the bulk connection is completely surrounded by the source junction, therefore minimizing the base of the vertical bipolar.

### 4.2 CDB noise performance

The overall CDB MOS noise comprises, in addition to the standard MOS noise sources, a contribution from the parasitic bipolars. The BJT noise sources are typically the shot noise of collector and base currents, the flicker noise of base current and the thermal noise of base resistance [29]. These different noise sources can be modeled as two independent noise sources: a current noise source...
4.2. CDB NOISE PERFORMANCE

Figure 4.9: Source follower step response with decoupling capacitor.

placed between the emitter and the base terminals and a voltage noise source placed in series with
the base.

The Power Spectral Density (PSD) of the current source is given by [29]:

\[ i^2(f) = 2q \left( I_B + \frac{K \cdot I_B}{f} + \frac{I_C}{|\beta(f)|^2} \right) \]  

(4.4)

The PSD of the noise voltage source can be expressed as [29]:

\[ v^2(f) = 4kT \left( r_b + \frac{1}{g_{m,BJT}} \right) \]  

(4.5)

The current source models both the flicker noise and the shot noise contributions; as equation 4.4
shows, the overall noise is at first approximation directly proportional to the bipolar base current.
Since the magnitude of this current is very small, the flicker and shot noise contribution is insignif-
icant with respect to the MOS shot and flicker noise counterparts.

The voltage source models the contribution of the thermal noise and, as equation 4.5 states, is
inversely proportional to the bipolar transconductance \( g_{m,BJT} \). Once more, the bipolar bias current
is small, resulting in a poor BJT transconductance; hence the thermal noise contribution can be
neglected. To a first order approximation, the ratio \( r \) between the MOS transconductance and the
bipolar transconductance can be expressed as:

\[ r = \frac{g_{m,BJT}}{g_{m,MOS}} \approx \frac{I_C}{nI_D} \approx \frac{1}{10n} \]  

(4.6)

Eq. 4.6 has been derived under the assumption that the bipolar current is set to 1/10 of the MOS

\[ 320 \text{ mV} \]
\[ 220 \text{ mV} \]
\[ 120 \text{ mV} \]
\[ 0 \text{ mV} \]
\[ 0 \text{ s} \]
\[ 1 \text{ s} \]
\[ 3 \text{ s} \]
\[ 4 \text{ s} \]
bias current and the MOS operates in sub-threshold. The thermal noise contribution of the bipolar is an order of magnitude smaller than the MOS thermal noise. Depending on the type of application, this contribution can impact the SNR performance of the circuit; for low frequency applications, the thermal noise contribution is not so important, since the MOS flicker noise dominates.

A final observation: the noise contribution of the base resistance $r_b$ can be reduced by decreasing the resistance with the CDB layout proposed in fig. 4.11.

### 4.2.1 Supply noise coupling

Another potential CDB limit may be dictated by noise coupling from the supply rails, degrading the Power-Supply Rejection Ratio (PSRR) performance of the design. Possible coupling paths are the collector terminal (substrate contact) of the vertical bipolar and the current source MOS that sets the bulk current.

In the first case, once the condition $V_{SD} > 200 \text{ mV}$ is ensured, the bipolar base-collector junction operates in cut-off mode and therefore variations on the collector voltage do not largely affect the bipolar current (e.g. the current variation is limited to the Early effect).

In the second case, assuming the current source is implemented with a MOS in saturation (e.g. transistor $M_5$ in fig. 4.11), noise on the negative supply rail directly modulates the bias current; this effect can be modeled as a voltage source of magnitude $V_n$ connected to the MOS gate. The noise current due to the noise voltage $V_n$ can then be expressed as:

$$I_{noise} = g_m V_n$$

(4.7)

Since the transconductance $g_m$ of the MOS current source is very poor (due to the small bias current)
4.3  CDB MOS SUMMARY

The Current Driven Bulk MOS is based on a new technique whose purpose is to decrease the MOS threshold-voltage. The main characteristics of the CDB MOS can be summarized as:

- reduced threshold voltage.
- easy integration with standard basic analog building blocks.
- transient and frequency behavior comparable to standard MOS behavior.
- increased thermal noise.

The impact of the drawbacks on the MOS performance has been demonstrated to be not critical; simple circuital techniques can be used to solve most of the issues. Depending on the applications, only the noise performance of the CDB MOS can be worse than the standard MOS, due to the active parasitic bipolars. Finally, supply noise coupling does not appear to be a problem.
CHAPTER 5

1 V OPERATIONAL TRANSCONDUCTANCE AMPLIFIER

This chapter demonstrates the applicability of the CDB technique on a standard cascode Operational Transconductance Amplifier (OTA): CDB MOS transistors are used in the input-pair as well as in the current mirror. The modified architecture can operate from a power-supply below 1 V. A prototype OTA has been fabricated in a standard 0.5 μm CMOS process to verify the low-voltage operation capability of the CDB technique.

5.1 OTA architecture.

The low-voltage amplifier schematic is shown in fig. 5.1. It is a standard differential-input single-ended output folded cascode transconductance amplifier [27] with a CDB differential pair (and a decoupling capacitor $C_{DEC}$) and a CDB output current mirror, formed by transistors $M_7$ to $M_{11}$ [30]. For simplicity, a straightforward bias circuit is shown.

The bias current $2I_0$ provided by transistor $M_5$ is equally split in the input pair transistor $M_1$ and $M_2$. Transistors $M_3$ and $M_4$ work as current sources and are biased to sink the same current $I_1$. The current flowing into $M_6$ and $M_7$ is then the same and it is equal to the difference between the input pair current $I_0$ and the current $I_1$.

When a small differential voltage $\Delta V_{in}$ is applied to the input pair, the variation of the drain current of $M_1$ and $M_2$ can be expressed as:

$$\pm \Delta i_0 = \pm g_{m1} \Delta V_{in} / 2$$  \hspace{1cm} (5.1)

where $g_{m1}$ is the transconductance the input transistor $M_1$. Since the currents flowing in $M_3$ and $M_4$ are constant, the same current variation will occur in transistors $M_6$ and $M_7$. The wide swing current mirror (transistors $M_8$ through $M_{11}$) mirrors the current change of $M_6$ into $M_9$. This leads to an output voltage variation equal to:

$$\Delta V_{out} = \pm g_{m1} \Delta V_{in} R_0$$  \hspace{1cm} (5.2)

where $R_0$ is the output resistance which, at a first order, can be approximated as:

$$R_0 = g_{m7}r_{o7}(r_{o4} \parallel r_{o2}) \parallel (g_{m9}r_{o9}r_{o11})$$  \hspace{1cm} (5.3)
where $r_{on}$ is the drain-source resistance of transistor $M_n$.

The dominant pole of the amplifier is determined by the time constant associated with the output load capacitance $C_L$. The high-frequency poles are determined by the stray capacitances loading the low-impedance nodes, indicated with 1, 2 and 3 in figure 5.1. The impedance seen at node 1 is approximately $1/gm_6$, at node 2 is about $1/gm_7 + (gm_9r_{on}r_{on11})/(gm_7r_{on})$ and at node 3 is around $1/gm_{10}$.

Ignoring the contribution of the high frequency poles, the approximate small signal transfer function is given by:

$$A(s) = \frac{gm_1R_0}{1 + sR_0C_L}$$

from which the OTA unity-gain is found to be:

$$\omega_t = \frac{gm_1}{C_L}$$

The larger the compensation capacitor $C_L$ is, the greater the phase margin of the operational amplifier. Maximizing the transconductance of the input transistors maximizes both the bandwidth and the DC gain.
5.2. OTA SIMULATION

Table 5.1: OTA transistor dimensions.

<table>
<thead>
<tr>
<th>MOS</th>
<th>M1 - M2</th>
<th>M3 - M4</th>
<th>M5</th>
<th>M6 - M7</th>
<th>M8 - M9</th>
<th>M10 - M11</th>
<th>M12</th>
<th>M13</th>
</tr>
</thead>
<tbody>
<tr>
<td>W (µm)</td>
<td>400</td>
<td>20</td>
<td>80</td>
<td>20</td>
<td>40</td>
<td>40</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>L (µm)</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>50</td>
<td>50</td>
</tr>
<tr>
<td>I_D (µA)</td>
<td>10</td>
<td>20</td>
<td>20</td>
<td>10</td>
<td>10</td>
<td>10</td>
<td>≈ 0.01</td>
<td>≈ 0.01</td>
</tr>
</tbody>
</table>

During time of slew-rating, one input pair transistor is turned-off and all the bias current provided by \( M_5 \) is flowing in the other input MOS. The current of the sink transistor connected to the off transistor is all diverted at the output load. Therefore, for a large positive (negative) differential input, transistor \( M_1 (M_2) \) turns off and the output voltage increases (decreases) linearly with a Slew-Rate (SR) given by:

\[
SR = \frac{I_{M_3}}{C_L}
\]

5.2 OTA simulation

The OTA transistor dimensions are presented in table 5.1. The bias current of the input pair is set to 10 µA and transistors \( M_3 \) and \( M_4 \) are biased to drive a current of 20 µA. The current in the mirror transistors and in the cascode transistors \( M_6 \) and \( M_7 \), set by the difference of the previous currents, is equal to 10 µA.

In order to keep the parasitic currents of the bulk driven transistors at negligible values, the total emitter current is set to 1 µA. To dimension the CDB current sources \( M_{12} \) and \( M_{13} \), the gain of the parasitic bipolar has been assumed to be \( \beta_{CS} = \beta_{CD} = 100 \). According to the bipolar gain, the current through the common bulk terminal node of the input pair is approximately 10 nA. Since this small current is obtained through current mirroring (not shown in fig. 5.1), transistors \( M_{12} \) and \( M_{13} \) are designed with very long channels.

Finally, the capacitive load is set to 20 pF.

Common mode range

Assuming a standard strong inversion design with \( V_{DD} = 1 \text{ V}, V_{SS} = 0 \text{ V}, |V_{th}| = 0.6 \text{ V} \) (typical value for a standard 0.5 µm CMOS process) and an overdrive voltage \( V_{DS, sat} \) of approximately 100 mV, the common-mode voltage input range is roughly given by:

\[
V_{DS, sat} - |V_{th}| \simeq -0.5 \text{ V} \leq V_{cm} \leq 0.2 \text{ V} \simeq V_{DD} - |V_{th}| - 2V_{DS, sat}
\]

If the input pair is biased in the sub-threshold region, the turn-on gate-source voltage \( V_{GS} \) can be reduced, directly improving the input common-mode range. Also, as an additive benefit, the input transconductance is maximized for the given bias current, leading to an increased OTA DC gain and to bandwidth extension. Furthermore, assuming it is possible to reduce the threshold voltage of the
CHAPTER 5. 1 V OPERATIONAL TRANSCONDUCTANCE AMPLIFIER

Figure 5.2: DC responses at different common-mode input levels.

PMOS input transistors till \( V_{\text{th}} = -0.4 \text{ V} \) by current driving the bulk, the common mode range is now modified to:

\[
2V_{\text{DS, sat}} - (|V_{\text{th}}| - V_{\text{DS, sat}}) - \simeq -0.1 \text{ V} \leq V_{\text{cm}} \leq 0.6 \text{ V} \simeq V_{\text{DD}} - V_{\text{DS, sat}} - (|V_{\text{th}}| - V_{\text{DS, sat}})
\]

(5.8)

The variation of the common-mode range becomes relevant when compared with the output voltage range, approximately given by:

\[
2V_{\text{DS, sat}} \simeq 0.2 \text{ V} \leq V_{\text{out}} \leq \simeq 0.8 \text{ V} = V_{\text{DD}} - 2V_{\text{DS, sat}}
\]

(5.9)

Note that a 0.4 V overlap in the valid input-output range is now available.

In order to get more voltage headroom for the current mirror, transistors \( M_{10} \) and \( M_{11} \) are also bulk driven; note that in the standard OTA design there is just enough voltage for the cascode current mirror to function.

The simulated OTA DC transfer curves are plotted in fig. 5.2; the steepest transition occurs for a common mode voltage of 300 mV. The transient response to an input step for three different values of the input decoupling capacitor is shown in fig. 5.3; probably, due to limited charging/discharging currents, when the decoupling capacitor \( C_{\text{DEC}} \) is not used a knee appears in the output curve, decreasing the dynamic range of the amplifier. The final size of \( C_{\text{DEC}} \) was set to 10 pF.

5.2.1 Frequency response

The magnitude and the phase of the OTA transfer function are plotted in figure 5.4. The OTA open-loop gain is found to be around 70 dB and the unity gain frequency is approximately 1.8 MHz, with
5.3 Measurements

A prototype amplifier has been fabricated in a standard 0.5 μm CMOS process. In order to evaluate the impact of the decoupling capacitor on the OTA performance, two separate versions of the amplifier have been laid out on the same chip. In the rest of the chapter, the amplifiers are indicated according to the following conventions:

- OTA1: CDB amplifier with decoupling capacitor.
- OTA2: CDB amplifier without decoupling capacitor.

The OTA layout is depicted in fig. 5.5. The large block on the top-left side of the layout is the transistor array implementing the wide input pair; being current driven, the input transistors are laid out according to the approach discussed in 4.1.5. The total amplifier area is 150 μm × 130 μm; the capacitor, not shown in fig. 5.5, occupies an area of about 290 μm × 322 μm.

The biasing circuit is shown in fig. 5.6. Bias voltages $V_{bias\,1}$ and $V_{bias\,2}$ are generated through the current mirror formed by bias transistors $M_{b\,1}$ and $M_{b\,2}$ and the diode connected transistor $M_{b\,3}$. Note that $V_{bias\,1}$ and $V_{bias\,2}$ set all the OTA bias currents, except for the bulk parasitic currents of the CDB transistor. An off-chip current source $I_{bias}$ is used to control the OTA DC currents.

Instead of regulating the bulk current with the biasing circuitry proposed in chapter 4, a second off-chip current source, $I_{bulk}$, is used to set the bias voltage $V_{bias\,3}$ for the CDB current sources. This allows the possibility to evaluate the impact of bulk current variations on the OTA behavior.
CHAPTER 5. 1 V OPERATIONAL TRANSCONDUCTANCE AMPLIFIER

Figure 5.4: AC response.

Unity Gain
Transconductance
Slew-rate
Phase margin

OTA 1
1.931 MHz
242 \mu A/V
0.61 V/\mu s
58\degree

OTA 2
> 2.1 MHz
> 261 \mu A/V
0.52 V/\mu s
n.a.

Table 5.2: OTA main parameters.

The transistor dimensions were already presented in table 5.1; observe that the OTA has been designed with a total bias current around \( \approx 40 \mu A \) to drive a 20 pF off-chip capacitive load while having a 1-MHz range gain bandwidth (a version for on-chip applications is straightforward to do by transistor scaling). The measured OTA parameters for a 1 V supply voltage are presented in tables 5.2 and 5.3.

5.4 Analysis of the measured data

The data presented in the tables is based on the mean value of different chip measurements. The first obvious conclusion from the measured values is that the CDB technique can be easily implemented in a standard CMOS process and applied to typical analog building blocks to produce low supply voltage design.

<table>
<thead>
<tr>
<th>( V_{cm} (V) )</th>
<th>-0.2</th>
<th>-0.1</th>
<th>0 - 0.6</th>
<th>0.7</th>
</tr>
</thead>
<tbody>
<tr>
<td>OTA 1 output range (V)</td>
<td>0.37 - 0.7</td>
<td>0.14 - 0.82</td>
<td>0.11 - 0.86</td>
<td>0.19 - 0.74</td>
</tr>
<tr>
<td>OTA 2 output range (V)</td>
<td>0.34 - 0.68</td>
<td>0.13 - 0.82</td>
<td>0.14 - 0.86</td>
<td>0.18 - 0.74</td>
</tr>
</tbody>
</table>

Table 5.3: Output range for different input common-mode voltages.
5.4. ANALYSIS OF THE MEASURED DATA

Figure 5.5: Layout of the CDB amplifier.

**Input transconductance**

The measured values of the input pair transconductance (table 5.2) are compatible with the operating region of transistor $M_1$ and $M_2$. Since these transistors are biased in the sub-threshold region, their transconductance $g_m$ can be calculated according to equation 2.4. Considering that the total parasitic current through the bipolars is set to be one-tenth of the input MOS bias current, substituting the values into equation 2.4, yields:

$$g_m = \frac{I_D}{nV_T} \approx \frac{9 \mu A}{1.2 \cdot 26 \text{ mV}} = 288.4 \mu \text{A/V} \quad (5.10)$$
CHAPTER 5. 1 V OPERATIONAL TRANSCONDUCTANCE AMPLIFIER

Figure 5.6: OTA bias circuitry.

The calculated $g_m$ is very close to the measured transconductance; also the measured slew-rate in 5.2 matches the expected theoretical value:

$$SR = \frac{I_{D,M4}}{C_L} \approx \frac{10 \mu A}{20 \text{ pF}} = 0.5 \text{ V/\mu s} \quad (5.11)$$

Common-mode input range

The measured OTA DC responses at different common-mode input voltages are shown in fig. 5.7 and the dc gain as a function of the common-mode input voltage is plotted in fig. 5.8: it shows that, for both amplifier configurations, a gain of at least 62 dB is achieved over an input range of about 0.65 V.

Comparing the constant gain range with equation 5.8, it is possible to extrapolate a rough estimation of the input pair threshold voltage. Assuming that the current source transistor $M_5$ requires a drain-source voltage of around 100 mV, the gate-source voltage drop of transistor $M_1$ is about 0.25 V when the common-mode voltage is 0.65 V. As a consequence, the threshold voltage is roughly equal to 0.35 V; assuming as a typical value for the PMOS threshold voltage $V_{th} \approx -0.6 \text{ V}$, a reduction of around 250 mV (in absolute value) is achieved.

The measured minimum common-mode voltage is slightly greater than the limit predicted by equation 5.8, probably due to a larger effective voltage required by transistors $M_3$ and $M_4$.

Frequency performance

The measured phase and magnitude of the OTA transfer function are compared in fig. 5.9 with the corresponding curves from simulation. The measured and simulated amplitude characteristics agree very well. The measured phase margin is reduced with respect to the value predicted from simulations, but the stability is not compromised.
5.4. ANALYSIS OF THE MEASURED DATA

![Graph showing measured DC OTA responses at different common-mode input voltages.]

**Figure 5.7:** Measured DC OTA responses at different common-mode input voltages.

![Graph showing DC gain for different input common-mode voltages.]

**Figure 5.8:** DC gain for different input common-mode voltages.

**Effect of the decoupling capacitor**

The primary reason for including the decoupling capacitor was to ensure that the low frequency pole-zero pair was removed. From the measured transfer characteristic, the CDB technique, with or without the decoupling capacitor, clearly does not affect the frequency response of the amplifier.

In subsection 4.1.4.2, it was shown that cascoding the CDB transistor was a solution to eliminate the unwanted effects of the $C_{BD}$ bulk-drain parasitic capacitance. Consider now the schematic of fig. 5.1: transistors $M_{10}$ and $M_{11}$ are cascoded and their source is placed to a fixed voltage potential.
<table>
<thead>
<tr>
<th>Common-mode voltage</th>
<th>( V_{DD} = 0.8 \text{ V} )</th>
<th>( V_{DD} = 0.7 \text{ V} )</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.0 V</td>
<td>52.1 dB, 284-506 mV</td>
<td>36.1 dB, 195-471 mV</td>
</tr>
<tr>
<td>0.1 V</td>
<td>55.2 dB, 284-528 mV</td>
<td>37.5 dB, 190-487 mV</td>
</tr>
<tr>
<td>0.2 V</td>
<td>50.7 dB, 268-534 mV</td>
<td>36.2 dB, 221-465 mV</td>
</tr>
<tr>
<td>0.3 V</td>
<td>48.8 dB, 290-518 mV</td>
<td>33.1 dB, 195-354 mV</td>
</tr>
<tr>
<td>0.4 V</td>
<td>46.1 dB, 245-490 mV</td>
<td>4.7 dB, 112-146 mV</td>
</tr>
<tr>
<td>0.5 V</td>
<td>36.2 dB, 309-484 mV</td>
<td>n.a.</td>
</tr>
</tbody>
</table>

Table 5.4: DC gain at different input common-mode voltages at 0.7 V and 0.8 V power supply.

<table>
<thead>
<tr>
<th>Voltage supply (V)</th>
<th>Unity gain (MHz)</th>
<th>PM (degrees)</th>
<th>slew-rate (V/\mu s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.8</td>
<td>0.847</td>
<td>54</td>
<td>0.4</td>
</tr>
<tr>
<td>0.7</td>
<td>1.3</td>
<td>48</td>
<td>0.13</td>
</tr>
</tbody>
</table>

Table 5.5: Main parameters for the amplifier for reduced supply voltage.

Transistors \( M_6 \) and \( M_7 \) form a folded-cascode configuration with the input pair; moreover, when the input is a differential signal, the source potential of the input pair does not vary. Since both the source and drain potentials of the CDB MOS are constant, the unwanted effects introduced by the \( C_{BD} \) capacitance are eliminated.

### 5.5 Sub-1V supply voltage

The OTA was also tested into sub-1 V supply range. Tables 5.4 and 5.5 summarizes the results for 0.8 V and 0.7 V supply voltage. As shown, gain is still available with a power-supply down...
5.6. **COMPARISON**

Figure 5.10: Measured DC transfer function at $V_{DD} = 0.75$ V with and without (two bottom traces) bulk current.

To $0.7$ V. In figure 5.10 the DC transfer function at different common-mode voltage is shown. The lower curves in the graph are the OTA responses when the bulk current is switched-off. It is clearly demonstrated that no gain is available if the OTA does not use the CDB technique.

### 5.6 Comparison

To evaluate the performance of the CDB OTA, the measured figures of merit are compared with the correspondent parameters of two selected low supply voltage amplifiers. Table 5.6 shows the performance of the CDB OTA and of the Bulk-Driven OTA [31] based on the technique discussed in chapter 2. Both OTAs operate from a $1$ V voltage supply, exploit the bulk terminal to achieve low-voltage operations, but are based on different topologies: the bulk driven op-amp is a two stage amplifier [31], while the CDB is a single stage amplifier. The CDB OTA shows better performance in terms of DC gain, unity gain and power dissipation. The greatest advantage of the bulk-driven amplifier is the rail-to-rail operation.

Table 5.7 compares the CDB OTA with the state-of-the-art current mirror OTA presented in [32]. The CDB figures of merit are somewhat comparable to the current mirror OTA ones, except for power consumption. The key point is that, even though the CDB OTA is built in a old $0.5 \mu m$ CMOS process, the performance is still very respectable.
CHAPTER 5. 1 V OPERATIONAL TRANS CONDUCTANCE AMPLIFIER

<table>
<thead>
<tr>
<th>1V bulk-driven Op. Amp.</th>
<th>CDB OTA</th>
</tr>
</thead>
<tbody>
<tr>
<td>DC-open loop gain</td>
<td>48.8 dB ($V_{cm} =$ mid-supply)</td>
</tr>
<tr>
<td>Supply current</td>
<td>287$\mu$A</td>
</tr>
<tr>
<td>Input common mode range</td>
<td>-395 mV to 470 mV</td>
</tr>
<tr>
<td>Output swing</td>
<td>-475 mV to 498 mV</td>
</tr>
<tr>
<td>Unity-gain frequency</td>
<td>1.3 MHz</td>
</tr>
<tr>
<td>Phase margin</td>
<td>68$^\circ$</td>
</tr>
<tr>
<td>Pos. Slew-rate</td>
<td>0.7 V/µs</td>
</tr>
<tr>
<td>Neg. Slew-rate</td>
<td>1.6 V/µs</td>
</tr>
</tbody>
</table>

Table 5.6: DC gain for different input common-mode voltages.

<table>
<thead>
<tr>
<th></th>
<th>Current mirror OTA</th>
<th>CDB OTA</th>
</tr>
</thead>
<tbody>
<tr>
<td>Technology</td>
<td>0.25 $\mu$m CMOS</td>
<td>0.5 $\mu$m CMOS</td>
</tr>
<tr>
<td>DC-open loop gain</td>
<td>52 dB</td>
<td>46 - 53 dB</td>
</tr>
<tr>
<td>Supply current</td>
<td>9.9 $\mu$A</td>
<td>40 $\mu$A</td>
</tr>
<tr>
<td>Input common mode range</td>
<td>n.a.</td>
<td>0 V to 0.4 V</td>
</tr>
<tr>
<td>Output swing</td>
<td>600 mV</td>
<td>500 mV</td>
</tr>
<tr>
<td>Unity-gain frequency</td>
<td>1.2 MHz</td>
<td>0.8 MHz</td>
</tr>
<tr>
<td>Phase margin</td>
<td>60$^\circ$</td>
<td>54$^\circ$</td>
</tr>
<tr>
<td>Pos. Slew-rate</td>
<td>0.2 V/µs</td>
<td>0.4 V/µs</td>
</tr>
<tr>
<td>Neg. Slew-rate</td>
<td>0.2 V/µs</td>
<td>0.4 V/µs</td>
</tr>
</tbody>
</table>

Table 5.7: CDB OTA vs. state-of-the art current mirror OTA.

5.7 Summary

The applicability of the Current-Bulk Driven technique to standard analog blocks has been verified through the design of a low-voltage OTA. The measurements of a prototype implemented in a 0.5 $\mu$m CMOS process are all in agreement with the values predicted by simulations; the OTA performance is very fair, even compared to more recent low-voltage architectures.

The effectiveness of CDB technique has been experimentally verified by further reducing the supply-voltage in the sub-1 V range, where the standard OTA is not capable of providing any gain.
Part II

Σ∆ synthesizer
The frequency synthesizer is one of the main building blocks of integrated transceivers. In RF applications, the carrier frequency is normally synthesized through phase-locked loop based techniques. The standard basic integer-N architecture has been replaced by a more flexible fractional-N topology, usually controlled through ΣΔ modulation. In the last few years the number of publications on this type of synthesizers has rapidly increased [33, 34, 35, 36, 37, 38, 39].

As we shall see, the use of high-order multi-bit ΣΔ modulators introduces the issue of high-frequency quantization noise down-folding. This chapter provides the mathematical basis to quantify this effect and presents a new topology to correct it.

6.1 Phase-Locked Loop

The building blocks of a Phase-Locked Loop (PLL) are presented in figure 6.1. A signal with a stable frequency \( F_{TCXO} \) is generated by means of a Temperature Compensated Crystal Oscillator (TCXO) and it is divided down to a lower frequency signal, whose frequency is known as the comparison frequency \( F_{COMP} \) or, in some cases, as the reference frequency \( F_{REF} \). The phase-detector (PD) produces an output proportional to the phase error between the reference signal \( REF \) and the feed-back signal \( DIV \). The majority of the PLLs uses a phase-frequency detector (PFD) block, able to resolve not only phase differences, but also frequency errors.

The PFD is normally followed by a Charge-Pump (CP) [40]. The block formed by the PFD and the CP comprises three different operation states: frequency detection, phase detection and lock detection. When the input signal frequencies are different, the PFD operates in frequency detect mode and the output of the CP is a constant current: if \( F_{REF} > F_{DIV} \) a constant current will be sourced to the Loop-Filter; the opposite will occur when \( F_{DIV} > F_{REF} \).

Once the frequency of the two input signals is equalized, the PFD enters the phase acquisition mode. The PFD is active only for part of the comparison frequency cycle and UP/DOWN current pulses will be produced, The UP pulse controls the amount of current injected into the Loop-Filter (LF); the DOWN pulse sinks a controlled current quantity from the LF. The PFD pulses get progressively narrower as the phase difference between \( REF \) and \( DIV \) decreases: when the phase error is zero, the locked state condition is reached. Ideally, no pulses are produced from the PFD; in reality, due to the finite speed of the circuit, narrow current spikes will be produced [41]. The effects of
these spikes will be discussed later.

The LF can be implemented with passive elements or as an active filter. In Fig. 6.1 the simplest passive low-pass filter implementation is shown. The Voltage-Controlled Oscillator (VCO) operates as a voltage-to-frequency converter with a proportionality gain $K_{\text{VCO}}$; the PLL output frequency $F_{\text{OUT}}$ is hence controlled by the LF output voltage. The divider in the feedback path produces a signal whose frequency is equal to $F_{\text{OUT}}$ divided down by the divider modulus $N$. The control voltage produced by the LF tunes the VCO frequency so that the phase difference between $F_{\text{REF}}$ and $F_{\text{DIV}}$ is minimized.

In lock state, the relationship between $F_{\text{TCXO}}$ and $F_{\text{OUT}}$ can be expressed as:

$$F_{\text{OUT}} = \frac{N}{R} \cdot F_{\text{TCXO}}$$  \hspace{1cm} (6.1)

When the PLL is switching to a new frequency (e.g. by changing $F_{\text{REF}}$ or by selecting a new division ratio $N$) or during initial power-up, the PLL undergoes a transient response during which relationship 6.1 is no longer valid.

In integer-N PLLs the divider modulus $N$ is fixed for any given output frequency; this condition forces the comparison frequency $F_{\text{COMP}}$ to be chosen equal to the channel spacing of the adopted RF transmit/receive standard. If the channel spacing is in the order of a few kHz and the output frequency in the MHz range, then the divider ratio is in the thousands, adding several decibels to the phase detector noise floor [42]. If the divider ratio could be chosen independently from the channel spacing, then the noise floor could be drastically reduced.

Note that the comparison frequency sets the maximum synthesizer bandwidth: since the PFD samples the input signals, the PLL bandwidth can be at maximum half the sampling frequency $F_{\text{COMP}}$. In typical designs, the loop bandwidth is roughly one-tenth of the comparison frequency to guarantee stability.
6.2 Fractional-N synthesis

Fractional-N indicates a frequency synthesizers category [43] whose minimum output frequency step can be a fraction of the reference frequency. The synthesized frequency is related to the reference frequency by the following equation:

\[ F_{\text{OUT}} = F_{\text{REF}} \left( N + \frac{k}{M} \right) \]  

(6.2)

where \( k \) and \( M \) are integer numbers. \( M \) is an indicator of the fractionality (and hence of the frequency resolution) that the synthesizer can achieve; \( k \) can be any integer between 0 and \( M \). The average division ratio (usually indicated with \( N.f \)) is generated by toggling the frequency divider modulus between two or more values. For example, if a division ratio of 8 is chosen for 3 clock cycles and a division ratio of 9 is chosen for 1 clock cycle, the average division ratio is equal to \( (8 \cdot 3 + 9 \cdot 1)/4 = 8.25 \).

In fractional-N PLLs, the divider modulus is periodically changed between two or more values; this eliminates the integer-N constraint on the comparison frequency, allowing \( F_{\text{COMP}} \) to be chosen much larger than the channel spacing. This in turn means that a larger PLL bandwidth can be chosen, improving the locking-time of the PLL. Also, since \( F_{\text{COMP}} \) and the channel spacing are independent, the divider ratio can be chosen much smaller than the integer-N case, giving better phase noise performance.

The way the divider is controlled is critical since any periodic control sequence gives rise to unwanted fractional tones. If the fractional tone falls outside the PLL bandwidth, then it is attenuated by the loop-filter; however for given combinations of comparison frequency and output frequency the fractional tone (and/or its harmonics) can fall in-band, making the synthesizer unsuitable for many applications.

In its simplest implementation, a fractional-N synthesizer comprises a dual-modulus divider (DMD) controlled by a Digital Phase Accumulator (DPA), as shown in fig. 6.2. A DPA consists of an accumulator and a register, clocked by the PLL reference signal. At the clock rising edge, the content of the register is incremented by the input value, which is an \( m \)-bit word. The carry-out of the adder is the 1-bit quantization of the input word and it used to control the DMD [44]: when the carry-out is high, the DMD divides by \( N+1 \); when low, the division ratio is equal to \( N \). As an example, to obtain the division ratio above mentioned, a three bit accumulator and a decimal input equal to 2 can be chosen. This gives an overflow every 4 cycles.

Though very simple, unfortunately the DPA synthesizer architecture is not particularly suitable for RF applications. In fact, the output sequence of the DPA is periodic, resulting in a sawtooth phase error at the PFD output. If the error is unfiltered, tones will appear at the synthesizer output spectrum: for a 1/4 fractionality, tones appear at \( F_{\text{clk}}/4 \) and at the harmonics (\( F_{\text{clk}}/8, F_{\text{clk}}/16,... \)).

A possible solution to cancel the spurs is referred as analog compensation [45]. The idea behind this method is based on the fact that the phase error could easily be predicted at any given time, being proportional to the content of the accumulator. The phase error can then be canceled by
subtracting an analog signal proportional to the content of the digital accumulator from the VCO control voltage. The subtraction is accomplished by a current-type DAC, whose control signal comes from the accumulator output bit. The DAC currents size are scaled to the charge-pump current according to the fractionality used: for instance, with a fractionality of 8, the currents are sized to one-eighth of the CP current. For high fractionality, the compensation becomes more difficult due to the inability to match the DAC currents precisely.

6.3 $\Sigma\Delta$ fractional-N PLL

In $\Sigma\Delta$ fractional-N PLLs $[46, 47]$ the modulus of the divider is controlled by means of a digital $\Sigma\Delta$ modulator (fig. 6.3). Using $\Sigma\Delta$ modulators brings the advantage of randomizing the divider control sequence, thereby eliminating fractional tones or reducing their magnitude; on the other hand, high frequency quantization noise is now injected in the loop and can severely impact the overall synthesizer phase noise.

$\Sigma\Delta$ modulators have been widely analyzed in the context of oversampled A/D converters with a 1-bit quantizer. The basic concept behind the oversampling approach is to improve the signal-to-noise ratio by spreading out the power noise of a band-limited signal over a larger bandwidth and by filtering out the high frequency components. $\Sigma\Delta$ modulators include an integrator in the feed-forward path that shifts the quantization error components to higher frequencies (i.e. it works as a high-pass filter) without altering the spectrum of the input signal. By decreasing the quantization
6.3. $\Sigma\Delta$ FRACTIONAL-N PLL

noise power at low frequency, the SNR is improved more efficiently.

In the context of $\Sigma\Delta$ fractional-N synthesizers, the modulator is clocked at the PFD comparison frequency. The Over Sampling Ratio (OSR) is in this case defined as the ratio of the comparison frequency to twice the PLL bandwidth:

$$\text{OSR} = \frac{F_{\text{COMP}}}{2 \cdot F_{\text{BW}}}$$

Reducing the PLL bandwidth results in lower in-band quantization noise; if the bandwidth is increased, the loop filter must be designed to suppress the high frequency quantization noise. Note that the closed loop bandwidth of the synthesizer acts as a low pass filter.

A block level schematic and the linear model of a digital version of a first order $\Sigma\Delta$ modulator are shown in fig. 6.4. Observe that the first order $\Sigma\Delta$ is completely equivalent to the digital accumulator of fig. 6.2 with the carry-out signal as output signal.

The noise shaping property of a $\Sigma\Delta$ modulator can be better understood by calculating its transfer function. Consider the modulator in fig. 6.4, where the quantization noise is modeled as an
The choice of the modulator architecture largely affects the design of the synthesizer: besides the desired quantization noise shaping, it is fundamental to guarantee the purity of the output spectrum. When the synthesizer operates in receiving mode, a constant value is fed at the modulator input; this may result in the $\Sigma\Delta$ cycling through periodic states and consequently, output tones will be produced. Both the modulator order and architecture must be selected to obtain a random output sequence, but, since a digital $\Sigma\Delta$ is a finite state machine, a completely random sequence can not be generated. Nevertheless, a pseudo-random output sequence can be a sufficient condition for a
In this work, two different topologies have been examined: a 4th order MASH topology and a 4th order CANDY topology [49], shown in fig. 6.5 and fig. 6.7 respectively. The transfer functions of each block in fig. 6.5 is given by:

\[
Y_1(z) = X(z) + (1 - z^{-1})E_1(z) \\
Y_2(z) = z^{-1}E_1(z) + (1 - z^{-1})E_2(z) \\
Y_3(z) = z^{-1}E_2(z) + (1 - z^{-1})E_3(z) \\
Y_4(z) = z^{-1}E_3(z) + (1 - z^{-1})E_4(z)
\]

The modulators total output is then given by:

\[
Y(z) = Y_1(z) + (1 - z^{-1}) [Y_2(z) + (1 - z^{-1}) [(1 - z^{-1})Y_4(z) + Y_3(z)]]
\]

The final NTF, which is the same for the Candy loop architecture, is a 4th order high-pass filter:

\[
H_{NTF}(z) = (1 - z^{-1})^4 \big|_{z = e^{j2\pi f/T_{REF}}}
\]

For multistage MASH topologies it has been mathematically demonstrated that the quantization error is smooth and white, even for a constant input value, provided that the number of stages is equal to or bigger than two and that the input is no-overload dithered by an independent identically
Figure 6.7: Candy architecture.

distributed process [50]. Similar conditions apply to the Candy loop architecture: a spur-free operation is guaranteed if a $n$ bit quantizer is used (where $n$ is the order of the loop) and the input signal has a random component [51]. The input random component can be generated by adding a Pseudo Random Binary Sequence generator (PRBS) to dither the Least Significant Bit (LSB) of the digital input value.

The dither condition has been source of confusion: setting the LSB to a '1' has been considered sufficient to provide enough input activity [52]. Unfortunately this is not always the case: it strongly depends on the number of states that the $\Sigma\Delta$ cycles through before repetition occurs. In turn, this number depends on the combination of accumulators bit size, $\Sigma\Delta$ order and input fractional value. Note that, by turning the LSB on, a frequency offset, equal to the modulator resolution, is introduced at the output. On the contrary, the PRBS sequence can be chosen to have a zero mean value (i.e. no frequency offset).

Since the modulator is entirely digital, the effects of dithering can be easily and accurately verified through simulations. Fig. 6.8 presents the Power Spectrum Density (PSD) of a third order MASH, with 11 bit accumulator size and a constant input of $1/64$. On the left side of plot, no dithering is applied on the LSB and high power tones appear in the output spectrum. Setting the LSB to one (right side plot) helps in randomizing the output, but tones are still present. On the contrary, the dithering on the LSB completely eliminates the spurs, at the expense of introducing a dithering noise floor. This can be alleviated by decimating the PRBS output (i.e. taking a sample every $n$ time steps to decrease the power of the PRBS sequence).

Note that the random component is only necessary when the synthesizer operates in receive mode; in fact, as it will be discussed in chapter 8, $\Sigma\Delta$ synthesizers can be used for indirect VCO modulation by feeding transmit data into the modulator and, in this case, the input sequence has sufficient activity to create a white quantization noise sequence.

So far, only a Dual-Modulus Divider (DMD) has been assumed in the synthesizer. However, as previously said, when the modulator order is equal or greater than two and the spur-free conditions are satisfied, the output is a multi-bit sequence, requiring a Multi-Modulus Dividers (MMD) [53]. The design of MMD is critical for linearity requirements: as we shall see, it is crucial that the propagation delay through the divider remains constant independent of the instantaneous division ratio (N-8 to N+7 in the case of 4 bits). Note that the number of output bits sets the phase error amplitude at the PFD input, which in turn, affects the CP linearity: a bigger phase error amplitude requires a larger CP linearity. To relax the CP linearity specification, a post $\Sigma\Delta$ filter can be placed.
As discussed in the previous section, an important factor in determining the choice of the $\Sigma\Delta$ modulator is the tones generation. Unfortunately, a smooth modulator spectrum does not guarantee a spur free synthesizer output. In fact, the commonly used combination PFD with CP shows a limited linearity. The main issue arises from the dead-zone problem [55], which is a zero response for phase error smaller than a threshold value depending on the propagation delay in the phase-frequency detector logic circuit. In [39] it was shown how the dead zone of the PFD generates spurs on the VCO output spectrum. Since it is difficult to predict the non-linearity of the PFD plus CP block, a possible solution is to shift the bias point of the PFD/CP by means of a small offset current connected to the LF. In this way the PFD is forced to work on one side only of its characteristic [54].

Another source of tone generation is given by coupling among different chip sections. The coupling can either take place on die or on board: cross-modulation of the VCO output signal with the CP current (especially when both loop filter and VCO are integrated) can give rise to fractional tones, due to the non-integer ratio between the output frequency and the comparison frequency.
CHAPTER 6. \( \Sigma \Delta \) SYNTHESIZERS THEORY

6.5 S/H \( \Sigma \Delta \) Fractional \( N \) PLL topology

A non-linear operation is intrinsic to standard \( \Sigma \Delta \) fractional-\( N \) synthesizers. As known, the phase-frequency detector inputs are two continuous signals: the REF signal and the DIV signal. The PFD produces pulses whose width is equal to the time difference between the rising edge of REF and DIV: an UP pulse is produced if the REF frequency \( F_{\text{REF}} \) leads the DIV frequency \( F_{\text{DIV}} \) and a DOWN pulse is produced if \( F_{\text{REF}} \) lags \( F_{\text{DIV}} \). Therefore the output of the PFD is a Pulse-Width Modulated (PWM) signal; the phase error information, which is proportional to the width of the pulse, is available at the rising edge of REF in case of a DOWN pulse or at the rising edge of DIV in case of an UP pulse (fig. 6.9).

This means that the PFD samples the phase error in a non-uniform manner, spreading the sampling out over time around the reference clock edge. This effectively constitutes non-uniform sampling, as illustrated in fig. 6.9.

As known, non-uniform sampling is a highly non-linear phenomenon and causes the down-folding of high frequency noise. Since in \( \Sigma \Delta \) synthesizer high-frequency and high-power \( \Sigma \Delta \) quantization noise is present, the contribution of the down-folded noise to the overall output phase noise can be relevant.

To solve the non-uniform sampling problem, the topology shown in fig. 6.10 is adopted [56]. The structure is similar to ordinary \( \Sigma \Delta \) fractional-\( N \) synthesizers except for the presence of a Sample-Hold (S/H) block between the charge-pump and the loop filter. By re-sampling the Charge Pump output at regular time intervals, the non-linearity above mentioned is eliminated. The Sample-Hold has another beneficial effect: it prevents the modulation of the loop filter voltage by the reference clock, hence ideally it eliminates reference spurs at the VCO output spectrum. As previously
stated, when the PFD enters the phase locked state, the output is constituted by narrow spikes containing a strong periodicity related to the PFD sampling time. The S/H filters the spikes out: this operation can be better understood by considering the S/H Laplace transform:

\[
H_{SH}(s) = \frac{1 - e^{sT_{REF}}}{s}
\]  

(6.6)

The magnitude of the transfer function is plotted in fig. 6.11. As shown, the zeros of \(H_{SH}(s)\) are placed at the reference frequency and at its harmonics, canceling the tones. However, low level spurs may appear at the output due to the charge feed-through in the control switch.

The use of sample-hold detectors is known [43, 57] to give good spurious performance; sampled PLL circuits have been already used in clock and data-recovery circuits [58]. A sampled feed-forward network has been recently proposed in a clock-generator PLL architecture [59]. However, the sample-and-hold approach has not been previously used to compensate the non-uniform
6.6 Linear model derivation

Despite the increasing use of 𝜔Δ synthesizers, a general model has been only recently published [60]. The model presented in this section is similar to [60], but the derivation is more straightforward and provides more intuitive insight. The starting point of the analysis is the S/H portion of the synthesizer. A possible implementation is shown in fig 6.12. This circuit uses a switched-capacitor integrator to carry out both the S/H function as well as the integrator function that is usually performed by the loop filter. Note that the S/H block is in series with the loop filter: both the integral and the proportional loop corrections are sampled and held for each PFD sampling interval. To derive the transfer function we start by considering the charge deposited on the capacitance $C_1$:

$$Q_{C_1} = \frac{\Delta \varphi(t)}{2\pi} T_{\text{REF}} \cdot I_{CP}$$  \hspace{1cm} (6.7)

where $\Delta \varphi(t)$ is the phase error waveform produced by the PFD. After a certain delay $\tau_{SH}$ the charge is transferred to $C_2$ and added to the charge previously stored:

$$Q_{C_2}(t) = Q_{C_2}(t - T_{\text{REF}}) + Q_{C_1}(t - \tau_{SH})$$  \hspace{1cm} (6.8)

In voltage terms and inserting the expression for $Q_{C_1}$:

$$V_{C_2}(t) = V_{C_2}(t - T_{\text{REF}}) + \frac{I_{CP}}{2\pi \cdot C_2} \cdot T_{\text{REF}} \cdot \Delta \varphi(t - \tau_{SH})$$  \hspace{1cm} (6.9)

Taking the Laplace transform yields:

$$\frac{V_{C_2}(s)}{\Delta \varphi(s)} = T_{\text{REF}} \cdot \frac{I_{CP}}{2\pi \cdot C_2} \cdot \frac{e^{-s\tau_{SH}}}{1 - e^{-sT_{\text{REF}}}}$$  \hspace{1cm} (6.10)
At this point the effects of the PFD needs to be considered. The PFD output is a PWM signal with non-linear influence on the PLL dynamic; however, since the width of each PFD pulse is very narrow (compared to the filter response), the PFD output sequence can be represented as an impulse sequence. Therefore, in the previous equation $V_{C_2}(s)$ is still modeled in the discrete-time domain, i.e. as a train of delta-functions. In reality the output voltage is a staircase function and, as a consequence, eq. 6.10 is further modified by a zero-order hold network that converts the impulse-train into the staircase waveform. The transfer function of the zero-order hold network is given by:

$$H_{ZOH}(s) = \frac{1}{T_{REF}} \cdot \frac{1 - e^{-sT_{REF}}}{s}$$  \hspace{1cm} (6.11)

The actual transfer function from phase difference (PFD input) to integrator output is then given by:

$$\frac{V_O(s)}{\Delta \varphi(s)} = H_{ZOH}(s) \cdot \frac{V_{C_2}(s)}{\Delta \varphi(s)} = e^{-s\tau_{SH}} \cdot \frac{I_{CP}}{2 \pi \cdot s \cdot C_2}$$  \hspace{1cm} (6.12)

Consequently, the circuit in fig. 6.12 can be modeled as shown in fig. 6.13. Note that in fig. 6.13 the integration $1/sC_2$ has been absorbed in the loop filter transfer function $F(s)$. Compared to the continuous time approximation, the only difference introduced in the linear model by the S/H is the delay $\tau_{SH}$. Note that the sampling now always occurs at regular time intervals, namely at the negative edge of the reference clock.

In the setup shown in fig. 6.12 the delay $\tau_{SH}$ is equal to half a reference period. The delay is necessary to allow the charge-pump current to be completely integrated before the sampling operation takes place. Note also that the sampling switch needs to be opened while the charge pump is active. The control logic of fig. 6.12 takes into account the fact that the rising edge of the DOWN pulse occurs before the rising edge of the reference clock and ensure that the switch is still in the off condition.

If an offset current is used in the charge-pump to compensate for current mismatches (i.e. only UP pulses are generated in the lock state) then it is sufficient to invert the reference clock signal to generate a proper $S/H_{CTRL}$ signal.
6.6.1 Divider

The derivation of the linear model for the divider with dithering starts in the time domain. The first step is to find the timing deviations with the aid of fig. 6.14. According to the timing diagram we can write:

$$\Delta t(n+1) = \Delta t(n) + (N + b(n)) \cdot T_{VCO} - T_{REF}$$  (6.13)

where $N$ is the nominal division ratio and $b(n)$ is the modulus control signal. Indicating with $\mu_b$ the average value of $b(n)$ ( $\mu_b$ is the fractional divider value ), the reference period $T_{REF}$ can be expressed as:

$$T_{REF} = (N + \mu_b)T_{VCO}$$  (6.14)

In deriving eq. 6.14 we are making the important approximation that $T_{VCO}$ is constant. This assumption is reasonable for receive-transmit synthesizers with narrow modulation bandwidth. In these cases the relative frequency variation of the VCO is small, which means that $T_{VCO}$ is nearly constant.

Defining $b'(n) = b(n) - \mu_b$ and substituting $T_{VCO}$ from eq. 6.14 into eq. 6.13 yields:

$$\Delta t(n+1) = \Delta t(n) + \frac{T_{REF}}{N + \mu_b} b'(n)$$  (6.15)

By converting the time error into phase error we have:
Finally an expression for the additive noise caused by dithering the divider ratio can be derived:

\[ \Delta \varphi (n + 1) = \Delta \varphi (n) + \frac{2 \pi}{N + \mu_b} b'(n) \]  

(6.17)

The Laplace transform yields:

\[ \Delta \varphi (s) = \frac{2 \pi}{N + \mu_b} \cdot \frac{e^{-s T_{\text{Ref}}}}{1 - e^{-s T_{\text{Ref}}}} b'(z) \]  

(6.18)

Setting \( z = e^{s T_{\text{Ref}}} \), eq. 6.18 can be equivalently written in the digital domain (Z-transform):

\[ \Delta \varphi (z) = \frac{2 \pi}{N + \mu_b} \cdot \frac{z^{-1}}{1 - z^{-1}} b'(z) \]  

(6.19)

The previous equation shows that the \( \Sigma \Delta \) noise undergoes an integration but is otherwise shaped by the loop in exactly the same way as the reference clock phase noise. Observe also that equation 6.17 reveals the discrete nature of the phase error, as discussed in the previous section.

The final linear model is shown in fig. 6.15. The NL block in the model indicates the non-linear effect that occurs in the PLL if the sample-hold block is not used. An analytical derivation of such effect is presented in the next section. The closed-loop transfer function \( H_\theta(s) \) is given by (fig. 6.15):

\[ H_\theta(s) = \frac{\frac{I_{\text{CP}}}{2\pi} e^{-s T_{\text{SH}}} \cdot F(s) \frac{K_{\text{vco}}}{s}}{1 + \frac{I_{\text{CP}}}{2\pi} e^{-s T_{\text{SH}}} \cdot F(s) \frac{K_{\text{vco}}}{s} \frac{1}{N + \mu_b}} \]  

(6.20)

The phase noise properties can now be predicted from straightforward linear systems analysis [61]. Also, although fig. 6.15 indicates \( \Sigma \Delta \) modulation, the linear model has been derived with no assumption on the type of modulation used to dither the divider modulus (i.e. it is valid for any fractional-N topology).

**Contribution of \( \Sigma \Delta \) modulation**

The \( \Sigma \Delta \) modulation can be modeled as additive phase contribution (also shown in fig. 6.15). As an example, a \( \Sigma \Delta \) MASH architecture of order \( n \) is used in the analysis. As previously discussed, the quantizer causes quantization noise \( e(n) \) which is added to the output divider control signal. The noise is spread out over a bandwidth of \( f_{\text{Ref}} = 1/T_{\text{Ref}} \) and is high-pass shaped by the \( \Sigma \Delta \) modulator with a noise transfer function (NTF) given by:

\[ H_{\text{NTF}}(z) = (1 - z^{-1})^n \bigg|_{z = e^{2\pi f_{\text{Ref}}}} \]  

(6.21)

Assuming that the quantization noise is independent of the input signal, the Power Spectral Density of the bit stream can be expressed as:
From the linear model of fig. 6.15 we can find the transfer function from the output of the NTF to the VCO output phase $\phi_{VCO}$:

$$H_n(s) = \frac{2\pi}{N + \mu_b} \cdot \frac{e^{-sT_{ref}}}{1 - e^{-sT_{ref}}} H_\phi(s) \tag{6.23}$$

Finally the output phase noise Power Spectral Density due to the $\Sigma\Delta$ quantization noise $e(n)$ is simply given by:

$$S_{\varphi VCO}(f) = |H_n(j2\pi fT_{REF})|^2 S_e(f) \tag{6.24}$$

The effect of quantization at the $\Sigma\Delta$ input (i.e. due to finite input word length) can be evaluated in the same way. The PSD is given by:

$$S_{\Sigma\Delta in}(f) = \frac{T_{ref}}{12} \cdot 2^{-2b_{res}} \cdot |H_{STF}(f)|^2 \tag{6.25}$$

where $b_{res}$ is the number of bits below the decimal point in the $\Sigma\Delta$ input. The calculation of the Power Spectral Density of the PLL phase error due to the $\Sigma\Delta$ input quantization is then straightforward (fig. 6.15):

$$S_{\varphi, \Sigma\Delta in}(f) = |H_n(j2\pi fT)|^2 S_{\Sigma\Delta in}(f) \tag{6.26}$$

The output phase noise due to other noise sources, such as charge-pump noise or VCO noise, can be evaluated in a similar way.
6.7 Analytical evaluation of the intrinsic non-linearity

As previously mentioned, in \( \Sigma \Delta \) synthesizers an intrinsic non-linearity affects the close-in phase noise. It will be now demonstrated that in standard \( \Sigma \Delta \) synthesizers, the charge-pump output \( I_{out}(t) \) contains an additional noise term, which is caused by the non-uniform pulse stretching shown in fig. 6.14 [62].

We begin by taking the Fourier transform of the charge-pump output:

\[
I_{out}(f) = \int_{-\infty}^{\infty} i_{out}(t)e^{-j2\pi ft}dt
\]  

(6.27)

With the aid of fig. 6.14 the previous equation can be written as:

\[
I_{out}(f) = \begin{cases} 
\sum_{n=\infty}^{\infty} \left( \int_{nT_{REF}}^{nT_{REF}+\Delta t(n)} I_{CP}e^{-j2\pi ft}dt \right), & \text{if } \Delta t(n) > 0 \\
\sum_{n=\infty}^{-\infty} \left( \int_{nT_{REF}}^{nT_{REF}+\Delta t(n)} -I_{CP}e^{-j2\pi ft}dt \right), & \text{if } \Delta t(n) < 0 
\end{cases}
\]  

(6.28)

which simplifies to:

\[
I_{out}(f) = \sum_{n=-\infty}^{\infty} \int_{nT_{REF}}^{nT_{REF}+\Delta t(n)} I_{CP}e^{-j2\pi ft}dt
\]  

(6.29)

By solving the integral, equation 6.29 becomes:

\[
I_{out}(f) = I_{CP} \sum_{n=-\infty}^{\infty} \frac{-1}{j2\pi f} e^{-j2\pi fnT_{REF}} \left( e^{-j2\pi f\Delta t(n)} - 1 \right)
\]  

(6.30)

We now perform a 2nd order Taylor series expansion of the \( e^{-j2\pi f\Delta t(n)} \) term:

\[
I_{out}(f) = I_{CP} \sum_{n=-\infty}^{\infty} \frac{-1}{j2\pi f} e^{-j2\pi fnT_{REF}} \cdot 
\left[ \left( 1 - j2\pi f\Delta t(n) - \frac{1}{2} (j2\pi f\Delta t(n))^2 \right) - 1 \right]
\]  

(6.31)
Equation 6.32 contains two terms. The first one is simply a linearly filtered version of the quantization noise, as predicted by the linear model previously derived. The second term quantifies the undesired non-linear effect caused by the non-uniform pulse stretching. As can be seen, it is essentially the Fourier transform of the filtered quantization noise squared, followed by a differentiation.

The NL block in fig. 6.15 symbolizes the non-linear effect and, according to the above analysis, it can be modeled as shown in fig. 6.16. Based on the previous analysis we can write an analytical expression for the power spectral density of the excess noise that occurs in standard $\Sigma\Delta$ PLL (i.e. without S/H):
\[ S_{\theta_{\text{out, excess}}} (f) = 4\pi^2 (2\pi f)^2 \frac{1}{4} \left( S_{\Delta t}(f) \ast S_{\Delta t}(f) \right) \left| H_\theta(j2\pi fT) \right|^2 \]  

(6.33)

where “\( \ast \)” denotes convolution and \( H_\theta(f) \) is given by equation 6.20. \( S_{\Delta t}(f) \) is given by:

\[ S_{\Delta t}(f) = \left( \frac{T_{\text{ref}}}{N + \mu_b} \cdot \frac{1}{1 - e^{-j2\pi fT_{\text{ref}}}} \right)^2 S_e(f) \]  

(6.34)

with \( S_e(f) \) given by eq. 6.22.

Fig. 6.17 shows the equivalent phase noise at the phase-frequency detector input (top row) and at the PLL output (bottom row) for both S/H and no S/H topology and for different \( \Sigma \Delta \) modulator orders. The values of the parameters used in the graphics can be found in table 8.2. If a non-S/H PLL is used then an excess noise appears and the total noise becomes as shown by the dashed curve. Of course the regular \( \Sigma \Delta \) quantization noise also gets worse with increasing frequency. So, at high frequency offset the excess noise actually becomes insignificant in comparison with the \( \Sigma \Delta \) noise. Notice also that the excess phase noise effect is more noticeable for high-order \( \Sigma \Delta \) modulators. This is because the high-frequency quantization noise is stronger so that more noise is down-folded. On top of this, the low-frequency quantization noise is lower, which makes the excess noise more significant in comparison.

The contribution of the excess noise might not always be significant with respect to other PLL noise sources, such as the charge-pump noise, which usually dominates at low frequency. However it is still valuable to quantify and to model the effect of the non-linearity in order to ensure correct performance of the PLL in all case.

### 6.8 Summary

This chapter has presented the theory of \( \Sigma \Delta \) fractional-N synthesizers. A brief overview of the impact of the \( \Sigma \Delta \) modulator on the synthesizer design and performance has been given. A linear model for the synthesizers has been derived based on the assumption of constant VCO period. Moreover, a non-linear effect previously unknown has been discovered and a new topology to correct it was proposed. As we shall see in the next chapter, results from simulations fully validate the derived model.
Accurate simulations of $\Sigma\Delta$ fractional-$N$ synthesizers are difficult for many reasons [63]; simulation time tends to be long since a large number of samples are necessary in order to retrieve the statistical behavior of the system. The dithering applied on the divider modulus makes the behavior of the synthesizers non-periodic in steady-state; therefore known methods for periodic steady-state simulations [64] cannot be applied to $\Sigma\Delta$ fractional-$N$ synthesizers.

Traditional time sampling simulations based on fixed time-steps or adaptive time-steps quantize the location of the edges of the digital signals, introducing quantization noise that can overcome the real phase performance of the synthesizer. Adaptive time-steps introduces also non-uniform sampling, which is a highly non-linear phenomenon and leads to down-folding of high frequency noise. A constant time-step can not reveal the non-uniform sampling operated by the PFD, unless an extremely small step is chosen.

Different techniques to solve the quantization issue have been proposed [63, 65]. In [63] an area conservation principle approach allows the use of uniform time-steps in the simulation. In [65] a simple event-driven approach is used in combination with iterative methods to calculate the loop filter response for integer-$N$ PLLs. Event-driven simulators offer an alternative approach for simulating fractional-$N$ synthesizers in a fast and accurate manner, and have so far been unexplored for this application area.

7.1 Event-driven object oriented methodology

The use of Event-Driven simulators is very attractive: besides providing precise time-steps, as explained later in the section, event-driven simulations are also very fast and highly efficient. In fact, the number of calculations is kept to a minimum because synthesizer signals and variables are calculated only when a transition occurs.

The simulation method proposed in [63] ensures extremely high computation speed because, instead of simulating the true time domain behavior, it effectively operates in a sub-sampled manner on the merged VCO-divider block. This idea makes the method in [63] very attractive too. However, this idea could equally well be used in the event-driven approach, speeding up the simulation tremendously. In this case the VCO would sample the loop-filter once for every reference cycle. This sub-sampling operation implicitly relies on the assumption that the power level of the
Figure 7.1: Simulation model.

noise at high frequency offset is not giving a significant contribution to the overall synthesizer noise when aliased to low frequencies. Thus, if the assumption holds, the event-driven approach would be equally as fast as the method in [63]. However, even without the VCO-divider merging approach, the event-driven method is already so fast that it is hardly worthwhile to use this merging technique.

A unique strength of the event-driven methodology proposed in this work is that it is exact: it does not require assumptions or approximations.

7.2 Simulation Core

The PLL simulation set-up is structured in an object-oriented way: PLL blocks are connected through signals that are responsible for timing and for data exchange, as shown in fig. 7.1. Note that IN/OUT signals can operate also as implicit update signals (e.g. the UP/DOWN signals from the charge-pump). Whenever a block is called from the simulator, a specific operation is performed and an event may be posted. As shown in fig. 7.2, the simulator inserts the event in the event queue in the proper time order and extracts from the queue the next event that needs to be executed, resulting in the update of the signals/variables of a block.

This means that each PLL block can be coded as an independent unit, without worrying about the interaction and the sequencing with the other blocks. The fact that each block is self-contained allows to change and refine the behavior of a single block without affecting the coding of the other PLL units. The simulator itself keeps track of the succession of the events with the event queue. A simple pseudo code implementation of this simulation structure is presented in algorithm 1 and it can be easily coded with a few lines of high-level programming language.

Fig. 7.3 presents a simple example of the initial simulation steps with the operations executed and the evolution of the event-queue for each step. The simulation begins with the system initial-
Algorithm 1 Simulation structure pseudo code.

SIMULATOR CORE
while ( event_queue not empty and current_time < end_time )
    get next element (event) from queue
    set current_sim_time = event.time_stamp
    if event.name = “event1” then
        call block_a
        call block_b
    end if
    if event.name = “event2” then
        call block_c
        call block_d
    end if
end while

BLOCKS
function block1
begin
    <block-related calculations, etc>
    post_event ( current_sim_time + propagation_delay, “event1” )
    ....
end
function block2
begin
    <block-related calculations, etc>
    post_event ( current_sim_time + propagation_delay, “event2” )
    ....
end

EVENT POSTING
function post_event ( time_to_execute_event, event name )
    insert event in time-order in queue
end
Figure 7.2: Simulation structure.

ization at step 0; there the simulator core inserts the special event “start” in the event-queue. At step 1, “start” is extracted and it is executed at step 2. As a result, both the VCO block and the reference clock block get activated. A detailed explanation of the VCO block coding and operations will be given later. The execution of the VCO block results in the change of the VCO waveform (by inverting the actual logic value) and in the post of two events: Loop_update and VCO. These events are inserted in the event-queue in the proper order, determined by their associated time stamp. For sake of simplicity, we assume that the reference block only posts the updating of the reference clock.

At step 3, the first event in the queue is the update of the loop filter; the event gets extracted from the queue and in the next step a new control voltage for the VCO gets calculated (step 4). The next event (step 5) is the execution of the VCO, which performs the same operations previously described. Note the evolution of the event-queue: since the VCO period is much smaller than the reference clock period, the two events posted by the VCO block are placed on the top of the event-queue. Thus, the time progression is determined dynamically as the simulator progresses.

The advantage of maintaining a simulation event-queue is that the simulation time points occur exactly at the moment of the execution of the event. Thus, the simulation time points are always aligned with the edges of the signals, providing 100% accurate time-steps.

Coding the behavior of the synthesizer digital blocks is straightforward; the description of the voltage controlled oscillator and of the loop filter require particular attention, as discussed next.

### 7.2.1 VCO model

The VCO is modeled as a self-updating block. Such operation can be visualized as shown in fig. 7.1. The pseudo-code describing the VCO behavior is presented in algorithm 2. The update takes place at discrete time instances, namely every half-VCO cycle. Every half-period the VCO receives the update VCO control voltage from the loop filter; on the basis of the received value, the new VCO period is calculated.

The VCO completes its execution by posting two events; the first event is the execution of the loop filter block at the next time point when the VCO update will take place. This ensures that the
### 7.2. Simulation Core

#### Simulation Steps

<table>
<thead>
<tr>
<th>Simulation Steps</th>
<th>Event_queue</th>
<th>Operations executed</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Step 0</strong></td>
<td>EMPTY</td>
<td><strong>Initialization</strong></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Insert &quot;start&quot; event in the queue</td>
</tr>
<tr>
<td></td>
<td></td>
<td><strong>Loop update block</strong></td>
</tr>
<tr>
<td></td>
<td></td>
<td><strong>VCO block</strong></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Return to simulator core</td>
</tr>
<tr>
<td><strong>Step 1</strong></td>
<td>t1=0 start</td>
<td>Extract first event in the queue</td>
</tr>
<tr>
<td></td>
<td></td>
<td><strong>Initialization block</strong></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Update VCO waveform</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Post Loop_update @ t2=t1+half_VCO_period</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Post VCO @ t2=t1+half_VCO_period</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Post REF_clock @ t3=t1+half_REF_period</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Return to simulator core</td>
</tr>
<tr>
<td></td>
<td></td>
<td><strong>Loop update block</strong></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Calculate control_voltage</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Return to simulator core</td>
</tr>
<tr>
<td><strong>Step 2</strong></td>
<td>EMPTY</td>
<td>Extract first event in the queue</td>
</tr>
<tr>
<td></td>
<td></td>
<td><strong>Loop update block</strong></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Calculate control_voltage</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Return to simulator core</td>
</tr>
<tr>
<td><strong>Step 3</strong></td>
<td>t2=t1+half_VCO_period loop update</td>
<td>Extract first event in the queue</td>
</tr>
<tr>
<td></td>
<td>t2=t1+half_VCO_period VCO</td>
<td><strong>VCO block</strong></td>
</tr>
<tr>
<td></td>
<td>t3=t1+half_REF_period REF_clock</td>
<td>Update VCO waveform</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Post Loop_update @ t4=t2+half_VCO_period</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Post VCO @ t4=t2+half_VCO_period</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Return to simulator core</td>
</tr>
<tr>
<td><strong>Step 4</strong></td>
<td>t2=t1+half_VCO_period VCO</td>
<td><strong>VCO block</strong></td>
</tr>
<tr>
<td></td>
<td>t3=t1+half_REF_period REF_clock</td>
<td>Update VCO waveform</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Post Loop_update @ t4=t2+half_VCO_period</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Post VCO @ t4=t2+half_VCO_period</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Return to simulator core</td>
</tr>
<tr>
<td><strong>Step 5</strong></td>
<td>t2=t1+half_VCO_period VCO</td>
<td><strong>VCO block</strong></td>
</tr>
<tr>
<td></td>
<td>t3=t1+half_REF_period REF_clock</td>
<td>Update VCO waveform</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Post Loop_update @ t4=t2+half_VCO_period</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Post VCO @ t4=t2+half_VCO_period</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Return to simulator core</td>
</tr>
<tr>
<td><strong>Step 6</strong></td>
<td>t3=t1+half_REF_period REF_clock</td>
<td><strong>VCO block</strong></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Update VCO waveform</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Post Loop_update @ t4=t2+half_VCO_period</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Post VCO @ t4=t2+half_VCO_period</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Return to simulator core</td>
</tr>
<tr>
<td><strong>Step 7</strong></td>
<td>t4=t2+half_VCO_period loop update</td>
<td>Extract first event in the queue</td>
</tr>
<tr>
<td></td>
<td>t4=t2+half_VCO_period VCO</td>
<td><strong>VCO block</strong></td>
</tr>
<tr>
<td></td>
<td>t3=t1+half_REF_period REF_clock</td>
<td>Update VCO waveform</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Post Loop_update @ t4=t2+half_VCO_period</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Post VCO @ t4=t2+half_VCO_period</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Return to simulator core</td>
</tr>
</tbody>
</table>

Figure 7.3: Simulation steps.
**Algorithm 2** VCO pseudo code.

```plaintext
MODULE VCO
input control_voltage
output VCO_clk
  \( F_{\text{OUT}} = F_{\text{FreeRUN}} + K_{\text{VCO}} \cdot V_{\text{CTR}} \) // Update the instantaneous frequency
  VCO_semiperiod=0.5/\( F_{\text{OUT}} \) // Calculate the new semiperiod
  VCO_clock = NOT (VCO_clock) // Update the VCO_clock signal
  POST_EVENT (current_sim_time + VCO_semiperiod, update_loop)
  POST_EVENT (current_sim_time + VCO_semiperiod, execute VCO)
END MODULE
```

The value used to calculate the semiperiod of the VCO is always updated. The second event is simply the scheduling of the next VCO block call.

Due to the finite number representation of the simulator, the effects of the number truncation represents a potential problem in the calculation of the VCO period. In order to avoid the accumulation of the truncation error, the calculation of the VCO semi-period can be implemented as a 1<sup>st</sup> order \( \Sigma \Delta \) modulator. In this way the accumulation error is always driven to zero on average.

### 7.2.2 Loop filter model

A simple method based on state-space equations description is proposed. The way the loop filter is modeled can be visualized with the help of fig. 7.1. Every time the VCO and the charge-pump are executed, they post events requiring the update of the loop filter state. When these events are extracted from the event-queue to be executed, the simulator calls the loop filter to update its state and to calculate a new control voltage according to the actual input value. The event posted from the charge-pump indicates that a change has occurred at the loop filter input; the VCO event is posted to obtain the actual control voltage.

To describe the loop filter behavior in mathematical terms we start from its transfer function and we derive its State-Space Formulation. We assume the loop filter transfer function to be given by the following equation:

\[
F(s) = \frac{1 + \frac{s}{s_0}}{sC \cdot \left(1 + \frac{s}{p_0}\right) \cdot \left(1 + \frac{s}{p_1}\right) \cdot \left(1 + \frac{s}{p_2}\right)} \tag{7.1}
\]

Note that equation 7.1 also includes the charge-pump integrating capacitance \( C \). With a partial fraction expansion, equation 7.1 can be decomposed into four parallel blocks, namely an integrator and three 1<sup>st</sup> order RC blocks. Equation 7.1 becomes:

\[
F(s) = \frac{1}{sC} + \frac{A_0}{\left(1 + \frac{s}{p_0}\right)} + \frac{A_1}{\left(1 + \frac{s}{p_1}\right)} + \frac{A_2}{\left(1 + \frac{s}{p_2}\right)} \tag{7.2}
\]

where \( A_0, A_1, A_2 \) are gain factors. For each term of the previous sum the state equation can be...
Algorithm 3 Loop filter pseudo code.

MODULE LOOP FILTER
input Update_signal, Charge_Pump_current;
output VCO_ctr_voltage;
every time an Update_signal is received
begin
  \( T_{\text{ACTUAL}} = \text{current simulation time} \)
  \( I_{\text{CP}} = \text{charge-pump_current} \)
  \( T_{\text{DIFF}} = T_{\text{ACTUAL}} - T_{\text{OLD}} \)  // time elapsed from previous update
  \( V_{\text{CAP}} = V_{\text{CAP}} + \frac{I_{\text{OLD}}}{C} \cdot T_{\text{DIFF}} \)  // Integrator block update
  \( V_1 = V_1 + (A_1 I_{\text{OLD}} - V_1) \left( 1 - \exp\left(\frac{T_{\text{DIFF}}}{\tau_1}\right) \right) \)  // 1\(^{st}\) RC block update
  \( V_2 = V_2 + (A_2 I_{\text{OLD}} - V_2) \left( 1 - \exp\left(\frac{T_{\text{DIFF}}}{\tau_2}\right) \right) \)  // 2\(^{st}\) RC block update
  \( V_3 = V_3 + (A_3 I_{\text{OLD}} - V_3) \left( 1 - \exp\left(\frac{T_{\text{DIFF}}}{\tau_3}\right) \right) \)  // 3\(^{st}\) RC block update
  \( V_{\text{CTR}} = V_1 + V_2 + V_3 + V_{\text{CAP}} \)  // VCO control voltage update
  \( T_{\text{OLD}} = T_{\text{ACTUAL}} \)
  \( I_{\text{OLD}} = I_{\text{CP}} \)
end
end module

written in the form:
\[
V(t) = -\frac{1}{\tau} V(t) + K \cdot I_{\text{in}}(t) \tag{7.3}
\]

where \( V(t) \) is the state variable (that in this situation is equal to the output variable), \( I_{\text{in}} \) is the input variable, \( K \) is a gain factor and \( \tau \) is the time constant of the block. The solution to the state-equation is given by:
\[
V(t) = e^{-\frac{t-t_0}{\tau}} \cdot V(t_0) + K \cdot \int_{t_0}^{t} e^{-\frac{\alpha-t_0}{\tau}} \cdot I_{\text{in}}(\alpha) \, d\alpha \tag{7.4}
\]

To solve this equation it is necessary to know the initial state at time \( t_0 \) and to know the input value \( I_{\text{in}} \).

Noting that between the update times the input to the loop filter is constant (i.e. \( I_{\text{in}} \) is appearing as a staircase to the loop filter), the equation describing the behavior of each of the three RC blocks is given by (state equation solution):
\[
V_x(t_1) = V_x(t_0) + (A_x I_{\text{in}}(t_0) - V_x(t_0)) \left( 1 - e^{-\frac{t_1-t_0}{\tau_x}} \right) \tag{7.5}
\]

The equation that describes the integrating block is given by:
\[
V_C(t_1) = V_C(t_0) + \frac{I_{\text{in}}(t_0)}{C} (t_1 - t_0) \tag{7.6}
\]

The VCO control voltage is then given by:
\[
V_{\text{CTR}}(t) = V_1(t) + V_2(t) + V_3(t) + V_C(t) \tag{7.7}
\]
The model for the loop filter is then simply given by a set of equations which describe exactly the behavior of the loop filter. This representation of the loop filter can be directly converted into simulation code. The pseudo-code is presented in algorithm 3. It is important to underline that the filter behavior is modeled with no approximation. Also, the loop filter update takes place only when required by other blocks: the update time intervals are not uniform. This makes the simulation methodology very efficient, since the calculations occur only at the required time steps.

7.3 Verilog Implementation

The PLL topology presented in the previous chapter is simulated with a standard event-driven simulator, Verilog XL, but the simulation methodology can be applied to any kind of event-driven simulators. For example, the simulation structure can be easily implemented from the pseudo code presented in algorithm 1 with a few lines of C code.

The choice of Verilog is a matter of convenience: its integration in the Cadence Environment allows easier debugging, schematic capture and plotting capabilities. Moreover, the Cadence Environment offers the possibility to directly use the Verilog code together with Spice like simulators to run mixed-mode simulation. However simulations in a mixed-mode environment require long simulation time. As a comparison, to simulate in an event-driven simulation 2 million VCO cycles (equivalent to 1ms ) recording 4 million data in a file, the time of execution is less than 15 minutes on a RISC 8500 processor (it reduces to only 5 minutes if the VCO simulation time points are not written to a file ). The same simulation in a mixed-mode environment takes more than 20 hours, without reaching the same accuracy. A fully analogue simulator such as SPICE would probably require a simulation time at least one order of magnitude longer.

Verilog is essentially a digital simulator; basic mathematic functions are not supported directly. Through the use of PLI (Programming Language Interface [66]) routines, Verilog can be customized to virtually support any kind of function. This allows the use of the exponential functions, required to describe the loop filter. Variables can be passed between blocks through the Verilog system function $\text{bitstoreal}$ and $\text{realtobits}$ which convert floating variable in 64 bit buses and vice versa. If Verilog blocks are used in a mixed-mode simulation, the conversion requires the insertion of specific D/A converters to interface a block coded in Verilog (e.g. the Loop-Filter) with a block (e.g. the VCO) described with an analog simulator (VerilogA, Spice). The use of a simulator based on matrix solving methods and adaptive time-steps will slow down the simulation; the time points will be still set by the digital blocks, but extra time points (depending on the accuracy) will be required for the calculations operated from the analog blocks.

It is more convenient to proceed with behavioral simulation, since it is very fast and it can be done at a very detailed level. For example, the waveform produced by the charge-pump can be simulated with an analog simulator for one reference clock cycle. The rise/fall time can be extracted from the waveform and included in the Verilog code.

As previously mentioned, the effect of number truncation represents a potential problem in the calculation of the VCO period. In order to avoid the accumulation of the truncation error, the
Algorithm 4 VCO implementation.

`timescale 1s / 1fs
MODULE VCO (V_out, V_in )
...............wire [64:1] V_in;
integer VCO_semiperiod_int;
...............initial assign V_ctr=$bitstoreal (V_in);
ALWAYS # (VCO_semiperiod_act)
BEGIN
  VCO_clk = `VCO_clk;
  inst_freq = free_run_freq + V_ctr * VCO_gain;
  VCO_semiperiod=0.5/inst_freq;
  // first order \Sigma\Delta
  diff = VCO_semiperiod - VCO_semiperiod_act;
  acc_error=acc_err+diff;
  int_semiperiod=acc_err*1e15; // femto-second resolution
  VCO_semiperiod_act=int_semiperiod * 1e-15;
  // \Sigma\Delta end
END
assign {V_out}=VCO_clk;
END MODULE

The calculation of the VCO semi-period is implemented as a 1st order \Sigma\Delta modulator. In this way the accumulation error is always driven to zero on average. Part of the VCO Verilog code is presented in algorithm 4. Variable declarations and initializations have been omitted for the sake of simplicity. The `always statement is an implicit signaling to the block itself, causing the update of the semi-period every half VCO cycle. The calculation of the semi-period undergoes a quantization with a femto-second resolution (multiplication by the term \(10^{15}\)). This is the minimum step that can be set and it is a Verilog limitation. It is important to notice that such limitation is independent from the event-driven methodology. If the simulator is implemented in C, arbitrary fine resolution (up to double floating point precision) can be reached in time representation.

The effects of noise sources can be easily evaluated in the simulation, in a similar manner as described in [60]; the noise is directly incorporated inside the block code. The charge-pump white noise is obtained from a random number generator. Since the object-oriented structure of the simulation can be easily expanded, the VCO noise can be generated by adding a filter block (coded in the same way as the loop filter) with another random number generator. Another option is to read the noise data from a file; in this way it is possible to use data from other simulations or from real measurements.

It is also easy to insert non-idealities into the blocks, such as non-uniform divider propagation delay, variable VCO duty-cycle or charge pump mismatch. An example of the latter is presented in algorithm 5. The way the block operates is straightforward. Every time a transition occurs at
Algorithm 5 Charge-pump with mismatch.

MODULE CP (I\textsubscript{CP}, UP, DOWN)

wire [64:1] I\textsubscript{CP};

initial I\textsubscript{UP} = ....;
initial I\textsubscript{DOWN} = .....;

ALWAYS # (posedge (UP) or negedge (UP) or posedge (DOWN) or negedge (DOWN))
BEGIN
    I\textsubscript{INST} = (UP*I\textsubscript{UP} - DOWN * I\textsubscript{DOWN}) \ instantaneous current
END
assign {I\textsubscript{CP}} = $realtobits (I\textsubscript{INST});

END MODULE

\begin{table}[h]
\centering
\begin{tabular}{|c|c|c|c|c|c|}
\hline
F\textsubscript{out} & N & \mu\textsubscript{b} & I\textsubscript{CP} & K\textsubscript{VCO} & \tau\textsubscript{SH} \\
\hline
1907.75 MHz & 73 & 0.375 & 10 \mu A & 2\pi \ 100 MHz/V & 0.5 T\textsubscript{REF} \\
\hline
\end{tabular}
\caption{Design parameters.}
\end{table}

the block input (UP/DOWN) signals, the instantaneous current I\textsubscript{INST} is calculated. The current mismatch can be included by setting different values for the I\textsubscript{UP} and I\textsubscript{DOWN} currents. Notice that the UP/DOWN signals are logic values (i.e. can be either 1 or 0).

7.4 Results

The main parameters of the simulated PLL are resumed in table 7.1. The \(\Sigma\Delta\) modulator is a MASH 4\textsuperscript{th} order and the parameters of the loop filter are presented in table 7.2.

We now present several simulation results obtained by the event-driven methodology in order to:

- validate the linear theory developed.
- evaluate the effect of the Sample/Hold block.
- show the effect of the truncation error in the simulation.
- evaluate the effects of non-idealities, such as VCO noise and charge-pump current mismatch.

\begin{table}[h]
\centering
\begin{tabular}{|c|c|c|c|c|}
\hline
C\textsubscript{CP} & \omega_1 & \omega_3 & \omega_4 & \omega_5 \\
\hline
34.522 pF & 2\pi \cdot 50 kHz & 2\pi \cdot 500 kHz & 2\pi \cdot 1 MHz & 2\pi \cdot 5 MHz \\
\hline
\end{tabular}
\caption{Loop parameters.}
\end{table}
7.4. RESULTS

We start by showing the effects of the non-uniform sampling at the PFD and of the truncation error in the calculation of the VCO semi-period. The effect of other noise sources will be discussed later. Fig. 7.4 shows the power spectral density of the output phase noise $\varphi_{VCO}$ due to the $\Sigma\Delta$ quantization for two different synthesizer topologies: the PSD of the S/H PLL is compared with the PSD of the standard PLL. The S/H PLL has a lower overall phase noise and does not present spurs. By contrast the standard PLL (i.e. without S/H) has greatly increased close-in phase-noise as well as reference spurs.

The benefits of using a $1^{\text{st}}$ order $\Sigma\Delta$ modulator in the VCO module algorithm (to avoid the effects of the accumulation of the truncation error) is illustrated in fig 7.5. Without $\Sigma\Delta$ modulator, the effects of the truncation error dominate the real noise shape of the synthesizer. The drawback of using the $1^{\text{st}}$ order $\Sigma\Delta$ modulator is the introduction of a noise floor around -205 dBc/Hz.

After getting rid of the two sources of non-linearity just described, we can expect a highly linear behavior from the simulation.

In fig. 7.6 the PSD from simulations are compared with the predicted theoretical curves. The lower curves represent the ideal condition: a floating point number representation in the $\Sigma\Delta$, i.e. no input quantization. The upper curves show the result of 16 bit digital number representation in the $\Sigma\Delta$, i.e. 16 bits quantization below decimal point. Clearly, the curves obtained from the simulation match very well with the PSD described by the equations derived in the previous chapter.

The low frequency noise floor (“dithering noise floor” in fig. 7.6) is due to a very small amount of dithering applied on the $\Sigma\Delta$ modulator input. In absence of modulated data, dithering is nec-
CHAPTER 7. $\Sigma\Delta$ PLLS SIMULATION TECHNIQUE

Figure 7.5: Phase noise PSD: effects of the truncation error.

Figure 7.6: S/H Synthesizer phase noise PSD.
7.4. RESULTS

Figure 7.7: Phase noise PSD with VCO noise added.

Figure 7.8: Phase noise PSD for different charge-pump current mismatches.
necessary to avoid the presence of fractional spurs. Note that no reference spurs appear in the output spectrum.

The accuracy can be evaluated through the RMS output phase errors. The calculation over a bandwidth of 300 kHz results in identical values in the linear model and the simulation: 0.0027° and 0.3574° for 16 bits input quantization.

In the previous figures, only the effect of the \( \Sigma \Delta \) quantization noise on the output phase noise has been considered. The effect of other noise sources can be easily evaluated; as an example, fig. 7.7 shows the PSD of the output phase noise due to the contribution of \( \Sigma \Delta \) quantization noise and VCO phase noise (in this example the VCO phase noise is about \(-140 \text{ dBc/Hz } @ 1 \text{ MHz offset}\)). Together with the simulation result, fig. 7.7 presents the predicted contribution of the single noise sources; the typical VCO phase noise (-20 dB/decade characteristic) determines an increased close-in phase noise.

As previously mentioned, it is easy to incorporate non-idealities in the blocks. It was previously shown how to implement current mismatches in the charge-pump block (algorithm 5). Fig. 7.8 presents the PSD of the phase noise for different current mismatches. Even with a small difference in the UP/DOWN currents, the close-in phase noise is greatly increased.

### 7.5 Summary

A new approach entirely based on event-driven simulation has been developed for \( \Sigma \Delta \) fractional-N synthesizer. The characteristics of the simulation can be summarized as:

- fast simulation time.
- high accuracy (depending only from the simulator accuracy).
- natural capability of simulating non ideal effects.
- easily redefinition of the block behavior.
- object-oriented nature.

It is important to stress once more that the simulation is not based on a linearized model; rather it simulates the true behavior of the synthesizer, including the non-linear behavior, occurring, for instance, during switching times.
The event-driven simulation methodology and the linear model previously derived are applied to the study case of $\Sigma\Delta$ synthesizers for GSM modulation. The indirect modulation capability makes the $\Sigma\Delta$ synthesizer architecture a compact, low-power, frequency-agile transceiver solution for the new communication standards.

As discussed in the previous chapters, the fast channel switching is due to a higher comparison frequency with respect to integer-N architecture. The indirect modulation capability is the topic of the next section; the results of several simulations will be summarized through the chapter to demonstrate the capability of GSM operation.

### 8.1 Transmitter architectures

The transmitter task is to perform modulation, up-conversion and power amplification. In some cases, the first two operations are merged in a single step. The baseband modulation data is first filtered to limit its spectrum and then is up-converted to the transmitter frequency. The last operation is a translation of the baseband spectrum to the transmit frequency; depending on how the up-conversion is done, the transmitter can be divided into several category:

- mixer based [67, 68]
- open loop VCO modulation [69, 70]
- indirect VCO modulation [33, 71]

The mixer based approach is shown in fig. 8.1. Two mixers and two baseband DAC are required to form the in-phase/quadrature (I/Q) channels. A heterodyne transmitter requires one or more frequency synthesizers; the low frequency modulation is up-converted in steps, requiring several filtering blocks, difficult to integrate on-chip.

A homodyne approach requires just a frequency synthesizer to generate the LO (Local Oscillator) carrier frequency, but suffers from Power-Amplifier (PA) leakage; to avoid that the PA and the synthesizer operate at the same frequency, a transmitter with offset frequency synthesizer can be used [72].
CHAPTER 8. $\Sigma \Delta$ SYNTHESIZERS FOR DIRECT GSM MODULATION

In the open-loop modulation technique, a frequency synthesizer generates the carrier frequency. Once the lock is reached, the loop is disconnected and the VCO is directly modulated, as shown in fig. 8.2. The advantage of this approach is the reduction in the number of blocks used: no mixers are necessary and only one A/D converter is needed. Thus great power saving can be achieved. However, the open-loop modulation introduces a severe drawback. In fact, when the synthesizer operates in open-loop mode, due to leakage currents, the VCO output frequency tends to drift away from its nominal frequency (frequency droop). Also, the VCO is very sensitive to perturbations and therefore strong isolation is required, making the one-chip solution unfeasible.

Note that the direct modulation technique just discussed could be applied also with closed-loop synthesizers, by superimposing the modulation on the VCO control voltage. In this case, the PLL has to be designed with a narrow loop bandwidth, otherwise the PLL tracks out the modulation. It is therefore unsuitable for large bandwidth modulation schemes.

The last modulation technique is based on proper control of a parameter that sets the VCO frequency; the control signal is proportional to the modulation data and can be applied directly on the divider modulus. Observe that the synthesizer always operates in closed loop mode, therefore eliminating the issue of the open-loop frequency drift. The isolation requirements are not so severe as in the case of open-loop modulation, giving the possibility of an integrated solution. Controlling the divider modulus has the great advantage of a direct digital input, eliminating the need of a DAC. Indirect VCO modulation is therefore the simplest and the most compact solution among the transmitter architectures presented.

In $\Sigma \Delta$ synthesizers the modulation data is directly applied by feeding the digital data in the $\Sigma \Delta$ modulator, therefore the transmit filter can be implemented digitally. Since the instantaneous frequency is set by a digital word, the modulation index is exact to the precision of the PLL reference frequency (which is usually set by a crystal); the reduction of analogue blocks in the transmit path (DAC, transmit filter) allows the possibility of good modulation accuracy.

$\Sigma \Delta$ synthesizers enable wide loop bandwidth; this makes the PLL robust toward interference and suppresses the close-in VCO noise, making the implementation of on-chip VCO feasible. The fast channel switching saves power due to the reduced warm-up time. The arbitrarily fine frequency

![Diagram](image-url)

Figure 8.1: Mixer based modulation: (a) heterodyne, (b) homodyne.
resolution allows the possibility of digital frequency correction and allows a greater flexibility in choosing the crystal frequency.

## 8.2 System architecture for EGSM/DCS

The target of the study case analyzed in this work is a dual band EGSM (Extended Group Mobile Standard) and DCS (Direct Cellular System) system.

The complete system block is shown in fig. 8.3 and the linear model is presented in fig. 8.5. The GSM transmit data is a 1 bit signal and it is assumed that the bit-stream to be transmitted can be modeled as white noise (i.e. it has a flat power spectral density). The bit-stream is passed through a digital Gaussian transmit filter to produce a signal which represents the desired phase variation as a function of time; to enable a wide bandwidth modulation [33], the PLL transfer function is compensated by a digital equalizer filter (pre-warping filter), which can be merged with the Gaussian filter. The pre-warp filter has a transfer function matching the inverse of the PLL transfer function, thereby ideally the transfer function from the input of the pre-warp filter to the output is constant and equal to one, well beyond the PLL bandwidth. Errors and limitations in the transfer function matching determine a modulation error which will be discussed later.

As said, the synthesizer supports two operating bands: the VCO runs at twice the standard transmit frequency in DCS mode and at four times the transmit frequency in EGSM mode. As the block model of fig. 8.3 shows, the PLL is followed by a divide-by-2 (DCS band) or a divide-by-4 block (EGSM band); to compensate the division, the input data to the modulator is multiplied by two (EGSM) or by four (DCS).

The divider block serves two functions: since the VCO is not running at the PA frequency, the pulling effect is reduced; moreover, since the divider block is placed outside the loop, the phase noise is improved of 6 dB (for the divide by 2 mode) and 12 dB (for the divide by 4 mode). The EGSM transmit band, 880 MHz-915 MHz, and the DCS transmit band, 1710 MHz-1785 MHz, set the VCO operating frequency: 3520 MHz-3660 MHz (EGSM) and 3420 MHz-3570 MHz (DCS).
Since the VCO frequency is similar for both band operation, it might be possible to use one VCO with band switching for both bands.

The complete EGSM/DCS specifications can be found in [73]. The main characteristics are here briefly summarized:

- constant envelope phase transmit modulation.
- required modulation accuracy smaller than 5° RMS for entire burst transmission.
- maximum instantaneous phase error smaller than 17°.
- up to 5 exception allowed.

The EGSM/DCS mask specifications are summarized in table 8.1. The masks take into account both transmission band and receiving band emissions. The values have been derived for the entire transmitter system including the PA; a maximum transmitter output power of 30 dBm has been assumed.

### 8.3 Modulation accuracy

The modulation inaccuracy is caused by close-in phase noise, power amplifier non-linearity (which cause PM-to-PM conversion) and non-ideal PLL transfer function, mainly caused by inaccurate VCO gain. In case of variation in the PLL transfer function, the digital pre-warping filter no longer accurately compensates the PLL transfer function.
8.3. MODULATION ACCURACY

![Gaussian transfer function and modulated data](image)

Figure 8.4: Gaussian transfer function and modulated data.

Ideally the modulated output signal is given as:

$$\varphi_{\text{out,ideal}}(s) = H_G(s) \cdot b_{tx}(s) \quad (8.1)$$

where $b_{tx}(s)$ is the one bit symbol stream and $H_G(s)$ is the transfer function of the Gaussian transmit filter. The magnitude of the transfer function is plotted in fig. 8.4 together with an example of modulated data. The actual modulation is given by:

$$\varphi_{\text{out,real}}(s) = H_G(s)H_{eq}(s)H_\theta(s)b_{tx}(s) \quad (8.2)$$

where $H_{eq}(s)$ is the Laplace transform of the pre-warp filter. The modulation error can be calculated by subtracting equation 8.2 from equation 8.1:

$$\Delta \varphi_{\text{error}}(s) = \varphi_{\text{out,real}} - \varphi_{\text{out,ideal}} = H_G(s)[H_{eq}(s)H_\theta(s) - 1]b_{tx}(s) \quad (8.3)$$
For convenience, the Laplace variable $s$ has been used for all the transfer functions, also the digital ones ($H_G$ and $H_{eq}$). The $z$ variable is simply replaced with $e^{sT_{COMP}}$. The $z^{-4}$ block is just a four samples delay and therefore does not need to be compensated. The condition for no modulation error can be stated mathematically as:

$$H_{eq}(s)H_{\theta}(s) = 1 \quad (8.4)$$

Observe again that $H_{eq}$ is digitally implemented, while $H_{\theta}$ is an analogue filter, depending on variable parameters (above all the VCO gain). This makes the ideal matching condition in eq. 8.4 impossible and introduces residual modulation error. The PSD of the modulation error is given by:

$$S_{\Delta\phi_{\text{OUT}}}(f) = \left| H_G(2\pi f) \right|^2 \left| H_{eq}(2\pi f)H_{\theta}(2\pi f) - 1 \right|^2 S_{b_{tx}}(f) \quad (8.5)$$

As previously stated, the RMS phase error, defined as:

$$\Delta\phi_{\text{outRMS}} = \int_{-300 \text{ kHz}}^{300 \text{ kHz}} S_{\Delta\phi_{\text{OUT}}}(f)df \quad (8.6)$$

must be smaller than $5^\circ$ RMS, according to the GSM standard.

The pre-warped filter is designed to match the synthesizer transfer function $H_{\theta}(s)$ under nominal conditions, but in practice the PLL transfer function deviates from the nominal one. The effects of the VCO variation are plotted in fig. 8.6. The design parameters chosen makes tolerable VCO variations up to $\pm30\%$.

### 8.4 Design example

Table 8.2 summarizes the PLL design variables. The capacitor values are rather large in order to ensure low-noise; they can actually be scaled if the VCO gain is reduced. This would introduce the necessity of coarse band switching in the VCO.

The transmit power spectrum for the EGSM nominal case is plotted in fig. 8.7; fig. 8.8 presents the nominal DCS case. It can be seen that for both standards the transmit spectra lies withing
the standard mask with only one exception; the modulation error calculated with equation 8.6 is found to be around 0.2° RMS for both transmit bands. Except for spurs cancellation, the difference between the S/H topology and the standard ΣΔ PLL architecture is negligible in transmit mode.

In a circuit implementation several non-ideal elements will affect the synthesizer performance, easily increasing the phase noise over the mask specifications.

With the aid of the simulations, the effects of the following non-linearities have been deeply investigated:

- Variations of the VCO duty-cycle. This effect becomes important if the divider operates on both VCO rising and falling edge to divide the output frequency down to the comparison frequency. This non-linearity can be easily seen as a variable divider propagation delay.

- Variations of the divider moduli propagation delay. It is fundamental to ensure that, when switching between division ratio, the propagation delay is constant. To simulate cases close to real implementation, two distinct sets of simulations have been run: in the first case, a Gaussian distributed delay has been assigned to all the divider ratio. In the second case, the effect of the delay on a single ratio has been analyzed. The divider value with the highest frequency in the modulator output sequence has been chosen.

- Mismatches in the charge-pump currents. For this issue, the possibility of compensating the
CHAPTER 8. $\Sigma \Delta$ SYNTHESIZERS FOR DIRECT GSM MODULATION

<table>
<thead>
<tr>
<th>Synthesizer Output Frequency</th>
<th>3624 MHz</th>
</tr>
</thead>
<tbody>
<tr>
<td>Reference Frequency</td>
<td>26 MHz</td>
</tr>
<tr>
<td>Modulator order</td>
<td>4</td>
</tr>
<tr>
<td>Charge Pump Current</td>
<td>500 $\mu$A</td>
</tr>
<tr>
<td>VCO gain</td>
<td>100 MHz/V</td>
</tr>
<tr>
<td>Natural Frequency</td>
<td>100 kHz</td>
</tr>
<tr>
<td>Damping Factor</td>
<td>1</td>
</tr>
<tr>
<td>Loop Filter Zero</td>
<td>50 kHz</td>
</tr>
<tr>
<td>Loop Filter Integration Capacitor</td>
<td>818 pF</td>
</tr>
<tr>
<td>Second loop filter capacitor</td>
<td>91 pF</td>
</tr>
<tr>
<td>Zero setting resistor</td>
<td>3.9 k$\Omega$</td>
</tr>
<tr>
<td>Third loop filter pole</td>
<td>500 kHz</td>
</tr>
<tr>
<td>Fourth loop filter pole</td>
<td>1 MHz</td>
</tr>
<tr>
<td>Fifth loop filter pole</td>
<td>5 MHz</td>
</tr>
</tbody>
</table>

Table 8.2: PLL design variables.

<table>
<thead>
<tr>
<th>Phase error (RMS)</th>
<th>VCO error</th>
<th>PFD error</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>min</td>
<td>max</td>
</tr>
<tr>
<td>MASH with S/H</td>
<td>0.0027$^\circ$</td>
<td>-0.242$^\circ$</td>
</tr>
<tr>
<td>standard</td>
<td>0.0517$^\circ$</td>
<td>-0.481$^\circ$</td>
</tr>
<tr>
<td>Candy with S/H</td>
<td>0.0027$^\circ$</td>
<td>-0.236$^\circ$</td>
</tr>
<tr>
<td>standard</td>
<td>0.0516$^\circ$</td>
<td>-0.542$^\circ$</td>
</tr>
</tbody>
</table>

Table 8.3: Simulated errors for MASH and Candy modulators.

non-linearity by means of an offset current has been analyzed.

8.4.1 Impact of non-linearities on the synthesizer performance.

This section is a brief summary of the main conclusions extracted from simulations; the entire set of simulated cases consists of more than 1300 runs. The choice of the modulator order was based on a compromise between synthesizer bandwidth and quantization noise, in order to satisfy the EGSM/DCS spectral masks; the architecture choice was dictated by the requirement of spur free output spectrum. The phase errors for MASH and Candy topologies due to the quantization error are reported in table 8.3; the values are basically equal for both architectures. Given that both topologies have equal spectra and the same error distribution, the MASH structure was chosen for all the simulations with modulation, due to its inherent stability.

We start by considering the effect of a variable divider ratio propagation delay: if the delay is not constant for all the ratios, then the time error (eq. 6.19) contains a non-linear term. This corresponds to variable time step quantization at the divider (since the VCO period is constant, the divider counts at fixed time steps), leading, once more, to noise down-folding. A VCO with a variable duty-cycle leads to same issue: the variation in the duty cycle can always be seen as a
variation of the divider propagation delay. The effect of this non-linearity, for a single ratio delay and S/H topology, is plotted in fig. 8.10: as the delay increases, a larger portion of the transmit power spectrum lies outside the mask specification, as a consequence of the down-folding issue. Observe that, even if the mask is progressively violated as the delay increase, the modulation error is still acceptable. This can be seen in the left side of fig. 8.9: even for a delay of 100ps, the RMS phase error is smaller than 3°. In case the delay is Gaussian distributed, the error is below 1.5°; however it is difficult to draw a conclusion since the simulation has been run with one statistical sample of a Gaussian distribution. The RMS errors just reported are calculated for the EGSM band; similar trend are found for the DCS band, but the error magnitudes are doubled, due to the different divider ratio at the VCO output.

Concerning phase noise, no noticeable difference between the S/H synthesizer topology and the standard synthesizer architecture have been revealed by the simulations. The effects of the S/H on the phase-noise are overcome by the power of the low frequency modulation data; however, the reference tone suppression becomes even more evident in case of CP currents mismatches or offset compensation current.

The increase of phase noise due to CP mismatches has already been shown in the previous chapter. In order to compensate for mismatches an offset current can be added in the synthesizer
The effect of this small current is to shift the CP operating point: when the current magnitude is properly set, the PFD produces only one type of pulses (UP or DOWN). By always activating the same current source at each comparison frequency, the mismatch issue is canceled. The disadvantage of using an offset current is an increased magnitude of the reference tone (and harmonics) as reported in table 8.4. In this case, the S/H architecture is not penalized, due to the tones suppression operated by the sampling-and-hold operation.

The general conclusions for both EGSM and DCS bands can be summarized as follows:

- It is important to ensure equal propagation delay for all divider moduli: the maximum acceptable delay variation has to be smaller than 10 ps.

<table>
<thead>
<tr>
<th>offset current</th>
<th>0 μA</th>
<th>0.1 μA</th>
<th>0.2 μA</th>
<th>0.3 μA</th>
<th>0.4 μA</th>
<th>0.5 μA</th>
<th>0.6 μA</th>
</tr>
</thead>
<tbody>
<tr>
<td>DCS 1st spur (dBc/Hz)</td>
<td>-146.14</td>
<td>-115.35</td>
<td>-109.31</td>
<td>-105.8</td>
<td>-103.31</td>
<td>-101.38</td>
<td>-99.82</td>
</tr>
<tr>
<td>DCS 2nd spur (dBc/Hz)</td>
<td>-162.74</td>
<td>-139.61</td>
<td>-133.71</td>
<td>-130.24</td>
<td>-127.77</td>
<td>-125.86</td>
<td>-124.34</td>
</tr>
<tr>
<td>DCS 3rd spur (dBc/Hz)</td>
<td>-166.04</td>
<td>-153.2</td>
<td>-147.51</td>
<td>-144</td>
<td>-141.59</td>
<td>-139.71</td>
<td>-138.30</td>
</tr>
</tbody>
</table>

Table 8.4: DCS spurious performance for various offset currents.
8.4. DESIGN EXAMPLE

---

**Figure 8.9:** Phase error RMS value for variable delay and CP mismatch.

---

**Figure 8.10:** Voltage Power Spectral Density for different divider delays with GSM modulation.
• Even a small mismatch in the charge-pump currents results in a large close-in phase noise increase: the maximum mismatch acceptable is as little as 1% of the CP current. The mismatches can be compensated with an offset current source: a mismatch up to 5% can be acceptable (from mask and spur requirement points of view) with a 2% CP offset current. However, the greater the required compensation is, the larger is the magnitude increase of the reference tone. This does not apply to the S/H PLL.

• For receive synthesizers, the S/H topology greatly reduces the close-in phase noise. In transmit mode, the increased close-in phase-noise integrates up to a relatively small RMS phase error; consequently, it is acceptable to use the standard topology. However the Sample/Hold eliminates the reference spur issue.
CHAPTER 9
CALIBRATION

An accurate PLL response is required in many situations, especially when ΣΔ PLLs are used for indirect modulation [47]. As previously mentioned, in these types of PLLs, the data fed into the ΣΔ modulator is often undergoing a pre-filtering process in order to cancel the low-pass PLL transfer function and thereby extend the modulation bandwidth [33]. The pre-distortion filter presents a transfer function equal to the inverse of the PLL transfer function and it is usually implemented digitally. Consequently, a tight matching between the pre-distortion filter and the analogue PLL transfer function is necessary to avoid distortion of the transmitted data.

Especially for on-chip Voltage Controlled Oscillators (VCO), the gain $K_{VCO}$ is typically the parameter with the poorest accuracy among the PLL analog components. Other sources of variability are the resistor and the integrating capacitance of the loop-filter. If the LF is implemented off-chip, then both resistance and capacitance can be determined with good accuracy. If the LF is implemented on-chip, a possible implementation by means of switch-capacitors reduces the variability only to the capacitor value. In this case, in order to establish an accurate PLL transfer function only the product $K_{VCO} \times I_{CP}/C$ needs to be accurate [74]. The PLL can then be calibrated by adjusting the charge-pump current; the problem is how to measure the accuracy of the PLL transfer function.

A continuous calibration technique is presented in [75]. The transmitted data is digitally compared with the input data and the charge-pump current is then adjusted to compensate the detected error. This method offers the possibility of continuous calibration at the expense of increased circuit complexity; since the error detection is based on the cross-correlation between input and transmitted data, this approach will not work on unmodulated synthesizers.

An alternative approach is found in [76], where a method based on the detection of pulse skipping is described. The presence of one or several pulse skips can be used as an indication of the bandwidth. This method requires an input frequency step large enough to push the PLL into its non-linear operating region and only offers a rough estimation of the actual PLL bandwidth.

The next sections show a novel approach that makes it possible to determine the characteristics of the PLL transfer function by simply adding a digital counter; moreover the approach can be used to obtain an estimate of the static phase error of the PLL.
9.1 Measurement scheme

A two step calibration cycle is required by this method. In the first step the natural frequency $\omega_n$ of the transfer function is retrieved; the second step is used to determine the damping factor $\zeta$. In order to explain the basic idea behind the method, it is going to be initially applied to integer-N PLLs and afterward it will be applied to $\Sigma\Delta$ PLLs. The block schematic of the synthesizer used for calibrating is shown in fig. 9.1. The only difference compared to a standard integer-N architecture is the presence of two switches in the loop filter. To start the calibration, the switches to $R_{cal1}$ and $R_{cal2}$ are closed and the calibration resistors are connected to the resistor $R$. Since $R_{cal1}$ and $R_{cal2}$ are placed in parallel with the filter resistance $R$, the total resistance is reduced, resulting in under-damped characteristics of the loop transfer function.

By changing the division ratios $M$ or $N$, a frequency step can be applied to the PLL. The PLL reacts to compensate the phase error detected at the PFD. The time behavior of the phase error is a damped oscillation, whose natural frequency can be indirectly measured by counting the UP/DOWN pulses produced by the PFD. If the counter counts 1 up for each UP pulse and counts 1 down for each DOWN pulse generated from the time the frequency step was applied, then the maximum counter value is a measure of the natural frequency of the PLL transfer function. This fact can be seen in fig. 9.2, where the expected behavior of the phase error together with the counter value behavior are presented: as long as the phase error is positive (i.e. the reference signal leads in phase the divider signal), UP pulses are generated by the PFD; DOWN pulses are produced while the phase error is negative.

Once the natural frequency is determined, the calibration step is repeated after changing the damping characteristics of the transfer function, i.e. by opening the switch to $R_{cal2}$. By comparing
the values of the oscillation frequency in the two steps, it is possible to estimate the variation of the damping factor $\zeta$; this information can be used to adjust the filter resistor $R$ to obtain the desired damping factor.

The presence of a leakage current in the charge-pump will induce a static phase error at the PFD input. This, in turn, means an increased number of pulses in one direction (e.g., UP pulses). However, as explained later, the value of $\omega_n$ and $\zeta$ can still be measured.

The auxiliary PFD in fig. 9.1 might be required to generate stable UP/DOWN pulses for the digital counter, depending on the synthesizer PFD implementation. A possible circuit implementation that works together with a standard three-states PFD is shown in fig. 9.3. The two set-reset
flip-flops (SR-FF) are used to establish which one between the UP/DOWN pulses occurs first. This is necessary because the UP and DOWN pulses are simultaneously high for a length equal to the delay in the PFD reset path [74]. If the UP pulse rises before the DOWN pulse, then a logical 'ONE' appears at the input of the top edge-triggered resettable D flip-flop (fig. 9.3) and a logical 'ZERO' appears at the input of the bottom D flip-flop. The UP pulse delayed through a couple of inverters clocks the flip-flop and the negative transition of the REF clock resets the flip-flop. Hence the flip-flop produces an UP\textsubscript{stable} pulse whose length is approximately equal to the REF semi-period.

The opposite happens if the DOWN pulse occurs before the UP pulse. If the PFD produces aligned UP and DOWN pulses (this is the case if the input phase error is smaller than the dead-zone of the PFD) then the UP\textsubscript{stable} and the DOWN\textsubscript{stable} signals are high at the same time.

### 9.2 Mathematical derivation

To justify mathematically the calibration method, we start by deriving the PLL loop transfer function with the aid of the linear model of fig. 9.4:

\[
H\text{\textsubscript{loop}}(s) = \frac{I\text{\textsubscript{cp}}(R \cdot C_p s + 1)K\text{\textsubscript{VCO}}}{2\pi C_p s^2 N}
\]  

(9.1)

The transfer function from phase input to phase error is given by:

\[
\frac{\Phi\text{\textsubscript{err}}(s)}{\Phi\text{\textsubscript{in}}(s)} = \frac{1}{1 + H\text{\textsubscript{loop}}(s)} = \frac{s^2}{s^2 + 2\zeta \omega_n s + \omega_n^2}
\]  

(9.2)

The natural frequency \(\omega_n\) is given by:

\[
\omega_n = \sqrt{\frac{I\text{\textsubscript{cp}}K\text{\textsubscript{VCO}}}{2\pi C_p N_d}}
\]  

(9.3)
and the damping factor ζ is defined as:

\[ ζ = \frac{R}{2} \sqrt{\frac{I_{CP} C_p K_{VCO}}{2πN}} \]  \hspace{1cm} (9.4)

An unit input frequency step corresponds to an input phase ramp, with Laplace transform given by \( Φ_{in}(s) = \frac{1}{s^2} \). The Laplace transform of the phase error is then given by:

\[ Φ_{err}(s) = \frac{s^2}{s^2 + 2ζω_n s + ω_n^2 + \frac{1}{N}} \] \hspace{1cm} (9.5)

The behavior in the time domain of equation 9.5 is the impulse response of a second order system:

\[ ϕ_{err}(t) = \frac{1}{ω_o} e^{-ζω_o t} \sin (ω_o t) \] \hspace{1cm} (9.6)

where the oscillation frequency \( ω_o \) is defined as:

\[ ω_o = ω_n \sqrt{1 - ζ^2} \] \hspace{1cm} (9.7)

Note that the natural frequency is independent of the filter resistor \( R \), but the actual oscillation frequency \( ω_0 \) depends on \( R \) through the damping factor. Once the natural frequency is retrieved, \( R \) can be adjusted to obtain the correct damping factor, without affecting the value of \( ω_n \).

As previously said, when the phase error function \( ϕ_{err}(t) \) is positive, the PFD generates UP pulses. If the error becomes negative, DOWN pulses are generated. Assuming a positive frequency step, an initial sequence of UP pulses is produced by the PFD and the counter value increases monotonically. When the phase error crosses the zero-error phase, occurring at \( t_{cross} = \frac{π}{ω_0} \) according to equation 9.6, DOWN pulses start to appear decreasing the counter value. Hence at the crossing time the counter reaches its maximum value, \( V_{max} \); the crossing point can also be expressed as:

\[ t_{cross} = \frac{π}{ω_0} = V_{max} \cdot T_{REF} \] \hspace{1cm} (9.8)

where \( T_{REF} \) is the period of the REF signal (fig. 9.1).

Since, due to stability reasons [74], the PLL dynamics is always much slower than the REF clock, the error introduced by quantizing \( t_{cross} \) with \( T_{REF} \) is then negligible. By solving equation 9.8, \( ω_0 \) can be expressed as:

\[ ω_0 = \frac{π}{V_{max} \cdot T_{REF}} \] \hspace{1cm} (9.9)

By making the damping factor \( ζ \) small, the oscillation frequency \( ω_0 \) is roughly equal to the natural frequency \( ω_n \) (equation 9.7). The value of \( ω_0 \) retrieved with the second step can be used to calculate the relative \( ζ \) variation; in this way it is possible to adjust the resistor \( R \) to obtain the desired damping factor. Note also that the oscillation frequency estimation is independent of the applied frequency step size.

So far all the equations have been derived under the assumption that the PLL is operating in its linear region. In case of a large frequency step (this is usually the case if the crystal oscillator
divider is changed), the PLL might lose its frequency lock. In this case, the previous equations are no longer valid; however, it is equally possible to use the calibration method by extracting the final counter value from simulations. Another possibility is resetting the counter whenever a pulse skip is detected: this condition occurs when two edges of the same input signals (reference clock or divider feedback signal) appears at the PFD input without an edge of the other signal occurring in the middle. This indicates that the frequency of the two signals is different, e.g. the PLL is operating in frequency acquisition mode. Once the frequency lock is achieved, the PLL enters the phase acquisition mode: the counter is not reset anymore and the behavior of the PLL can be modeled with the described linear equations.

9.3 Estimation of the static phase offset

Every real PLL implementation is affected by a static phase offset; its presence is due to different factors, such as leakage currents, mismatches in the charge-pump UP/DOWN currents or different CP currents turn-on/turn-off timing. If the phase offset can be measured, a small offset current can be added to null the static phase offset, therefore improving the PLL spurs performance. Notice that knowing the phase offset is not sufficient for proper spur reduction: if the offset is due to a leakage current or to a lossy loop filter, then an offset current source can be adjusted to eliminate completely (in principle) the spur. On the contrary, if the offset is caused by a current mismatch, adding a compensation current will help, but will not remove the spur completely: in fact, a spike current will be replaced with a saw-tooth shaped current.

As previously mentioned, a static phase offset will alter the number of UP or DOWN pulses produced during the transient response. This can be visualized with the aid of fig. 9.5, showing the phase error curves for a positive and a zero static phase offset (for a positive frequency step) together with the relative counter curves. It can be seen that the effect of the phase offset is a positive translation of the zero phase error curve; as a consequence, the oscillation period measured with the counter will differ from the zero-offset case.
9.3. ESTIMATION OF THE STATIC PHASE OFFSET

In the case shown in fig. 9.5 the oscillation period will be overestimated, since the PFD will produce UP pulses for a longer time interval, till the intersection of the offset phase error curve with the zero phase error line.

Consider now the sequence of DOWN pulses following the UP pulses: in this case the length of the sequence is shorter than the value expected under zero phase offset condition. As a consequence, at the end of the first oscillation period, the counter value will be different from zero.

Let $V_{\text{max}}$ indicate the maximum number of UP pulses and let $V_{\text{min}}$ indicate the minimum number of DOWN pulses; by comparing $V_{\text{min}}$ with $V_{\text{max}}$, it is not only possible to determine the real oscillation period, but it is also possible to extract information about the phase offset.

In fact, under zero phase offset condition, the magnitude of $V_{\text{max}}$ is equal to the magnitude of $V_{\text{min}}$; this means that if a phase offset is present, the correct number of pulses is the average value between the magnitude of $V_{\text{max}}$ and $V_{\text{min}}$. The real oscillation period is then given by:

$$ \omega_0 = \frac{2\pi}{(|V_{\text{max}}| + |V_{\text{min}}|) \cdot T_{\text{REF}}} \quad (9.10) $$

Furthermore, the sign of the difference between $V_{\text{max}}$ and $V_{\text{min}}$ is equal to the polarity of the phase offset. Finally it possible to obtain a rough estimation of the magnitude of the phase offset. Let $t_{\text{meas}}$ denote the semi-period of the offset phase error curve; the phase offset can then be obtained by evaluating equation 9.6 for $t = t_{\text{meas}}$. This is visualized in fig. 9.5: the offset curve $\phi_{\text{offset}}(t)$ can be written as:

![Figure 9.5: Phase error curves.](image-url)
\[ \phi_{\text{offset}}(t) = \phi_{\text{error}}(t) + \phi_{\text{static\_offset}} \]  

(9.11)

By observing that \( \phi_{\text{offset}}(t) \) is equal to zero for \( t = t_{\text{meas}} \), the static phase offset is given by:

\[ \phi_{\text{static\_offset}} = -\phi_{\text{offset}}(t_{\text{meas}}) \]  

(9.12)

The accuracy of the above equation depends on many factors; first of all, \( t_{\text{meas}} \) is quantized with a time step equal to the inverse of the PFD comparison frequency \( F_{\text{REF}} \). The higher the frequency is (compared to the PLL bandwidth) the better is the resolution. Also, unlike the oscillation frequency estimation technique, the size of the frequency step directly influences the estimation accuracy. This is because a large step will produce a large phase excursion at the PFD input, and the static phase error then only constitutes a small proportion. Therefore it is preferable to use a small frequency step for this measurement. A rough estimation of the minimum detectable phase-offset can be estimated by evaluating equation 9.6 in the case that \( F_{\text{REF}} \gg F_{\text{bandwidth}} \):

\[ \phi_{\text{static\_offset}}(t) \approx \frac{A}{\omega_0} \sin \left( \frac{\omega_0}{F_{\text{REF}}} \right) \approx \frac{A}{F_{\text{REF}}} \]  

(9.13)

where \( A \) is the step amplitude. Since the argument of the sine is small, the derivation of equation 9.13 uses the approximation \( \sin(x) \approx x \).

### 9.4 Extension to ΣΔ PLL topologies

The applicability of the measuring technique will now be demonstrated for ΣΔ fractional-N PLLs. As explained before, the output frequency is controlled by means of a ΣΔ modulator. Thus, to measure \( \omega_0 \), the frequency step is, in this case, applied to the modulator input.

The linear model of a ΣΔ fractional-N PLL is shown in fig. 9.6. In the example analyzed in this section, the loop filter transfer function is the same as presented in chapter 8. The mathematics involved in this case is lengthier, but the final transfer function can be reduced to an approximate 2nd order equation.

We start by finding the transfer function from the ΣΔ modulator input to phase error \( \Phi_{\text{err}}(s) \). With the aid of fig. 9.6, considering that the ΣΔ modulator signal transfer function is just adding a delay to the input data, the transfer function is given by:

\[ \frac{\Phi_{\text{err}}(s)}{\Sigma\Delta_{\text{in}}(s)} = \frac{2\pi}{N + \mu_3} \cdot \frac{e^{-sT_{\text{ref}}}}{1 - e^{-sT_{\text{ref}}}} \frac{1}{1 + H_{\text{loop}}(s)} \]  

(9.14)

Indicating with \( \omega_3, \omega_4, \omega_5 \), and \( z_1 \) the high order poles and, respectively, the zero of the loop filter, and by setting:

\[ T_{\text{eq}}^2 = \frac{1}{\omega_3^2} + \frac{1}{\omega_4^2} + \frac{1}{\omega_5^2} + \frac{1}{\omega_3 \cdot \omega_4} + \frac{1}{\omega_3 \cdot \omega_5} + \frac{1}{\omega_4 \cdot \omega_5} + \frac{1}{\omega_3 \cdot z_1} + \frac{1}{\omega_4 \cdot z_1} + \frac{1}{\omega_5 \cdot z_1} \]
it is possible to define an equivalent natural frequency $\omega_{n1}$ and an equivalent damping factor $\zeta_1$:

$$\omega_{n1} := \sqrt{\frac{\omega_n^2}{1 + (\omega_n T_{eq})^2}}$$  \hspace{1cm} (9.15)

$$\zeta_1 = \frac{1}{2} \omega_{n1} \left( \frac{1}{z_1} - \frac{1}{\omega_3} - \frac{1}{\omega_4} - \frac{1}{\omega_5} \right)$$  \hspace{1cm} (9.16)

After proper manipulations, eq. 9.14 can be reduced to the following approximated expression:

$$\frac{\Phi_{\text{err}}(s)}{\Sigma \Delta_{\text{in}}(s)} = G \cdot \frac{s}{s^2 + 2\zeta_1 \omega_{n1} s + \omega_{n1}^2}$$  \hspace{1cm} (9.17)

with the gain factor given by $G = \left( \frac{\omega_{dbl}}{\omega_0} \right)^2 \frac{2\pi}{N + \mu_b} \frac{1}{T_{ref}}$. The final expression for the phase error, obtained by applying a step function to the $\Sigma \Delta$ modulator, is given by:

$$\Phi_{\text{err}}(s) = \frac{G s}{s^2 + 2\zeta_1 \omega_{n1} s + \omega_{n1}^2}$$  \hspace{1cm} (9.18)

and it corresponds directly to equation 9.5. The smaller the loop damping factor $\zeta$ is, the more accurate is the approximation in equation 9.18. At this point, the natural frequency can be calculated as described in section 9.2.

However the use of $\Sigma \Delta$ fractional-N PLLs introduces a new requirement for the correct applicability of the method. The input step to the modulator needs to be large enough to overcome the random effects of the modulator itself. If the input step is too small, then the UP sequence is no longer monotonic and the extracted value of $\omega_0$ is no longer accurate.
CHAPTER 9. CALIBRATION

9.5 Simulation results

Based on the simulation results, the $\Sigma \Delta$ fractional-$N$ PLL topology can be simulated with a linear simulator such as Simulink; however the system behavior has been also investigated with the event-driven approach discussed in chapter 7, to capture the effects of non-linearities.

As previously discussed, the VCO gain $K_{VCO}$ is the parameter with the poorest accuracy; the $\Sigma \Delta$ PLL was simulated with the nominal $K_{VCO}$ value and with a gain variation of $\pm 30\%$ with respect to the nominal value. The mismatch between the pre-distortion filter and the PLL transfer function due to this variation causes an output error up to 5 degrees RMS. In fig. 9.7 the counter behavior for the 3 different $K_{VCO}$ values is presented. It is apparent that the three curves reach different peaks according to the value of $K_{VCO}$; as time proceeds, the effects of the $\Sigma \Delta$ modulator start to appear.

By substituting the values of the parameters in the equations presented in section 9.4, the theoretical maximum counter values for the three different VCO gains, are, respectively, 150, 123, and 106. The values extracted from the Verilog simulation of fig. 9.7 are 148, 122, and 105; these values closely match the predicted ones.

The counter behavior for different input frequency step is shown in fig. 9.8. As the step is increased, the $\Sigma \Delta$ modulator noise is overcome and the measured values match very well with the predictions. In the same figure the results for both simulators, Verilog and Simulink are presented.
Figure 9.8: Counter maximum vs. frequency steps.

Note that the Simulink curves are very close to the curves obtained with Verilog, which confirms that the linear simulator describes accurately the transient behavior of the ΣΔ PLL even for fairly large input frequency steps.

The estimation of the phase offset with the method described in section 9.3 is more difficult for ΣΔ PLL. In fact the resulting phase error curve for a frequency step is not as smooth as the integer case; this means that in the proximity of the zero phase error line there could be more than one crossing before and after the real crossing time. This affects only marginally the bandwidth estimation since the variation in the number of pulses is small relatively to the total number of pulses. On the contrary, the phase offset estimation can be significantly affected.

A possibility to overcome the problem is to take the average of several step measurements. Alternatively, the ΣΔ can be overloaded (or switched off) before the step is applied in order to operate the PLL in integer mode.

9.6 Summary

A new method to calibrate the PLL transfer-function has been presented. The implementation does not require any additional analogue component. The only necessary extra circuitry is a digital counter. This new approach does not offer continuous calibration and it requires a calibration cycle,
but it is very simple and virtually no extra silicon area and no extra power consumption is required. Moreover, this technique works for both linear and non-linear PLL frequency step responses; also, it can be used to estimate and calibrate the static phase offset. The mathematical formulation of the method has been verified with simulations based on a $\Sigma\Delta$ fractional-N PLL topology, run both on Verilog and Simulink. Results from both simulations closely match the theoretical values.
CHAPTER 10

CONCLUSIONS

In this work, two separate cases of low-voltage and low-power systems have been investigated with different perspectives. The first example, namely a low-voltage amplifier was treated at transistor level to show what the implications of the supply voltage scaling are. The second case, specifically a $\Sigma\Delta$ frequency synthesizer, was analyzed at system level to investigate its feasibility as a low-power transceiver architecture.

A new technique to reduce the threshold voltage of the conventional MOS transistor was the achievement of the first part of the work. The current-driven bulk approach was introduced and it was explained how to lower the threshold voltage (exploiting the transistor body-effect) by forcing a constant current out of the bulk terminal. It was also shown how the current-driven bulk technique can be easily integrated in standard analog design to smooth the constraints imposed by low-voltage design. To verify the applicability of this technique experimentally, a prototype operational transconductance amplifier was implemented in a standard 0.5 $\mu$m process. The target was to achieve 1 V operation; results from measurements have not only confirmed the strength of the CDB approach, but have also demonstrated the amplifier capability of sub-1 V operation.

Detailed analysis of $\Sigma\Delta$ synthesizers was the topic of the second portion of the work. One of the accomplishments of this part is the derivation of an analytical model for noise analysis of $\Sigma\Delta$ synthesizers; moreover the resulting model not only applies to $\Sigma\Delta$ modulation, but is valid for any kind of divider dithering: for example, standard fractional-N PLLs can also be analyzed with the developed model.

The linear model was augmented to include the effect of a previously unknown non-linearity intrinsic to standard $\Sigma\Delta$ synthesizer architectures. It was demonstrated how this non-ideality is responsible for high-frequency noise power down-folding and how the overall baseband contribution becomes progressively significant with the modulator order. A new $\Sigma\Delta$ fractional-N topology was proposed to eliminate the issue introduced by the above mentioned non-linearity: the novel architecture comprises a sample/hold block placed between the phase-frequency detector and the loop filter to align the PFD output sequence to fixed clock edges. The sample/hold architecture has also another beneficial effect: by inserting zeros at the reference frequency multiples, the problem of output spectrum reference tones is eliminated.

A new methodology was developed to address the non-trivial issues dictated by $\Sigma\Delta$ fractional-N simulation. The presented approach is entirely based on an object-oriented event-driven method-
A unique feature of this new simulation technique is the high degree of accuracy: the model does not require approximations and does not depend on assumptions. Moreover, undesirable time quantization phenomena are avoided: the only limitations depend on the numerical accuracy of the event-driven simulator.

Another advantage of the developed methodology is its capability to naturally predict non-obvious phenomena such as the above mentioned down-folding issue, without having to resort to any special measures. It is worthwhile to remember that the simulation model is based on the behavior of the synthesizer blocks and not on their linear model. Therefore, results from simulation are independent from the analytical model previously derived and can be used to sustain it; the opposite is also true, specifically the analytical model can be used to establish the accuracy of simulations. Results of multiple simulations have in fact acknowledged good matching with the theoretical model.

Several examples have been shown to demonstrate that the simulation methodology can be easily applied to the study of the effects of multiple non-idealities. A study case for direct GSM/DCS modulation was deeply investigated; the brief summary presented indicates that the S/H \( \Sigma \Delta \) fractional-N synthesizer is suitable for fulfilling the GSM/DCS standard requirements.

Finally, a new method to calibrate the PLL transfer-function has been presented. It was shown how it can be implemented with a simple digital counter, without demanding any additional analog components. The novel approach does not offer continuous calibration and it requires a calibration cycle or two if both natural frequency and damping factor are retrieved, but it is extremely simple and, virtually, no extra silicon area and no extra power consumption is necessary. Moreover, this technique can be equally applied to both linear and non-linear PLL frequency step responses; it can also be used to estimate the static phase offset.

The mathematical formulation of the calibration method was verified at system level with simulations based on a \( \Sigma \Delta \) fractional-\( N \) PLL topology, run both on a behavioral model (Verilog) and a linear model (Simulink). Both simulations closely match the predictions.
Part III

Appendices
APPENDIX A

VERILOG CODE

Reference clock block

// Reference clock
`timescale 1s / 1fs
MODULE clk_block (V_clk);
OUTPUT V_clk;
reg clk;
initial clk =0;
always # (0.5/(26e6)-1e-15) clk=~clk;
assign {V_clk}=clk;
ENDMODULE

Charge-Pump block

// Charge-Pump
`timescale 1fs / 1fs
MODULE cp_block (UP, DOWN, I, ctrl);
INPUT UP; // UP from PFD
INPUT DOWN; // DOWN from PFD
OUTPUT I; // output current
OUTPUT ctrl; // control signal that toggles on current changes
wire [64:1] I;
real i_actual,i_up,i_down,i_old;
reg ctrl;
initial
begin
i_up = 10.0e-6;
i_down = 10.0e-6;
ctrl = 0;
end
always @ (posedge(UP) or negedge(UP) or posedge(DOWN) or negedge(DOWN))
begin
    i_actual=(1.00*UP*i_up-1.00*DOWN*i_down);
    if (i_actual!=i_old)
    begin
        ctrl=~ctrl;
        i_old=i_actual;
    end
end
assign{I}=$realtobits(i_actual);
ENDMODULE

Divider block

// multi-moduli divider
'timescale 1fs / 1fs
MODULE div_block (VCO_in, mod_ctrl, div_out);
INPUT VCO_in, mod_ctrl;
OUTPUT div_out;
wire [32:1] mod_ctrl;
reg div_clk;
integer N, i, N_base, delay, fp;
time t;
initial
begin
    i=0;
    N=134;
    N_base=134;
    t=0;
    div_clk=1;
end
always @ (posedge (VCO_in))
begin
    i=i+1;
    if (i==10) div_clk=0;
    if (i == (N))
    begin
        i=0;
        t=$time;
        N=mod_ctrl;
    end
if (N==N_base-7) delay=1e3; //standard delay is 1ps
if (N==N_base-6) delay=1e3;
if (N==N_base-5) delay=1e3;
if (N==N_base-4) delay=1e3;
if (N==N_base-3) delay=1e3;
if (N==N_base-2) delay=1e3;
if (N==N_base-1) delay=1e3;
if (N==N_base) delay=1e3;
if (N==N_base+1) delay=1e3;
if (N==N_base+2) delay=1e3;
if (N==N_base+3) delay=1e3;
if (N==N_base+4) delay=1e3;
if (N==N_base+5) delay=1e3;
if (N==N_base+6) delay=1e3;
if (N==N_base+7) delay=1e3;
if (N==N_base+8) delay=1e3;
# delay div_clk=1;
end
end
assign {div_out}=div_clk;
ENDMODULE

Loop-Filter block

`timescale 1fs / 1fs

MODULE loop_block (I_cp, cp_ctrl, VCO_clk, V_ctrl);
INPUT I_cp; // current signal from CP block
INPUT VCO_clk; // VCO signal
INPUT cp_ctrl; // ctrl signal from CP block
OUTPUT V_ctrl; // Control voltage for VCO
wire [64:1] V_ctrl;
wire [64:1] I_cp;
integer fp_loop;
real z0,p0,p1,p2,A0,A1,A2,tau0,tau1,tau2,cap;
real I_current, I_old, t_scale, t_diff, Vlp0,Vlp1,Vlp2,Vlp;
time t_current, t_old;
initial
begin
z0=2*3.14*100e3;
p0=2*3.14*500e3;
APPENDIX A. VERILOG CODE

p1=2*3.14*1e6;
p2=2*3.14*5e6;
cap=18.158e-12;
A0=p1*p2*(p0-z0)/((-p0*p2+p1*p2+p0*p0*p1)*z0*cap*p0);
A1=p0*p2*(z0-p1)/((p1-p0)*(p2-p1)*z0*cap*p1);
A2=p0*p1*(p2-z0)/((p1-p2)*(p0-p2)*z0*cap*p2);
tau0=1/p0;
tau1=1/p1;
tau2=1/p2;
t_scale=1e-15;
end
always @(posedge cp_ctrl or negedge cp_ctrl or posedge VCO_clk or negedge VCO_clk)
begin
I_current=((bitstoreal(I_ctr)));
t_current=$time;
t_diff=(t_current-t_old)*t_scale;
V_cap=V_cap+(t_diff)*(i_old/cap);
Vlp0=Vlp0+(I_old*A0-Vlp0)*(1.00-$exp((t_diff)/(-tau0)));
Vlp1=Vlp1+(I_old*A1-Vlp1)*(1.00-$exp((t_diff)/(-tau1)));
Vlp2=Vlp2+(I_old*A2-Vlp2)*(1.00-$exp((t_diff)/(-tau2)));
Vlp=(V_cap+Vlp0+Vlp1+Vlp2);
t_old=$time;
I_old =I_current;
END
assign{V_ctr}=$realtobits(Vlp);
ENDMODULE

MASH block

`timescale 1fs/1fs
MODULE mash_block (clk, mod_ctrl);
INPUT clk;
OUTPUT mod_ctrl;
wire [32:1] mod_ctrl;
real int_1, int_2, int_3, int_4, bit_res;
integer carry_1, carry_2, carry_3, carry_4, carry_4_old;
integer sum_1, sum_2, sum_3, sum_1_old, sum_2_old;
integer fp1,fp2,opn, count, div_ratio, sd_out, tmp;
integer dither;
real offset, modulation, prew_data, data_in;

real mean;
time t;
initial
begin
fp1=$fopenr("modulation_data.dat");
int_1=0;
int_2=0;
int_3=0;
int_4=0;
sum_1=0;
sum_2=0;
sum_3=0;
sum_1_old=0;
sum_2_old=0;
carry_1=0;
carry_2=0;
carry_3=0;
carry_4=0;
carry_4_old=0;

bit_res=65000; // input word resolution
offset=bit_res*0.5;
div_ratio=134;
count=0;
end
always @(negedge(clk))
begin
count=count+1;
t=$time;
dither=(({$random} %2)*2-1);
if (t>(0.41e-3*(1e15))
begin
opn=$fscanf(fp1,"%f",prew_data);
modulation=prew_data*bit_res;
end
// data1*4.00: the modulation amplitude is multiplied for 4
data_in=offset+modulation;
// integrator_1
int_1 = int_1+ data_in;
if (int_1>(bit_res-1))
begin
APPENDIX A. VERILOG CODE

carry_1=1;
int_1=int_1-bit_res;
end
else carry_1=0;
// integrator_2
int_2=int_2+int_1;
if (int_2>(bit_res-1))
begin
  carry_2=1;
  int_2=int_2-bit_res;
end
else carry_2=0;
// integrator_3
int_3=int_3+int_2;
if (int_3>(bit_res-1))
begin
  carry_3=1;
  int_3=int_3-bit_res;
end
else carry_3=0;
// integrator_4
int_4=int_4+int_3;
if (int_4>(bit_res-1))
begin
  carry_4=1;
  int_4=int_4-bit_res;
end
else carry_4=0;
sum_1=carry_3+carry_4-carry_4_old;
sum_2=sum_1-sum_1_old+carry_2;
sum_3=sum_2-sum_2_old+carry_1;
carry_4_old=carry_4;
sum_1_old=sum_1;
sum_2_old=sum_2;
sd_out=sum_3+div_ratio;
end
assign{mod_ctrl}=sd_out;
ENDMODULE
Phase-Frequency Detector

// phase-frequency detector
`timescale 1fs / 1fs

MODULE PFD_block (V_clk, V_div, UP, DOWN);
INPUT V_clk, V_div; // input from reference-clock and from divider
OUTPUT UP, DOWN;
wire reset;
assign {reset}=(UP && DOWN);
dff flip1( 1'b1, V_clk, reset, UP);
dff flip2( 1'b1, V_div, reset, DOWN);
ENDMODULE

Sample/Hold block

`timescale 1fs / 1fs

MODULE sh_block (I_cp, I_cp_ctrl, clk, V_out, SH_ctrl);
INPUT I_cp, I_cp_ctrl, clk;
OUTPUT V_out, SH_ctrl;
wire [64:1] I_cp;
reg [64:1] V_out;
reg SH_ctrl;
time t_current, t_old;
real int_cap, tscale, charge, i_current, i_old, V_sh;
initial
begin
    int_cap=18.833e-12; // integrating capacitance
tscale=1e-15;
    SH_ctrl=0;
end
always @(posedge (I_cp_ctrl) or negedge (I_cp_ctrl))
begin
    i_current = $bitstoreal(I_cp);
t_current = $time;
    charge = charge+(t_current-t_old)*tscale*(i_old);
t_old = t_current;
i_old = i_current;
end
always @(negedge(clk))
begin
    V_sh=V_sh+charge/int_cap;
end
VCO block

*timescale 1s / 1fs

`timescale 1s / 1fs

MODULE VCO_block (V_in,V_div);
OUTPUT V_div; // output waveform (to the divider)
INPUT V_in; // input control voltage (from the loop-filter)
wire [64:1] V_in;
integer int_part,VCO_semiperiod_int;
time int_semiperiod;
reg VCO_clk;
real V_ctr, f_actual, VCO_semiperiod, gain, err_acc, f_run, diff, Wfr;
real VCO_semiperiod_act;
initial
begin
    assign V_ctr=$bitstoreal(V_in);
    VCO_clk=1;
    VCO_semiperiod=1.470e-10;
    VCO_semiperiod_act=1.470e-10;
    gain=100.00e6;
    Wfr=3.400e9;
end
always # (VCO_semiperiod_act-1e-15)
begin
    VCO_clk = ~ VCO_clk;
    f_run=Wfr+V_ctr*gain;
    VCO_semiperiod=1/(2*f_run);
    // sigma-delta first order
    diff=VCO_semiperiod-VCO_semiperiod_act;
    err_acc=err_acc+diff;
    int_semiperiod=err_acc*1e15;
    VCO_semiperiod_act=int_semiperiod/(1.0e15);
    // end of sigma-delta code
    if ((VCO_semiperiod< 5e-12)||(VCO_semiperiod> 1e-9))
        begin


VCO_sampling_block

// Signal recorder for VCO and for the ideal VCOid
	`timescale 1fs / 1fs

MODULE VCO_samp_block (VCO_id, VCO);
INPUT VCO_id; // ideal output clock
INPUT VCO; // signal from VCO
time t_VCO, t_VCOid;
integer ph_VCO, ph_VCOid;
real t_rec_start;
initial
begin
 t_rec_start=0.41e-3*(1e15);
t_rec_start ph_VCO = $fopen("time_VCO");
t_rec_start ph_VCOid = $fopen("time_VCOid");
end
always @ (posedge(VCO_id) or negedge (VCO_id))
begin
 t_VCOid=$time;
$fdisplay(ph_VCOid,"%d",t_VCOid);
end
always @ (posedge(VCO) or negedge(VCO))
begin
 t_VCO=$time;
$fdisplay(ph_VCO,"%d", t_VCO);
end
ENDMODULE
APPENDIX B

OSCILLATION FREQUENCY ESTIMATION

This appendix presents the derivation of an approximate 2\textsuperscript{nd} order equation to calculate the oscillation frequency of a 5\textsuperscript{th} order system. The design parameters can be found in table 8.2.

We start by considering the loop transfer function:

\[ H_{\text{loop}}(s) = \frac{K_{\text{VCO}}}{s^2 \cdot N_b} \cdot \frac{1_{\text{CP}}}{2 \cdot \pi \cdot C_p} \cdot \frac{1 \pm \frac{s}{\omega_1}}{(1 + \frac{s}{\omega_3})(1 + \frac{s}{\omega_4})(1 + \frac{s}{\omega_5})} \]  

(B.1)

\[ G = \frac{K_{\text{VCO}}}{N_b} \cdot \frac{1_{\text{CP}} \cdot \omega_3 \cdot \omega_4 \cdot \omega_5}{2 \cdot \pi \cdot C_p \cdot z_1} \]  

(B.2)

\[ H_{\text{loop}}(s) = G \cdot \frac{z_1 + s}{s^5 + s^4 (\omega_3 + \omega_4 + \omega_5) + s^3 (\omega_5 \omega_3 + \omega_5 \omega_4 + \omega_4 \omega_3) + s^2 \omega_4 \omega_3 \omega_5} \]

The transfer function from \( \Sigma\Delta \) input to PFD input can be expressed as:

\[ H_{\theta_{\Sigma\Delta}}(s) = \frac{2 \cdot \pi}{N_b} \cdot \frac{1}{e^{s \cdot T_{\text{REF}}} - 1} \cdot \frac{1}{1 + H_{\text{loop}}(s)} \]  

(B.3)

Expanding the loop transfer function the following expression is found:

\[ H_{\theta_{\Sigma\Delta}}(s) = \frac{2 \cdot \pi}{N_b \cdot s T_{\text{REF}}} \cdot \frac{s^5 + s^4 (\omega_3 + \omega_4 + \omega_5) + s^3 (\omega_{\text{INT\_PROD}}) + s^2 \omega_4 \omega_3 \omega_5}{s^5 + s^4 (\omega_3 + \omega_4 + \omega_5) + s^3 (\omega_{\text{INT\_PROD}}) + s^2 \omega_4 \omega_3 \omega_5 + G s + G z_1} \]

where the new parameter \( \omega_{\text{INT\_PROD}} \) is defined as \( \omega_{\text{INT\_PROD}} = \omega_5 \cdot \omega_3 + \omega_5 \cdot \omega_4 + \omega_4 \cdot \omega_3 \).

By applying a unary step to the \( \Sigma\Delta \) modulator, the Laplace transform of the phase error is given by:

\[ \Phi_{\text{err}}(s) = H_{\theta_{\Sigma\Delta}}(s) \cdot \frac{1}{s} \]  

(B.4)

After dividing the numerator for the denominator and setting:

\[ \omega_{n1} = \sqrt{\frac{\omega_n^2}{1 + \omega_n^2 \cdot \left( \frac{1}{\omega_3^2} + \frac{1}{\omega_4^2} + \frac{1}{\omega_5^2} + \frac{1}{\omega_3^2} + \frac{1}{\omega_4^2} + \frac{1}{\omega_5^2} + \frac{1}{\omega_3^2} + \frac{1}{\omega_4^2} + \frac{1}{\omega_5^2} + \frac{1}{\omega_3^2} + \frac{1}{\omega_4^2} + \frac{1}{\omega_5^2} \right)}} \]
\[ \zeta_1 = \frac{1}{2} \cdot \omega_{n1} \cdot \left( \frac{1}{z_1} - \frac{1}{\omega_3} - \frac{1}{\omega_4} - \frac{1}{\omega_5} \right) \quad (B.5) \]

A second order approximated transfer function can be derived:

\[ H_{\theta_{\Delta}} (s) = \left( \frac{\omega_{n1}}{\omega_n} \right)^2 \cdot \frac{2 \cdot \pi}{N_b} \cdot \frac{1}{T_{REF}^2} \cdot \frac{1}{s^2 + 2 \cdot \zeta_1 \cdot \omega_{n1} \cdot s + \omega_{n1}^2} \quad (B.6) \]

The oscillation frequency is given by:

\[ \omega_O = \omega_{n1} \cdot \sqrt{1 - \zeta_1^2} \quad (B.7) \]

From the value of the oscillation frequency, the max counter value can be calculated as:

\[ \text{max\_value} = \frac{\pi}{T_{REF} \cdot \omega_O} \quad (B.8) \]


1997.

[26] W. Liu and et al., BSIM3v3.2.2 MOSFET Model - Users Manual. Dept. of Elec. Eng. and
Computer Sciences, University of California, Berkeley, 1999.


[28] E. Vittoz, “MOS transistors operated in the lateral bipolar mode and their application in CMOS


[32] L. Yao, M. Steyaert, and W. Sansen, “A 0.8-V, 8-μW, CMOS OTA with 50-dB gain and 1.2-
MHz GBW in 18-pF load,” European Solid-State Circuits, 2003. ESSCIRC ’03. Conference

[33] M. Perrott, I. Tewksbury, T.L., and C. Sodini, “A 27-mW CMOS fractional-N synthesizer us-
ing digital compensation for 2.5-Mb/s GFSK modulation,” Solid-State Circuits, IEEE Journal

[34] J. Craninckx and M. S. Steyaert, “Fully integrated CMOS DCS-1800 frequency synthesizer,”

a 3-b third-order delta sigma modulator,” IEEE Journal of Solid-State Circuits, vol. 35, no. 10,

Σ-Δ frequency synthesizer,” Solid-State Circuits, IEEE Journal of, vol. 37, no. 1, pp. 18–26,
2002.


APPENDIX C

PUBLICATIONS

During this study, the following journal and conference papers were presented:


The papers are reproduced on the following pages with permission of their respective copyright holders.
1-V Power Supply CMOS Cascode Amplifier
Torsten Lehmann and Marco Cassia

Abstract—In this paper, we design a folded cascode operational transconductance amplifier in a standard CMOS process, which has a measured 69-dB dc gain, a 2-MHz bandwidth, and compatible input- and output voltage levels at a 1-V power supply. This is done by a novel current driven bulk (CDB) technique, which reduces the MOST threshold voltage by forcing a constant current through the transistor bulk terminal. We also look at limitations and improvements of this CDB technique.

Index Terms—1-V OTA, CMOS, current driven bulk, ultra-low voltage.

I. INTRODUCTION

One of the most serious design constraints when making integrated analog circuits for systems with ultra-low supply voltages is the value of the MOS threshold voltage $V_{th}$. A typical 3.3-V process has $V_{th}$ in the range 0.6–0.7 V. Used with a 1-V power supply, this gives a signal swing of at most 100 mV on a transistor gate, if room for two drain–source saturation voltages of 100 mV is needed (which it most certainly is). Several approaches to ultra-low voltage supply circuit design have recently been described; e.g., based on charge pumps [1], bulk drive [2], floating gates [3] or limited common-mode range input circuits [4], [5]. In this paper, we shall look at how to reduce the MOS threshold voltage in a standard CMOS process, and we shall use the reduced-$V_{th}$ transistors to implement a 1-V folded cascode operational transconductance amplifier (OTA) with compatible input- and output levels. The advantages of this technique are that 1) possible voltage stress, increased power consumption, and noise coupling associated with a charge pump are avoided; 2) the input transistor pair gain reduction and input impedance reduction associated with a bulk drive are avoided; 3) special processing and calibration steps associated with floating gate devices are avoided; 4) continuous time signal processing is possible; and 5) standard circuit topologies, such as cascode amplifiers, can be used. In Section II, we look at the current driven bulk technique for reducing $V_{th}$. In Section III, we look at the unwanted effects of this technique and how to overcome these. In Section IV, we implement the OTA. In Section V, we present measurements from an experimental chip, and in Section VI, we draw the conclusions.

Manuscript received November 21, 2000; revised February 26, 2001.
T. Lehmann is with Cochlear Ltd., Lane Cove 2066 NSW, Australia (e-mail: tlehmann@cochlear.com.au).
M. Cassia is with Ørsted DTU, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark (e-mail: marco@it.dtu.dk).
Publisher Item Identifier S 0018–9200(01)$10.00 © 2001 IEEE
Fig. 2. Threshold voltage versus bulk bias current.

Fig. 3. Bias circuit eliminating unknown BJT β’s.

Fig. 4. CDB compared with normal MOST common-source frequency response.

Fig. 5. Cascoded CDB compared with normal MOST common-source frequency response.

A voltage as a function of the bulk bias current $I_{BB}$ is shown for a $W/L = 40 \mu m / 10 \mu m$ device in a 0.5-μm process. We use a p-channel transistor as we can access the bulk terminal for this device without fear of latchup in a standard n-well process.

Because of the exponential $I-V$ relation of the diode, the exact value of $I_{BB}$ is not important for the resulting threshold voltage. However, the parasitic bipolar transistor can have quite high base–collector current gains [8]; as we shall see below, this puts some limitations on the applicability of the CDB technique.

To keep the BJT current gains as low as possible, the layout shown in Fig. 1(c) can be used: to keep the substrate–collector gain $\beta_{CS}$ low, the bulk connection is completely surrounded by the source junction; to keep the drain–collector gain $\beta_{CD}$ low, a longer than minimum MOS channel length should be used. In the simulations, we have used current gains in the order of 100.

Another problem with the BJT current gains is that they are usually unknown to the designer. This is solved by “measuring” the current gain using the bias circuit in Fig. 3. A transistor with current driven bulk is set up between a current sink $I_{D,C} = I_D + I_{CD}$ and a current source $I_{S,E} = I_P + I_E + I_R$. The circuit feedback will now cause a bulk bias current $I_{BB}$, and hence a bias voltage $V_{bias}$ such that $I_{S,E} = I_P + I_{BE} + I_R$, regardless of the actual values of the β’s. For an ideal drain current $I_D$, we would probably choose $I_{D,C} \approx 1.1 I_D$, $I_{S,E} \approx 1.3 I_D$ and $I_R \approx 0.1 I_D$. For the bias circuit to work, we must have $V_{BE} < V_{hN} + I_R R$ and $|V_{hP}| + V_{DS Sat} < V_{hN} + I_R R$, where $V_{hP}$ is the threshold voltage of the current driven bulk device. If the MOST threshold voltages are high compared with $a$ $V_{BE}$ and if $|V_{hP}| \geq V_{hN}$, the level shifter $I_R R$ can be omitted.

III. CDB UNWANTED EFFECTS

Current driving the bulk introduces a number of unwanted effects in the resulting device. The first obvious one is the parallel connection of the BJT emitter/collector with the MOS source/drain; this must lower the device output impedance. If the BJT emitter current is much smaller than the MOS source current, the effect is negligible. If not, to get a reasonable output impedance, the BJT must be in the active region; i.e., the device drain–source voltage must be less than about −200 mV (simulations can be found in [9]). Noise from the BJT would also enter the circuit, but again, if the current in this device is low, we would expect only a small amount of added noise.

The largest current available for discharging the bulk–drain capacitance is $I_{BB}$; likewise, the largest current available for charging it is the source quiescent current divided by the base–emitter current gain $\beta_{BE}/(\beta_{CS} + \beta_{CD} + 1)$. These are small currents, which means that slewing-rate effects might occur if the bulk–drain voltage is changed.

Together with the bulk transconductance and the base–emitter impedance, the drain–bulk capacitance also causes a low frequency pole-zero pair. This can be seen in Fig. 4, which shows the simulated frequency response of a common-source amplifier ($W/L = 40 \mu m / 1 \mu m$ device with $I_{BB} = 30 \mu A$ and $I_D = 10 \mu A$).

It is evident that the drain–bulk capacitance has a major impact on high-frequency circuit performance. Fortunately, there are several ways to cancel its effect. First, one can put a decoupling capacitor between bulk and source: as both the slew-rate
and the pole-zero pair are caused by the bulk transconductance through a nonconstant bulk–source voltage, both effects can be canceled this way. If the source is at a constant potential, a cascode can be used to keep the drain potential constant, thus eliminating the current in the capacitor. In Fig. 5, a cascode has been added to the common-source amplifier, and we see how the frequency response is greatly improved. A third way to reduce the slew-rate limitation (for instance in a CDB differential pair) is proposed in [9]: the type II CDB technique. By adding a third collector to the BJT and shorting this to its base, the current available for slewing can be increased by a base–collector current gain.

IV. 1-V FOLDED CASCODE OTA

Fig. 6 shows our OTA. It is a standard folded cascode transconductance amplifier with a CDB differential pair, and a CDB output current mirror (for simplicity, a straightforward bias circuit is shown). Assuming a standard strong inversion design with $V_{DD} = 1$ V, $V_{SS} = 0$ V, $V_{th} = 0.6$ V, and $V_{DSsat} = 0.1$ V, the range of the input common-mode voltage would be

$$V_{DSsat} - |V_{th}| = -0.5 \ V \lesssim V_{CM} \lesssim 0.2 \ V$$

which is not compatible with the output voltage range

$$2V_{DSsat} = 0.2 \ V \lesssim V_{sat} \lesssim 0.8 \ V = V_{DD} - 2V_{DSsat}.$$  \hspace{1cm} (3)

Note that there is only just enough voltage for the cascode current mirror to function, which does not make a particularly good design.

Reducing the threshold voltage of the differential pair ($M_1-M_2$) directly improves the common-mode input range. Also, operating the input pair in subthreshold reduces the gate–source voltage, and improves the common-mode input range. Note that we use only one current source for the common bulk terminal in the pair (rather than individual current drives for each transistor); otherwise, we would have mismatch problems in the pair. Also note that any noise injected because of the current drive will enter the amplifier as a common-mode signal and thus be rejected. Assuming we can reduce the threshold voltage until $|V_{th}| = 0.4$ V by current driving the bulk, we now get

$$2V_{DSsat} - |V_{th}| = -0.2 \ V \lesssim V_{CM} \lesssim 0.6 \ V$$

$$= V_{DD} - |V_{th}| - V_{DSsat}.$$  \hspace{1cm} (4)

Thus, we now have a 0.4-V overlap in the valid input and output ranges. To get more voltage room for the current mirror, we also current drive the bulk of the transistors in.

As the drains of all the current driven bulk transistors are cascoded, we will not expect any parasitic poles from the CDB technique. However, when a large common-mode input signal change is applied, the bulk–drain voltage of the input pair will change, which might cause slewing in this stage. We reduce this slewing effect by adding a coupling capacitor $C_X$ between the bulk and the source of the pair. In such a low-voltage design, it is an advantage to operate the cascoding transistors ($M_3-M_6$) in subthreshold, as that makes it easier to generate the bias voltages $V_{bias3}$ and $V_{bias4}$; it is, however not critical. The other transistors ($M_7-M_{13}$) should work in strong inversion as good matching gives the lowest overall offset error. The transistor dimensions used in our amplifier are shown in Table I.

<table>
<thead>
<tr>
<th>MOST</th>
<th>$W/\mu m$</th>
<th>$L/\mu m$</th>
<th>$I_D/\mu A$</th>
</tr>
</thead>
<tbody>
<tr>
<td>$M_1-M_2$</td>
<td>400</td>
<td>1</td>
<td>10</td>
</tr>
<tr>
<td>$M_3-M_4$</td>
<td>20</td>
<td>1</td>
<td>10</td>
</tr>
<tr>
<td>$M_5-M_6$</td>
<td>40</td>
<td>1</td>
<td>10</td>
</tr>
<tr>
<td>$M_7-M_8$</td>
<td>40</td>
<td>1</td>
<td>10</td>
</tr>
<tr>
<td>$M_9-M_{10}$</td>
<td>20</td>
<td>1</td>
<td>20</td>
</tr>
<tr>
<td>$M_{11}$</td>
<td>80</td>
<td>1</td>
<td>20</td>
</tr>
<tr>
<td>$M_{12}$</td>
<td>1</td>
<td>50</td>
<td>$\sim 0.01$</td>
</tr>
<tr>
<td>$M_{13}$</td>
<td>1</td>
<td>50</td>
<td>$\sim 0.01$</td>
</tr>
</tbody>
</table>
Fig. 7. Measured CDB OTA dc responses at $V_{DD} = 1.0$ V at different common-mode voltages.

Fig. 8. Measured CDB OTA dc responses at $V_{DD} = 0.75$ V with and without (two bottom traces) bulk current.

V. MEASUREMENTS

An experimental amplifier has been fabricated in a standard 0.5-$\mu$m CMOS process. It has been designed with a quite high total bias current, 40 $\mu$A, such that it can drive a 20-pF off-chip capacitive load while having a 1-MHz-range gain bandwidth (a version for on-chip applications is straightforward to do by transistor scaling). The coupling capacitor $C_X$ can be chosen to 10 or 0 pF. The nominal value of the bulk current is $I_{BB} = 10$ nA, which (given a BJT current gain of about 100) gives a 10% increase in the differential pair quiescent current. The strong inversion transistors have been designed to operate with effective gate–source voltages around 100 mV.

Fig. 7 shows the measured dc transfer function of our amplifier for different common-mode input voltages, using a 1-V power supply; the high-gain region is readily identified. We also notice that the input referred offset error of the amplifier is less than 1 mV. Fig. 8 shows the dc transfer function using a 0.75-V power supply; this figure also show the transfer function when no transistors are driven with a bulk current. It is evident that our CDB technique enables us to use this ultra-low power supply.

Fig. 9 shows the dc gain as a function of the common-mode input voltage. We see that at a 1-V power supply, we have a 0.65-V common-mode input range in which the amplifier has at least a 62-dB gain; and an overlap of about 0.3 V in the input and output voltage ranges. Fig. 10 compares the measured and simulated ac characteristics of the amplifier when loaded with 20 pF. The measured and simulated amplitude characteristics agree very well. The measured phase margin is somewhat worse than the simulated one. This is probably due to inaccurate modeling of the BJT. Quantitatively, the measurements are the same for all common-mode input voltages; also, they are the same regardless of whether $C_X$ is present or not. The slew rate is also virtually independent of the presence of $C_X$, and the CDB-induced slewing shows only at a 0.7-V power supply, which could be because the BJT current gain is low. It has a gain bandwidth of 2 MHz and a phase margin of 57°. These are all respectable data for any 1-V amplifier, which means that the CDB technique can be applied wherever low-voltage LF analog signal processing is required (e.g., in hearing aids, implants, watches, or similar battery operated devices). Table II summarizes the amplifier characteristics.

VI. CONCLUSION

In this paper, we implemented an ultra-low supply voltage folded cascode OTA in a standard CMOS process. At a 1-V power supply, it has a 0.3-V overlap in the allowed input common-mode range and the output voltage range, a dc gain of 69 dB and a 2-MHz bandwidth. The amplifier works with a power supply of less than 0.8 V (with a somewhat degraded performance, though). This design was made possible by a new technique to lower the MOST threshold voltage, current driven
bulk, where we force a constant current out of the bulk terminal. Initially, the drain–bulk capacitance gives these circuits poor high-frequency performance, but its effect can be compensated for using cascodes as experimentally verified—or by using additional circuits in the CDB structure.

### TABLE II

<table>
<thead>
<tr>
<th>Supply voltage</th>
<th>1.0 V</th>
<th>0.8 V</th>
<th>0.7 V</th>
</tr>
</thead>
<tbody>
<tr>
<td>Common-mode input range</td>
<td>0.0 V–0.65 V</td>
<td>0.0 V–0.4 V</td>
<td>0.0 V–0.3 V</td>
</tr>
<tr>
<td>High gain output range</td>
<td>0.35 V–0.75 V</td>
<td>0.25 V–0.5 V</td>
<td>0.2 V–0.4 V</td>
</tr>
<tr>
<td>Output saturation limits</td>
<td>0.1 V–0.9 V</td>
<td>0.15 V–0.65 V</td>
<td>0.1 V–0.6 V</td>
</tr>
<tr>
<td>DC gain</td>
<td>62 dB–69 dB</td>
<td>46 dB–53 dB</td>
<td>33 dB–36 dB</td>
</tr>
<tr>
<td>Gain-bandwidth</td>
<td>2.0 MHz</td>
<td>0.8 MHz</td>
<td>1.3 MHz</td>
</tr>
<tr>
<td>Slew-rate</td>
<td>0.5 V/μs</td>
<td>0.4 V/μs</td>
<td>0.1 V/μs</td>
</tr>
<tr>
<td>Phase margin</td>
<td>57°</td>
<td>54°</td>
<td>48°</td>
</tr>
</tbody>
</table>

### REFERENCES


**Torsten Lehmann** was born in Bagsværd, Denmark, in 1967. He received the M.Sc. degree in electrical engineering from the Technical University of Denmark, Lyngby, in 1991, and the Ph.D. degree in 1995 for the work “hardware learning in analog VLSI neural networks.” He spent two years as a Research Fellow funded by the European Union at the University of Edinburgh, Scotland, where he worked with biologically inspired neural systems using pulse stream techniques. From 1997 to 2000, he was an Assistant Professor in microelectronics at the Technical University of Denmark, working with low-power low-noise low-voltage analog and mixed analog-digital circuits. He worked briefly as an ASIC design engineer with Microtronics A/S, Denmark. He is currently with Cochlear Ltd., Australia, developing cochlea implants. His main research interests are in solid-state circuits and systems (analog and digital), medical electronics and artificial neural networks.

**Marco Cassia** was born in Bergamo, Italy, in 1974. He received the M.Sc. degree in engineering from the Technical University of Denmark, Lyngby, Denmark, in May 2000 and the M.Sc. degree in electrical engineering from Politecnico di Milano, Italy, in July 2000. He is currently working toward the Ph.D. degree at the Technical University of Denmark. His main research interests are in low-voltage low-power analog systems.
1 V OTA USING CURRENT DRIVEN BULK CIRCUITS

TORSTEN LEHMANN
Cochlear Ltd., 14 Mars Road,
Lane Cove 2066 NSW, Australia
tlehmann@cochlear.com.au

MARCO CASSIA
Ørsted DTU, Building 348, Ørsteds plads,
Technical University of Denmark,
DK-2800 Kgs. Lyngby, Denmark
mca@oersted.dtu.dk

Received 21 March 2000
Revised 18 November 2001

We show how the MOST threshold voltage can be reduced simply by forcing a constant current through the transistor bulk terminal. We characterize two versions of the resulting current driven bulk device by simulations, and conclude that this is a good method for improving circuit performance when the voltage supply is very low. Finally we show how the technique can be used to implement a 1 V folded cascode OTA with compatible input and output voltage ranges.

1. Introduction

Small, battery powered systems such as hearing aids or watches are usually powered by a single element battery (e.g., a zinc–air battery) with a terminal voltage range of 1 V–1.2 V. Designing analogue blocks, such as oscillators, amplifiers or analogue-to-digital converters, that work from such a low supply voltage is hard, and the use of voltage doublers often become necessary, e.g., in Ref. 1. Avoiding such voltage doublers, though, is advantageous from power consumption, power supply noise and voltage stress points of view. Recently, several papers has also been published on generic low voltage supply amplifiers, e.g., using bulk drive,2 floating gates3 or limited common-mode range input circuits.4,5 Each of these methods has its limitation like low input stage gain, necessary trimming or discrete time signal processing.

*This paper was recommended by Eby G. Friedman.
In this paper, we explore a different approach to low voltage circuit design, namely reduction of the MOS transistor threshold voltage, $V_{th}$, whose value is one of the most serious design constraints for analogue circuits in standard CMOS processes. A typical 3.3 V process has $V_{th}$ in the range 0.6 V–0.7 V. Used with a 1 V power supply, this gives a signal swing of at most 100 mV on a transistor gate, if room for two drain–source saturation voltages, $V_{DS,sat}$ of 100 mV is needed (which it most certainly is). We shall look at how to reduce the MOS threshold voltage in a standard CMOS process, and we shall use the reduced-$V_{th}$ transistors to implement a 1 V folded cascode Operational Transconductance Amplifier (OTA) with compatible input and output levels. In Sec. 2 we look at the current driven bulk technique for reducing $V_{th}$; in Sec. 3 we look at the limitations of this technique; in Sec. 4 we present a better, type II, technique; and in Sec. 5 we implement the OTA.

2. Current Driven Bulk

The threshold voltage of a MOS transistor as a function of the bulk-source voltage $V_{BS}$ is given by

$$V_{th} = V_{th0} + \gamma \left( \sqrt{2\phi_F - V_{BS}} - \sqrt{2\phi_F} \right),$$  \hspace{1cm}  \hspace{1cm} (1)

where $V_{th0}$ is the zero bias threshold voltage, $\gamma$ is the bulk effect factor, $\phi_F$ is the Fermi potential. For p-channel transistors, $2\phi_F \approx -0.7$ V and $\gamma \approx -0.5 \sqrt{V}$, typically, and a bulk bias $V_{BS}$ is normally $> 0$ V, which numerically increases the threshold voltage. However, by biasing $V_{BS} < 0$ V we can actually (numerically) decrease the threshold voltage.$^{6,7}$

To reduce the threshold voltage as much as possible, we obviously want the bulk bias $|V_{BS}|$ as high as possible. We will, however, forward bias the bulk-source diode, i.e., the base-emitter diode of the associated parasitic bipolar transistor (BJT), thereby turning on this BJT; thus, $|V_{BS}|$ is limited by how much current we can tolerate in the BJT. Now, this is the idea of the new current driven bulk (CDB) circuits, see Fig. 1: instead of voltage driving the bulk where we would need a considerable safety margin to hold the current level in the bipolar transistor below

---

Fig. 1. Current drive of bulk terminal. (a) circuit, (b) with parasitic BJT, (c) layout.
a certain value, $I_{\text{max}}$, we simply force a current, $I_{\text{BB}} = I_{\text{max}}/(\beta_{\text{CS}} + \beta_{\text{CD}} + 1)$, through the diode, where the $\beta$’s are the two base–collector current gains of the BJT; this will always give us the largest possible bulk bias (namely the diode forward voltage). In Fig. 2 the simulated threshold voltage as a function of the bulk bias current $I_{\text{BB}}$ is shown for a $W/L = 40 \, \mu\text{m}/10 \, \mu\text{m}$ device in a 0.5 $\mu\text{m}$ process. We use a p-channel transistor as we can access the bulk terminal for this device without fear of latch-up in a standard n-well process.

Because of the exponential $I–V$ relation of the diode, the exact value of $I_{\text{BB}}$ is not important for the resulting threshold voltage. However, the parasitic bipolar transistor can have quite high base–collector current gains; as we shall see below, this put some limitations on the applicability of the CDB technique. To keep the BJT current gains as low as possible, the layout shown in Fig. 1(c) can be used: to keep the substrate–collector gain ($\beta_{\text{CS}}$) low, the bulk connection is completely surrounded by the source junction; to keep the drain–collector gain ($\beta_{\text{CD}}$) low, a longer than minimum MOS channel length should be used. In the simulations we have used current gains in the order of 100.

![Fig. 2. Threshold voltage versus bulk bias current.](image1)

![Fig. 3. Bias circuit eliminating unknown BJT $\beta$’s.](image2)
Another problem with the BJT current gains is that they are usually unknown to the designer. This is solved by “measuring” the current gain using the bias circuit in Fig. 3: a transistor with current driven bulk is set up with a drain current $I_{D,C}$ and a source current $I_{S,E}$ ($I_{S,E}$ includes the emitter current of the parasitic BJT and $I_{D,C}$ the $I_{CD}$ collector current). The circuit feedback will now cause a bulk bias current $I_{BB}$, and hence a bias voltage $V_{bias}$, such that $I_{S,E} = I_D + I_{BB}(1 + \beta_{CS} + \beta_{CD})$, regardless of the actual values of the $\beta$’s. For an ideal drain current $I_D$, we would probably choose $I_{D,C} \approx 1.1I_D$ and $I_{S,E} \approx 1.2I_D$. For the bias circuit to work, we must have $V_{BE} < V_{thN}$ and $|V_{thP}| < V_{thN}$, where $V_{thP}$ is the threshold voltage of the current driven bulk device; in most processes, the latest is not a problem. The requirement can be lessened by including the dashed level-shifter in the figure.

3. CDB Limitations

Current driving the bulk introduces a number of unwanted effects in the resulting device. The first obvious one is the parallel connection of the BJT emitter/collector with the MOS source/drain; this must lower the device output impedance. If the BJT emitter current is much smaller than the MOS source current, the effect is negligible. If not, to get a reasonable output impedance, the BJT must be in the active region; i.e., the device drain–source voltage must be less than about $-200$ mV. Fig. 4 shows the output characteristic for a $W/L = 40 \mu m/1 \mu m$ with $I_{BB} = 10$ nA at different ideal device saturation currents.

The largest current available for discharging the bulk-drain capacitance is $I_{BB}$; likewise, the largest current available for charging it is the source quiescent current divided by the base emitter current gain: $I_S/(\beta_{CS} + \beta_{CD} + 1)$. These are small currents, which means that slew-rate effects might occur if the bulk-drain voltage is changed. This can be seen in Fig. 5, which shows the step response of a source follower ($W/L = 40 \mu m/1 \mu m$ device with $I_{BB} = 30$ nA and $I_D = 10$ $\mu A$).

![Fig. 4. CDB compared with normal MOST output characteristics.](image-url)
Together with the bulk transconductance and the base–emitter impedance, the
drain-bulk capacitance also causes a low frequency pole-zero pair. This can be
seen in Fig. 6, which shows the frequency response of a common source amplifier
\(W/L = 40 \, \mu m/1 \, \mu m\) device with \(I_{BB} = 30 \, nA\) and \(I_D = 10 \, \mu A\).
It is evident that the drain-bulk capacitance has a major impact on high-frequency circuit performance. Fortunately, though, there are several ways to cancel its effect. First, one can put a decoupling capacitor between bulk and source: as both the slew-rate and the pole-zero pair are caused by the bulk transconductance through a nonconstant bulk-source voltage, both effects can be canceled this way. If the source is at a constant potential, a cascode can be used to keep the drain potential constant, thus eliminating the current in the capacitor. In Fig. 7 a cascode has been added to the common-source amplifier, and we see how the frequency response is greatly improved.

4. Type II CDB Circuit

The basic CDB MOST can be significantly improved at a very small circuit expense. This, Type II CDB technique, can be seen in Fig. 8. The idea is to add another collector to the BJT, and then couple the BJT as a current mirror, feeding this a current \( I_Q \) in replace for the bulk current \( I_{BB} \). Doing this, we will know approximately what the emitter current is (see figure) and hence what the increase in quiescent current is. Thus, with the CDB type II circuit there is no need to use a special bias circuit for the BJT base current.

![Type II CDB MOST](image)

Fig. 8. Type II CDB MOST. (a) Circuit with parasitics, and (b) principal physical cross-section.

The type II CDB technique will give a current \( I_Q \) for charging the drain-bulk capacitance; i.e., the base–emitter current gain as much current as with the type I CDB circuit. Thus, the slewing effect will be reduced by the base–emitter current gain. Also, the bulk impedance will be reduced by the base–emitter current gain, and thus the low frequency pole will move up in frequency. Adding another collector to the BJT is best done by adding another MOST, as shown in the figure, who’s gate is connected to the source. Assuming all emitter–collector current gains are approximately the same, this structure will add about \( I_E \sim 3I_Q \) to the MOST.
The quiescent current. If, however, the current gain of the added collector is high, \( I_E \) will approach \( I_Q \) — the added MOST should therefore be a minimum channel length device. As the added MOST is always in accumulation, there is no depletion layer under its gate, which also allows for a higher base–collector current gain for the added collector compared with the original one. Figure 9 shows simulations, which compare a type II and a basic CDB source follower and Fig. 10 shows the simulated transfer functions for a type II and a basic CDB common source amplifier. It is evident that the transient behavior of the type II circuits are greatly improved compared with the basic circuits, albeit not as good as the normal non-CDB equivalents. Note that for these simulations we have used a different 0.5 \( \mu m \) process with \( W/L = 13 \mu m/0.5 \mu m \). We used \( I_{BB} = 20 \) nA and \( I_D = 10 \) \( \mu A \) for the basic CDB circuits while \( I_Q \) for the type II circuits was chosen to give the same threshold voltage shift as in the basic CDB circuits. Obviously, the techniques proposed in the previous section can be used to further improve the CDB type II performance.
5. 1 V Folded Cascode OTA

To verify the applicability of the current driven bulk techniques, we have designed an operational transconductance amplifier using basic CDB circuits. Figure 11 shows our OTA. It is a standard folded cascode transconductance amplifier with a CDB differential pair, and a CDB output current mirror (for simplicity a straightforward bias circuit is shown). Assuming a standard, strong inversion design with \( V_{DD} = 1 \text{ V}, V_{SS} = 0 \text{ V}, |V_{th}| = 0.6 \text{ V} \) and \( V_{DS,sat} = 0.1 \text{ V} \), the range of the input common-mode voltage would be

\[
V_{DS,sat} - |V_{th}| = -0.5 \text{ V} \lesssim V_{CM} \lesssim 0.2 \text{ V} = V_{DD} - |V_{th}| - 2V_{DS,sat},
\]

(2)

which is not compatible with the output voltage range

\[
2V_{DS,sat} = 0.2 \text{ V} \lesssim v_{out} \lesssim 0.8 \text{ V} = V_{DD} - 2V_{DS,sat}.
\]

(3)

Note that there is only just enough voltage for the cascode current mirror to function, which does not make a particularly good design.

Reducing the threshold voltage of the differential pair directly improves the common-mode input range. Also, operating the input pair in sub-threshold reduces the gate–source voltage, and improves the common-mode input range. Assuming we can reduce the threshold voltage till \( |V'_{th}| = 0.4 \text{ V} \) by current driving the bulk, we now get

\[
2V_{DS,sat} - |V'_{th}| = -0.2 \text{ V} \lesssim V'_{CM} \lesssim 0.6 \text{ V} = V_{DD} - |V'_{th}| - V_{DS,sat};
\]

(4)

thus, we now have a 0.4 V overlap in the valid in- and output ranges. To get more voltage room for the current mirror, we also current drive the bulk of the transistors in this.

![Fig. 11. 1 V CDB folded cascode OTA.](image-url)
As pointed out in Sec. 3, the cascodes on the CDB current mirror will eliminate the unwanted effects of the drain-bulk capacitance of these transistors. In such a low-voltage design, it’s an advantage to operate the cascoding transistors in sub-threshold, as that makes it easier to generate the bias voltages $V_{\text{bias}3}$ and $V_{\text{bias}4}$;
it is, however not critical. The other transistors should work in strong inversion as good matching gives the lowest overall offset error. The cascodes on the differential pair will eliminate the unwanted effects of the drain-bulk capacitance of the pair when a differential signal is applied. Common-mode signals will cause a change in the source voltage and hence, as the drain potentials are kept constant by the cascodes, a change in drain-bulk voltage. Therefore, we have inserted a decoupling capacitor, $C_X$.

Figure 12 shows the simulated DC transfer function of our amplifier for different common-mode voltages and Fig. 13 shows the low-frequency gain as a function of the input common-mode voltage, when the output quiescent voltage equals the input common-mode voltage. Over a range of about 0.3 V, the amplifier has a gain of more than 60 dB, in agreement with our prediction. The amplifier uses a total bias current of about 5 $\mu$A. Figure 14 shows the AC characteristic of the amplifier when loaded with 1 pF. It has a gain-bandwidth of about 2 MHz and a phase margin of 75$^\circ$. All respectable data for any 1 V amplifier. A test chip with the proposed amplifier has been designed, and the experimental results agree well with the simulations.

6. Conclusions

In this paper, we introduced a very simple way of reducing the threshold voltage on MOS transistors: by forcing a constant current out of the bulk terminal. The resulting reduction in threshold voltage would typically be about 150 mV–250 mV, possibly doubling the effective supply voltage $V_{DD} - V_{th}$ in low voltage designs. Initially, the drain-bulk capacitance gives these circuits poor high-frequency performance, but its effect can be compensated for using cascodes or decoupling. Also, a type II technique was proposed, which reduces the unwanted effects by a BJT base–collector current gain. We used this current driven bulk technique to implement a 2 MHz, 1 V folded cascode OTA with a 0.3 V overlap in the allowed input common-mode range and the output voltage range.

References


A SPUR-FREE FRACTIONAL-N \( \Sigma \Delta \) PLL FOR GSM APPLICATIONS: LINEAR MODEL AND SIMULATIONS

Marco Cassia\(^1\), Peter Shah\(^2\), and Erik Bruun\(^1\)

\(^1\)Ørsted\(\bullet\)DTU, Technical University of Denmark, DK-2800 Kgs. Lyngby, mca@oersted.dtu.dk
\(^2\)Qualcomm Inc., 5775 Morehouse Drive San Diego, CA 92121 pshah@qualcomm.com

ABSTRACT

A new PLL topology and a new simplified linear model are presented. The new \( \Sigma \Delta \) fractional-N synthesizer presents no reference spurs and lowers the overall phase noise, thanks to the presence of a Sample/Hold block. With a new simulation methodology it is possible to perform very accurate simulations, whose results match closely those obtained with the linear PLL model developed.

1. INTRODUCTION

\( \Sigma \Delta \) modulation in fractional-N synthesizers is a technique that has been successfully demonstrated for high resolution and high speed frequency synthesizers \([1, 2]\). The use of high-order multi-bit \( \Sigma \Delta \) modulators introduces new issues which need to be carefully taken into account, such as down-folding of high frequency quantization noise, derivation of a linear model for noise analysis \([4]\) and efficient techniques for fast and accurate simulations. The paper is organized as follows: in section 2 we present a new PLL topology which prevents high frequency noise down-folding and cancels reference spurs in the output spectrum. In section 3 the derivation of a linear model is presented. The resulting linear model is similar to \([4]\), but the derivation is more straightforward. Section 4 presents a simple event-driven simulation approach, which compared to previous approaches \([5, 3]\), offers greatly increased accuracy and simulation speed. Finally in section 5 we compare the theory developed with the results from simulations.

2. SPURS FREE PLL TOPOLOGY

The proposed synthesizer is shown in Fig. 1. The structure is similar to ordinary \( \Sigma \Delta \) fractional-N synthesizers except for the presence of a Sample-Hold (S/H) block between the Charge Pump (CP) and the Loop Filter (LF). The S/H serves two purposes: it prevents noise down-folding due to non-uniform sampling, and it cancels reference spurs. The non-uniform sampling is due to the fact that the PFD generates variable length pulses aligned to the first occurring edge of the reference clock signal \( f_{ref} \) and the divider signal \( f_{div} \) (Fig. 1); consequently the PFD output is not synchronized to the reference clock. Non-uniform sampling is a highly non linear phenomenon: the contribution of the down-folded noise to the overall output phase noise can be relevant, especially since high frequency and high power \( \Sigma \Delta \) quantization noise is present. Another way of looking at it: it is completely equivalent to non-uniform quantization steps in voltage \( \Sigma \Delta \) DACs.

The second advantage of the S/H is its action on the LF voltage. After every UP/DOWN pulse, the S/H samples the voltage across the integrating capacitance and holds it for a reference cycle. This operation prevents the modulation of the LF voltage by the reference clock, hence ideally it eliminates reference spurs \([6]\) in the VCO output. In reality low level spurs may appear at the output due to the charge feedthrough in the control switch. In the next section we derive a linear model for the analysis of the S/H PLL.

3. LINEAR MODEL

We first focus our attention on the S/H portion of the PLL. A possible implementation is shown in Fig. 2. This circuit uses a switched-capacitor integrator to carry out both the S/H function as well as the integrator function that is usually implemented by the Loop Filter. To derive the transfer function we start by considering the charge deposited on the capacitor \( C_1 \):

\[
Q_{C_1} = \frac{\Delta \phi(t)}{2\pi} T_{ref} \cdot I_{CP}
\]

where \( \Delta \phi(t) \) is the phase error waveform produced by the PFD. After a certain delay \( T_{SH} \) the charge is transferred to \( C_2 \) and added to the charge previously stored:

\[
Q_{C_2}(t) = Q_{C_2}(t - T_{ref}) + Q_{C_1}(t - T_{SH})
\]

In voltage terms and inserting the expression for \( Q_{C_1} \):

\[
V_{C_2}(t) = V_{C_2}(t - T_{ref}) + \frac{I_{CP}}{2\pi \cdot C_2} T_{ref} \cdot \Delta \phi(t - T_{SH})
\]

Taking the Laplace transform yields:
Figure 2: Possible implementation of S/H portion

\[ V_{C_3}(s) = \frac{I_{CP}}{2\pi \cdot s \cdot C_2} \cdot e^{-sT_{SH}} \]  

(4)

In the previous equation \( V_{C_3}(s) \) is still modeled in the discrete-time domain, i.e. as a train of delta-functions. In reality the output voltage is a staircase function. As a consequence eq. 4 is further modified by a zero-order hold network that converts the impulse-train into the staircase waveform. The transfer function of the Zero-order hold network is given by \( H_{ZOH}(s) = \frac{1}{sT_{SH}} \).

The actual transfer function from phase difference (PFD input) to integrator output is then given by:

\[ V_0(s) = H_{ZOH}(s) \cdot V_{C_3}(s) = e^{-sT_{SH}} \cdot \frac{I_{CP}}{2\pi \cdot s \cdot C_2} \]  

(5)

Consequently, the circuit in Fig. 2 can be modeled as shown in Fig. 3. Note that in Fig. 3 the integration \( \frac{1}{sC_2} \) has been absorbed in the loop filter transfer function \( F(s) \). Thus the only difference introduced by the S/H is the delay \( T_{SH} \).

If a trickle current is used in the CP (e.g. only UP pulses are present) then it is sufficient to invert the reference clock signal to generate a proper S/H control signal.

Figure 3: Linear model of S/H portion

3.1. Divider

We will now derive a simple linear model for the divider. The first step is to find the timing errors with the aid of Fig. 4. \( N \) is the nominal divider modulus and \( b(n) \) is the dithering value provided by the \( \Sigma \Delta \) modulator. According to the timing diagram we can write:

\[ \Delta t(n + 1) = \Delta t(n) + (N + b(n)) \cdot T_{VCO} - T_{Ref} \]  

(6)

Indicating with \( \mu_b \) the average value of \( b(n) \) (\( \mu_b \) is the fractional divider value), the reference period \( T_{Ref} \) can be expressed as:

\[ T_{Ref} = (N + \mu_b)T_{VCO} \]  

(7)

In deriving eq. 7 we are making the important approximation that \( T_{VCO} \) is constant. This assumption is reasonable for receive-transmit synthesizers with narrow modulation bandwidth. In these cases the relative frequency variation of the VCO is very small, which means that \( T_{VCO} \) is nearly constant.

Defining \( b'(n) = b(n) - \mu_b \) and substituting \( T_{VCO} \) from eq. 7 into eq. 6 yields:

\[ \Delta t(n + 1) = \Delta t(n) + \frac{T_{Ref}}{N + \mu_b} b'(n) \]  

(8)

Converting to phase domain we have \( \Delta \phi_{\Sigma \Delta} = \frac{2\pi}{N + \mu_b} b'(n) \).

We can finally derive an expression for the additive noise caused by dithering the divider ratio:

\[ \Delta \phi_{\Sigma \Delta}(n + 1) = \Delta \phi_{\Sigma \Delta}(n) + \frac{2\pi}{N + \mu_b} b'(n) \]  

(9)

The Z-transform yields:

\[ \Delta \phi_{\Sigma \Delta}(z) = \frac{2\pi}{N + \mu_b} \cdot \frac{z^{-1} b'(z)}{1 - z^{-1}} \]  

(10)

The previous equation shows that the \( \Sigma \Delta \) noise undergoes an integration but is otherwise shaped by the loop in exactly the same way as the reference clock phase noise.

The final linear model is shown in Fig. 5. The closed-loop transfer function \( H_6(s) \) is given by (fig. 5):

\[ H_6(s) = \frac{I_{CP} e^{-sT_{SH}} \cdot F(s) \frac{z_{E}}{z_{N+\mu_b}}}{1 + I_{CP} e^{-sT_{SH}} \cdot F(s) \frac{z_{E}}{z_{N+\mu_b}}} \]  

(11)

The phase noise properties can be predicted from linear systems analysis; the \( \Sigma \Delta \) modulation can be modeled as additive phase contribution (also shown in Fig. 5). The \( \Sigma \Delta \) architecture used was a 4th-order MASH with a 4 bit output signal. The 4 bit quantization causes quantization noise \( n_q \) which is added to the output word. Such noise is spread out over a bandwidth of \( \frac{1}{T_{Ref}} \).
and is high-pass shaped by the $\Sigma\Delta$ modulator with a noise transfer function (NTF) given by $H_{NTF}(f) = (1 - e^{-j2\pi f T_{s}f})^4$. The $\Sigma\Delta$ signal transfer function (STF) is given by $H_{STF}(f) = (e^{-j2\pi f T_{s}f})^4$. Assuming that the quantization noise is independent of the input signal, the power spectral density of the bit stream can be expressed as:

$$S_{b}(f) = \frac{T_{ac}^{2}}{12} |H_{NTF}(f)|^{2}$$  \hspace{1cm} (12)

From the linear model of Fig. 5 we can find the transfer function from the output of the NTF to the output phase $\phi_{VCO}$:

$$H_{\phi}(s) = \frac{2\pi}{N + \mu} \cdot e^{-sT_{s}} - H_{0}(s)$$  \hspace{1cm} (13)

Finally the output phase noise Power Spectral Density due to the $\Sigma\Delta$ quantization noise $n_{q}$ is simply given by:

$$S_{\phi_{VCO}}(f) = |H_{\phi}(j2\pi f T)|^{2} S_{b}(f)$$  \hspace{1cm} (14)

The effect of quantization error at $\Sigma\Delta$ input can be evaluated in the same way. The PSD is given by:

$$S_{\phi_{\Sigma\Delta_{in}}}(f) = |H_{\Sigma\Delta_{in}}(j2\pi f T)|^{2} S_{\phi_{\Sigma\Delta_{in}}}(f)$$  \hspace{1cm} (15)

where $b_{es}$ is the number of bits below decimal point in $\Sigma\Delta$ input. The calculation of the power spectral density of the PLL phase error due to the $\Sigma\Delta$ input quantization is then straightforward (fig. 5):

$$S_{\phi_{\Sigma\Delta_{in}}}(f) = |H_{\Sigma\Delta_{in}}(j2\pi f T)|^{2} S_{\phi_{\Sigma\Delta_{in}}}(f)$$  \hspace{1cm} (16)

The following discussion is focused on the LF modeling, but it applies to any (pseudo) continuous time system modeling. The way the Loop Filter is modeled is shown in Fig. 6. Every time a control signal is issued from the VCO or the CP, the loop filter updates its state and calculates a new control voltage according to the actual input value.

To describe the LF behavior in mathematical terms we start from its transfer function and we derive its State-Space Formulation. We assume the LF transfer function to be given by the following equation:

$$F(s) = \frac{1 + \frac{s}{\tau_{o}}}{\tau_{o} \cdot \left(1 + \frac{s}{\tau_{o}}\right) \cdot \left(1 + \frac{s}{\tau_{o}}\right) \cdot \left(1 + \frac{s}{\tau_{o}}\right)}$$  \hspace{1cm} (17)

Note that equation 17 also includes the integrating capacitance. With a partial fraction expansion, equation 17 can be decomposed into 4 parallel blocks, namely an integrator and three $1^{st}$-order RC blocks. Noting that between the update times the input to the LF is constant (e.g. $V_{in}$ appearing as a staircase to the LF), the equation describing the behavior of each of the three RC blocks is given by (state equation solution):

$$V_{o}(t) = V_{o}(t_{0}) + \left(A_{o} V_{in}(t_{0}) - V_{o}(t_{0})\right) \left(1 - e^{-\frac{t-t_{0}}{\tau_{o}}}\right)$$  \hspace{1cm} (18)

The equation that describes the integrating block is given by:

$$V_{c}(t) = V_{c}(t_{0}) + \frac{A_{o} V_{in}(t_{0})}{C} \left(t - t_{0}\right)$$  \hspace{1cm} (19)

The VCO control voltage is then given by:

$$V_{c}(t) = V_{c}(t_{0}) + \frac{A_{o} V_{in}(t_{0})}{C} \left(t - t_{0}\right)$$  \hspace{1cm} (20)

The Verilog model for the LF is then simply given by a set of equations which describe exactly the behavior of the LF. The VCO is modeled as a self-updating block. Such operation can be visualized as shown in Fig. 6. The update takes place at discrete time instances, namely every half-VCO period. The approximation introduced is minimal, since the frequency of the VCO is several orders of magnitude higher with respect to the PLL dynamics. Every half-period the VCO sends an update signal to the LF to obtain the new VCO control voltage; on the basis of the updated value, the new VCO period is calculated. Note that the LF update takes place only when required by other blocks: the update time intervals are not uniform. This makes the simulation methodology very efficient, since the calculations occur only at the required time steps.

Figure 5: Complete linearized S/H $\Sigma\Delta$ fractional-N PLL

4. FULLY EVENT DRIVEN SIMULATION

The presence of a multi-bit $\Sigma\Delta$ modulator makes the use of an event-driven simulator beneficial. Methods based on uniform or adaptive time steps quantize the location of the edges of the sampling time in the $\Sigma\Delta$ multi-bit quantizer [3]. This is equivalent to non-uniform sampling and leads to down-folding of high frequency noise. The effect is the same as having non uniform steps in multilevel D/A converters.

Besides providing 100% accurate time steps, event driven simulations are very fast and highly efficient; PLL variables are calculated only when an event occurs. The simulation presented in this section is based on a standard event-driven simulator, Verilog XL, customized through PLI (Programming Language Interface) to support mathematical functions.

The simulation set-up is structured in a modularized way: PLL blocks are connected through signals that are responsible for timing and for data exchange. This means that each PLL block can be coded as an independent unit, without worrying about the interaction with the other blocks. The implementation of the synthesizer digital blocks is trivial; we concentrate on the implementation of the Loop Filter which is the biggest issue in PLL simulations.

The following discussion is focused on the LF modeling, but it applies to any (pseudo) continuous time system modeling. The way the Loop Filter is modeled is shown in Fig. 6. Every time a control signal is issued from the VCO or the CP, the loop filter updates its state and calculates a new control voltage according to the actual input value.

To describe the LF behavior in mathematical terms we start from its transfer function and we derive its State-Space Formulation. We assume the LF transfer function to be given by the following equation:

$$F(s) = \frac{1 + \frac{s}{\tau_{o}}}{\tau_{o} \cdot \left(1 + \frac{s}{\tau_{o}}\right) \cdot \left(1 + \frac{s}{\tau_{o}}\right) \cdot \left(1 + \frac{s}{\tau_{o}}\right)}$$  \hspace{1cm} (17)

Note that equation 17 also includes the integrating capacitance. With a partial fraction expansion, equation 17 can be decomposed into 4 parallel blocks, namely an integrator and three $1^{st}$-order RC blocks. Noting that between the update times the input to the LF is constant (e.g. $V_{in}$ appearing as a staircase to the LF), the equation describing the behavior of each of the three RC blocks is given by (state equation solution):

$$V_{o}(t) = V_{o}(t_{0}) + \left(A_{o} V_{in}(t_{0}) - V_{o}(t_{0})\right) \left(1 - e^{-\frac{t-t_{0}}{\tau_{o}}}\right)$$  \hspace{1cm} (18)

The equation that describes the integrating block is given by:

$$V_{c}(t) = V_{c}(t_{0}) + \frac{A_{o} V_{in}(t_{0})}{C} \left(t - t_{0}\right)$$  \hspace{1cm} (19)

The VCO control voltage is then given by:

$$V_{c}(t) = V_{c}(t_{0}) + \frac{A_{o} V_{in}(t_{0})}{C} \left(t - t_{0}\right)$$  \hspace{1cm} (20)

The Verilog model for the LF is then simply given by a set of equations which describe exactly the behavior of the LF. The VCO is modeled as a self-updating block. Such operation can be visualized as shown in Fig. 6. The update takes place at discrete time instances, namely every half-VCO period. The approximation introduced is minimal, since the frequency of the VCO is several orders of magnitude higher with respect to the PLL dynamics. Every half-period the VCO sends an update signal to the LF to obtain the new VCO control voltage; on the basis of the updated value, the new VCO period is calculated. Note that the LF update takes place only when required by other blocks: the update time intervals are not uniform. This makes the simulation methodology very efficient, since the calculations occur only at the required time steps.
5. RESULTS

The methodology described in section 4 was used to simulate the topology presented in section 2. Fig. 7 shows the Power Spectral Density (PSD) of the output phase noise $\phi_{\text{vco}}$ due to the $\Sigma\Delta$ quantization for two different cases. The lower curves represent the ideal condition: the input data to the $\Sigma\Delta$ presents no input quantization. The upper curves are instead the result of a 16 bits quantization below decimal point. The curves obtained from the simulation (PSD calculated with Matlab) match perfectly the PSD described by equations 14 and 16.

---

Figure 7: Synthesizer Phase Noise PSD

---

The low frequency noise floor is due to a very small amount of dithering applied on the $\Sigma\Delta$ modulator input. In absence of modulated data, dithering is necessary to avoid the presence of fractional spurs. Note that no reference spurs appear in the output spectrum.

In figure 8 the PSD of the S/H PLL is compared with the PSD of the standard PLL. The S/H PLL has a lower overall phase noise and does not present reference spurs which appear instead in the spectrum of the standard PLL (i.e. without S/H).

As already discussed in section 2, reference spurs may appear also in the S/H PLL output when a real switch is used to control the S/H. However the level of such spurs would be much lower with respect to the spurs level of a standard PLL. Besides, the spurs level for a standard PLL is higher in real situations, when CP mismatches, CP leakage currents and timing mismatches in the PFD are taken into account. Even in ideal conditions, the spurs are exceeding the GSM mask specifications. Simulations have shown that the S/H PLL does not present spurs even in case of CP mismatches and CP leakage currents.

Real GSM data was fed into the EA modulator through a digital prewarp filter. The output spectrum lies within the mask specified by the GSM standard and the RMS phase error is smaller than 0.5 deg RMS. This indicates that the S/H PLL is suitable for direct GSM modulation.

6. CONCLUSIONS

This work presented a new $\Sigma\Delta$ fractional-N synthesizer topology and a new simplified theory to describe the PLL performance. Also, a new simulation methodology completely based on event driven approach is shown. The novelty is represented by the introduction of a S/H block to avoid noise down-fold due to the non-uniform sampling operation of the PFD. The S/H also eliminates the problem of reference spurs in the VCO output. We have shown how it is possible to represent the behavior of the loop filter in a way which is suitable for event-driven based simulations. Extremely high accuracy can be reached because undesirable time quantization phenomena are avoided, the only limit being the numerical accuracy of the event-driven simulator. The simulations are very fast since they proceed through events and the implementation is straightforward. The comparisons presented in section 5 demonstrate the advantages of the S/H PLL over a standard PLL and they show the perfect match between the theoretical model and the simulations. Finally, the S/H PLL fulfills the GSM standard both in receiving and transmitting mode.

7. REFERENCES


A novel PLL calibration method

Marco Cassia 1, Peter Shah 2, and Erik Bruun 1

1 Ørsted•DTU, Technical University of Denmark, DK-2800 Kgs. Lyngby, mca@oersted.dtu.dk
2 RF Magic, 10182 Telesis Court, 4th floor, San Diego, CA 92121 pshah@rfmagic.com

ABSTRACT

A novel method to calibrate the frequency response of a Phase-Locked Loop is presented. The method requires just an additional digital counter and an auxiliary Phase-Frequency Detector (PFD) to measure the natural frequency of the PLL. The measured value can be used to tune the PLL response to the desired value. The method is demonstrated mathematically on a typical PLL topology and it is extended to $\Sigma\Delta$ fractional-N PLLs. A set of simulations performed with two different simulators is used to verify the applicability of the method.

1. INTRODUCTION

Phase-Locked Loop (PLL) frequency synthesizers are building blocks of all communication systems. An accurate PLL response is required in many situations, especially when $\Sigma\Delta$ PLLs for direct modulation [1] are used. In these types of PLLs, the data fed into the $\Sigma\Delta$ modulator is often undergoing a pre-filtering process in order to cancel the low-pass PLL transfer function and thereby to extend the modulation bandwidth [2]. The pre-distortion filter presents a transfer function equal to the inverse of the PLL transfer function and it is usually implemented digitally. Consequently, a tight matching between the pre-distortion filter and the analogue PLL transfer function is necessary to avoid distortion of the transmitted data.

Especially for on-chip Voltage Controlled Oscillators (VCO), the gain $K_{VCO}$ is typically the parameter with the poorest accuracy among the PLL analog components. However to establish an accurate PLL transfer function only the product $K_{VCO} \times I_{CP}/C$ needs to be accurate [4]. The PLL can then be calibrated by adjusting the Charge-Pump current; the problem is how to measure the accuracy of the PLL transfer function.

A continuous calibration technique is presented in [3]. The transmitted data is digitally compared with the input data and the Charge-Pump current is then adjusted to compensate the detected error. This method offers the possibility of continuous calibration at the expense of increased circuit complexity; since the error detection is based on the cross-correlation between input and transmitted data, this approach will not work on unmodulated synthesizers.

2. MEASURING SCHEME

In this paper we present a simple and novel approach that makes it possible to determine the characteristics of the PLL transfer function by simply adding a digital counter and an auxiliary Phase Frequency Detector (PFD). The paper is organized as follows: in section 2 we present the basic idea behind the method and in section 3 we discuss its mathematical formulation. The extension of the method to $\Sigma\Delta$ PLLs is presented in section 4. Finally, in section 5, the results from different simulations are compared with the theory developed.

A calibration cycle is required by this method. Consider the typical PLL topology in fig.1: to start the calibration, the switch to $R_{cal}$ is closed and the calibration resistor $R_{cal}$ is connected to the resistor $R$. By reducing the total filter resistance, the loop transfer function presents underdamped characteristics. Note that, if the loop transfer function is already designed with under-damped characteristics, then the calibration switch is not necessary. By changing the ratio $M$ of the $f_{ref}$ divider or of the ratio $N$ of the $f_{out}$ divider, a frequency step can be applied to the PLL. The natural frequency of the induced transient response can be indirectly measured by counting the UP/DOWN pulses produced by the PFD. If the counter counts 1 up for each UP pulse and counts 1 down for each DOWN pulse, then the maximum counter value is a measure of the natural frequency of the PLL transfer function. This can be seen in fig. 2, where the expected behavior of the phase error together with the counter value trajectory
is presented. The Charge-Pump current can be adjusted so that the PLL natural frequency is the desired one.

For the calibration method to work properly, the leakage current in the Charge-Pump should be kept small and the trickle current (if any) should be turned off. Any leakage or trickle currents will induce a static phase error at the PLL input. This, in turn, means an increased number of pulses in one direction (e.g. UP pulses). Consequently, the counter value is no longer an accurate representation of the natural frequency.

The auxiliary PFD in fig. 1 is required to generate stable UP/DOWN pulses for the digital counter. A possible circuit implementation that works together with a typical PFD is shown in fig. 3. The two set-reset flip-flops (SR-FF) are used to establish which one between the UP/DOWN pulses occurs first. This is necessary because the UP and DOWN pulses are simultaneously high for a period of time equal to the delay in the PFD reset path [4]. If the UP pulse rises before the DOWN pulse, then a logical 'ONE' appears at the input of the top edge-triggered D flip-flop (fig. 3) and a logical 'ZERO' appears at the input of the bottom D flip-flop. The UP pulse delayed through a couple of inverters clocks the flip-flop and the negative transition of the REF clock resets the flip-flop. Hence the flip-flop produces an UP_stable pulse whose length is approximately equal to the REF semiperiod.

The opposite happens if the DOWN pulse occurs before the UP pulse. If the PFD produces aligned UP and DOWN pulses (this is the case if the input phase error is smaller than the dead-zone of the PFD) then the UP_stable and the DOWN_stable signals are high at the same time.

3. MATHEMATICAL DERIVATION

The mathematical formulation will be based on the PLL topology of fig. 1; however, the applicability of the method extends to other topologies, as it will be demonstrated in the next section. We start by deriving the PLL loop transfer function with the aid of the linear model of fig. 4:

\[ H_{\text{loop}}(s) = \frac{I_{cp}(R \cdot C_p s + 1)K_{VCO}}{2\pi C_p s^2 N} \]  

The transfer function from phase input to phase error is given by:

\[ \Phi_{\text{err}}(s) = \frac{1}{1 + H_{\text{loop}}(s)} = \frac{s^2}{s^2 + 2\zeta \omega_n s + \omega_n^2} \]  

where \( \omega_n = \sqrt{\frac{I_{cp}K_{VCO}}{2\pi C_p N_d}} \) and \( \zeta = \frac{R}{2\sqrt{\frac{I_{cp}C_p K_{VCO}}{2\pi N}}} \). The two parameters \( \omega_n \) and \( \zeta \) represent, respectively, the natural frequency and the damping factor of the PLL. A unit input frequency step corresponds to an input phase ramp, with Laplace transform given by \( \Phi_{\text{in}}(s) = \frac{1}{s} \). The Laplace transform of the phase error is then given by:

\[ \Phi_{\text{err}}(s) = \frac{s^2}{s^2 + 2\zeta \omega_n s + \omega_n^2} \frac{1}{s^2} \]  

The behavior in the time domain of equation 3 is the impulse response of a second order system:

\[ \phi_{\text{err}}(t) = \frac{1}{\omega_n \sqrt{1 - \zeta^2}} e^{-\zeta \omega_n t} \sin(\omega_0 t) \]  

where \( \omega_0 = \omega_n \sqrt{(1 - \zeta^2)} \). As long as the phase error function \( \phi_{\text{err}}(t) \) is positive, the PFD generates UP pulses. If the error becomes negative, DOWN pulses are generated. Assuming a positive frequency step, an initial sequence of UP pulses is produced by the PFD and the counter value increases monotonically. When the phase error crosses the zero-error phase, occurring at \( t_{\text{cross}} = \frac{\pi}{\omega_0} \) according to equation 4, DOWN pulses start to appear.
decreasing the counter value. Hence at the crossing time the counter reaches its maximum value, \(V_{\text{max}}\) (fig. 2).

The crossing point time can also be expressed as \(t_{\text{cross}} = V_{\text{max}} \cdot T_{\text{ref}}\), where \(T_{\text{ref}}\) is the period of the REF clock (fig. 1). For stability reason [4], the PLL dynamics is always much slower than the REF signal: the error introduced by quantizing \(t_{\text{cross}}\) is then insignificant. \(\omega_0\) can then be approximated as:

\[
\omega_0 = \frac{\pi}{V_{\text{max}} \cdot T_{\text{ref}}}
\]  

4. EXTENSION TO \(\Sigma\Delta\) PLL TOPOLOGIES

The applicability of the measuring technique will be now demonstrated for \(\Sigma\Delta\) fractional-\(N\) PLLs. These PLLs use high-order multi-bit \(\Sigma\Delta\) modulators to dither the divider modulus; direct modulation is achieved by feeding the data into the modulator. Thus, to measure \(\omega_0\), the frequency step is, in this case, applied to the \(\Sigma\Delta\) modulator input.

The linear model of a \(\Sigma\Delta\) fractional-\(N\) PLL is shown in fig. 5. A complete derivation of the linear model can be found in [5]. The Loop Filter is typically a high-order structure to attenuate the high-frequency quantization noise. In the example analyzed in this section, the Loop Filter transfer function presents 4 poles (including the Charge-Pump integration) and 1 zero. The mathematics involved in this case is lengthier, but the final transfer function can be reduced to an equivalent 2nd order equation.

We start by finding the transfer function from the \(\Sigma\Delta\) modulator input to phase error \(\Phi_{\text{err}}(s)\). With the aid of fig. 5, considering that the effect of the \(\Sigma\Delta\) modulator Signal Transfer Function (STF) is just adding a delay to the input data, the transfer function is given by:

\[
\Phi_{\text{err}}(s) = \frac{2\pi}{N + \mu_b} \frac{e^{-sT_{\text{ref}}}}{1 - e^{-sT_{\text{ref}}}} \frac{1}{1 + H_{\text{loop}}(s)}
\]  

where \(N + \mu_b\) is the instantaneous divider ratio.

\[
\text{Figure 5: } \Sigma\Delta \text{ fractional-}\!N \text{ PLL linear model.}
\]

Indicating with \(\omega_3, \omega_4, \omega_5\) and \(z_1\) the high order poles and, respectively, the zero of the Loop Filter, and by setting:

\[
T_{\text{eq}}^2 = \frac{1}{\omega_4^2} + \frac{1}{\omega_3 \cdot \omega_4} + \frac{1}{\omega_4 \cdot \omega_5} + \frac{1}{\omega_3^2} + \frac{1}{\omega_3 \cdot \omega_5} \omega_5^2
\]

it is possible to define an equivalent natural frequency \(\omega_{n1}\) and an equivalent damping factor \(\zeta_1\):

\[
\omega_{n1} := \sqrt{\frac{\omega_4^2}{1 + \left(\frac{\omega_4 T_{\text{eq}}}{\omega_5}\right)^2}}
\]

\[
\zeta_1 = \frac{1}{2} \omega_{n1} \left(\frac{1}{\omega_3} - \frac{1}{\omega_4} - \frac{1}{\omega_5}\right)
\]

After proper manipulations, eq. 6 can be reduced to the following approximated expression:

\[
\Phi_{\text{err}}(s) = \frac{G}{s^2 + 2\omega_{n1} s + \omega_{n1}^2}
\]

with the gain factor given by

\[
G = \left(\frac{\omega_4}{\omega_5}\right)^2 2\pi T_{\text{ref}} \frac{1}{N + \mu_b}
\]

The final expression for the phase error, obtained by applying a step function to the \(\Sigma\Delta\) modulator, is given by:

\[
\Phi_{\text{err}}(s) = \frac{G}{s^2 + 2\omega_{n1} s + \omega_{n1}^2}
\]

and it corresponds directly to equation 3. Thus, the natural frequency can be calculated as described in section 3.

However the use of \(\Sigma\Delta\) fractional-\(N\) PLLs introduces a new requirement for the correct applicability of the method. The input step to the modulator needs to be large enough to overcome the random effects of the modulator itself. If the input step is too small, then the UP sequence is no longer monotonic and the extracted value of \(\omega_0\) is no longer accurate. On the other hand, the equations derived so far are based on the assumption that the PLL is working in its linear region. If a large input step is applied, the PLL may be pushed out of its linear region. In this case, the previous equations are no longer valid, but the final counter value can be extracted from simulations; the calibration method will still work.

5. SIMULATION RESULTS

The main parameters of the simulated PLL are resumed in table 1. The \(\Sigma\Delta\) modulator is a MASH 4th order and the parameters of the Loop Filter are presented in table 2. Based on the simulation results, the \(\Sigma\Delta\) fractional-\(N\) PLL topology can be simulated with a linear simulator such as Simulink; however the system behavior has been also investigated through a Verilog implementation. The use of Verilog provides the possibility to simulate a model more close to a real PLL implementation, capable of capturing the non-linear behavior of the system [5].

As already discussed in the introduction, the VCO gain \(K_{\text{VCO}}\) is the parameter with the poorest accuracy; the \(\Sigma\Delta\) PLL was simulated with the nominal \(K_{\text{VCO}}\) value and with a gain variation of \(\pm 30\%\) with respect to the nominal value. In fig. 6 the counter behavior for the 3 different \(K_{\text{VCO}}\) values is presented. It is apparent that the three curves reach different peaks according to the value of \(K_{\text{VCO}}\); as the time proceeds the effects of the \(\Sigma\Delta\) modulator start to appear.
By substituting the values of the parameters in the equations presented in section 3, the theoretical maximum counter values for the three different VCO gains are, respectively, 150, 123, and 106. The values extracted from the Verilog simulation of fig. 6 are 148, 122, and 105; these values closely match the predicted ones. The counter behavior for different input frequency steps is presented in fig. 7. As the step is increased, overcoming the $\Sigma \Delta$ modulator noise, the measured values match very well the predicted ones. In the same figure the results for both simulators, Verilog and Simulink, are presented. Notice that the Simulink curves are very close to the curves obtained with Verilog, which confirms that the linear simulator describes accurately the transient behavior of the $\Sigma \Delta$ PLL even for fairly large input frequency steps.

As previously discussed, the presence of a leakage current will result in an average phase error different from zero. If the leakage current is too large, even in lock condition of the PLL, the PFD will only produce UP pulses (for a negative leakage current) or DOWN pulses (for a positive leakage current). The simulations show that the calibration method is very robust to leakage currents: a $\pm 1\%$ leakage current will produce less than $6\%$ deviation from the nominal $\omega_0$.

6. CONCLUSIONS

A new method to calibrate the PLL transfer-function has been presented. The implementation does not require any additional analogue components. The only extra circuitry necessary is an auxiliary PFD and a digital counter. This new approach does not offer continuous calibration and it requires a calibration cycle, but it is very simple and virtually no extra silicon area and no extra power consumption is required. The mathematical formulation of the method has been verified with simulations based on a $\Sigma \Delta$ fractional-N PLL topology, run both on Verilog and Simulink. Results from both simulations closely match the theoretical values.

Acknowledgment

The authors would like to thank the QCT department of Qualcomm CDMA Technologies for the valuable help and support in this work.

7. REFERENCES


Analytical Model and Behavioral Simulation Approach for a $\Sigma\Delta$ Fractional-$N$ Synthesizer Employing a Sample-Hold Element

Marco Cassia, Peter Shah, Member, IEEE, and Erik Bruun, Senior Member, IEEE

Abstract—A previously unknown intrinsic nonlinearity of standard $\Sigma\Delta$ fractional-$N$ synthesizers is identified. A general analytical model for $\Sigma\Delta$ fractional-$N$ phased-locked loops (PLLs) that includes the effect of the nonlinearity is derived and an improvement to the synthesizer topology is discussed. Also, a new methodology for behavioral simulation is presented: the proposed methodology is based on an object-oriented event-driven approach and offers the possibility to perform very fast and accurate simulations, and the theoretical models developed validate the simulation results. We show a GSM example to demonstrate the applicability of the simulation methodology to real study cases.

Index Terms—Linear systems, nonlinearities, phase-locked loops, phase noise, $\Sigma\Delta$ modulation, simulation.

I. INTRODUCTION

The Delta–Sigma modulation in fractional-$N$ synthesizers is a technique that has been successfully demonstrated for high resolution and high-speed frequency synthesizers [1], [2]. These synthesizers use high-order multibit $\Sigma\Delta$ modulators [8] to dither the divider modulus, introducing the issue of high-frequency quantization noise down-folding. For this reason, the derivation of analytical models for noise analysis and the development of efficient techniques for fast and accurate simulations becomes very important.

Simulation of $\Sigma\Delta$ fractional-$N$ synthesizers is difficult for many reasons [3]: simulation time tends to be long since a large number of samples is necessary in order to retrieve the statistical behavior of the system. The dithering applied on the divider modulus makes the behavior of the synthesizers nonperiodic in steady state; therefore, known methods for periodic steady-state simulations [6] cannot be applied to $\Sigma\Delta$ fractional-$N$ synthesizers.

Traditional time sampling simulations based on fixed time-steps or adaptive time-steps quantize the location of the edges of the digital signals. This causes quantization noise and more severely, nonuniform sampling, which is a highly nonlinear phenomenon and leads to down-folding of high-frequency noise.

Different techniques to solve the quantization issue have been proposed in [3] and [5]. In [3], an area conservation principle approach allows to use uniform time-steps in the simulation. In [5], a simple event-driven approach is used in combination with iterative methods to calculate the loop filter response for integer-$N$ phased-locked loops (PLLs). Event-driven synthesizers offer an alternative approach for simulating fractional-$N$ synthesizers in a fast and accurate manner, and have so far been unexplored for this application area.

In this paper, we present and discuss a new simulation methodology based on an object-oriented event-driven approach [18]. This methodology, besides being accurate and highly efficient, prevents nonlinear time quantization from appearing in the simulation. In addition, it allows easy modification and augmentation of individual blocks separately without having to worry about interaction with other blocks.

Before discussing the simulation methodology, we identify a previously unknown intrinsic nonlinear phenomenon in the standard $\Sigma\Delta$ PLL topology [18], which causes down-folding of high-frequency quantization noise and hence increased close-in phase-noise. In Section II, we propose a simple enhancement to the synthesizer topology to eliminate the intrinsic nonlinearity; in Section III we derive a linear model, and in Section IV we extend the model to incorporate the nonlinear effect.

In Section V, we present and discuss the simulation methodology. Finally in Section VI, we compare results from simulations with the theory developed. Also we demonstrate the applicability of the simulation methodology to a direct GSM modulation synthesizer.

II. SAMPLE-HOLD TOPOLOGY

Before deriving a linear model for $\Sigma\Delta$ fractional-$N$ synthesizers, we address a nonlinear issue intrinsic to the standard $\Sigma\Delta$ synthesizer topology. The phase frequency detector samples the phase error in a nonuniform manner. The phase frequency detector produces UP and DOWN pulses of variable length occurring, respectively, after and before the sampling point. The sampling is thus spread out over time around the reference clock edge and that effectively constitutes nonuniform sampling. This is illustrated in Fig. 1.

Nonuniform sampling is a highly nonlinear phenomenon and causes the down-folding of high-frequency noise. The contribution of the down-folded noise to the overall output phase noise can be relevant, especially since high-frequency and high-power $\Sigma\Delta$ quantization noise is present.
To solve the nonuniform sampling problem, we adopt the topology [18] shown in Fig. 2. The structure is similar to ordinary \( \Sigma \Delta \) fractional-\( N \) synthesizers except for the presence of a sample-hold block between the charge-pump and the loop filter. By resampling the charge pump output at regular time intervals, the nonlinearity previously discussed is eliminated. The sample-hold has another beneficial effect: it prevents the modulation of the loop filter voltage by the reference clock, hence, ideally it eliminates reference spurs in the voltage-controlled oscillator (VCO) output. In reality, low-level spurs may appear at the output due to the charge feedthrough in the control switch.

The use of sample–hold detectors is known [12], [16] to give good spurious performance; sampled PLL circuits have been already used in clock and data-recovery circuits [13]. A sampled feed-forward network has been recently proposed in a clock generator PLL architecture [14]. However, to the knowledge of the authors, the sample-hold technique has not been used before in \( \Sigma \Delta \) fractional-\( N \) synthesizers for the purpose of compensating the nonuniform sampling operation of the phase-frequency detector (PFD).

In Section III, we present a derivation of a linear model of the S/H \( \Sigma \Delta \) fractional-\( N \) synthesizer. The resulting linear model is similar to [4], but the derivation is more straightforward and provides more intuitive insight.

### III. Linear Model Derivation

The starting point is the sample-hold portion of the synthesizer. A possible implementation is shown in Fig. 3. This circuit uses a switched-capacitor integrator to carry out both the sample-hold function as well as the integrator function that is usually implemented by the loop filter. Note, that the sample-hold block is in series with the loop filter: both the integral and the proportional loop corrections are sampled and held for each PFD sampling interval. To derive the transfer function we start by considering the charge deposited on the capacitance \( C_1 \)

\[
Q_{C_1}(t) = \frac{\Delta \varphi(t)}{2\pi} T_{Ref} \cdot I_{CP}
\]  

where \( \Delta \varphi(t) \) is the phase error waveform into the PFD. After a certain delay \( \tau_{SH} \) the charge is transferred to \( C_2 \) and added to the charge previously stored

\[
Q_{C_2}(t) = Q_{C_2}(t - T_{Ref}) + Q_{C_1}(t - \tau_{SH})
\]  

In voltage terms and inserting the expression for \( Q_{C_1} \)

\[ V_{C_2}(t) = V_{C_2}(t - T_{Ref}) + \frac{I_{CP}}{2\pi \cdot C_2} \cdot T_{Ref} \cdot \Delta \varphi(t - \tau_{SH}) \]  

Taking the Laplace transform yields

\[ \frac{V_{C_2}(s)}{\Delta \varphi(s)} = \frac{I_{CP}}{2\pi \cdot C_2} \cdot \frac{1}{1 - e^{-s\tau_{SH}}}. \]  

In (4), \( V_{C_2}(s) \) is still modeled in the discrete-time domain, i.e. as a train of delta-functions. In reality, the output voltage is a staircase function. As a consequence, (4) is further modified by a zeroth-order hold network that converts the impulse-train into the staircase waveform. The transfer function of the zeroth-order hold network is given by

\[ H_{ZH}(s) = \frac{1}{T_{Ref}} \cdot \frac{1 - e^{-s\tau_{SH}} - \frac{1}{s}}{s}. \]  

The actual transfer function from phase difference (PFD input) to integrator output is then given by

\[ \frac{V_{O}(s)}{\Delta \varphi(s)} = H_{ZH}(s) \cdot \frac{V_{C_2}(s)}{\Delta \varphi(s)} = e^{-s\tau_{SH}} \cdot \frac{I_{CP}}{2\pi \cdot s \cdot C_2}. \]  

Consequently, the circuit in Fig. 3 can be modeled as shown in Fig. 4. Note that in Fig. 4 the integration \( 1/sC_2 \) has been absorbed in the loop filter transfer function \( F(s) \). Thus, the only difference introduced in the linear model by the sample-hold is the delay \( \tau_{SH} \). Note that the sampling now always occurs at regular time intervals, namely at the negative edge of the reference clock.

In the setup shown in Fig. 3, the delay \( \tau_{SH} \) is equal to half a reference period. The delay is necessary to allow the charge-pump current to be completely integrated before the sampling operation takes place. Note also that the sampling switch needs to be opened while the charge pump is active.
The control logic of Fig. 3 takes into account the fact that the rising edge of the DOWN pulse occurs before the rising edge of the reference clock.

If a trickle current is used in the charge-pump (e.g., only UP pulses are generated in the lock state) then it is sufficient to invert the reference clock signal to generate a proper signal.

A. Divider

We will now derive a simple linear model for the divider with dithering. The first step is to find the timing deviations with the aid of Fig. 5. \( N \) is the nominal divider modulus and \( b(n) \) is the dithering value provided by the \( \Sigma \Delta \) modulator. Note that the UP and DOWN pulses have variable length and occur, respectively, after and before the reference signal. As already stated, the sampling is spread out over time before and after the sampling point.

According to the timing diagram we can write

\[
\Delta t(n+1) = \Delta t(n) + (N + b(n)) \cdot T_{\text{VCO}} - T_{\text{Ref}}. \tag{7}
\]

Indicating with \( \mu_b \) the average value of \( b(n) \) (\( \mu_b \) is the fractional divider value), the reference period \( T_{\text{Ref}} \) can be expressed as

\[
T_{\text{Ref}} = (N + \mu_b)T_{\text{VCO}}. \tag{8}
\]

In deriving (8), we are making the important approximation that \( T_{\text{VCO}} \) is constant. This assumption is reasonable for receive–transmit synthesizers with narrow modulation bandwidth. In these cases, the relative frequency variation of the VCO is small, which means that \( T_{\text{VCO}} \) is nearly constant.

Defining \( \hat{b}(n) = b(n) - \mu_b \) and substituting \( T_{\text{VCO}} \) from (8) into (7) yields

\[
\Delta t(n+1) = \Delta t(n) + \frac{T_{\text{Ref}}}{N + \mu_b} \hat{b}(n). \tag{9}
\]

B. \( \Sigma \Delta \) Modulation

The \( \Sigma \Delta \) modulation can be modeled as additive phase contribution (also shown in Fig. 6). As an example, a \( \Sigma \Delta \) MASH architecture [9] of order \( n \) is used in the analysis. The \( \Sigma \Delta \) quantizer causes quantization noise \( q_n \) which is added to the output signal.

Converting to phase domain we have

\[
\Delta \varphi = \frac{2\pi \Delta t}{T_{\text{Ref}}}. \tag{10}
\]

We can finally derive an expression for the additive noise caused by dithering the divider ratio

\[
\Delta \varphi(n+1) = \Delta \varphi(n) + \frac{2\pi}{N + \mu_b} \hat{b}(n). \tag{11}
\]

The Laplace transform yields

\[
\Delta \varphi(s) = \frac{2\pi}{N + \mu_b} \cdot \frac{e^{-sT_{\text{Ref}}}}{1 - e^{-sT_{\text{Ref}}}} \hat{b}(z). \tag{12}
\]

Setting \( z = e^{sT_{\text{Ref}}} \), (12) can be equivalently written in the digital domain (Z-transform)

\[
\Delta \varphi(z) = \frac{2\pi}{N + \mu_b} \cdot \frac{z^{-1}}{1 - z^{-1}} \hat{b}(z). \tag{13}
\]

The previous equation shows that the \( \Sigma \Delta \) noise undergoes a phase shift but is otherwise shaped by the loop in exactly the same way as the reference clock phase noise.

The final linear model is shown in Fig. 6, where signal transfer function (STF) and the noise transfer function (NTF) are the \( \Sigma \Delta \) modulator, respectively [8]. The NL block in the model indicates the nonlinear effect that occurs in the PLL if the sample-hold block is not used. An analytical derivation of such effect is presented in Section IV. The closed-loop transfer function \( H_\theta(s) \) is given by (Fig. 6)

\[
H_\theta(s) = \frac{\frac{K_{\text{VCO}}}{2\pi} e^{-sT_{\text{Ref}}} \cdot F(s) \frac{K_{\text{VCO}}}{s}}{1 + \frac{K_{\text{VCO}}}{2\pi} e^{-sT_{\text{Ref}}} \cdot F(s) \frac{K_{\text{VCO}}}{s} + \frac{1}{N + \mu_b}}. \tag{14}
\]

The phase noise properties can now be predicted from straightforward linear systems analysis [11]. Also, although Fig. 6 indicates \( \Sigma \Delta \) modulation, the linear model has been derived with no assumption on the type of modulation used to dither the divider modulus (e.g., it is valid for any fractional-\( N \) topology [7]).
word. Such noise is spread out over a bandwidth of \( f_{\text{ref}} = 1/T_{\text{ref}} \) and is high-pass shaped by the \( \Sigma\Delta \) modulator with a noise transfer function (NTF) given by

\[
H_{\text{NTF}}(z) = (1 - z^{-1})^n |_{z = e^{j2\pi f T_{\text{ref}}}}. \tag{15}
\]

The \( \Sigma\Delta \) STF is given by

\[
H_{\text{STF}}(z) = \left( z^{-1} \right)^n |_{z = e^{j2\pi f T_{\text{ref}}}}. \tag{16}
\]

Assuming that the quantization noise is independent of the input signal, the power spectral density of the bit stream can be expressed as

\[
S_{\text{nt}}(f) = \frac{T_{\text{ref}}}{12} |H_{\text{NTF}}(f)|^2. \tag{17}
\]

From the linear model of Fig. 6 we can find the transfer function from the output of the NTF to the output phase \( \varphi_{\text{VCO}} \)

\[
H_n(s) = \frac{2\pi}{N} \frac{e^{-sT_{\text{ref}}}}{1 - e^{-sT_{\text{ref}}}} H_{\text{STF}}(s). \tag{18}
\]

Finally the output phase noise power spectral density (PSD) due to the \( \Sigma\Delta \) quantization noise \( n_b \) is simply given by

\[
S_{\varphi_{\text{VCO}}}(f) = |H_n(j2\pi f T)|^2 S_{\text{nt}}(f). \tag{19}
\]

The effect of quantization at the \( \Sigma\Delta \) input (i.e. due to finite word length) can be evaluated in the same way. The PSD is given by

\[
S_{\varphi_{\text{in}}} = \frac{T_{\text{ref}}}{12} \times 2 \cdot 2^{b_{\text{in}}} \cdot |H_{\text{STF}}(f)|^2 \tag{20}
\]

where \( b_{\text{in}} \) is the number of bits below the decimal point in the \( \Sigma\Delta \) input. The calculation of the PSD of the PLL phase error due to the \( \Sigma\Delta \) input quantization is then straightforward (Fig. 6)

\[
S_{\varphi_{\Sigma\Delta_{\text{in}}}} = |H_n(j2\pi f T)|^2 S_{\varphi_{\text{in}}}. \tag{21}
\]

The output phase noise due to other noise sources, such as charge-pump noise or VCO noise can be evaluated in a similar way.

IV. ANALYTICAL EVALUATION OF THE INTRINSIC NONLINEARITY

As previously mentioned, in \( \Sigma\Delta \) synthesizers an intrinsic non linearity affects the close-in phase noise. We will now show that in standard \( \Sigma\Delta \) synthesizers, the charge-pump output \( i_{\text{out}}(t) \) contains an additional noise term, which is caused by the nonuniform pulse stretching shown in Fig. 5.

We begin by taking the Fourier transform of the charge-pump output

\[
i_{\text{out}}(f) = \int_{-\infty}^{\infty} i_{\text{out}}(t)e^{-j2\pi ft} dt. \tag{22}
\]

With the aid of Fig. 5, the previous equation can be written as (23), shown at the bottom of page, which simplifies to

\[
i_{\text{out}}(f) = \sum_{n=0}^{\infty} \int_{nT_{\text{ref}}}^{nT_{\text{ref}} + \Delta t} I_{CP} e^{-j2\pi ft} dt. \tag{24}
\]

By solving the integral, (24) becomes

\[
i_{\text{out}}(f) = I_{CP} \sum_{n=0}^{\infty} \int_{nT_{\text{ref}}}^{nT_{\text{ref}} + \Delta t} I_{CP} e^{-j2\pi ft} dt. \tag{25}
\]

We now perform a second-order Taylor series expansion of the term \( e^{-j2\pi f \Delta (n)} \)

\[
I_{\text{out}}(f) = I_{CP} \sum_{n=0}^{\infty} \int_{nT_{\text{ref}}}^{nT_{\text{ref}} + \Delta t} I_{CP} e^{-j2\pi ft} dt.
\]

Equation (27) contains two terms. The first one is simply a linearly filtered version of the quantization noise, as predicted by the linear model in the paper. The second term quantifies the undesired nonlinear effect caused by the nonuniform pulse stretching. As can be seen, it is essentially the Fourier transform of the filtered quantization noise squared, followed by a differentiation.

The NL block in Fig. 6 symbolizes the nonlinear effect and, according to the above analysis, it can be modeled as shown in Fig. 7.

Based on the previous analysis we can write an analytical expression for the PSD of the excess noise that occurs in standard \( \Sigma\Delta \) PLL (i.e. without sample-hold)

\[
S_{\text{inout excess}}(f) = 4\pi^2 (2\pi f)^2 \frac{1}{4} \times (S_{\Delta t}(f) \otimes S_{\Delta t}(f)) \left( |H_{\Theta}(j2\pi f T)|^2 \right) \tag{28}
\]

where

\[
I_{\text{out}}(f) = \begin{cases} 
\int_{nT_{\text{ref}}}^{nT_{\text{ref}} + \Delta t} I_{CP} e^{-j2\pi ft} dt, & \text{if } \Delta t(n) > 0 \\
\int_{nT_{\text{ref}}}^{nT_{\text{ref}} + \Delta t} -I_{CP} e^{-j2\pi ft} dt, & \text{if } \Delta t(n) < 0 
\end{cases} \tag{23}
\]
where "\(\otimes\)" denotes the convolution and \(H_0(f)\) is given by (14) \(S_{\Delta f}(f)\) is given by

\[
S_{\Delta f}(f) = \left( \frac{T_{\text{ref}}}{N + \mu_b} \cdot \frac{1}{1 - e^{-j2\pi fT_{\text{ref}}}} \right)^2 S_{\text{IN}}(f) \quad (29)
\]

with \(S_{\text{IN}}(f)\) given by (17).

Fig. 8 shows the equivalent phase-noise at the phase-frequency detector input (top row) and at the PLL output (bottom row) for both sample-hold and nonsample-hold topology and for different \(\Sigma\Delta\) modulator orders. The values of the parameters used in the graphics can be found in Section VI. If a nonsample-hold PLL is used then an excess noise appears and the total noise becomes as shown by the dashed curve. Of course, the regular \(\Sigma\Delta\) quantization noise also gets worse with increasing frequency. So, at high-offset frequency the excess noise actually becomes insignificant in comparison with the \(\Sigma\Delta\) noise. Notice also that the excess phase noise effect is more noticeable for high-order \(\Sigma\Delta\) modulators. This is because the high-frequency quantization noise is stronger so that more noise is down-folded. On top of this, the low-frequency quantization noise is lower, which makes the excess noise more significant in comparison.

The contribution of the excess noise might not always be significant with respect to other PLL noise sources, such as the charge-pump noise, which usually dominates at low frequency. However it is still valuable to quantify and to model the effect of the nonlinearity in order to ensure correct performance of the PLL in all cases.

V. EVENT-DRIVEN OBJECT ORIENTED METHODOLOGY

As discussed in the introduction, the use of event-driven simulators is very attractive. Besides providing precise time-steps, as explained later in the section, event-driven simulations are also very fast and highly efficient. In fact, the number of calculations is kept to a minimum because synthesizer signals and variables are calculated only when a transition occurs.

The simulation method proposed in [3] ensures extremely high computation speed because, instead of simulating the true time domain behavior, it effectively operates in a subsampled manner on the merged VCO-divider block. This idea makes the method in [3] very attractive too. However, this idea could equally well be used in the event-driven approach, speeding up the simulation tremendously. In this case, the VCO would sample the loop-filter once for every reference cycle. This subsampling operation implicitly relies on the assumption that the power level of the noise at high-frequency offset is not giving a significant contribution when aliased to low frequency. Thus, if the assumption holds, the event-driven approach would be equally as fast as the method in [3]. However, even without the VCO-divider merging approach, the event-driven method is already so fast that it is hardly worthwhile to use this merging technique.

A unique strength of the event-driven methodology we propose is that it is exact and does not require assumptions or approximations. The simulation setup is structured in an object-oriented way: PLL blocks are connected through signals that are responsible for timing and for data exchange, as shown in Fig. 9. Note that IN/OUT signals can operate also as implicit update signals (e.g. the UP/DOWN signals from the charge-pump). Whenever a block is called from the simulator, a specific operation is performed and an event may be posted. As shown in Fig. 10, the simulator inserts the event in the event queue in the proper time order and extracts from the queue the next event that needs to be executed, resulting in the update of the signals/variables of a block.

This means that each PLL block can be coded as an independent unit, without worrying about the interaction and the sequencing with the other blocks. The fact that each block is self-contained allows to change and refine the behavior of a single block without affecting the coding of the other PLL units. The simulator itself keeps track of the succession of the events with the event queue. A more detailed explanation of this concept can be found in [17].

The advantage of maintaining a simulation event-queue is that the simulation time points occur exactly at the moment of the execution of the event. Thus, the simulation time points are
always aligned with the edges of the signals, providing 100\% accurate time-steps.

Coding the behavior of the synthesizer digital blocks is straightforward; the description of the loop filter and of the VCO requires particular attention, as discussed next. More details about the methodology implementation can be found in [17].

A. Loop Filter

We propose a simple method based on state-space equations description. The way the loop filter is modeled can be visualized with the help of Fig. 9. Every time the VCO and the charge-pump are executed, they post events requiring the update of the loop filter state. When these events are extracted from the event-queue to be executed, the simulator calls the loop filter to update its state and to calculate a new control voltage according to the actual input value. The event posted from the charge-pump indicates that a change has occurred at the loop filter input; the VCO event is posted to obtain the actual control voltage.

To describe the loop filter behavior in mathematical terms we start from its transfer function and we derive its State-Space Formulation. We assume the loop filter transfer function to be given by the following equation:

\[ F(s) = \frac{1 + \frac{s}{20}}{sC \cdot \left( 1 + \frac{s}{P_0} \right) \cdot \left( 1 + \frac{s}{P_1} \right) \cdot \left( 1 + \frac{s}{P_2} \right)}. \] (30)

Note that (30) also includes the integrating capacitance. With a partial fraction expansion, (30) can be decomposed into four parallel blocks, namely an integrator and three first-order RC blocks. Noting that between the update times the input to the loop filter is constant (e.g. \( V_{in} \) is appearing as a staircase to the loop filter), the equation describing the behavior of each of the three RC blocks is given by (state equation solution)

\[ V_x(t_1) = V_x(t_0) + (A_x V_{in}(t_0) - V_x(t_0)) \left( 1 - e^{-t_1/t_2} \right). \] (31)

The equation that describes the integrating block is given by

\[ V_C(t_1) = V_C(t_0) + \frac{V_{in}(t_0)}{C}(t_1 - t_0). \] (32)

The VCO control voltage is then given by

\[ V_{ctrl}(t) = V_1(t) + V_2(t) + V_3(t) + V_C(t). \] (33)

The model for the loop filter is then simply given by a set of equations which describe exactly the behavior of the loop filter.

This representation of the loop filter can be directly converted into simulation code. It is important to underline that the filter behavior is modeled with no approximation. Also, the loop filter update takes place only when required by other blocks: the update time intervals are not uniform. This makes the simulation methodology very efficient, since the calculations occur only at the required time steps. Further implementation details can be found in [17].

B. VCO Model

The VCO is modeled as a self-updating block. Such operation can be visualized as shown in Fig. 9. The pseudocode describing the VCO behavior is presented in algorithm 1. The update takes place at discrete time instances, namely every half-VCO cycle. Every half-period the VCO receives the update VCO control voltage from the loop filter; on the basis of the received value, the new VCO period is calculated.

The VCO completes its execution by posting two events. The first event is the execution of the loop filter block at the next time point when the VCO update will take place. This ensures that the value used to calculate the semiperiod of the VCO is always updated. The second event is simply the scheduling of the next VCO block call.

Due to the finite number representation of the simulator, the effects of the number truncation represents a potential problem in the calculation of the VCO period. In order to avoid the accumulation of the truncation error, the calculation of the VCO semiperiod can be implemented as a first-order \( \Sigma \Delta \) modulator. In this way, the accumulation error is always driven to zero on average.

<table>
<thead>
<tr>
<th>TABLE I DESIGN PARAMETERS</th>
</tr>
</thead>
<tbody>
<tr>
<td>( f_{LO} )</td>
</tr>
<tr>
<td>1907.75 MHz</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>TABLE II LOOP PARAMETERS</th>
</tr>
</thead>
<tbody>
<tr>
<td>( C_{CP} )</td>
</tr>
<tr>
<td>34.522 pF</td>
</tr>
</tbody>
</table>

Algorithm 1 VCO pseudocode

```plaintext
MODULE VCO
input control_voltage
output VCO_clk
// Update the instantaneous frequency
f_{inst} = f_{FreeRUN} + K_{VCO} \cdot V_{ctrl}
// Calculate the new semiperiod
VCO_semiperiod = 0.5 / f_{inst}
// Update the VCO control voltage
VCO_reset = NOT (VCO_clock)
// Update the VCO_clock signal
POST_EVENT(update_loop @ current_sim_time + VCO_semiperiod)
POST_EVENT(execute VCO @ current_sim_time + VCO_semiperiod)
END MODULE
```
VI. RESULTS

The PLL topology presented in Section II is simulated with Verilog XL, but the simulation methodology can be applied to any kind of event-driven simulators. For example, the simulation core can be easily implemented with a few lines of C code. The choice of Verilog is a matter of convenience: its integration in the Cadence Environment allows easier debugging, schematic capture, and plotting capabilities. Moreover, the Cadence Environment offers the possibility to directly use the Verilog code together with Spice-like simulators to run mixed-mode simulation. However, simulations in a mixed-mode environment require long simulation time. As a comparison, to simulate in an event-driven simulation 2 million VCO cycles (equivalent to 1 ms) recording in a file 4 million data points, the time of execution is less than 15 min on a RISC 8500 processor (it reduces to only 5 min if the VCO simulation time points are not written to a file). The same simulation in a mixed-mode environment takes more than 20 h, without reaching the same accuracy. A fully analogue simulator such as SPICE would probably require a simulation time at least one order of magnitude longer.

The main parameters of the simulated PLL are resumed in Table I. The $\Sigma \Delta$ modulator is a MASH fourth order and the parameters of the loop filter are presented in Table II.

We now present several simulation results obtained by the event-driven methodology in order to

- validate the theory developed and evaluate the effect of the sample-hold block;
- evaluate the effects of nonidealities and nonuniform delay in the divider moduli;
- demonstrate the applicability of the simulation methodology to a real study case, namely direct GSM modulation.

We start by showing the effects of the nonuniform sampling at the PFD. The effect of other noise sources will be discussed later. Fig. 11 shows the PSD of the output phase noise $\varphi_{VCO}$ due to the $\Sigma \Delta$ quantization for two different synthesizer topologies: the PSD of the sample-hold PLL is compared with the PSD of the standard PLL. The sample-hold PLL has a lower overall phase noise and does not present spurs. By contrast the standard PLL (i.e., without sample-hold) has greatly increased close-in phase noise as well as reference spurs.

In the same figure, the PSD from simulations is compared with the predicted theoretical curves. Clearly, the curves obtained from the simulation match very well with the PSD described by (19) [for the sample-hold synthesizer topology] and (28) [for the standard synthesizer topology].

The low-frequency noise floor ("dithering noise floor" in Fig. 11) is due to a very small amount of dithering applied on the $\Sigma \Delta$ modulator input. In absence of modulated data, dithering is necessary to avoid the presence of fractional spurs.

In the previous figure, only the effect of the $\Sigma \Delta$ quantization noise on the output phase noise has been considered. The effects of other noise sources can be easily evaluated in the simulation, in a similar manner as described in [3]. Due to the object-oriented nature of the simulation it is easy to add new blocks that generate noise: the charge-pump white noise is obtained from a random number generator block and the VCO noise can be generated with another random number generator block followed by a filter block (coded in the same way as the loop filter). Another option is to read the noise data from a file; in this way it is possible to use data from other simulations or from real measurements. As an example, Fig. 12 shows the PSD of the output phase noise due to the contribution of $\Sigma \Delta$ quantization noise and VCO phase noise (in this example, the VCO phase noise is about $-140$ dBc/Hz @ 1-MHz offset). Together with the simulation result, Fig. 12 presents the predicted contribution of the single noise sources; the typical VCO phase noise ($-20$ dB/decade characteristic) determines an increased close-in phase noise.
A. Simulation Example: Direct GSM Modulation

The event-driven methodology and the linear model were applied to the study case of ΣΔ synthesizers for direct GSM modulation. The effects of nonidealities such as charge-pump mismatches, variation in the VCO gain, and variable delay in the divider modulus can be easily evaluated with the aid of the simulations. A brief account of the results will be given here; more results can be found in [17].

To evaluate the dynamic behavior of the simulator real GSM data was fed into the ΣΔ modulator through a digital prewarp filter [15], which compensates for the PLL transfer function. The transmitted output spectrum lies within the mask specified by the GSM standard and the rms phase error is smaller than 0.5° rms in the ideal condition.

As an example, the effects of a variable delay on a single divider modulus can be seen in Fig. 13. When the delay increases, the transmit power spectrum lies outside the mask specification. In fact, nonuniform propagation delay for the divider moduli is equivalent to nonuniform quantization in multibit ΣΔ DACs and causes down-folding of high-frequency noise.
The small reference spur in Fig. 13 is caused by a small dc content in the input data. In fact, the input data of the \( \Sigma \Delta \) modulator is not ideal, but it is taken from a real implementation (e.g., the length of the Gaussian filter is finite).

The conclusions from the simulation on the study case can be summarized as follows.

- Identical results are achieved with a \( \Sigma \Delta \) modulator based on a MASH or on a Candy architecture [10].
- It is important to ensure equal propagation delay for all divider moduli, otherwise the transmit power will exceed the GSM mask specification.
- Even a small mismatch in the charge-pump currents results in a large close-in phase noise increase. To compensate the charge-pump current mismatches a fixed trickle current source can be used, in order to have pulses in only one direction (e.g., only UP pulses) under lock condition. The penalty of this choice is an increased spur level in the output spectrum for the standard PLL, but not for the sample-hold PLL.
- For receive synthesizers, the sample-hold topology greatly reduces close-in phase noise. In transmit mode, the increased close-in phase-noise integrates up to a relatively small rms phase error; consequently, it is acceptable to use the standard topology. However, the sample-hold eliminates the spur problems; this means that a trickle current can be used in the charge-pump to compensate for current mismatches.

The same conclusions are obtained in the study of a \( \Sigma \Delta \) synthesizer whose target is the DCS specification. This indicates that the sample-hold PLL is suitable for both direct GSM/DCS modulation.

**VII. CONCLUSION**

This work identified an intrinsic nonlinearity of standard \( \Sigma \Delta \) synthesizers and presented a sample-hold topology to solve this issue. The sample-hold also eliminates the problem of reference spurs in the output spectrum. A general analytical model was derived for the \( \Sigma \Delta \) fractional-N synthesizer and was augmented to include the effects of the discussed nonlinearity. Moreover the model is valid for any kind of divider dithering, not just \( \Sigma \Delta \) modulation; thus, regular fractional-N PLLs can also be analyzed using this model.

We also proposed a new simulation approach based on an object-oriented event-driven methodology. The simulation methodology is very accurate because it does not require approximations and undesirable time quantization phenomena are avoided, the only limit being the numerical accuracy of the event-driven simulator. One of the advantages of this approach is its capability to naturally predict nonobvious phenomena such as noise down-folding, without having to resort to any special measures.

The comparisons presented in Section VI demonstrate a very good match between the theoretical model and the simulations. The examples provided show that the simulation methodology can be applied to the study of the effects of multiple nonlinearities. As an example, a study case for direct GSM/DCS modulation was briefly presented and a summary of the results was shown, which indicate that the sample–hold \( \Sigma \Delta \) fractional-N synthesizer is suitable for fulfilling the GSM/DCS standard.

**ACKNOWLEDGMENT**

The authors thank the QCT Department of Qualcomm CDMA Technologies for the valuable help and support in this work, and also thank P. Andreani for insightful discussions and the reviewers for their useful comments.

**REFERENCES**


**ACKNOWLEDGMENT**

The authors thank the QCT Department of Qualcomm CDMA Technologies for the valuable help and support in this work, and also thank P. Andreani for insightful discussions and the reviewers for their useful comments.
Marco Cassia was born in Bergamo, Italy, in 1974. He received the M.Sc. degree in engineering from the Technical University of Denmark, Lyngby, in 2000, and the M.Sc. degree in electrical engineering from Politecnico di Milano, Italy, in July 2000. He is currently working toward the Ph.D. degree at the Technical University of Denmark, Lyngby.

From July 2001 to July 2002, he was with the QCT Department of Qualcomm CDMA Technologies, San Diego, CA, working with direct modulation synthesizers. His main research interests include low-power low-voltage RF systems.

Peter Shah (M’89) was born in Copenhagen Denmark, in 1966. He received the M.Sc.E.E. and Ph.D degrees from The Technical University of Denmark, Lyngby, in 1990 and 1993, respectively.

From 1993 to 1995, he was a Post-Doctoral Research Assistant with the Imperial College in London, England, where he worked on switched-current circuits. In 1996, he joined PCSi, San Diego, CA, (which was subsequently acquired by Conexant Systems, San Diego, CA) as an RFIC Design Engineer, working on transceiver chips for the PHS cellular phone system. In 1998, he joined Qualcomm, San Diego, CA, where he worked on RFICs for CDMA mobile phones and for GPS. In December 2002, he joined RFMagic, San Diego, CA, where he is currently working on RFICs for consumer electronics. His research interests include RFIC architecture and design, including sigma-delta PLLs, A/D, D/A converters, LNAs, mixers, and continuous-time filters.

Erik Bruun (M’72–SM’02) received the M.Sc. and Ph.D. degrees in electrical engineering from the Technical University of Denmark, Lyngby, in 1974 and 1980, respectively, the B.Com. degree from the Copenhagen Business School, Denmark, in 1980, and the Dr.Techn. degree from the Technical University of Denmark in 2000.

In 1974, and again, from 1980 to 1984, he was with Christian Rovsing A/S, Denmark, working on the development of space electronics and test equipment for space electronics. From 1974 to 1980, he was with the Laboratory for Semiconductor Technology, Technical University of Denmark, working in the fields of nMOS memory devices, FET devices, bipolar analog circuits, and custom integrated circuits. From 1984 to 1989, he was Managing Director of Danmos Microsystems ApS, Denmark. Since 1989, he has been a Professor of analog electronics with the Technical University of Denmark, where from 1995 to 2001, he served as Head of the Sector of Information Technology, Electronics, and Mathematics. Since 2001, he has been Head of Ørsted DTU. His current research interests include RF integrated circuit design and integrated circuits for mobile phones.
A CALIBRATION METHOD FOR PLLS BASED ON TRANSIENT RESPONSE

Marco Cassia 1, Peter Shah 2, and Erik Bruun 1

1 Ørsted•DTU, Technical University of Denmark,DK-2800 Kgs. Lyngby, mca@oersted.dtu.dk
2 RF Magic, 10182 Telesis Court, 4th floor, San Diego, CA 92121 pshah@rfmagic.com

ABSTRACT

A novel method to calibrate the frequency response of a Phase-Locked Loop is presented. The method requires just an additional digital counter and an auxiliary Phase-Frequency Detector (PFD) to measure the natural frequency of the PLL. The measured value can be used to tune the PLL response to the desired value. The method is demonstrated mathematically on a typical PLL topology and it is extended to ΣΔ fractional-N PLLs. A set of simulations performed with two different simulators is used to verify the applicability of the method.

1. INTRODUCTION

Phase-Locked Loop (PLL) frequency synthesizers are building blocks of all communication systems. An accurate PLL response is required in many situations, especially when ΣΔ PLLs for direct modulation [1] are used. In these types of PLLs, the data fed into the ΣΔ modulator is often undergoing a pre-filtering process in order to cancel the low-pass PLL transfer function and thereby to extend the modulation bandwidth [2]. The pre-distortion filter presents a transfer function equal to the inverse of the PLL transfer function and it is usually implemented digitally. Consequently, a tight matching between the pre-distortion filter and the analogue PLL transfer function is necessary to avoid distortion of the transmitted data.

Especially for on-chip Voltage Controlled Oscillators (VCO), the gain $K_{VCO}$ is typically the parameter with the poorest accuracy among the PLL analog components. However to establish an accurate PLL transfer function only the product $K_{VCO} \times I_{CP}/C$ needs to be accurate [3]. The PLL can then be calibrated by adjusting the Charge-Pump current; the problem is how to measure the accuracy of the PLL transfer function.

A continuous calibration technique is presented in [4]. The transmitted data is digitally compared with the input data and the Charge-Pump current is then adjusted to compensate the detected error. This method offers the possibility of continuous calibration at the expense of increased circuit complexity; since the error detection is based on the cross-correlation between input and transmitted data, this approach will not work on unmodulated synthesizers. An alternative approach is found in [5], where a method based on the detection of pulse skipping is described. The presence of one or several pulse skips can be used as an indication of the bandwidth. This method requires an input frequency step large enough to push the PLL into its non-linear operating region and only offers a rough estimation of the actual PLL bandwidth.

In this paper we present a simple and novel approach that makes it possible to determine the characteristics of the PLL transfer function by simply adding a digital counter and an auxiliary Phase Frequency Detector (PFD).

The paper is organized as follows: in section 2 we present the basic idea behind the method and in section 3 we discuss its mathematical formulation. The extension of the method to ΣΔ PLLs is presented in section 4. Finally, in section 5, the results from different simulations are compared with the theory developed.

2. MEASURING SCHEME

A calibration cycle is required by this method. Consider the typical PLL topology in fig. 1: to start the calibration, the switch to $R_{cal}$ is closed and the calibration resistor $R_{cal}$ is connected to the resistor $R$. By reducing the total filter resistance, the loop transfer function presents under-damped characteristics. Note that, if the loop transfer function is already designed with under-damped characteristics, then the calibration switch is not necessary. By changing the ratio $M$ of the $f_{ref}$ divider or of the ratio $N$ of the $f_{out}$ divider, a frequency step can be applied to the PLL. The natural frequency of the induced transient response can be indirectly measured by counting the UP/DOWN pulses produced by the PFD. If the counter counts 1 up for each UP pulse and counts 1 down for each DOWN pulse, then the maximum counter value is a measure of the natural frequency of the PLL transfer function. This can be seen in fig. 2, where the expected behavior of the phase error together with the counter value trajectory are presented. The Charge-Pump current can be adjusted so that the PLL natural frequency is the desired one.

For the calibration method to work properly, the leakage current in the Charge-Pump should be kept small and the trickle cur-
rent (if any) should be turned off. Any leakage or trickle currents will induce a static phase error at the PFD input. This, in turn, means an increased number of pulses in one direction (e.g. UP pulses). Consequently, the counter value is no longer an accurate representation of the natural frequency.

The auxiliary PFD in fig. 1 is required to generate stable UP/DOWN pulses for the digital counter. A possible circuit implementation that works together with a typical PFD is shown in fig. 3. The two set-reset flip-flops (SR-FF) are used to establish which pulses are UP and DOWN pulses and the negative transition of the REF clock resets the flip-flop. Consequently, the counter value is no longer an accurate representation of the natural frequency.

The opposite happens if the DOWN pulse occurs before the UP pulse. If the PFD produces aligned UP and DOWN pulses (this is the case if the input phase error is smaller than the dead-zone of the PFD) then the UP_stable signal is smaller than the dead-zone of the PFD and the DOWN_stable signal is high at the same time.

3. MATHEMATICAL DERIVATION

The mathematical formulation will be based on the PLL topology of fig. 1; however, the applicability of the method extends to other topologies, as it will be demonstrated in the next section. We start by deriving the PLL loop transfer function with the aid of the linear model of fig. 4:

\[ H_{\text{loop}}(s) = \frac{I_{\text{pp}}(R \cdot C_p s + 1)K_{\text{vco}}}{2\pi C_p s^2 N} \]  

(1)

The transfer function from phase input to phase error is given by:

\[ \Phi_{\text{err}}(s) = \frac{1}{\Phi_{\text{in}}(s) + H_{\text{loop}}(s)} = \frac{s^2}{s^2 + 2\zeta \omega_n s + \omega_n^2} \]  

(2)

where \( \omega_n = \sqrt{\frac{C_p K_{\text{vco}}}{2\pi C_p N_d}} \) and \( \zeta = \frac{R}{2} \sqrt{\frac{C_p K_{\text{vco}}}{2\pi C_p N_d}} \) and \( \omega_n \) and \( \zeta \) represent, respectively, the natural frequency and the damping factor of the PLL. A unit input frequency step corresponds to an input phase ramp, with Laplace transform given by \( \Phi_{\text{in}}(s) = \frac{1}{s} \). The Laplace transform of the phase error is then given by:

\[ \Phi_{\text{err}}(s) = \frac{s^2}{s^2 + 2\zeta \omega_n s + \omega_n^2} \cdot \frac{1}{s} \]  

(3)

The behavior in the time domain of equation 3 is the impulse response of a second order system:

\[ \phi_{\text{err}}(t) = \frac{1}{\omega_n \sqrt{1 - \zeta^2}} e^{-\zeta \omega_n t} \sin(\omega_0 t) \]  

(4)

where \( \omega_0 = \omega_n \sqrt{1 - \zeta^2} \). As long as the phase error function \( \phi_{\text{err}}(t) \) is positive, the PFD generates UP pulses. If the error becomes negative, DOWN pulses are generated. Assuming a positive frequency step, an initial sequence of UP pulses is produced by the PFD and the counter value increases monotonically. When the phase error crosses the zero-error phase, occurring at \( t_{\text{cross}} = \frac{\pi}{\omega_0} \) according to equation 4, DOWN pulses start to appear decreasing the counter value. Hence at the crossing time the counter reaches its maximum value, \( V_{\text{max}} \); the crossing point can also be expressed as \( t_{\text{cross}} = V_{\text{max}} \cdot T_{\text{ref}} \), where \( T_{\text{ref}} \) is the period of the REF clock (fig. 1). For stability reason [3], the PLL dynamics is always much slower than the REF signal: the error introduced by quantizing \( t_{\text{cross}} \) is then insignificant. \( \omega_0 \) can then be approximated as:

\[ \omega_0 = \frac{\pi}{V_{\text{max}} \cdot T_{\text{ref}}} \]  

(5)

Fig. 2. Phase error with corresponding UP/DOWN pulses.

Fig. 3. Auxiliary circuit and digital counter.

Fig. 4. Classical PLL linear model.
4. EXTENSION TO Σ∆ PLL TOPOLOGIES

The applicability of the measuring technique will be now demonstrated for Σ∆ fractional-N PLLs. These PLLs use high-order multi-bit Σ∆ modulators to dither the divider modulus; direct modulation is achieved by feeding the data into the modulator. Thus, to measure \( \omega_n \), the frequency step is, in this case, applied to the ΣΔ modulator input.

The linear model of a ΣΔ fractional-N PLL is shown in fig. 5. A complete derivation of the linear model can be found in [6]. The Loop Filter is typically a high-order structure to attenuate the high-frequency quantization noise. In the example analyzed in this section, the Loop Filter transfer function presents 4 poles (including the Charge-Pump integration) and 1 zero. The mathematics involved in this case is lengthier, but the final transfer function can be reduced to an equivalent 2\( \text{nd} \) order equation.

We start by finding the transfer function from the ΣΔ modulator input to phase error \( \Phi_{err}(s) \). With the aid of fig. 5, considering that the effect of the ΣΔ modulator Signal Transfer Function (STF) is just adding a delay to the input data, the transfer function is given by:

\[
\Phi_{err}(s) = \frac{2\pi}{N + \mu_b} \cdot \frac{e^{-sT_{ref}}}{1 - e^{-sT_{ref}}} \cdot \frac{1}{1 + H_{loop}(s)}
\]

where \( N + \mu_b \) is the instantaneous divider ratio.

![Diagram](image)

**Fig. 5. ΣΔ fractional-N PLL linear model.**

Indicating with \( \omega_3, \omega_4, \omega_5 \) and \( z_1 \) the high order poles and, respectively, the zero of the Loop Filter, and by setting:

\[
T_{eq}^2 = \frac{1}{\omega_3^2} + \frac{1}{\omega_4 \cdot \omega_3} + \frac{1}{\omega_4 \cdot \omega_5} + \frac{1}{\omega_5^2} + \frac{1}{\omega_3 \cdot \omega_5} + \frac{1}{\omega_3^2}
\]

\[
- \frac{1}{\omega_4 \cdot z_1} - \frac{1}{\omega_3 \cdot z_1} - \frac{1}{\omega_5 \cdot z_1}
\]

it is possible to define an equivalent natural frequency \( \omega_{n1} \) and an equivalent damping factor \( \zeta_1 \):

\[
\omega_{n1} := \sqrt{\frac{\omega_n^2}{1 + \omega_n^2 T_{eq}^2}}
\]

\[
\zeta_1 = \frac{1}{2} \frac{\omega_{n1}}{z_1} \left( \frac{1}{\omega_3} + \frac{1}{\omega_4} + \frac{1}{\omega_5} - \frac{1}{\omega_n} \right)
\]

After proper manipulations, eq. 6 can be reduced to the following approximated expression:

\[
\Phi_{err}(s) = \frac{G \cdot s^2 + 2\zeta_1 \omega_{n1} s + \omega_{n1}^2}{s^2 + 2\zeta_\Phi \omega_{n1} s + \omega_{n1}^2}
\]

with the gain factor given by \( G = \left( \frac{\omega_n}{\omega_\Phi} \right)^2 \frac{2\pi}{N + \mu_b} \frac{1}{1 + H_{loop}(s)} \). The final expression for the phase error, obtained by applying a step function to the ΣΔ modulator, is given by:

\[
\Phi_{err}(s) = \frac{G}{s^2 + 2\zeta_\Phi \omega_{n1} s + \omega_{n1}^2}
\]

and it corresponds directly to equation 3. The smaller is the loop damping factor \( \zeta \), the more accurate is the approximation in equation 10. Thus, the natural frequency can be calculated as described in section 3.

However, the use of ΣΔ fractional-N PLLs introduces a new requirement for the correct applicability of the method. The input step to the modulator needs to be large enough to overcome the random effects of the modulator itself. If the input step is too small, then the UP sequence is no longer monotonic and the extracted value of \( \omega_0 \) is no longer accurate. On the other hand, the equations derived so far are based on the assumption that the PLL is working in its linear region. If a large input step is applied, the PLL may be pushed out of its linear region. In this case, the previous equations are no longer valid, but, as mentioned in [7], the calibration method will still work, since the final counter value can be extracted from simulations. Alternatively, the counter can be reset whenever a pulse skip is detected: in this way the final value in the counter represents the number of UP (or DOWN) pulses occurred during the linear part of the step response. In other words, the counter starts to operate properly when the PLL leaves the frequency acquisition mode and enters the phase acquisition mode.

5. SIMULATION RESULTS

The main parameters of the simulated PLL are resumed in table 1. The ΣΔ modulator is a MASH 4\(^{th}\) order and the parameters of the Loop Filter are presented in table 2. Based on the simulation results, the ΣΔ fractional-N PLL topology can be simulated with a linear simulator such as Simulink; however the system behavior has been also investigated through a Verilog implementation. The use of Verilog provides the possibility to simulate a model more close to a real PLL implementation, capable of capturing the non-linear behavior of the system [6].

As already discussed in the introduction, the VCO gain \( K_{VCO} \) is the parameter with the poorest accuracy; the ΣΔ PLL was simulated with the nominal \( K_{VCO} \) value and with a gain variation of ±30% with respect to the nominal value. In fig. 6 the counter behavior for the 3 different \( K_{VCO} \) values is presented. It is apparent that the three curves reach different peaks according to the value of \( K_{VCO} \); as the time proceeds the effects of the ΣΔ modulator start to appear.

By substituting the values of the parameters in the equations presented in section 3, the theoretical maximum counter values for the three different VCO gains, are, respectively, 150, 123, and 106. The values extracted from the Verilog simulation of fig. 6 are 148, 122, and 105; these values closely match the predicted ones. The counter behavior for different input frequency steps is presented in fig. 7. As the step is increased, overcoming the ΣΔ modulator noise, the measured values match very well the predicted ones. In the same figure the results for both simulators, Verilog and Simulink, are presented. Notice that the Simulink curves are
very close to the curves obtained with Verilog, which confirms that
the linear simulator describes accurately the transient behavior of
the ΣΔ PLL even for fairly large input frequency steps.

As previously discussed, the presence of a leakage current will
result in an average phase error different from zero. If the leakage
current is too large, even in lock condition of the PLL, the PFD will
only produce UP pulses (for a negative leakage current) or DOWN
pulses (for a positive leakage current). The simulations show that
the calibration method is very robust to leakage currents: ±1%
leakage current will produce less than ±6% deviation from the
nominal ω₀. By observing the difference between the maximum
number of UP (DOWN) pulses and the maximum number of fol-
lowing DOWN (UP) pulses, it is actually possible to obtain the
polarity and a magnitude estimation of the static phase offset. In
fact, with zero static phase offset the length of the two sequence
would be the same; if a positive phase offset is present, the phase
error curve for a positive frequency step is shifted up with respect
to the zero offset curve: in this case the monotonic sequence of
UP pulses will be longer than the following sequence of DOWN
pulses.

6. CONCLUSIONS

A new method to calibrate the PLL transfer-function has been pre-
sented. The implementation does not require any additional ana-
logue component. The only extra circuitry necessary is an auxil-
ary PFD and a digital counter. This new approach does not offer
continuous calibration and it requires a calibration cycle, but it is
very simple and virtually no extra silicon area and no extra power
consumption is required. Moreover, this technique works for both
linear and non-linear PLL frequency step responses; also, it can be
used to estimate the static phase offset. The mathematical formu-
lation of the method has been verified with simulations based on a
ΣΔ fractional-N PLL topology, run both on Verilog and Simulink.
Results from both simulations closely match the theoretical values.

Acknowledgment

The authors would like to thank the QCT department of Qual-
comm CDMA Technologies for the valuable help and support in
this work.

7. REFERENCES

sigma modulation in fractional-N frequency synthesis,” Jour-
CMOS Fractional-N Synthesizer using Digital Compensation
for 2.5 Mbit/s GFSK Modulation", Journal of Solid-State Cir-
[4] D. R. McMillah, and C. G. Sodini, "A 2.5-Mb/s GFSK 5.0-
Mb/s 4-FSK automatically calibrated Σ−Δ frequency syn-
1, pp. 18-26, Jan. 2002.
a Phase-Locked Loop", US Patent 6,049,255, Apr. 11, 2000
N ΣΔ PLL for GSM Applications: Linear Model and Sim-
ulations", Proc. IEEE International Symposium on Circuits
and Systems, Vol. 1, pp. 1065-1068, Bangkok, Thailand, May
2003.
Method", Proc. 21st IEEE NORCHIP Conference, pp. 252-
A Novel Calibration Method for Phase-Locked Loops

MARCO CASSIA1, PETER SHAH2 AND ERIK BRUUN1
1Technical University of Denmark Ørsted DTU; 2RFmagic
E-mail: mca@oersted.dtu.dk; pshah@rfmagic.com; eb@oersted.dtu.dk

Received February 11, 2004; Revised May 11, 2004; Accepted June 3, 2004

Abstract. A novel method to calibrate the frequency response of a Phase-Locked Loop is presented. The method requires just an additional digital counter to measure the natural frequency of the PLL; moreover it is capable of estimating the static phase offset. The measured value can be used to tune the PLL response to the desired value. The method is demonstrated mathematically on a typical PLL topology and it is extended to \(\Sigma \Delta\) fractional-\(N\) PLLs. A set of simulations performed with two different simulators is used to verify the applicability of the method.

Key Words: phase-locked loops, bandwidth tuning, \(\Sigma \Delta\) PLLs, calibration, static phase offset

1. Introduction

Phase-Locked Loop (PLL) frequency synthesizers are building blocks of all communication systems. An accurate PLL response is required in many situations, especially when \(\Sigma \Delta\) PLLs for direct modulation [1] are used. In these types of PLLs, the data fed into the \(\Sigma \Delta\) modulator is often undergoing a pre-filtering process in order to cancel the low-pass PLL transfer function and thereby to extend the modulation bandwidth [2]. The pre-distortion filter presents a transfer function equal to the inverse of the PLL transfer function and it is usually implemented digitally. Consequently, a tight matching between the pre-distortion filter and the analogue PLL transfer function is necessary to avoid distortion of the transmitted data.

Especially for on-chip Voltage Controlled Oscillators (VCO), the gain \(K_{\text{VCO}}\) is typically the parameter with the poorest accuracy among the PLL analog components. Provided that the value of the filter resistor can be determined with sufficient accuracy, in order to establish an accurate PLL transfer function only the product \(K_{\text{VCO}} \times I_{\text{CP}}/C\) needs to be accurate [3]. The PLL can then be calibrated by adjusting the Charge-Pump current; the problem is how to measure the accuracy of the PLL transfer function.

A continuous calibration technique is presented in [4]. The transmitted data is digitally compared with the input data and the Charge-Pump current is then adjusted to compensate the detected error. This method offers the possibility of continuous calibration at the expense of increased circuit complexity; since the error detection is based on the cross-correlation between input and transmitted data, this approach will not work on unmodulated synthesizers. An alternative approach is found in [5], where a method based on the detection of pulse skipping is described. The presence of one or several pulse skips can be used as an indication of the bandwidth. This method requires an input frequency step large enough to push the PLL into its non-linear operating region and only offers a rough estimation of the actual PLL bandwidth.

In this paper we present a simple and novel approach that makes it possible to determine the characteristics of the PLL transfer function by simply adding a digital counter; moreover this approach can be used to obtain an estimate of the static phase error.

The paper is organized as follows: in Section 2 we present the basic idea behind the method and in Section 3 we discuss its mathematical formulation. In Section 4 we show how the method can be used to obtain information about the PLL static phase offset. The extension of the method to \(\Sigma \Delta\) PLLs is presented in Section 5. Finally, in Section 6, the results from different simulations are compared with the theory developed.
2. Measurement Scheme

A two step calibration cycle is required by this method. In the first step the natural frequency $\omega_n$ of the transfer function is retrieved; the second step is used to determine the damping factor $\zeta$. Consider the typical PLL topology in Fig. 1: to start the calibration, the switches to $R_{cal1}$ and $R_{cal2}$ are closed and the calibration resistors are connected to the resistor $R$. By reducing the total filter resistance, the loop transfer function presents under-damped characteristics. By changing the ratio $M$ of the $f_{ref}$ divider or of the ratio $N$ of the $f_{out}$ divider, a frequency step can be applied to the PLL. The natural frequency of the induced transient response can be indirectly measured by counting the UP/DOWN pulses produced by the Phase-Frequency Detector (PFD). If the counter counts 1 up for each UP pulse and counts 1 down for each DOWN pulse, then the maximum counter value is a measure of the natural frequency of the PLL transfer function. This can be seen in Fig. 2, where the expected behavior of the phase error together with the counter value trajectory are presented. The Charge-Pump current can be adjusted so that the PLL natural frequency is the desired one. Once the natural frequency is determined, the calibration step is
repeated after changing the damping characteristics of the transfer function (e.g. by opening the switch to $R_{cal2}$). By comparing the values of the oscillation frequency in the two steps, it is possible to estimate the variation of the damping factor $\zeta$; this information can be used to adjust the filter resistor $R$ to obtain the desired damping factor.

The presence of a leakage current in the Charge-Pump or of a trickle current will induce a static phase error at the PFD input. This, in turn, means an increased number of pulses in one direction (e.g. UP pulses). However, as explained later, the value of $\omega_n$ and $\zeta$ can still be measured. Depending on the type of PFD used in the PLL, an auxiliary PFD might be required to stabilize the UP/DOWN pulses [7].

3. Mathematical Derivation

The mathematical formulation will be based on the PLL topology of Fig. 1; however, the applicability of the method extends to other topologies, as it will be demonstrated in the next section. We start by deriving the PLL loop transfer function with the aid of the linear model of Fig. 3:

$$H_{loop}(s) = \frac{I_{cp}(R \cdot C_p s + 1)K_{VCO}}{2\pi C_p s^2 N} \tag{1}$$

The transfer function from phase input to phase error is given by:

$$\frac{\Phi_{err}(s)}{\Phi_{in}(s)} = \frac{1}{1 + H_{loop}(s)} = \frac{s^2}{s^2 + 2\zeta\omega_n s + \omega_n^2} \tag{2}$$

where

$$\omega_n = \frac{\sqrt{I_{cp}K_{VCO}}}{2\pi C_p N_d}$$

and

$$\zeta = \frac{R}{2\sqrt{I_{cp}C_pK_{VCO}}}$$

and $\omega_n$ and $\zeta$ represent, respectively, the natural frequency and the damping factor of the PLL. A unit input frequency step corresponds to an input phase ramp, with Laplace transform given by $\Phi_{in}(s) = \frac{1}{s}$. The Laplace transform of the phase error is then given by:

$$\Phi_{err}(s) = \frac{s^2}{s^2 + 2\zeta\omega_n s + \omega_n^2} \cdot \frac{1}{s} \tag{3}$$

The behavior in the time domain of Eq. (3) is the impulse response of a second order system:

$$\phi_{err}(t) = \frac{1}{\omega_0\sqrt{1 - \zeta^2}} e^{-\zeta\omega_0 t} \sin(\omega_0 t) \tag{4}$$

where $\omega_0 = \omega_n\sqrt{1 - \zeta^2}$. Note that the natural frequency $\omega_n$ is independent of the filter resistor $R$, but the actual oscillation frequency $\omega_0$ depends on $R$ through the damping factor $\zeta$. As long as the phase error function $\phi_{err}(t)$ is positive, the PFD generates UP pulses. If the error becomes negative, DOWN pulses are generated. Assuming a positive frequency step, an initial sequence of UP pulses is produced by the PFD and the counter value increases monotonically. When the phase error crosses the zero-error phase,
occurring at $t_{\text{cross}} = \frac{\pi}{\omega_0}$ according to Eq. (4), DOWN pulses start to appear decreasing the counter value. Hence at the crossing time the counter reaches its maximum value, $V_{\text{max}}$; the crossing point can also be expressed as $t_{\text{cross}} = V_{\text{max}} \cdot T_{\text{ref}}$, where $T_{\text{ref}}$ is the period of the REF clock (Fig. 1). For stability reason [3], the PLL dynamics is always much slower than the REF signal: the error introduced by quantizing $t_{\text{cross}}$ is then insignificant. $\omega_0$ can then be approximated as:

$$\omega_0 = \frac{\pi}{V_{\text{max}} \cdot T_{\text{ref}}} \tag{5}$$

By making $\zeta$ small, $\omega_0$ is roughly equal to $\omega_n$. The values of $\omega_0$ retrieved with the two steps can be used to calculate the $\zeta$ variation; in this way it is possible to adjust the resistor $R$ to obtain the desired damping factor. So far all the equations have been derived under the assumption that the PLL is operating in its linear region. In case of a large frequency step (this is usually the case if the crystal oscillator divider is changed), the PLL might lose its frequency lock. In this case, the previous equations are no longer valid; however, it is equally possible to use the calibration method by extracting the final counter value from simulations. Another possibility is resetting the counter whenever a pulse skip is detected: this condition occurs when two edges of the same input signals (Reference Clock or Divider Feedback signal) appear at the PFD input without an edge of the other signal occurring in the middle. This indicates that the frequency of the two signals is different; the PLL is operating in frequency acquisition mode. Once the frequency lock is achieved, the PLL enters the phase acquisition mode: the counter is not reset anymore and the behavior of the PLL can be modeled with the described linear equations.

4. Estimation of the Static Phase Offset

Every real PLL implementation is affected by a static phase offset; its presence is due to different causes, such as leakage currents or mismatches in the Charge-Pump UP/DOWN currents. If the phase offset can be measured, a small offset current can be added to null the static phase offset, therefore improving the PLL spurs performance.

As previously mentioned, a static phase offset will alter the number of UP or DOWN pulses produced during the transient response. This can be visualized with the aid of Fig. 4, showing the phase error curves for a positive and a zero static phase offset (for a positive frequency step) together with the relative counter curves. It can be seen that the effect of the phase offset is a positive translation of the zero phase error curve; as a consequence, the oscillation period measured with the counter will differ from the zero-offset case. In the case shown in Fig. 4 the oscillation period will be overestimated, since the PFD will produce

![Fig. 4. Phase error curves.](image-url)
UP pulses for a longer time interval, till the intersection of the offset phase error curve with the zero phase error line.

Consider now the sequence of DOWN pulses following the UP pulses: in this case the length of the sequence is shorter than the value expected under zero phase offset condition. Indicating with $V_{\text{max}}$ the maximum number of UP pulses and with $V_{\text{min}}$ the minimum number of DOWN pulses, by comparing $V_{\text{min}}$ with $V_{\text{max}}$, not only it is possible to determine the real oscillation period, but it is also possible to extract information about the phase offset.

In fact, under zero phase offset condition, the magnitude of $V_{\text{max}}$ is equal to the magnitude of $V_{\text{min}}$; this means that if a phase offset is present, the correct number of pulses is the average value between the magnitude of $V_{\text{max}}$ and $V_{\text{min}}$. The real oscillation period is then given by:

$$\omega_0 = \frac{2\pi}{(|V_{\text{max}}| + |V_{\text{min}}|) \cdot T_{\text{ref}}} \quad (6)$$

Furthermore, the sign of the difference (in magnitude) between $V_{\text{max}}$ and $V_{\text{min}}$ is equal to the polarity of the phase offset. Indicating with $t_{\text{meas}}$ the period of the offset phase error curve, the phase error offset can be obtained by evaluating Eq. (4) for $t = t_{\text{meas}}$. This can be visualized in Fig. 4: since the offset curve $\phi_{\text{offset}}(t)$ is equal to $\phi_{\text{err}}(t) + \phi_{\text{offset}}$ and is equal to zero for $t = t_{\text{meas}}$, the following relation holds:

$$ph_{\text{offset}} = -\phi_{\text{err}}(t_{\text{meas}}) \quad (7)$$

The accuracy of the above equation depends on many factors; first of all, $t_{\text{meas}}$ is quantized with a time step equal to the inverse of the PFD comparison frequency $f_{\text{ref}}$. The higher the frequency is (compared to the PLL bandwidth) the better is the resolution. Also, unlike previously, the size of the frequency step directly influences the estimation accuracy.

This is because a large step will produce a large phase excursion at the PFD input and the static phase error then only constitutes a small proportion. Therefore it is preferable to use a small frequency step for this measurement. A rough estimation of the minimum detectable phase-offset can be estimated by evaluating Eq. (4) in the case that $f_{\text{ref}} \gg f_{\text{bandwidth}}$:

$$\phi_{\text{err}}(t) \approx \frac{A}{\omega_\theta \sqrt{1 - \zeta^2}} \sin \left( \frac{\omega_\theta}{f_{\text{ref}}} \right) \approx \frac{A}{f_{\text{ref}}} \quad (8)$$

where $A$ is the step amplitude.

5. Extension to $\Sigma\Delta$ PLL Topologies

The applicability of the measuring technique will be now demonstrated for $\Sigma\Delta$ fractional-$N$ PLLs. These PLLs use high-order multi-bit $\Sigma\Delta$ modulators to dither the divider modulus; direct modulation is achieved by feeding the data into the modulator. Thus, to measure $\omega_0$, the frequency step is, in this case, applied to the $\Sigma\Delta$ modulator input.

The linear model of a $\Sigma\Delta$ fractional-$N$ PLL is shown in Fig. 5. A complete derivation of the linear model can be found in [6]. The Loop Filter is typically a high-order structure to attenuate the high-frequency quantization noise. In the example analyzed in this section, the Loop Filter transfer function presents four poles (including the Charge-Pump integration) and one zero. The mathematics involved in this case is lengthier, but the final transfer function can be reduced to an approximate 2nd order equation.

We start by finding the transfer function from the $\Sigma\Delta$ modulator input to phase error $\Phi_{\text{err}}(s)$. With the aid of Fig. 5, considering that the effect of the $\Sigma\Delta$ modulator Signal Transfer Function (STF) is just adding a delay to the input data, the transfer function is given by:

$$\frac{\Phi_{\text{err}}(s)}{\Sigma_{\text{in}}(s)} = \frac{2\pi}{N + \mu b} \cdot \frac{e^{-sT_{\text{ref}}}}{1 - e^{-sT_{\text{ref}}}} \frac{1}{1 + H_{\text{loop}}(s)} \quad (9)$$

Fig. 5. $\Sigma\Delta$ fractional-$N$ PLL linear model.
where $N + \mu_b$ is the instantaneous divider ratio. Indicating with $\omega_3$, $\omega_4$, $\omega_5$ and $z_1$ the high order poles and, respectively, the zero of the Loop Filter, and by setting:

$$T_{eq}^2 = \frac{1}{\omega_4^2} + \frac{1}{\omega_4 \cdot \omega_3} + \frac{1}{\omega_4 \cdot \omega_5} + \frac{1}{\omega_3^2} \cdot \omega_5 + \frac{1}{\omega_3 \cdot \omega_5}$$

$$+ \frac{1}{\omega_3^2} - \frac{1}{\omega_4^2} \cdot z_1 - \frac{1}{\omega_5^2} \cdot z_1 + \frac{1}{\omega_5 \cdot z_1}$$

it is possible to define an equivalent natural frequency $\omega_{n1}$ and an equivalent damping factor $\zeta_1$:

$$\omega_{n1} := \sqrt{\omega_n^2 \left[ 1 + (\omega_n T_{eq})^2 \right]}$$

$$\zeta_1 = \frac{1}{2} \omega_{n1} \left( \frac{1}{z_1} - \frac{1}{\omega_3} - \frac{1}{\omega_4} - \frac{1}{\omega_5} \right)$$

After proper manipulations, Eq. (9) can be reduced to the following approximated expression:

$$\frac{\Phi_{err}(s)}{\Sigma_{\Delta in}(s)} = G \cdot \frac{s}{s^2 + 2 \zeta_1 \omega_{n1} s + \omega_{n1}^2}$$

(12)

with the gain factor given by $G = \left( \frac{\omega_{n1}^2}{\omega_n^2} \right)^2 \frac{2\pi}{N \mu_b L_2}$. The final expression for the phase error, obtained by applying a step function to the $\Sigma\Delta$ modulator, is given by:

$$\Phi_{err}(s) = \frac{G s}{s^2 + 2 \zeta_1 \omega_{n1} s + \omega_{n1}^2}$$

(13)

and it corresponds directly to Eq. (3). The smaller is the loop damping factor $\zeta$, the more accurate is the approximation in Eq. (13). Thus, the natural frequency can be calculated as described in Section 3.

However the use of $\Sigma\Delta$ fractional-$N$ PLLs introduces a new requirement for the correct applicability of the method. The input step to the modulator needs to be large enough to overcome the random effects of the modulator itself. If the input step is too small, then the UP sequence is no longer monotonic and the extracted value of $\omega_0$ is no longer accurate.

6. Simulation Results

The main parameters of the simulated PLL, based on a GSM study case, are resumed in Table 1. The $\Sigma\Delta$ modulator is a MASH 4th order and the parameters of the Loop Filter are presented in Table 2. Based on the simulation results, the $\Sigma\Delta$ fractional-$N$ PLL topology can be simulated with a liner simulator such as Simulink; however the system behavior has been also investigated through a Verilog implementation. The use of Verilog provides the possibility to simulate a model more close to a real PLL implementation, capable of capturing the non-linear behavior of the system [6].

As already discussed in the introduction, the VCO gain $K_{VCO}$ is the parameter with the poorest accuracy; the $\Sigma\Delta$ PLL was simulated with the nominal $K_{VCO}$ value and with a gain variation of $\pm 30\%$ with respect to the nominal value. The mismatch between the pre-distortion filter and the PLL transfer function due to this variation causes an output error up to 5 degrees rms. In Fig. 6 the counter behavior for the 3 different $K_{VCO}$ values is presented. It is apparent that the three curves reach different peaks according to the value of $K_{VCO}$; as the time proceeds the effects of the $\Sigma\Delta$ modulator start to appear.

By substituting the values of the parameters in the equations presented in Section 3, the theoretical

![Fig. 6. Counter behavior vs. time.](image)
maximum counter values for the three different VCO gains, are, respectively, 150, 123, and 106. The values extracted from the Verilog simulation of Fig. 6 are 148, 122, and 105; these values closely match the predicted ones.

As previously discussed, the presence of a leakage current will result in an average phase error different from zero. If the leakage current is too large, even in lock condition of the PLL, the PFD will only produce UP pulses (for a negative leakage current) or DOWN pulses (for a positive leakage current). The simulations show that the calibration method is very robust to leakage currents: a ±1% leakage current will produce less than ±6% deviation from the nominal ωo.

The estimation of the phase offset with the method described in Section 4 is more difficult for ΣΔ PLL. In fact the resulting phase error curve for a frequency step is not as smooth as the integer case; this means that in the proximity of the zero phase error line there could be more than one crossing before and after the real crossing time. This affects only marginally the bandwidth estimation since the variation in the number of pulses is small relatively to the total number of pulses. On the contrary, the phase offset estimation can be significantly affected.

A possibility to overcome the problem is to take the average of several step measurements. Alternatively, the ΣΔ can be overloaded (or switched off) before the step is applied in order to operate the PLL in integer mode.

7. Conclusions

A new method to calibrate the PLL transfer-function has been presented. The implementation does not require any additional analogue component. The only extra circuitry necessary is a digital counter. This new approach does not offer continuous calibration and it requires a calibration cycle, but it is very simple and virtually no extra silicon area and no extra power consumption is required. Moreover, this technique works for both linear and non-linear PLL frequency step responses; also, it can be used to estimate and calibrate the static phase offset. The mathematical formulation of the method has been verified with simulations based on a ΣΔ fractional-N PLL topology, run both on Verilog and Simulink. Results from both simulations closely match the theoretical values.

Acknowledgments

The authors would like to thank the QCT department of Qualcomm CDMA Technologies for the valuable help and support in this work.

References


Marco Cassia was born in Bergamo, Italy, 1974. He received the M.Sc. degree in engineering from the Technical University of Denmark, Lyngby, Denmark, in May 2000 and the M.Sc. degree in electrical engineering from Politecnico di Milano, Italy, in July 2000.

From July 2001 to July 2002 he was with the QCT department of Qualcomm CDMA Technologies, San Diego, working in the field of direct modulation synthesizers. He is currently working toward the Ph.D. degree at the Technical University of Denmark.

His main research interests are in the areas of low-power low-voltage RF systems.
Peter Shah was born in Copenhagen Denmark in 1966. He completed his MScEE and Ph.D at The Technical University of Denmark in 1990 and 1993 respectively. From 1993 to 1995 he was a post doctoral research assistant at Imperial College in London, England, working on switched-current circuits. In 1996 he joined PCSI in San Diego (subsequently acquired by Conexant) as an RFIC design engineer, working on transceiver chips for the PHS cellular phone system. In 1998 he joined Qualcomm, also in San Diego, where he worked on RFICs for CDMA mobile phones and for GPS. In December 2002 he joined RFMagic where he is currently working on RFICs for consumer electronics. His research interests lie mainly in RFIC architecture and design, including sigma-delta PLLs and A/D and D/A converters, LNAs, mixers, and continuous-time filters.

Erik Bruun received the M.Sc. and Ph.D. degrees in electrical engineering in 1974 and 1980, respectively, from the Technical University of Denmark. In 1980 he received the B.Com. degree from Copenhagen Business School. In 2000 he also received the dr. techn. degree from the Technical University of Denmark.

From January 1974 to September 1974 he was with Christian Rovsing A/S, working on the development of space electronics and test equipment for space electronics. From 1974 to 1980 he was with the Laboratory for Semiconductor Technology at the Technical University of Denmark, working in the fields of MNOS memory devices, F/L devices, bipolar analog circuits, and custom integrated circuits. From 1980 to 1984 he was with Christian Rovsing A/S. From 1984 to 1989 he was the managing director of Danmos Microsystems ApS. Since 1989 he has been a Professor of analog electronics at the Technical University of Denmark where he has served as head of the Sector of Information Technology, Electronics, and Mathematics from 1995 to 2001. Since 2001 he has been head of Ørsted-DTU. His current research interests are in the areas of RF integrated circuit design and integrated circuits for mobile phones.