Comparison between Trigonometric and Traditional DDS, in 90 nm Technology

The Direct Digital frequency Synthesizer (DDS) is an architecture largely used for the generation of numeric sine and/or cosine waveforms in different applications. In this work, authors compare two different DDS architectures: the traditional architecture, based on the exploitation of quarter wave symmetry, and the Symon’s DDS (trigonometric DDS) presented in 2002. The two layout configurations have been implemented in 90 nm technology and compared in terms of area, speed and power consumption. Comparisons have been performed in terms of circuital complexity on architectures having the same Spurious Free Dynamic Range (SFDR) and phase resolution. Experiments show that the trigonometric architecture is very efficient in terms of area.


Introduction
Digital sine and cosine waveforms generation is a very common operation and finds applications in several fields as Audio, Communication Digital Signal Processing (DSP), etc. [1]- [4].The most used hardware architecture to perform this task is the Direct Digital frequency Synthesis (DDS).DDS is playing a role of growing importance in modern digital communications due to fast frequency switching, fine frequency resolution, large bandwidth, good spectral purity and fast evolution of the Digital-to-Analog Converter (DAC) technology.The DDS is typically implemented with a circuit composed of two main blocks as shown in Figure 1: a.A Phase Generator, realized with an N-bit accumulator.b.A Look-Up Table (LUT), which is a ROM (Read Only Memory) storing sine and/or cosine samples, for the phase to amplitude conversion.The outputs of the LUTs provide sine and/or cosine waves characterized by a frequency that derives from the relation in equation 1.
where f clk is the system clock frequency, k the tuning word to select the sine and cosine waves frequency by changing the phase accumulator step, 2 N is the quantity of samples used to represent the sine/cosine functions in the range [0-2π) and N is the number of bits of the accumulator output.Normally, the maximum frequency that can be obtained maintaining a good signal quality is f clk /4, depending on the quality factor of the low pass filter at the output.DDSs are characterized by two main parameters: Frequency resolution: Resolution is the minimum distance between two adjacent synthesizable waves within the Nyquist bandwidth and it determines the number of distinct frequencies that the circuit can generate.It depends on the clock frequency and on the word length used in the phase accumulation loop: Spurious Free Dynamic Range SFDR: The SFDR provides information about the spectral purity of the generated signal.In traditional DDS as shown in Figure 1, the resolution depends on the number of bits (N) used for phase generation and the SFDR by the ROM size (both in terms number of locations and word-size).Considering the exponential relationship between N and the ROM size, the increasing of N implies an exponential increase of the complexity of the circuit in terms of area.In order to reduce the ROM size, several solutions were proposed in the literature.Among these, the most common strategies are: a. Truncation of the phase accumulator output.b.Exploitation of quarter wave symmetry.
The first solution has the advantage of reducing the ROM size without any limitation on the frequency resolution.However, phase truncation introduces a periodic amplitude error that gets worse the SFDR.The second strategy is capable to reduce the ROM size without negatively impacting on the SFDR.Nevertheless, this technique allows a ROM reduction only by a factor 4 and consequently is not efficient for big values of N.
A common approach used by designers consists in applying these two techniques at the same time.In this way, it is possible to maintain a high phase resolution factor still reducing the area required for the LUT implementation.This solution does not allow to achieve simultaneously high resolution and high SFDR.Despite this limitation, most commercial DDSs are essentially based on the combination of the two techniques previously described [5]- [11].
Other solutions have been proposed.However, such strategies require additional circuital elements besides the phase accumulator and amplitude converter, and consequently a larger dedicated area in the final layout [12]- [14].In 2002 P.R. Symons proposed a ROM mapping technique allowing a strong memory reduction in DDS architecture [15].Symons scientific work provided a formal description of his technique, called Trigonometric DDS, from a theoretical point of view.However, he did not proceed to implement his proposed architecture on hardware.
In 2011 the authors presented an FPGA implementation of the Symons DDS [16].This implementation showed that the trigonometric DDS fits perfectly in modern FPGA that includes DSPs or embedded multipliers.Results show that the trigonometric DDS allows to achieve very high SFDR with a reduced area occupation.These results confirm the advantage of the trigonometric DDS on FPGA, but they don't provide any information about performance on ASIC implementation.
In this paper, the traditional DDS with the exploitation of quarter wave symmetry and the trigonometric DDS are compared in terms of area, speed and power consumption.These comparisons have been performed in 90 nm technology.Results show that the trigonometric DDS, differently to the traditional one, offers the possibility to have high resolutions without impacting negatively on the area and SFDR.The paper is organized as follow: in Sect.II the

Material and Methods
In traditional DDS, the LUTs used for the phase to amplitude conversion are mapped with 2 N sine and 2 N cosine samples in the range [0-2π), where N is the number of bit of the accumulator.As discussed in the introduction, the main limitation of this architecture is the exponential relationship between N and the ROM size.For this reason, the exploitation of the quarter wave symmetry and the truncation of the phase techniques are commonly used.
The utilization of the quarter wave symmetry allows generating a complete sine and cosine waves having stored only a quarter of the period samples in the LUTs.This approach enables to reduce the number of memory locations from 2 N to 2 N-2 .For big N values this solution is not efficient and for this reason, in order to further reduce the memory size, it is also used the truncation phase technique.The phase is generated with an N accumulator, in order not to alter the frequency resolution but only N' bit are used to address the ROM as shown in Figure 2.However, this solution has the main disadvantage to impact negatively on the SFDR as shown in Figure 3.
The N-bit phase word coming from the accumulator is divided in two components: an integer part I and a fractional part F with N=I+F.I and F are respectively the MSBs and the LSBs of the phase accumulator.In this way we create two phase elements, the first one consisting of the coarse values of the phase and the second one containing the fine phase values.The sine and cosine samples can be calculated performing four multiplications, one addition and one subtraction.
In a traditional DDS exploiting the quarter wave symmetry, two 2 N-2 entry LUTs are required for the sine and cosine functions for a total of 2x2 N-2 =2 N-1 N-bit entries.In contrast, by using the approach of [16] with I=F=N/2, we need four 2 N/2 entry LUTs (one course and one fine values LUTs for sine and cosine) for a total of 4x2 N/2 =2 N/2+2 N-bit entries.For example for N=10, I=F=5, for the traditional method we have 10x2 9 =5120 bits LUTs, while for the proposed method the LUTs size is 10x2

ROM Mapping
As discussed above, the trigonometric DDS uses 4 different LUTs to store sine and cosine samples.Figure 5 shows coarse LUTs mapping.The 2 coarse LUTs (one for the sine and one for the cosine) are mapped according the following relations as shown in equation 4:

Hardware Implementation
As previously discussed, the trigonometric DDS allows high frequency resolution without the necessity of the phase truncation and consequently without negative impact on the SFDR.The goal of our experiment is the comparison in terms of area, speed and power consumption among the traditional DDS exploiting the quarter wave symmetry and the trigonometric DDS.Comparisons are performed considering architectures that grant the same SFDR level for different frequency resolutions (number of bits N).We consider conventional DDS without phase truncation.In this case, the SFDR only depends on the word-length of the amplitude stored in the memory (in the experiments, we selected a proper word-length in order to guarantee a given range of SFDR).
Experiments were performed on the traditional DDS exploiting the quarter wave symmetry (without phase truncation) and the trigonometric one.Both methods have been implemented in hardware and compared in terms of area, speed, and power consumption.After a fixed-point optimization performed on MATLAB and Simulink that guarantees an SFDR between 70 and 80 dB, the two systems have been coded in VHDL at RTL level.The Fixed point analysis of the two systems shows that in order to guarantee the same SFDR for the two architectures the trigonometric DDS requires an additional output bit.This due to the truncation of the multiplier outputs, setting the output word length the same as the LUTs width.For each of the two DDS, several implementations with different values of N have been realized.The synthesis was performed by Synopsys Design Compiler in the STM 90 nm library of standard cells.

Result and Discussion
The synthesis of LUTs in standard cells is done by multi-level logic mapping.This solution is suitable for relatively small tables, below 2^12-2^16, depending on the technology.Larger tables are normally implemented in ROM or PLA arrays.Since for the trigonometric DDS method, the LUT size is compatible to the multi-level logic range for standard cell synthesis, we  Table 1 and Figure 8 shows that the traditional DDS is the best choice for small values of N. The reason is the extra area required for the 4 multipliers, the adder and the subtractor which is not negligible compared to the area required for the 4 LUTs.Figure 8 shows that the graph relative to the area of the trigonometric DDS presents a knee in proximity of N=16.The reason of this knee is that with the increase of N the fine LUTs relative to the sine and the cosine can be reduced.This is because as N increases, the phase values are approximately 1 for the cosine and 0 for the sine.This aspect implies that in each of the two LUTs are stored values similar each other (many zeros in the sine LUT and many ones in the Cosine LUT) and consequently the synthetiser can simplify the architecture allowing the reduction of the area.This simplification is not possible for small values of N. Table 2 shows the total power (dynamic plus static) consumption in the two architectures.Table 3 shows the maximum clock frequencies reachable by the two architectures.For all the values of N considered in our experiments the traditional DDS can reach about 1 GHz, while the trigonometric about 500 MHz.This is due to the presence of the multipliers, the adder and the subtractor which introduce additional levels of logic with respect to the traditional DDS.This limitation can be overcome introducing pipeline registers between the output of the LUTs and the input of the multipliers.This solution introduces one clock cycle of latency.However, such latency does not represent a problem in most DDS applications.

Conclusions
In this paper, the traditional DDS with the exploitation of quarter wave symmetry and the trigonometric DDS are implemented and compared in terms of area, speed and also power consumption, being the power consumption a crucial aspect of circuit design in the last few years [17].These comparisons have been performed in 90 nm standard cell technology.Both DDS architectures have been simulated in MATLAB/Simulink and implemented in VHDL at RTL level.
The synthesis was performed by Synopsys Design Compiler in the STM 90 nm library of standard cells.Results show that as the phase accumulator bits and relevant phase precision N increase, the trigonometric DDS and allows a considerable area reduction.However, this advantage is at the expense of the maximum clock frequency that in the trigonometric is reduced by a factor 2. As previously discussed this limitation can be avoided introducing a stage of pipeline between the output of the LUTs and the input for the multipliers.Results show that the trigonometric DDS represent a good choice for all those applications where low power consumption is required as for example the Internet of Things [18], [19].
In order to further improve the trigonometric DDS some improvements should be introduced.The first one consists in applying the exploitation of the quarter wave symmetry on the coarse LUTs.This is possible because these LUTs contain values of sine and cosine in the period [0-2π).Another improvement consists in rounding the output of the multipliers instead of truncation.

Figure 1 .
Figure 1.Traditional DDS architecture composed by a phase generator and a phase to amplitude converter.In case of sine and cosine generator the number of LUTs is equal to two

Figure 2 .Figure 3 .
Figure 2. Truncation of the phase 7 =1280 bits.The sine and cosine values at the output of the trigonometric DDS are estimated according to equation 2. The values of the sine and the cosine of a and b are fetched from the 4 LUTs (two for the coarse values and two for the fine values) and the sine and cosine values of (a+b) are computed by the 4 multipliers, the adder and the subtractor.In Figure 4 the architecture of the Trigonometric DDS is shown.Moreover, the trigonometric DDS is composed by a phase generator and a phase to amplitude converter.The phase generator is realized by a N bit accumulator as on traditional DDS.The phase to amplitude converter is composed by 4 LUTs, 4 Multipliers one adder and a subtractor.The large memory reduction associated with this mapping technique allows to avoid the phase truncation for big values of N in comparison to a traditional DDS.

Figure 4 .
Figure 4. Trigonometric DDS composed by an accumulator, 4 LUT, four multipliers one adder and one subtractor

Figure 7 .
Figure 7. Theory of operation: the sine and the cosine values of the angle (a+b) is computed using the values stored in the Coarse and Fine LUTs

Figure 8 .
Figure 8. Area occupation of the two architectures, in Blue the trigonometric DDS, in Orange the Traditional DDS

Table 1 .
Comparison between Trigonometric and Traditional DDS, in 90 nm Technology (Cardarilli G.C.) 2251 implemented also the LUTs for the traditional DDS method in multi-level logic to compare the two methods.The area of the implemented circuits is reported in Table1in terms of NAND-2 equivalent gates.Figure8shows the Area in function of N for the two architectures.The area for TRAD N=24 is extrapolated from the previous values because such a size (2^24) is poorly synthesized in multi-level logic.Area (equivalent nand)