# 國立交通大學 電機與控制工程學系 ## 碩士論文 一種用於上行 LTE 之單載波分頻多工系統的可變長度快速傅 立葉轉換處理器 A Variable FFT /IFFT Processor for SC-FDMA Systems over Uplink LTE Applications 1896 研究生:王星雅 指導教授:蔡尚澕 教授 中華民國一百零一年五月 # 一種用於上行 LTE 之單載波分頻多工系統的可變長度快速傅立葉轉換處理器 A Variable FFT /IFFT Processor for SC-FDMA Systems over Uplink LTE Applications 研究生:王星雅 Student: Hsing-Ya Wang 指導教授:蔡尚澕 Advisor:Shang-Ho Tsai 國立交通大學電機與控制工程學系碩士論文 Submitted to Institute of Electrical Control Engineering College of Electrical Engineering and Computer Science National Chiao Tung University in partial Fulfillment of the Requirements for the Degree of Master in Electrical and Control Engineering May 2012 Hsinchu, Taiwan, Republic of China 中華民國 一百零一 年 五 月 一種用於上行 LTE 之單載波分頻多工系統的可變長度快速傅立葉轉換處理器 學生:王星雅 指導教授:蔡尚澕 國立交通大學電機與控制工程學系(研究所)碩士班 在這篇論文,我們介紹一個可以應用於 LTE 之單載波分頻多工系統中的可變長度快速傳立葉轉換器。這個 2048/1536/1024/512/256/128-point 可變長度快速傳立葉轉換是以radix-2 、radix-3及radix-2 、快速傅立葉轉換演算法,並且利用混和 radix 演算法去執行。為了在單載波分頻多工系統中利用單一個快速傅立葉轉換處理器同時處理傅立葉和反傅立葉運算,我們仔細地設計時間規畫。因此可以減少面積。為了能更完美地設計時間規劃,我們提出了每一級重新設定的控制和輸出重新排列時的早讀。我們利用 Xilinx FPGA Vertex5-XC5VLX110T-FF1136 去模擬操作在 100M 赫茲的快速傅立葉轉換處理器。 # A Variable FFT /IFFT Processor for SC-FDMA Systems over Uplink LTE Applications Student: Hsing-Ya Wang Advisors: Dr. Shang-Ho Tsai Department of Electrical and Control Engineering National Chiao Tung University **ABSTRACT** In this thesis, we present a variable FFT that can support multiple FFT/IFFT size for LTE SC-FDMA systems. The 2048/1024/512/128-point variable FFT is based on radix-2, radix-3 and radix-2<sup>3</sup> FFT algorithm, and we use mixed-radix algorithm to perform them. To handle both FFT and IFFT operations required in the SC-FDMA systems using only one FFT/IFFT processor, we carefully schedule the timing plan. As a result, the required area can be reduced. For the architecture, we propose a ripple-like ON/OFF stage control and an early access of reordering to implement our proposed timing plan. The proposed variable FFT processor is implemented using Xilinx FPGA Vertex5-XC5VLX110T-FF1136 at 100MHz operation frequency. 最先要感謝的是我的指導教授蔡尚澕教授,在我眼中教授是通訊 設計領域的專業者同時是不吝於給學生任何幫助的老師,除了指導我 的專業研究,在我生活遇到瓶頸時能適時的給我幫助和鼓勵,讓我最 終能完成學業。老師不辭辛苦的帶領我們,也希望老師在忙碌之餘能 多愛惜自己的身體。也感謝我的口試委員:林源倍教授和簡鳳村教授的 指導使我的論文更加完整。 另外,感謝 815 實驗室的同學,課業上的幫助及意見的提供讓我 在修課上的疑惑能夠有很大的幫助。各個同學研究領域的專業知識讓 我的研究能適時得到靈感,也在研究生活中創造許多美好的回憶。 最後,我要感謝的是我偉大的母親,從小她就一直認真地栽培我, 當我遇到困難她會給我滿滿的關心與安慰,是我做所有事情的動力, 感謝她在背後的支持,是她成就了現在的我。 將此篇論文獻給所有關心我幫助我的人,感謝你們。 # A Variable FFT/IFFT Processor for SC-FDMA Systems over Uplink LTE Applications #### Abstract In this thesis, we present a variable FFT that can support multiple FFT/IFFT size for LTE SC-FDMA systems. The 2048/1024/512/128-point variable FFT is based on radix-2, radix-3 and radix-2<sup>3</sup> FFT algorithm, and we use mixed-radix algorithm to perform them. To handle both FFT and IFFT operations required in the SC-FDMA systems using only one FFT/IFFT processor, we carefully schedule the timing plan. As a result, the required area can be reduced. For the architecture, we propose a ripple-like ON/OFF stage control and an early access of reordering to implement our proposed timing plan. The proposed variable FFT processor is implemented using Xilinx FPGA Vertex5-XC5VLX110T-FF1136 at 100MHz operation frequency. # Contents | 1 | Inti | roduct | | 1 | |---|-----------------------|---------|-------------------------------------|----| | | 1.1 | Motiv | ation and goal | 1 | | | 1.2 | Contr | ibutions and Features | 2 | | 2 | Bac | kgrou | ration and goal | 4 | | | 2.1 | FFT/ | IFFT for SC-FDMA vs OFDM | 4 | | | 2.2 | FFT a | algorithm | 5 | | | | 2.2.1 | Radix-2 algorithm | 5 | | | | 2.2.2 | Radix- $2^3$ algorithm | 8 | | | | 2.2.3 | Radix-3 algorithm | 10 | | | | 2.2.4 | FFT/IFFT 1896 | 11 | | | 2.3 | Varial | ble FFT | 12 | | | | 2.3.1 | Pipeline FFT processor architecture | 13 | | | | 2.3.2 | Variable FFT processor architecture | 14 | | | | 2.3.3 | Subcarrier Mapping | 15 | | 3 | $\mathrm{Th}\epsilon$ | e prope | osed scalable LTE for LTE systems | 18 | | | 3.1 | Algori | $^{ m thm}$ | 18 | | | 3.2 | Archit | tecture | 23 | | | | 3.2.1 | FFT Processor | 24 | | | | 3.2.2 | Subcarrier mapper | 27 | | | | 3.2.3 | Ripple-like ON/OFF stage control | 31 | | | | 3.2.4 | Early access of reordering | 32 | | | | 3.2.5 | FFT/IFFT hardware sharing | 33 | | | 3.3 | 3.3 Complexity analysis | | |---|-----|------------------------------------------------------------|----| | | 3.4 | Simulation | 36 | | | | 3.4.1 SQNR (Signal to Quantization Noise Ratio) simulation | 36 | | | | 3.4.2 FFT/IFFT loop function test | 38 | | 4 | FP | GA implementation and verification | 41 | | | 4.1 | RTL verification | 43 | | | | 4.1.1 LFSR (Liner feedback shift register) | 44 | | | 4.2 | Synthesis place and route | 46 | | | 4.3 | Xilinx iMPACT | 57 | | | 4.4 | Verification using ChipScope Pro | 64 | | 5 | Cor | TES 1896 | 68 | # List of Figures | 2.1 | SC-FDMA Transmitter and Receiver | 4 | |------|------------------------------------------------------------------------|----| | 2.2 | Simple DIF 8-point FFT | 6 | | 2.3 | Simple DIF 8-point FFT | 7 | | 2.4 | DIF 8-point FFT with radix-2 | 7 | | 2.5 | DIF butterfly with radix- $2^3$ | 9 | | 2.6 | DIF 8-point FFT with radix- $2^3$ | 9 | | 2.7 | DIF butterfly with radix-3 | 10 | | 2.8 | DIF 9-point FFT with radix-3 | 11 | | 2.9 | Block diagram of FFT/IFFT | 12 | | 2.10 | Architecture of R2MDC | 13 | | 2.11 | Architecture of R2SDF | 14 | | 2.12 | Block diagram of a variable FFT for 64-point, 32-point, 16-point | | | | and 8-point operation | 15 | | 2.13 | Subcarrier allocation methods for multiple (3 users, $12$ subcarriers, | | | | and 4 subcarriers per user) | 16 | | 2.14 | Subcarrier mapping example in the frequency domain for 4 sub- | | | | carriers per user, 3 users, and 12 subcarriers | 17 | | 3.1 | The SFG of stage 1 to stage 5 (radix-2) | 21 | | 3.2 | The SFG of stage 6 to stage 7 (radix- $2^3$ ) | 21 | | 3.3 | The SFG of DIF with radix-3 | 23 | | 3.4 | Block diagram of the variable FFT processor in FPGA | 24 | | 3.5 | Block diagram of the variable FFT processor | 25 | | 3.6 | Block diagram of module 1 module 5 (radix-2) | 26 | | 3.7 | Block diagram of module 6 and module 7 $(radix-2^3) \dots \dots$ | 26 | |------------|----------------------------------------------------------------------|----| | 3.8 | Block diagram of module 8 (radix-3) | 26 | | 3.9 | Block diagram of FFT and IFFT with subcarrier mapper | 27 | | 3.10 | Localized subcarrier mapper with N=4 mapping to M=8 $\dots$ | 28 | | 3.11 | Distributed subcarrier mapper with N=4 mapping to M=8 | 28 | | 3.12 | Distributed subcarrier mapper with N=6 mapping to M=8 | 29 | | 3.13 | 8-point DFT and IDFT | 30 | | 3.14 | 4-point is mapping to 8-point by distributed subcarrier mapping | 30 | | 3.15 | Clock cycle of ripple-like ON/OFF stage control | 31 | | 3.16 | Block diagram of FFT/IFFT for SC-FDMA system | 33 | | 3.17 | (a) FFT operation (b) Switch FFT to IFFT operation (early access | | | | of reordering and ripple-like $ON/OFF$ stage control start) (c) IFFT | | | | operation | 34 | | 3.18 | Timing planning of FFT/IFFT for $N=2048$ | 35 | | 3.19 | System model of SQNR | 37 | | 3.20 | SQNR v.s. SNR | 38 | | 3.21 | Block diagram of FFT and IFFT in SC-FDMA system | 38 | | 3.22 | fixed-point v.s. floating-point | 39 | | 3.23 | Input v.s. fixed-point output4 and floating-point output4 | 39 | | 3.24 | Input v.s. fixed-point output2 and fixed-point output4 | 40 | | 4.1 | A design flow of the Xilinx design kit Vertex5-XC5VLX110T-FF1136 | 42 | | 4.1 | Simulation environment for variable FFT | 43 | | 4.2 | Bit-width in all stages | 44 | | 4.4 | LFSR for 16 bits | 45 | | 4.4 | LFSR and FFT processor | 45 | | 4.5<br>4.6 | Timing plan for LESR | 46 | | | | | # List of Tables | 2.1 | FFT size in several OFDM or SC-FDMA systems | 12 | |-----|---------------------------------------------|----| | 3.1 | Distance for each stage | 31 | | 3.2 | A fraction of N-point cab be read early | 32 | | 3.3 | Comparison of hardware requirement | 35 | ## Chapter 1 ### Introduction # 1.1 Motivation and goal LTE (Long Term Evolution) is a standard for wireless communication of high-speed mobile phones and data terminals, which is an evolution of the GSM/UMTS standards. It is a next generation mobile system from 3GPP (3rd Generation Partnership Project) with a focus on wireless broadband. The goal of LTE is to increase the capacity and speed of wireless data networks using new DSP (Digital Signal Processing) techniques and modulations. Downlink and uplink multiple access of LTE use OFDMA (Orthogonal Frequency-Division Multiple Access) and SC-FDMA (Single-carrier FDMA) respectively. SC-FDMA is particularly to reduce power consumption. LTE, Wi-MAX and UMB are called 4G.LTE combines OFDMA/SC-FDMA/MIMO/HARQ. Hence it has good data rate. Some important features of LTE in physical layer as follow: #### • OFDMA techniques: OFDMA is used in many wireless communication standards, which is also applied to LTE downlink. In OFDM-based systems, FFT is key component and hence it is widely studied in recent years. #### • SC-FDMA techniques: SC-FDMA is similar to OFDMA because they both use FFT, but SC-FDMA has lower PAPR (peak-to- average power ratio). Lower PAPR is a great benefit to transmit signal on mobile terminal, because it can reduce power consumption. #### • Variable bandwidth LTE can support variable bandwidth in physical layer. That is, it can adjust the transmission rate via changing the bandwidth. As a result, we need FFT with various sizes. For example, the bandwidth for LTE systems can be 1.4MHz, 3MHz, 5MHz, 10MHz, 15MHz, 20MHz corresponding to the 128-, 256-, 512-, 1024 -, 1536-, and 2048-point FFT. This dynamic adjustment of bandwidth and possible FFT size enables user roaming in different network. Thus, the variable FFT is a key component for OFDM systems with various bandwidth. SC-FDMA is a new multiple access technique that uses single carrier modulation, DFT-spread orthogonal frequency multiplexing, and frequency equalization.[7] It has a similar structure and performance as OFDM, except the addition of a new DFT block. Because of that, it increases area, cost and power consumption, which is a challenging to solve the problems. #### 1.2 Contributions and Features In this thesis, we propose an FFT/IFFT architecture for SC-FDMA, and implemented it using a Xilinx FPGA design board with Vertex5-XC5VLX110T-FF1136. By using time sharing technique, the proposed FFT/IFFT can handle both the FFT and IFFT in SC-FDMA using one FFT processor. Hence, it reduces the hardware complexity. In this thesis, we will show how to perform time sharing. Moreover, the used FPGA design flow will also be described and demonstrated. Detailed architecture specifications of the proposed FFT/IFFT processor are listed as follows: • High radix to reduce the complexity: Because higher-radix algorithm can reduce number of multiplier and power consumption, we employ radix-2<sup>3</sup> algorithm to implement the last two stages. - Subcarrier mapping: By our design, we can do subcarrier mapping in reordering step. We do not need unnecessary latency to perform that. - Make FFT/IFFT work at the same time: SC-FDMA system has both FFT and IFFT, because FFT has similar structure with IFFT, we use timing plan to do FFT and IFFT at the same time. Thus, we almost can reduce a half of the hardware. - Early access of reordering: By our calculation, we can read the FFT output data earlier by using ripple-like ON/OFF stage control. Thus, timing plan can be designed better. ## Chapter 2 ## Background ### 2.1 FFT/IFFT for SC-FDMA vs OFDM SC-FDMA (Single-carrier Frequency-Division Multiple Access) is a technique for high data uplink communication and has been adopted by LTE. Fig. 2.1 shows a block diagram of a SC-FDMA system. SC-FDMA has similar throughput performance and complexity with OFDM, while its PAPR (peak-to-average power ratio) is much smaller than OFDM systems, which makes it suitable for uplink transmission. SC-FDMA is often viewed as FFT coded OFDM. The use of FFT transforms the time-domain symbols to the frequency-domain before passing through the standard OFDM modulation. Thus, SC-FDMA has all the advantages of OFDM. The main advantage of OFDM, as is for SC-FDMA, is its robustness against multipath signal propagation, which is suitable for broadband systems. Figure 2.1: SC-FDMA Transmitter and Receiver. #### 2.2 FFT algorithm FFT algorithm has several types, including radix-2, radix-4, radix-8, radix-2<sup>2</sup> and radix-2<sup>3</sup>. Radix-2 is the most simple and elastic structure of all FFT algorithms. In many researches, we know higher radix has lower operations which results in good efficiency, but it increases complexities. Radix-2<sup>2</sup> and radix-2<sup>3</sup> have simple structure of radix-2 and lower operations of higher radix. Therefore, we use series connection to combine radix-2 and radix-2<sup>3</sup>. Because LTE standard has 1536-point FFT, we use radix-3 for implementation. FFT has two mathematics modes which are DIT (decimation in time) and DIF (decimation in frequency). We use DIF mode in follows reduction. #### 2.2.1 Radix-2 algorithm A basic N-point discrete Fourier transform (DFT) is defined as $$X(k) = \sum_{n=0}^{N-1} x(n)W_N^{kn}, \quad k = 0, 1, \dots, N-1,$$ (2.1) where x(n) and X(k) are complex number. The twiddle factor is $$W_N^{nk} = e^{-j\frac{2\pi nk}{N}} = \cos\frac{2\pi nk}{N} - j\sin\frac{2\pi nk}{N}$$ (2.2) Divide x(n) into even part and odd part $$X(2r) = \sum_{n=0}^{N-1} x(n)W_N^{2rn}, \quad r = 0, 1, \dots, N/2 - 1$$ $$= \sum_{n=0}^{N/2-1} x(n)W_N^{2rn} + \sum_{n=N/2}^{N-1} x(n)W_N^{2rn}$$ $$= \sum_{n=0}^{N/2-1} x(n)W_N^{2rn} + \sum_{n=0}^{N/2-1} x(n+N/2)W_N^{2r(n+N/2)}$$ $$= \sum_{n=0}^{N/2-1} [x(n) + x(n+N/2)]W_N^{2rn} \quad (W_N^{rn} = 1)$$ $$= \sum_{n=0}^{N/2-1} [x(n) + x(n+N/2)]W_{N/2}^{rn} \quad (W_N^{rn} = W_{N/2}^{rn}) \quad (2.3)$$ $$X(2r+1) = \sum_{n=0}^{N-1} x(n)W_N^{(2r+1)n}, \quad r = 0, 1, \dots, N/2 - 1$$ $$= \sum_{n=0}^{N/2-1} x(n)W_N^{(2r+1)n} + \sum_{n=N/2}^{N-1} x(n)W_N^{(2r+1)n}$$ $$= \sum_{n=0}^{N/2-1} x(n)W_N^{(2r+1)n} + \sum_{n=0}^{N/2-1} x(n+N/2)W_N^{(2r+1)(n+N/2)}$$ $$= \sum_{n=0}^{N/2-1} [x(n) - x(n+N/2)]W_N^{(2r+1)n} \quad (W_N^{rN+N/2} = -1)$$ $$= \sum_{n=0}^{N/2-1} [x(n) - x(n+N/2)]W_N^nW_N^{rn}$$ $$= \sum_{n=0}^{N/2-1} [x(n) - x(n+N/2)]W_N^nW_N^{rn}$$ $$= \sum_{n=0}^{N/2-1} [x(n) - x(n+N/2)]W_N^nW_N^{rn}$$ (2.4) Take N=8 for instance, x[0]-x[3] and x[4]-x[7] add and subtract mutually as shown in Fig 2.2, then the subtracted values are multiplied by twiddle factors. Figure 2.2: Simple DIF 8-point FFT Use the same axiom to go on decomposing, shown in Fig 2.3. After several times of decomposing, we can obtain Fig 2.4, a complete DIF 8-point FFT with radix-2. Figure 2.4: DIF 8-point FFT with radix-2 #### 2.2.2 Radix- $2^3$ algorithm $8^N$ -point (N is an integer) FFT usually uses radix-8 or radix- $2^3$ algorithm for implemention. They have advantage in trivial twiddle factor, but radix-8 structure demands more complexities than radix- $2^3$ . Hence we usually use radix- $2^3$ instead of radix-8 in implemention. Radix- $2^3$ algorithm can be written as follows. write $$k = l_1 + 2l_2 + 4l_3 + 8k_1$$ (2.5) $$X(8k_1 + 4l_3 + 2l_2 + l_1)$$ $$= \sum_{n=0}^{N/8-1} \{ [x(n) + x(n + \frac{2N}{8}) + W_4^{l_1}x(n + \frac{4N}{8}) + W_4^{2l_1} + x(n + \frac{6N}{8})W_4^{-1}] + [x(n + \frac{N}{8}) + x(n + \frac{3N}{8}) + W_4^{l_1}x(n + \frac{5N}{8}) + W_4^{2l_1}x(n + \frac{7N}{8})W_4^{-1}]W_8^{-1} \} W_N^{nl}W_{N/8}^{nk_1}$$ $$= \sum_{n=0}^{N/8-1} \{ [x(n) + W_2^{l_1}x(n + \frac{4N}{8}) + W_2^{l_2}W_4^{l_1}x(n + \frac{2N}{8}) + W_2^{l_1}x(n + \frac{6N}{8})] + [x(n + \frac{N}{8}) + W_2^{l_1}x(n + \frac{5N}{8}) + W_2^{l_2}W_4^{l_1}x(n + \frac{3N}{8}) + W_2^{l_1}x(n + \frac{7N}{8})]W_8^{2l_2+l_1} \} W_N^{n(4l_3+2l_2+l_1)}W_{N/8}^{nk_1}$$ $$l_1, l_2, l_3 = 0, 1$$ $k = 0, 1, \dots, N/8 - 1$ (2.6) Fig 2.5 shows DIF butterfly with radix- $2^3$ . From Fig 2.6, we can see that radix- $2^3$ has the simple basic stucture of radix-2, but the twiddle factor was replaced by trivial twiddle factor $(W_4^N, W_8^N)$ . Thus we say radix- $2^3$ algorithm is good for implemention. Figure 2.5: DIF butterfly with radix- $2^3$ Figure 2.6: DIF 8-point FFT with radix- $2^3$ #### 2.2.3 Radix-3 algorithm $$X(3k_1+l) = \sum_{n=0}^{N/3-1} [X(n) + x(n+\frac{N}{3})W_3^l + x(n+\frac{2N}{3})W_3^{2l}]W_N^{nl}W_{N/3}^{nk_1}$$ $$l = 0, 1, 2 \qquad k = 0, 1, \dots, N/3 - 1$$ (2.7) Fig. 2.7 shows a radix-3 butterfly. Figure 2.7: DIF butterfly with radix-3 #### 1896 From 2.7, the input should be multiplied by extra twiddle factors and then passed to the butterfly. More specifically, $$y(n) = x(n)W_N^0 + x(n+1)W_N^0 + x(n+2)W_N^0$$ $$y(n+1) = x(n)W_N^0 + x(n+1)W_N^1 + x(n+2)W_N^2$$ $$y(n+2) = x(n)W_N^0 + x(n+1)W_N^2 + x(n+2)W_N^4$$ An example for a 9-point FFT butterfly with radix-3 is shown in 2.8. Figure 2.8: DIF 9-point FFT with radix-3 #### 2.2.4 FFT/IFFT The IFFT of an N-point sequence x(n) is defined as $$x(n) = \frac{1}{N} \sum_{k=0}^{N-1} X(k) W_N^{-nk} \qquad k = 0, 1, \dots, N-1$$ (2.8) In order to implement the IFFT algorithm more efficiently, equation (2.9) can be rewritten as $$x(n) = \frac{1}{N} \left\{ \sum_{k=0}^{N-1} X^*(k) W_N^{nk} \right\}^*$$ (2.9) According to equation (2.10), the IFFT can be performed by adding the complex conjugate of input data and output data without changing any coefficient in the original FFT architecture. We can see Fig. 2.9, it utilizes multiplexer to change the mode that operation of FFT or IFFT. Thus, the hardware implementation can be more efficient. The data sequence is divided into two data paths by commutator, and then properly add and substract mutually for two data paths. The PE (processor element) is implemented by radix-2 algorithm. The numbers of PE unit, multipliers and delay elements are with order $log_2N$ ( $log_2N-2$ ) and $(\frac{3N}{2})-2$ respectively. Figure 2.9: Block diagram of FFT/IFFT. ### 2.3 Variable FFT Many standars need various FFT, such as DAB, DVB-T, VDSL, Wi-MAX and LTE. Their FFT sizes are shown in Table 2.1. Hence, the design of a scalable FFT for different purposes become more important. For variable FFT, we use mixed-radix algorithm which is more elastic and convenient for variable FFT sizes. We utilize the above radix-2,radix-2<sup>3</sup> and radix-3 for mixed-radix to present a variable FFT which can support 64, 32, 16 and 8-point operation. It is very easy to modify our LTE standar designed by 128, 256, 512, 1024, 1536 and 2048-point FFT operation size to create any required length of FFT. Hence, this modification of circuit is simple and convenient to be used for many FFT sizes of other standards. For the others scalable FFT, please refer to [2]. | Communication system | FFT size | |----------------------|----------------------------| | LTE | 128,256,512,1024,1536,2048 | | Wi-MAX [6] | 128,512,1024,2048 | | VDSL [3] | 8192,4096,2048,1024,512 | | DAB [4] | 2048,1024,512,256 | | DVB-T [5] | 8192,2048 | Table 2.1: FFT size in several OFDM or SC-FDMA systems #### 2.3.1 Pipeline FFT processor architecture The traditional radix-2 pipeline FFT architectures can be roughly classified into MDC (multi path delay commutator) and SDF (single path delay feedback) [1]. Fig. 2.10 shows a radix-2 multi-path delay commutator (R2MDC) architecture with N=8. The data sequence is divided into two data paths by commutator, and then properly add and substract mutually for two data paths. The PE (processor element) is implemented by radix-2 algorithm. The numbers of PE unit, multipliers and delay elements are with order $log_2N$ ( $log_2N-2$ ) and ( $\frac{3N}{2}$ ) – 2 respectively. Fig. 2.11 shows a radix-2 single path delay feedback (R2SDF) architecture with N=8. The use of delay elements in R2SDF is more efficient than R2MDC by sharing the memory. The numbers of PE units, multipliers and delay elements for R2MDC are N-1, ( $log_2N-1$ ) and $log_2N$ . The SDF FFT and MDC FFT are described as follows. - a) MDC FFT. Because the MDC FFT utilizes multi path to increase data path, the MDC FFT can increase the throughput rate. Its drawback is that the memory size become much more. - b) SDF FFT: Because the SDF FFT utilizes feedback to reuse memory, the SDF FFT can reduce the memory usage. Its drawback is that the throughput rate is low. Figure 2.10: Architecture of R2MDC. Figure 2.11: Architecture of R2SDF. #### 2.3.2 Variable FFT processor architecture We can see a variable FFT which can achieve 64-point, 32-point, 16-point and 8-point operation in Fig. 2.12. The structure uses mixed-radix algorithm for implemention, where stage 1 to stage 3 are radix-2 and stage 4 is radix-2<sup>3</sup>. It utilizes above SDF structure and multiplexors to perform the variable FFT. There are four different FFT lengths. For the 64-point FFT, all stages are active. For the 32-point FFT, the input data will skip the first stage and go to the second stage directly. For the 16-point FFT, the input data will skip the first stage and the second stage and go to the third stage. Multiplexors are used to control different FFT size operation. Thus, we can use that to perform variable FFT easily. Figure 2.12: Block diagram of a variable FFT for 64-point, 32-point, 16-point and 8-point operation #### 2.3.3 Subcarrier Mapping SC-FDMA is a promising technique for high data rate uplink communications. It is a modified form of OFDMA, and has similar throughput performance with OFDMA. A pinciple advantage is the peak-to-average power (PAPR), which is lower than OFDMA. In the SC-FDMA system structure, subcarrier mapping is a important factor which affects PAPR and throughput performance. There are many accurate analyses of throughput and PAPR with subcarrier mapping in SC-FDMA [8]. There are two ways to apportion subcarriers among terminals, which are localized SC-FDMA (LFDMA) and distributed SC-FDMA. One realization of distributed SC-FDMA is interleaved FDMA (IFDMA). The LFDMA and IFDMA are described as follows. • a) LFDMA (localized SC-FDMA): Each terminal utilizes a set of adjacent subcarriers to transmit its symbols. Thus, the bandwidth of LFDMA is limited to a fraction of all the system bandwidth. Fig. 2.10 (a) • b) IFDMA (distributed SC-FDMA): The subcarriers used by a terminal are spread over the entire signal band. The occupied subcarriers of users are allocated uniformly as shown in Fig. 2.10 (b). Figure 2.13: Subcarrier allocation methods for multiple (3 users, 12 subcarriers, and 4 subcarriers per user) An example of subcarrier mapping is in Fig. 2.14. We assume that 12 subcarriers are shared by 3 users, so each user has four data symbols to transmit at the same time. The DFT output has four complex data symbols in frequency domain, and the four data symbols are mapped on 12 subcarriers using different mapping methods. In [8], the authors analyzed the effects of subcarrier mapping on throughput and PAPR. They find LFDMA with channel-dependent scheduling (CDS) has higher throughput than IFDMA, but the PAPR performance of IFDMA is better than LFDMA. Figure 2.14: Subcarrier mapping example in the frequency domain for 4 subcarriers per user, 3 users, and 12 subcarriers ## Chapter 3 # The proposed scalable LTE for LTE systems ### 3.1 Algorithm From (2.1), the N-point DFT operation can be decomposed to $N_1 \times N_2 \times ... \times N_k$ point DFT operation. We use the radix-2<sup>3</sup> as many as possible reduce the multipliers. The mathematical representation is shown in equation (3.1). $$N = 2048 = \underbrace{8 \times 8 \times 2 \times 2 \times 2 \times 2 \times 2}_{128\text{-point}} \times \underbrace{256\text{-point}}_{512\text{-point}} \times \underbrace{1024\text{-point}}_{2048\text{-point}} \underbrace{1024\text{-point}}_{2048\text{-poin$$ From (3.1), $N = 2048 = N_1 \times N_2 = 2 \times 1024$ . Define the following indices $$n = N_2 n_1 + n_2 = 1024 n_1 + n_2,$$ $$\begin{cases} n_1 = 0, 1 \\ n_2 = 0, 1, \dots, 1023 \end{cases}$$ and $$k = k_1 + N_1 k_2 = k_1 + 2k_2,$$ $$\begin{cases} k_1 = 0, 1 \\ k_2 = 0, 1, \dots, 1023 \end{cases}$$ (2.1) can be rewritten as $$X(k_1 + 2k_2)$$ $$= \sum_{n_2=0}^{1023} \sum_{n_1=0}^{1} x(1024n_1 + n_2) W_{2048}^{(1024n_1 + n_2)(k_1 + 2k_2)}$$ $$= \sum_{n_2=0}^{1023} \left\{ \sum_{n_1=0}^{1} x(1024n_1 + n_2) W_2^{n_1k_1} \underbrace{W_{2048}^{n_2k_1}}_{\text{twiddle factor}} \right\} W_{1024}^{n_2k_2}$$ $$= \sum_{n_2=0}^{1024} B_2^{(1)}(n_2) W_{1024}^{n_2k_2},$$ $$= \sum_{n_2=0}^{1024} B_2^{(1)}(n_2) W_{1024}^{n_2k_2},$$ $$(3.3)$$ where $B_r^{(k)}$ denotes the radix-r algorithm at stage k. Now $N_2 = 1024 = N_2 \times N_3 = 2 \times 512$ . Define the indices $n_2$ and $k_2$ as $$n_2 = N_3 n_2 + n_3 = 512 n_2 + n_3, \qquad \begin{cases} n_2 = 0, 1 \\ n_3 = 0, 1, \dots 511 \end{cases},$$ $$k_2 = k_2 + N_2 k_3 = k_2 + 2k_3, \qquad \begin{cases} k_2 = 0, 1 \\ k_3 = 0, 1, \dots, 511 \end{cases}$$ We have: and $$= \sum_{n_3=0}^{511} \sum_{n_2=0}^{1} B_2^{(1)} (512n_2 + n_3) W_{1024}^{(512n_2+n_3)(k_2+2k_3)}$$ $$= \sum_{n_3=0}^{511} \left\{ \sum_{n_2=0}^{1} B_2^{(1)} (512n_2 + n_3) W_2^{n_2k_2} \underbrace{W_{1024}^{n_3k_2}}_{\text{twiddle factor}} \right\} W_{512}^{n_3k_3}$$ $$= \sum_{n_3=0}^{511} B_2^{(2)} (n_3) W_{512}^{n_3k_3}$$ $$= \sum_{n_3=0}^{511} B_2^{(2)} (n_3) W_{512}^{n_3k_3}$$ $$= (3.4)$$ In this way, we can obtain the result is given by $$X(k_1 + 2k_2 + 4k_3 + 8k_4 + 16k_5 + 32k_6)$$ $$= \sum_{n_6=0}^{63} B_2^{(5)}(n_6) W_{64}^{n_6k_6}, \qquad (3.5)$$ where $k_6 = 0, 1, ..., 63$ . By decomposing the 64-point DFT into the 8-point DFT, we can achieve the 2048-point mixed-radix FFT algorithm. $$X(k_1 + 2k_2 + 4k_3 + 8k_4 + 16k_5 + 32k_6 + 256k_7)$$ $$= \sum_{n_7=0}^{7} \left\{ \sum_{n_6=0}^{7} B_2^{(5)} (8n_6 + n_7) W_8^{n_6 k_6} W_{64}^{n_7 k_6} \right\} W_8^{n_7 k_7}$$ (3.6) Because the radix-8 butterfly unit is inefficient in the use of adders and multipliers. we use the radix-2<sup>3</sup> FFT algorithm [1] to replace the radix-8 FFT algorithm. In this case, we can further reduce the complexity of the butterfly by using the radix-2 butterfly three times. The SFG of the 2048-point mixed-radix FFT algorithm, is as shown in Fig. 3.1 and Fig. 3.2. Let $n_6 = 4\alpha_1 + 2\alpha_2 + \alpha_3$ and $k_6 = \beta_1 + 2\beta_2 + 4\beta_3$ , we can obtain the form with radix-2<sup>3</sup> in equation (3.9). $$X(k_1 + 2k_2 + 4k_3 + 8k_4 + 16k_5 + 32(\beta_1 + 2\beta_2 + 4\beta_3) + 256k_7)$$ $$= \sum_{n_7=0}^{7} B_8^{(6)}(n_7, k_7) W_8^{n_7 k_7},$$ (3.7) where #### 1896 $$B_8^{(6)}(n_7, k_7) = \sum_{\alpha_3=0}^{1} \sum_{\alpha_2=0}^{1} \sum_{\alpha_1=0}^{1} B_2^{(5)}(8(4\alpha_1 + 2\alpha_2 + \alpha_3) + n_7) W_2^{\alpha_1\beta_1}$$ $$W_4^{\alpha_2\beta_1} W_2^{\alpha_2\beta_2} W_8^{\alpha_3(\beta_1 + 2\beta_2)} W_2^{\alpha_3\beta_3} W_{64}^{n_7(\beta_1 + 2\beta_2 + 4\beta_3)}$$ (3.8) Figure 3.1: The SFG of stage 1 to stage 5 (radix-2). Figure 3.2: The SFG of stage 6 to stage 7 (radix- $2^3$ ). LTE standard has 1536-point FFT, and the above equations only perform 128-point, 256-point, 512-point, 1024-point and 2048-point algorithm. Hence we derive the 1536-point algorithm as follows. $$N = 1536 = \underbrace{8 \times 8 \times 2 \times 2 \times 2 \times 3}_{128\text{-point}}$$ $$\underbrace{256\text{-point}}_{1536\text{-point}}$$ $$\underbrace{512\text{-point}}_{1536\text{-point}}$$ From (3.9), $N = 1536 = N_1 \times N_2 = 3 \times 512$ . Define the following indices $$n = N_2 n_1 + n_2 = 512 n_1 + n_2,$$ $$\begin{cases} n_1 = 0, 1, 2 \\ n_2 = 0, 1, \dots, 511 \end{cases}$$ and $$k = k_1 + N_1 k_2 = k_1 + 3k_2,$$ $$\begin{cases} k_1 = 0, 1, 2 \\ k_2 = 0, 1, \dots, 511 \end{cases}$$ (2.1) can be rewritten as $$= \sum_{n_{2}=0}^{511} \sum_{n_{1}=0}^{1} x(512n_{1} + n_{2}) W_{1536}^{(512n_{1}+n_{2})(k_{1}+3k_{2})}$$ $$= \sum_{n_{2}=0}^{511} \left\{ \sum_{n_{1}=0}^{1} x(512n_{1} + n_{2}) W_{2}^{n_{1}k_{1}} \underbrace{W_{1536}^{n_{2}k_{1}}}_{\text{twiddle factor}} \right\} W_{512}^{n_{2}k_{2}}$$ $$= \underbrace{\sum_{n_{2}=0}^{511} \left\{ \sum_{n_{1}=0}^{1} x(512n_{1} + n_{2}) W_{2}^{n_{1}k_{1}} \underbrace{W_{1536}^{n_{2}k_{2}}}_{\text{twiddle factor}} \right\} W_{512}^{n_{2}k_{2}}}_{512-\text{point}}$$ $$= \underbrace{\sum_{n_{2}=0}^{511} \left\{ \sum_{n_{1}=0}^{1} x(512n_{1} + n_{2}) W_{2}^{n_{1}k_{1}} \underbrace{W_{1536}^{n_{2}k_{1}}}_{\text{twiddle factor}} \right\} W_{512}^{n_{2}k_{2}}}_{512-\text{point}}$$ $$= \underbrace{\sum_{n_{2}=0}^{511} \left\{ \sum_{n_{1}=0}^{1} x(512n_{1} + n_{2}) W_{2}^{n_{1}k_{1}} \underbrace{W_{1536}^{n_{2}k_{1}}}_{\text{twiddle factor}} \right\} W_{512}^{n_{2}k_{2}}}_{512-\text{point}}$$ $$= \sum_{n_2=0}^{511} B_3^{(1)}(n_2) W_{512}^{n_2 k_2}, \tag{3.11}$$ Figure 3.3: The SFG of DIF with radix-3 #### Architecture 3.2 The proposed architecture of the variable FFT/IFFT processor is shown in Fig. 3.4. In uplink transmission, SC-FDMA needs both FFT and IFFT. Thus we share FFT and IFFT hardware to reduce area and power. For FFT/IFFT sharing, we plan a suitable timing achedule, so that the FFT/IFFT is operated at 100MHZ and can handle a 25MHZ input throughput rate. We use LFSR (Linear Feedback Shift Register) to create variable data rate for simulation convenience. The variable FFT for LTE system provide 2048/1536/1024/512/256/128-point FFT/IFFT operations, and we design subcarrier mapper to map N point FFT to M point IFFT. After the FFT/IFFT, we use RAM to bitreverse the FFT/IFFT. Figure 3.4: Block diagram of the variable FFT processor in FPGA. #### 3.2.1 FFT Processor Fig. 3.5 shows a variable FFT processor for 128/256/512/1024/1536 and 2048-point. From subsection 3.2.2, we can skip some stages to achieve variable length FFT. Figure 3.5: Block diagram of the variable FFT processor. #### • Module 1 to Module 5 (radix-2 FFT algorithm) Module 1 to module 5 contain one butterfly unit, one multiplier, ROM table and one memory bank per module, which is shown in Fig. 3.6. Their memory bank sizes are with order 1024, 512, 256, 128 and 64 respectively. #### • Module 6 and Module 7 (radix-2<sup>3</sup> FFT algorithm) Module 6 and module 7 contain three butterfly units and three memory bank per module, which is shown in Fig. 3.7. The size of the memory bank from stage one to stage three of module 6 are with order 32, 16 and 8 respectively. The size of the memory bank from stage one to stage three of module 7 are with order 4, 2 and 1 respectively. The multiplier of trivial twiddle factors such as $w_8^N$ and $w_4^N$ can be replaced by shift adders. #### • Module (radix-3 FFT algorithm) Module 8 contains one butterfly unit, one multiplier, ROM table and two memory banks, which is shown in Fig. 3.8. We can see that the butterfly unit is different from other modules, i.e. module 8 is radix-3 FFT. The input data have to be multiplied by $W_N^1$ , $W_N^2$ and $W_N^4$ first. Figure 3.6: Block diagram of module 1 module 5 (radix-2) Figure 3.7: Block diagram of module 6 and module 7 (radix-2<sup>3</sup>) Figure 3.8: Block diagram of module 8 (radix-3) #### 3.2.2Subcarrier mapper In SC-FDMA uplink system, N-point FFT is mapped to M-point IFFT. We can operate that in reordering step, which is shown in Fig.3.9. Figure 3.9: Block diagram of FFT and IFFT with subcarrier mapper #### • Localized subcarrier mapping We save the index of X in the ROM. Then we read the adresses from the ROM to write the FFT output data into RAM, which is shown in Fig. 3.10 with 4-point FFT output mapping to the input of 8-point IFFT. #### • Distrubuted subcarrier mapping Distributed subcarrier mapping has the same access control with localized subcarrier mapping. The adresses of the distributed subcarrier mapping can be obtained from some mathematical formula described below: When $N=2^x$ -point FFT is mapped to $M=2^y$ -point FFT, we should add y-x bits zero, then bitreverse all bits of the FFT. Take an example for N=4 and M=8 in Fig. 3.11, we add one zero and then bitreverse the three bits. Because 1536-point is mapped to 2048-point which is particular, we find another method to implement that. First we divide the index by 3 and then do floor function. Finally bitreverse all bits of the FFT as above. Fig. 3.12 is an example for N=6 and M=8. Figure 3.10: Localized subcarrier mapper with N=4 mapping to M=8 Figure 3.11: Distributed subcarrier mapper with N=4 mapping to M=8 Figure 3.12: Distrubuted subcarrier mapper with N=6 mapping to M=8 $\,$ # 1896 Please note that there are some easier ways to generate the SC-FDMA signals for distributed subcarrier mapping. For instance, Fig. 3.13 shows that for N=M=8, there is no need to perform FFT and IFFT at all. Fig. 3.14 shows that for N=8 and M=4, the output signal of SC-FDMA can be obtained by repeating the input data twice so that there is no need to perform FFT and IFFT. In fact, there is no need to perform FFT and IFFT for distributed subcarrier mapping. Thus the proposed processor will be active only for the localized subcarrier mapping. Figure 3.13: 8-point DFT and IDFT Figure 3.14: 4-point is mapping to 8-point by distributed subcarrier mapping #### 3.2.3 Ripple-like ON/OFF stage control For FFT/IFFT hardware sharing, we should use ripple-like ON/OFF stage control. This can set each stage of the FFT processor to switch FFT or IFFT. In SC-FDMA uplink transmission, N-point FFT is mapped to M-point IFFT, where we always set M be 2048 and N be 128, 256, 512, 1024, 1536 and 2048. To reduce the hardware, we would like a FFT processing be capable to handle both FFT and IFFT processing. Thus, the mapped reordered FFT output data feed back to the input of the FFT processor. In this case, this FFT can control the reset time. It is shown in Fig. 3.15. | | Stage1-2 | Stage2-3 | Stage3-4 | Stage4-5 | Stage5-6 | Stage6-7 | Stage7-8 | Stage8-9 | Stage9-10 | Stage10-11 | |---------------------|----------|----------|----------|----------|----------|----------|----------|----------|-----------|------------| | Distance<br>(cycle) | 1024 | 512 | 256 | 128 | 64 | 32 | 16 | 8 | 4 | 2 | Table 3.1: Distance for each stage ## 3.2.4 Early access of reordering The mapped and reordered data can actually be accessed earlier and we do not need to wait until the whole mapping and reordering processes finish, Take a 8-point FFT output for instance, the output data are X[0], X[4], X[2], X[6], X[1], X[5], X[3], X[7]. Thus we can read the output in when X[1] is available, i.e. around half of the FFT size. For other FFT sizes, we summarize when the data can be accessed earlier in Table 3.2. | N-point | Time for early access | |---------|-----------------------| | 128 | 7N/8 | | 256 | 15N/16 | | 512 | 15N/16 | | 1024 | 31N/32 | | 1536 | 31N/32 | | 2048 | 31N/32 | Table 3.2: A fraction of N-point cab be read early ## 3.2.5 FFT/IFFT hardware sharing In SC-FDMA system, we need to perform both FFT and IFFT. FFT and IFFT have similar structure except conjugate of input and output data. The Fig. 3.16 shows a timing plan for a N-point SC-FDMA system, we expect each FFT/IFFT or reordering takes N clock cycles. FFT/IFFT latency is N and reordering latency is N, thus we need 4N clock cycles. As a result, if the required data is R, we should use 4R for the proposed FFT/IFFT precessor. Figure 3.16: Block diagram of FFT/IFFT for SC-FDMA system The proposed processor uses pipeline, ripple-like ON/OFF stage control and early access of reordering to reduce the hardware. An operation example for N=2048 is shown in Figs. 3.17 and 3.18. Please note that the (a), (b) and (c) correspond to Figs. 3.17 (a), (b) and (c). When the clock cycles are 2048 + 1984(2048 \* 31/32) = 4033, which is shown in Fig. 3.18, IFFT starts to operate in Fig. 3.17(b). Fig. 3.17(c) is the IFFT operation. Figure 3.17: (a) FFT operation (b) Switch FFT to IFFT operation (early access of reordering and ripple-like ON/OFF stage control start) (c) IFFT operation Figure 3.18: Timing planning of FFT/IFFT for N = 2048 # 3.3 Complexity analysis Because SC-FDMA system has both FFT and IFFT, but we combine them in one. We almost reduce a half of the hardware, the requirement hardware is shown in Table 3.3. | | Complex | Complex | ROM | Memory | |----------|------------|---------|--------------|-------------| | | multiplier | adder | table(bytes) | size(bytes) | | Proposes | 7 | 25 | 6144 | 6142 | | FFT/IFFT | / | 23 | 0144 | 0142 | Table 3.3: Comparison of hardware requirement. # 3.4 Simulation We conduct two simulation in MATLAB before we design our proposed FFT/IFFT, i.e. SQNR (Signal to Quantization Noise Ratio) simulation and FFT/IFFT loop function test. SQNR simulation can determine the bit width of the FFT, and FFT/IFFT loop function test can verify the function correctness of the SC-FDMA system. # 3.4.1 SQNR (Signal to Quantization Noise Ratio) simulation Determining appropriate bit width in the FFT processor is important. Since the bit width affects the hardware cost directly. Bit width can be determine by fixed-point simulation. We use the SQNR (Signal to Quantization Noise Ratio) system model to determine the FFT bit width. The system model of SQNR is shown below. Figure 3.19: System model of SQNR. The SQNR is defined as $$SQNR = \frac{1}{N} \sum_{k=0}^{N-1} \frac{1}{|y_k - u_k| + \beta^2 \sigma_e^2}$$ (3.12) Consider the SNR (Signal to Noise Ratio) is one $(\frac{\sigma_x^2}{\sigma_e^2} = 1)$ , we can rewrite the equation as $$SQNR = \frac{1}{N} \sum_{k=0}^{N-1} \frac{1}{|y_k - u_k| + \beta^2} \quad , \tag{3.13}$$ where $\beta = 10^{\frac{-SNR}{20}}$ . The relationship between SQNR (with final bit-width) and SNR is shown in Fig. 3.20. When SQNR is large, it means that quantization error is small. From the figure, the final bit-width can support SQNR up to 40 dB. Figure 3.20: SQNR v.s. SNR. ## 3.4.2 FFT/IFFT loop function test For verifing SC-FDMA system, we use MATLAB to simulate a loop which cascading FFT and IFFT as shown in Fig. 3.21. Fig. 3.22 is shown the comparison of fixed-point system platform and floating-point system platform. Fig. 3.23 shows the comparison of input and fixed-point system output 4 and floating-point system output 4. Fig. 3.24 shows the comparison of input and fixed-point system output 2 and output 4. Figure 3.21: Block diagram of FFT and IFFT in SC-FDMA system Figure 3.23: Input v.s. fixed-point output4 and floating-point output4 Figure 3.24: Input v.s. fixed-point output2 and fixed-point output4 # Chapter 4 # FPGA implementation and verification In this chapter we state how to build the MATLAB platform of the variable FFT and how to verify the design. The MATLAB platform can be used to verify the function of the RTL platform. Then we use FPGA to verify the output data of the RTL platform. The design flow is illustrate in Fig. 4.1. - RTL verification We use MATLAB to analyse and verify architecture and use verilog to design architecture. - Synthesis and Place and Route We add ICON and ILA of Chipscope and .ucf file which is pin location file in Xilinx synthesis. - Xilinx iMPACT We can use iMPACT to download .bit file or .mcs file in FPGA to verify our design. Figure 4.1: A design flow of the Xilinx design kit Vertex5-XC5VLX110T-FF1136 # 4.1 RTL verification Figure 4.2: Simulation environment for variable FFT We need to build the MATLAB platform and verilog platform. Fig. 4.2 shows the simulation environment for bit-true verification. At first we need to develop a floating-point FFT model in MATLAB environment. Since the FFT function in MATLAB tool is a floating-point function, we need to build the MATLAB quantizable FFT function using the proposed architecture mentioned earlier. When the quantizable FFT function is ready, we can determine the bit-width for individual stages using the procedure mentioned in Sec. 3.2.6. Fig. 4.3 shows the quantized result in all stages. Note that total number of bits in all stage is 16. When the fixed-point FFT model is ready, we use LFSR (Liner feedback shift register) to generate random input data pattern from this model. Figure 4.3: Bit-width in all stages # 4.1.1 LFSR (Liner feedback shift register) Usually, the random input data pattern of MATLAB will be used by Verilog, but we will verify that in FPGA. Thus we use LFSR to implement the random input data both in MATLAB and Verilog. First we confirm the two input data of MATLAB platform and Verilog platform are the same, and then we compare the output data from the verilog platform and the MATLAB platform to verify the result. The LFSR will be inserted in FPGA with FFT. Thus, we can verify the output data in FPGA. Fig. 4.6 shows the used LFSR structure with 16 bits. Figure 4.4: LFSR for 16 bits Since the data rate is R and the operation rate is 4R, we need four LFSR in simulation. which is shown in Fig. 4.5. They generate four input data simultaneously. So there are four input data waiting for the FFT processor. By doing this, the timing diagram of the input data is shown in Fig. 4.6. Figure 4.5: LFSR and FFT processor Figure 4.6: Timing plan for LFSR # 4.2 Synthesis place and route #### • Overview of ISE software [9] When RTL code is ready, we will use ISE software to synthesis. The ISE software controls all aspects of the design flow. Through the Project Navigator interface, we can access all of the design entry and design implementation tools. You can also access the files and documents associated with the project. We will conduct the operation steps as follows. #### • Starting the ISE software To start the ISE software, double-click the ISE Project Navigator icon on your desktop. #### • Creating a new project To create a new project using the New Project Wizard, do the following From Project Navigator, select File $\rightarrow$ New Project. The New Project Wizard appears. In the Location field, browse to the directory in which we installed the project, and write the file name. Click Next and the New Project Wizard page appears. Select the FPGA types in the New Project WizardXDevice Properties page , and click Next. Our FPGA is Vertex5-XC5VLX110T-FF1136 #### • Adding source files HDL files have to be added to the project before they can be synthesized. Select Project $\rightarrow$ Add Source, and Select the following files (.vhd files for VHDL design entry or .v files for Verilog design entry) from the project directory, and click Open. #### • Overview of CORE Generator software [9] The CORE Generator software is a graphical interactive design tool that enables we to create high-level modules such as memory elements, math functions and communications, and I/O interface cores. We can use CORE generator to generate ICON and ILA of Chipscope, and then add source of them. #### • Chipscope LA (logic analyzer) is an important tool of FPGA, but traditional LA has two disadvantages, which are expensive and need more pads. Thus the Xilinx provide a tool called "ChipScope Pro". ChipScope Pro is cheap and can be opearated easily. #### • Setting macro search path If we use CORE generator to generate memory, mutiplier or else module, we have to set macro search path. In the Processes Pane $\rightarrow$ Implement Design, right Click $\rightarrow$ Process Properties. Set macro search path that choose the path in which our CORE generator file and then click OK. # • Create constraint file (.ucf file) Select Project $\to$ New Source $\to$ Implementation Constrains File, and then set file name and location. The constrain file is in current hierarchy. #### • Edit constraint file Setting clock period and location. (Refer to Xilinx tutorial ug347) # Setting DIP switch location. (Refer to Xilinx tutorial ug347) | Table 1-5: DIP Switch Connections (SW8 | | | | |----------------------------------------|----------|--|--| | SW4 | FPGA Pin | | | | GPIO_DIP_SW1 | U25 | | | | GPIO_DIP_SW2 | AG27 | | | | GPIO_DIP_SW3 | AF25 | | | | GPIO_DIP_SW4 | AF26 | | | | GPIO_DIP_SW5 | AE27 | | | | GPIO_DIP_SW6 | AE26 | | | | GPIO_DIP_SW7 | AC25 | | | | GPIO_DIP_SW8 | AC24 | | | | | | | | Setting LED location. (Refer to Xilinx tutorial ug347) Setting press location for reprogramming FPGA. (Refer to Xilinx tutorial ug347) • Synthesis and implement design In the Processes Pane $\rightarrow$ Implement Design, right-click $\rightarrow$ Run ## 4.3 Xilinx iMPACT [9] IMPACT is a file generation and device programming tool. iMPACT enables you to program through several parallel cables, including the Platform Cable USB. iMPACT can create bitstream files, System ACE solution files, PROM files, and SVF/XSVF files. #### • Platform cable USB-II The Platform Cable connects to the USB port of your computer and can be used to facilitate Boundary-Scan functionality. For more information, see the Platform Cable USB-II Data Sheet available from the Xilinx website. #### • Generating the configuration files BIT file: A binary file that contains proprietary header information as well as configuration data.( We choose BIT file to do our verifycation. ) MCS file: An ASCII file that contains PROM configuration information. We choose BIT file in our simulation. 1896 • Generate .bit file In the Processes Pane $\to$ Implement Design $\to$ Generate Programming File, right-click $\to$ Run We can find .bit file in our created project. #### • Download .bit file In the Processes Pane $\to$ Manage Configuration Project, right-click $\to$ Run (Open iMPACT software ) In the ISE iMPACT GUI, Select File $\rightarrow$ New Project. Select Configure devices using Boundary-Scan (JTAG), click OK. Select Device: xc5vlx110t, right-click $\rightarrow$ Assign New Configuration File. Click No. Select Device: xc5vlx110t, right-click $\rightarrow$ Program. The GUI shows Program Succeeded. # 4.4 Verification using ChipScope Pro # • Running ChipScope Pro In the Processes Pane - Analyze Design Using ChipScope, right-click $\to$ Run ( Open ChipScope ) We can see Project, Signals, Trigger Setup, Waveform view. Click play button, you can run ChipScpoe Pro. After editing our signal, we can get internal FPGA signal. # Chapter 5 # **Conclusions** In this thesis, we proposed the variable FFT for uplink of LTE application. In chapter 2 we discuss the several FFT architectures which can be applied with various FFT size and present subcarrier mapping. In chapter 3 we use ripple-like ON/OFF stage control and early access of reordering to implement our timing plan design. In chapter 4 we discuss how to build the system platform in MAT-LAB and verilog and FPGA implement flow. Finally, we use FPGA to verify our design. 1896 # **Bibliography** - [1] Shousheng He and Mats Torkelson, "Designing Pipeline FFT Processor for OFDM (de) Modulation," URSI International Symposium on Signals, Systems and Electronics, pp. 257-262, 1998. - [2] Y. T. Lin, P. Y. Tsai, and T.D.Chiueh, "Low-power Variable-length Fast Fourier Transform Processor," IEEE Proc.-Computer. Digit. Tech., Vol. 152, No. 4, pp 499-506, July 2005. - [3] ETSI TS 101 270-2 (v1.1.1): "Transmssiion and multiplexing (TM); access transmission systems on metallic access cables; very high speed digital subscriber line (VDSL); Part 2: Transceiver specification," Feb. 2001. - [4] ETSI EN 300 401 (v1.3.2): "Radio broadcasting systems; digital audio broadcasting (DAB) to mobile, portable and fixed receivers," Sep. 2000. - [5] ETSI EN 300 744 (v1.2.1): "Digital video broadcasting (DVB); framing structure, channel coding and modulation for digital terrestrial television," Jul. 1999. - [6] Y. W. Lin and C. Y. Lee, "Design of an FFT/IFFT Processor for MIMO-OFDM Systems," IEEE Trans. Circuits Syst. I, Reg. Papers, vol.54, no.4, pp.807-815, Apr. 2007. - [7] IXIA, "SC-FDMASingle Carrier FDMA in LTE," Technical report, can be accessed $http: //www.ixiacom.com/pdfs/library/white_papers/SC FDMA INDD.pdf$ - [8] Hyung G. Myung, Junsung Lim, and David J. Goodman, "Single Carrier FDMA for Uplink Wireless Transmission," IEEE Vehicular Technology Magazine, September 2006. - [9] ISE In-Depth Tutorial UG695 (v13.1)