# 國立交通大學

# 電子工程學系 電子研究所 碩士論文

十億級資料傳輸室內無線 SC/OFDM 接收機之

等化器

# Design of Equalizer for Multi-Gbps Transmission Indoor Wireless SC/OFDM Receiver

研 究 生:葉福鈞

指導教授:周世傑 教授

#### 十億級資料傳輸室內無線 SC/OFDM 接收機之等化器

# Design of Equalizer for Multi-Gbps Transmission Indoor Wireless SC/OFDM Receiver

研究生: 葉福鈞 Student: Fu-Chun Yeh

指導教授:周世傑 教授 Advisor:Prof. Shyh-Jye Jou



Submitted to Department of Electronics Engineering & Institute of Electronics

College of Electrical and Computer Engineering

National Chiao Tung University

in partial Fulfillment of the Requirements

for the Degree of Master of Science

in

Department of Electronics Engineering
July 2011
Hsinchu, Taiwan, Republic of China

中華民國一○○年 七月

#### 十億級資料傳輸室內無線 SC/OFDM 接收機之等化器

研究生: 葉福鈞 指導教授: 周世傑 教授

國立交通大學

電子工程學系 電子研究所碩士班

#### 摘要

本論文針對IEEE 802.15.3c標準中的雙模式(單載波和正交分頻多工)提出適應性最小均平方(LMS)頻域等化器搭配最小平方(LS)頻域通道估測(LS-LMS FDE),和多重路徑干擾消除(MPIC)時域等化器搭配格雷(Golay)序列時域通道估測(Golay-MPIC TDE)。此兩種方法皆可以共用雙模式中的硬體,以達到降低硬體複雜度的目的。LS-LMS FDE使用了最小均平方的適應演算法以及最小平方的通道估測來加速收斂速度同時也能保持低運算複雜度。模擬的結果顯示在訊雜比為12dB時,未具有任何編碼保護下的位元錯誤率,在單載波和正交分頻多工模式中,分別可達到6.01\*10<sup>-1</sup>和9.68\*10<sup>-6</sup>。整體的等效邏輯開數除了快速傅立葉轉換模組外,為41.5萬個邏輯閘,其中有69%是雙模式共用的硬體。當操作頻率在400MHz時,不包含快速傅立葉轉換模組的功率消耗只有81.27毫瓦。而Golay-MPIC TDE使用多重路徑干擾消除演算法降低硬體複雜度及格雷序列通道估測來消除雜訊干擾。當訊雜比為12dB時,未具有任何編碼保護下的位元錯誤率,在單載波和正交分頻多工模式中,分別可達到2.53\*10<sup>-1</sup>和4.22\*10<sup>-5</sup>。整體的等效邏輯閘數為40.5萬個邏輯閘,其中有99%是雙模式共用的硬體。操作頻率在400MHz時,功率消耗只有88毫瓦。

本論文提出的頻域和時域等化器,分別整合進兩個室內無線傳輸基頻接收機中。基於高速和面積使用率的考量,硬體合成使用了65 奈米1 伏特 1P9M CMOS 製程。LS-LMS FDE 晶片的核心部分占了7.81mm²,使用率是65.91%。操作頻率為333 MHz,資料傳輸率在單載波和正交分頻多工模式中,分別可達到3.52Gbps

和 5. 28Gbps,功率消耗 793. 98 毫瓦。符號邊界同步估測器與此 LS-LMS FDE 共用的記憶體佔了 32. 68%。而 Golay-MPIC TDE 晶片的核心部分占了 7. 95 mm²,使用率是 88. 93%。操作頻率為 336. 7 MHz,資料傳輸率在單載波和正交分頻多工模式中,分別可達到 3. 52Gbps 和 5. 28Gbps,功率消耗為 1. 12 瓦。符號邊界同步估測器、相位雜訊消除器與此 Golay-MPIC TDE 共用的記憶體佔了 37%。



**Design of Equalizer for Multi-Gbps Transmission** 

**Indoor Wireless SC/OFDM Receiver** 

Student: Fu-Chun Yeh Advisor: Prof. Shyh-Jye Jou

Department of Electronics Engineering

Institute of Electronics

National Chiao Tung University

**Abstract** 

This thesis proposes an adaptive LS-LMS FDE and LOS Goaly-MPIC TDE that can satisfy the dual mode (SC and HSI) specifications of IEEE 802.15.3c. The hardware of both methods can be shared by SC and HSI mode to reduce hardware complexity. The LS-LMS FDE combines LMS adaptive algorithm with LS channel estimation. The LMS algorithm has the advantage of low computational complexity and sufficient convergence speed with the aid of LS channel estimation. The simulation results show that the LS-LMS FDE can achieve 6.01\*10<sup>-4</sup> BER in SC mode and 9.68\*10<sup>-3</sup> BER in HSI mode (both uncoded) at SNR 12 dB. The total area is about 415K gate-count with 69% shared among SC and HSI mode except 2 FFT. The power consumption excluding FFT is only 81.27 mW when working at 400MHz. On the other hand, the Golay-MPIC TDE uses Multi-path Interference Cancellation (MPIC) equalization with Golay sequence-aided channel estimation. The MPIC algorithm can reduce the hardware complexity unlike traditional time-domain equalizer and Golay sequence-aided channel estimation will eliminate the AWGN noise. The Golay-MPIC TDE can achieve 2.53\*10<sup>-4</sup> BER in SC mode and 4.22\*10<sup>-5</sup> BER in HSI mode (both uncoded) at SNR 12dB. The total area is about 405K gate-count with 99% shared by SC and HSI mode. The power consumption is only 88 mW when working at 400 MHz.

The proposed different domain architectures are integrated in two indoor wireless communication baseband receiver systems. For the high speed and area efficiency considerations, the overall system designs are implemented using 65 nm 1P9M CMOS GP process under supply voltage 1.0 V. The LS-LMS FDE chip occupies 7.81mm<sup>2</sup> core area with 65.91% utilization, and the clock rate is 333 MHz. The data rate of SC and HSI mode can achieve 3.52 Gbps and 5.28 Gbps, respectively. Also, the power consumption is 793.98 mW. The shared memory is 32.68% of the baseband system which is shared by BD and FDE blocks. The core area of Golay-MPIC TDE chip is 7.95 mm<sup>2</sup> with 88.93% utilization, and the clock rate is 336.7 MHz. The data rate of SC and HSI mode can achieve 3.52 Gbps and 5.28 Gbps, respectively. Also, the power consumption is 1.12 W. The BD, TDE and PNC blocks use the same shared memory which is 37% of the baseband system.

1896

#### 致 謝

在碩士兩年生涯裡,最感謝的是周世傑教授,不論是學術課業上的指導,還是生活健康上的照顧,老師都無私地給予我們幫助,因此我才能順利地完成研究。然後要感謝口試委員,陳紹基教授、李鎮宜教授和劉志尉教授,委員們在口試時的建議和指導,讓我的研究更加的完善。再來要感謝的是實驗室的同學們:首先是庭楨、瑋昌、盈志、紹維、代暘、為凱、雅雪、祥譽學長姐們,還有亦瑋、以樂、佳怡,他們給我很多研究上的建議,並且也為生活中帶來了許多樂趣。也感謝幫忙完成快速傳立葉轉換模組的紳睿學長,讓晶片能夠順利完成。接下來要感謝的是幫助過我的所有師長和朋友。最後要感謝支持我的家人,讓我能安心地完成學業。

# **Contents**

| Chapter 1 Introduction                                              | 4         |
|---------------------------------------------------------------------|-----------|
| 1.1 Indoor Wireless Communication Standards for 60 GHz with Mu      | ılti-Gbps |
| Transmission                                                        | 4         |
| 1.2 Motivation                                                      | 6         |
| 1.3 Thesis Organization                                             | 7         |
| Chapter 2 Overview of Multi-Gbps Transmission Indoor Wireless Commu | ınication |
| Standards                                                           | 8         |
| 2.1 Comparison of IEEE 802.15.3c and IEEE 802.11ad                  | 8         |
| 2.2 IEEE 802.15.3c Specifications                                   | 10        |
| 2.2 IEEE 802.15.3c Specifications  2.2.1 Basic Specifications       | 10        |
| 2.2.2 Equalization Related Specifications                           |           |
| 2.2.3 Channel Model                                                 | 16        |
| Chapter 3 SC/OFDM Dual-Mode Frequency and Time Domain Equalizer     | 20        |
| 3.1 Review of Frequency Domain Equalization (FDE) [11]              | 20        |
| 3.1.1 Channel Estimation                                            | 22        |
| 3.1.2 Adaptive Equalization                                         | 24        |
| 3.2 Review of Time Domain Equalizer (TDE)                           | 27        |
| 3.2.1 Multi-path Interference Cancellation                          | 29        |
| 3.2.2 Golay-Sequence Aided Channel Estimation [15]                  | 31        |
| 3.3 Proposed Architecture for IEEE 802.15.3c                        | 33        |
| 3.3.1 Proposed Adaptive LS-LMS FDE [11]                             | 33        |
| 3.3.2 Proposed LOS Golay-MPIC TDE                                   | 36        |
| Chapter 4 Architecture Design and Performance Analysis              | 41        |
| 4.1 Design Specifications and Architecture                          | 41        |

| 4.1.1 Proposed Adaptive LS-LMS FDE and Baseband Receiver [11] | 44 |
|---------------------------------------------------------------|----|
| 4.1.2 Proposed LOS Golay-MPIC TDE and Baseband Receiver       | 47 |
| 4.2 Sub-block Architecture Design                             | 50 |
| 4.2.1 Optimised Golay Correlator (OGC)                        | 50 |
| 4.2.2 Divider Free LS Method [11]                             | 52 |
| 4.2.3 FFT/IFFT Design Specifications                          | 57 |
| 4.3 Synthesis and Simulation Results                          | 58 |
| 4.3.1 Proposed Adaptive LS-LMS FDE                            | 59 |
| 4.3.2 Proposed LOS Golay-MPIC TDE                             | 61 |
| 4.4 Comparison of Adaptive LS-LMS FDE and MPIC Golay-MPIC TDE | 63 |
| Chapter 5 Baseband Design and Chip Implementation             | 65 |
| 5.1 Adaptive LS-LMS FDE in Baseband Receiver [34]             | 65 |
| 5.1.1 Chip Integration and Implementation Result              | 65 |
| 5.1.2 Measurement Consideration                               | 69 |
| 5.2 LOS MPIC Golay-MPIC TDE in Baseband Receiver              | 73 |
| 5.2.1 Chip Integration and Implementation Result              | 73 |
| 5.2.2 Measurement Consideration                               | 76 |
| Chapter 6 Conclusion and Future Work                          | 77 |
| 6.1 Architecture Design Summary                               | 77 |
| 6.2 Chip Implementation Summary                               | 78 |
| 6.3 Future Work                                               | 78 |
| Reference                                                     | 79 |

# List of Tables

| Table 2-1 Comparison of 802.15.3c and 802.11ad                             | 9  |
|----------------------------------------------------------------------------|----|
| Table 2-2 SC mode specifications                                           | 11 |
| Table 2-3 HSI mode specifications                                          | 12 |
| Table 2-4 Golay sequences                                                  | 13 |
| Table 4-1 System parameters                                                | 43 |
| Table 4-2 FDE Hardware comparison between SC and HSI mode                  | 46 |
| Table 4-3 TDE Hardware comparison between SC and HSI mode                  | 49 |
| Table 4-4 Number of calculation for each corrlator                         | 52 |
| Table 4-5 Specifications of FFT/IFFT in the proposed FDE                   | 58 |
| Table 4-6 LS-LMS FDE for SC and HSI mode synthesis result                  | 60 |
| Table 4-7 Golay-MPIC TDE for SC and HSI mode synthesis result              | 61 |
| Table 4-8 Synthesis result of LS-LMS FDE and Golay-MPIC TDE                | 63 |
| Table 4-9 Computation complexity of LS-LMS FDE and Golay-MPIC TDE          | 64 |
| Table 4-10 Comparison of LS-LMS FDE and Golay-MPIC TDE                     | 64 |
| Table 5-1 Area and power comparison of 32×32 dual port memory and register | 67 |
| Table 5-2 1 <sup>st</sup> chip summary (using LS-LMS FDE)                  | 68 |
| Table 5-3 Power measurement of 1 <sup>st</sup> chip                        | 69 |
| Table 5-4 Comparison of old and new TSMC 65nm GP process libraries         | 74 |
| Table 5-5 2 <sup>nd</sup> Chip summary (using Golay-MPIC TDE)              | 75 |

# List of Figures

| Fig. 1-1 Unlicensed band at 60 GHz in different countries                      | 5           |
|--------------------------------------------------------------------------------|-------------|
| Fig. 2-1 RF band plan                                                          | 11          |
| Fig. 2-2 CMS frame format                                                      | 13          |
| Fig. 2-3 SC PHY frame format                                                   | 14          |
| Fig. 2-4 SC PHY preamble structure                                             | 14          |
| Fig. 2-5 SC PHY payload structure                                              | 15          |
| Fig. 2-6 SC data format                                                        | 15          |
| Fig. 2-7 HSI PHY payload structure                                             | 16          |
| Fig. 2-8 SC channel impulse response                                           | 18          |
| Fig. 2-8 SC channel impulse response                                           | 18          |
| Fig. 2-10 SC channel frequency response                                        | 19          |
| Fig. 2-11 HSI channel frequency response                                       | 19          |
| Fig. 3-1 Structure of fully parallel FDE                                       | 21          |
| Fig. 3-2 Illustration of adaptive FDE                                          | 25          |
| Fig. 3-3 FIR filter structure                                                  | 28          |
| Fig. 3-4 Multi-path interference                                               | 29          |
| Fig. 3-5 Golay Sequences of CES                                                | 32          |
| Fig. 3-6 Block diagram of the proposed adaptive LS-LMS FDE                     | 34          |
| Fig. 3-7 Learning curves                                                       | 35          |
| Fig. 3-8 Block diagram of the proposed LOS Golay-MPIC TDE                      | 36          |
| Fig. 3-9 BER of TDE SC mode for 2 channel paths                                | 37          |
| Fig. 3-10 BER of TDE HSI mode for 2 channel paths                              | 38          |
| Fig. 3-11 TDE SC mode BER for 3 channel paths & 2 <sup>nd</sup> Path gain=0.35 | and delay=8 |
|                                                                                | 39          |

| Fig. 3-12 TDE SC mode BER for 3 channel paths & 2 <sup>nd</sup> Path gain=0.35 and delay=40 |
|---------------------------------------------------------------------------------------------|
| 39                                                                                          |
| Fig. 4-1 Proposed block diagram of baseband receiver design                                 |
| Fig. 4-2 Block diagram of the proposed adaptive LS-LMS FDE                                  |
| Fig. 4-3 Hardware reduction of the proposed LS-LMS FDE                                      |
| Fig. 4-4 Proposed block diagram of baseband receiver design                                 |
| Fig. 4-5 Block diagram of the proposed Golay-MPIC TDE48                                     |
| Fig. 4-6 Hardware reduction of the proposed Golay-MPIC TDE49                                |
| Fig. 4-7 Efficient Golay Correlator (EGC)                                                   |
| Fig. 4-8 EGC procedure                                                                      |
| Fig. 4-9 Optimsed Golay Correlator (OGC)                                                    |
| Fig. 4-10 Table of inversed scalar54                                                        |
| Fig. 4-11 Reduced mapping55                                                                 |
| Fig. 4-12 Structure of the scalar55                                                         |
| Fig. 4-13 Reduced table56                                                                   |
| Fig. 4-14 Block diagram of modified divider57                                               |
| Fig. 4-15 Area Percentage of each part in LS-LMS FDE                                        |
| Fig. 4-16 BER of FDE in SC and HSI mode                                                     |
| Fig. 4-17 Area Percentage of each part in Golay-MPIC TDE                                    |
| Fig. 4-18 BER of TDE in SC and HSI mode                                                     |
| Fig. 5-1 Proposed block diagram of baseband receiver design                                 |
| Fig. 5-2 Area proportion of each block circuit excluding FFT                                |
| Fig. 5-3 Size of 32×32 dual port memory with power rings                                    |
| Fig. 5-4 1 <sup>st</sup> Chip layout view of the proposed baseband receiver                 |
| Fig. 5-5 Testing diagram for measurement                                                    |
| Fig. 5-6 Die photo of 1 <sup>st</sup> Chip baseband receiver                                |

| Fig. 5-7 Functional test of 1 <sup>st</sup> chip                             | 70 |
|------------------------------------------------------------------------------|----|
| 11g. 5 / 1 Giletional tost of 1 cinp                                         |    |
| Fig. 5-8 Agilent 93000 SOC test system                                       | 71 |
| Fig. 5-9 CQFP 160 pins socket                                                | 71 |
| Fig. 5-10 Wire connection of CQFP 160 pins socket                            | 72 |
| Fig. 5-11 Proposed block diagram of baseband receiver design                 | 73 |
| Fig. 5-12 Area proportion of each block circuit                              | 74 |
| Fig. 5-13 2 <sup>nd</sup> Chip layout view of the proposed baseband receiver | 75 |
| Fig. 5-14 Testing diagram for measurement                                    | 76 |



# Chapter 1

## Introduction

This chapter introduces the indoor wireless communication for 60 GHz with multi-Gbps transmission in Section 1.1. Section 1.2 is the motivation of this work about dual mode equalizer, and Section 1.3 is the thesis organization.

# 1.1 Indoor Wireless Communication Standards for 60 GHz with Multi-Gbps Transmission

The wireless communication technology has been developed for many years. Because of the improvement on CMOS process, the high speed wireless transmission becomes a promising technology in recent years. For a newly developed wireless communication system, the selection of the operating frequency band is an important issue. Since the 60 GHz RF band is unlicensed in many countries as shown in Fig. 1-1 [1], the development of a wireless communication system on 60 GHz RF band does not need license and becomes a promising technology in recent years. With about 9 GHz-wide bandwidth, the data rate can be very high. Moreover, due to the property of short transmission range, the security issue is protected and the reuse rate is very high. Based on these benefits, the wireless communication system using 60 GHz RF band is suitable for indoor and Multi-Gbps data rate transmission.

Three main features of 60 GHz are [2] [3]:

• First of all, the available bandwidth is wide. (57.0–66.0 GHz) and can provide Multi-Gbps transmission.

- Second, the 60GHz frequency band is license-free in most country.
- Finally, the reflection of the signal is attenuated quickly, so the transmitter needs to aim at the receiver. Therefore, beamforming is required.



Fig. 1-1 Unlicensed band at 60 GHz in different countries

There are two wireless communication standards using 60 GHz RF band: IEEE 802.15.3c [4] and IEEE 802.11ad [5]. IEEE 802.15.3c is announced in 2003 and the latest version is released in 2009. IEEE 802.11ad is a 60 GHz version of 802.11 series and looks for compatibility with 802.15.3c in PHY. Both standards have Orthogonal frequency-division multiplexing (OFDM) and single carrier (SC) mode, and focus on indoor, over Gbps data rate wireless transmission. The data rate of SC and OFDM mode in two standards can achieve over 4.6 Gbps and 6.7 Gbps, respectively. The detail comparison of two standards will be discussed in Section 2.1.

#### 1.2 Motivation

OFDM has been developed for many years due to its inter-symbol interference (ISI)-free property. With cyclic prefix (CP), OFDM turns a group of samples with ISI in the time domain into the ISI-free sub-channels, which can be easily equalized by a single-tap equalizer. The orthogonal sub-carriers of OFDM provide high spectrum efficiency and can achieve high data rate requirement. Although OFDM is able to eliminate the ISI, it has the drawback of high peak-to-average power ratio (PAPR). An OFDM signal is composed of N sinusoidal waves, where N is number of sub-channels. As N increased, the PAPR gets higher, and the system requires a power amplifier with large linear region in RF end. Furthermore, OFDM suffers from inter-carrier-interference (ICI) caused by carrier frequency offset (CFO) or Doppler Effect, which ruins the orthogonality between each sub-channel. On the other hand, SC is less affected by PAPR and ICI than OFDM while using time-domain equalization. However, the ISI impacts the performance and the computational complexity of time-domain equalizer (TDE) is very high as the RMS delay spread of channel increasing.

In baseband system, it has three main blocks. First, a synchronization block includes symbol/preamble detection, CFO, and SCO estimation. The second block is channel estimation and equalization. The final block is channel decoder that corrects the error bits. This thesis focuses on the channel estimation and equalization design.

It's better to implement SC and OFDM dual mode equalization design with only one method, so that least hardware cost and maximum hardware sharing can be achieved. As mentioned before, SC and OFDM both have advantages and disadvantages, so this thesis will propose two kinds of equalization which are in

time-domain and frequency-domain respectively. It depends on the channel condition and computational complexity overhead to choose the corresponding equalization method. In IEEE 802.15.3c and IEEE 802.11ad standards, to achieve high transmission sampling rate, it is a challenge to meet timing requirement while maintain low hardware complexity. Parallel architecture is proposed to solve the high sampling rate and hardware complexity dilemma. Power consumption is also a critical problem when operating at such high data rate.

#### 1.3 Thesis Organization

The organization of this thesis is as follows. Chapter 2 compares the difference between IEEE 802.15.3c and IEEE 802.11ad. Then, we give the overview of IEEE 802.15.3c standard. Chapter 3 first makes overview of traditional frequency domain equalizer (FDE) and time domain equalizer (TDE), and then the proposed algorithms for FDE and TDE are described. The hardware architecture design and RTL simulation result are addressed in Chapter 4. Also, we will discuss the advantage and disadvantage between the proposed FDE and TDE. In Chapter 5, baseband receiver system and chip implementation of the proposed FDE and TDE are presented. Finally, Chapter 6 is the conclusion and the future work.

# Chapter 2

# **Overview of Multi-Gbps Transmission**

## **Indoor Wireless Communication**

### **Standards**

This chapter introduces the standards for indoor wireless communication. Section 2.1 makes comparison of IEEE 802.15.3c and IEEE 802.11ad. The detail specifications of IEEE 802.15.3c will be described in Section 2.2 with special emphasis on equalization related parts and channel model.

#### 2.1 Comparison of IEEE 802.15.3c and IEEE 802.11ad

Both standards have OFDM and single carrier (SC) mode and have quite the same system specifications. The comparisons of these two standards are listed in Table 2-1[3]. In the 802.11ad, a new mode, named low power SC PHY, is added. This mode aimed to have low data rate with low power consumption. The low power SC mode uses Reed-Solomon (RS) code instead of low-density parity-check (LDPC) for channel coding to reduce the computational complexity and power consumption. Besides, the payload of this new mode is different with the other modes. The data block length is still 448 like the SC mode; however, it is divided into 7 sub-blocks which are composed of 56 data chips and 8 known data chips for GI. If single carrier frequency domain equalizer (SC-FDE) [6] is adopted in the receiver, a smaller sub-block length will reduce the power consumption. It is because frequency domain

equalizer needs less tap to equalize the received signals.

The following sections will focus on system specifications of IEEE 802.15.3c, since the final version of IEEE 802.11ad is not yet released.

Table 2-1 Comparison of 802.15.3c and 802.11ad

|                           | 802.15.3c                                                                                                        | 802.11ad                      |
|---------------------------|------------------------------------------------------------------------------------------------------------------|-------------------------------|
| Frequency Band            | 57-66 GHz                                                                                                        |                               |
| Sample (Chip) Rate        | 2640 MHz (OFDM)<br>1760 MHz (SC)                                                                                 |                               |
|                           | Common Mode Signaling(CMS) (SC)                                                                                  | Control PHY(SC)               |
|                           | SC                                                                                                               | SC PHY                        |
| Modes                     | High Speed Interface(HSI) (OFDM)                                                                                 | OFDM PHY                      |
|                           | Audio/Visual(AV)<br>(OFDM)                                                                                       | No                            |
|                           | No                                                                                                               | low power SC PHY              |
| Channel Code              | LDPC                                                                                                             | LDPC,<br>RS(for low power SC) |
| Preamble Structure        | Almost the same                                                                                                  |                               |
| Payload Structure         | OFDM: 512 + 64 (GI), 336 useful subcarriers<br>SC: 448 + 64 (known GI)<br>low power SC: [56 + 8 (known GI)]*7+64 |                               |
| Number of Pilot<br>(OFDM) | Both are 16, but with different location                                                                         |                               |

#### 2.2 IEEE 802.15.3c Specifications

The IEEE 802.15.3c standard which is based on SC and OFDM transmission is developed for Wireless Personal Area Network (WPAN) to provide short range (<10 m) and very high speed (>2 Gbps) multimedia data services to personal computer and consumer appliances located in rooms, offices, and so on [7]. In the beginning of transmission, IEEE 802.15.3c uses Common Mode Signaling (CMS) and a preamble is attached in front of the data stream. The CMS is specified to enable interoperability among different PHY modes. After the CMS, the frame payload is transmitted in different PHY modes. The preamble is added to aid receiver algorithms related to AGC setting, antenna diversity selection, timing acquisition, frequency offset estimation, frame synchronization, and channel estimation. With insertion of CP, it can reduce the impact of ISI. With channel coding, the system can correct the transmission errors. Furthermore, the standard defines interleaving, scrambler, unequal channel coding, and several modulation scheme to achieve better performance.

#### 2.2.1 Basic Specifications

In IEEE 802.15.3c, there are three transmission modes: Single Carrier (SC) mode, High Speed Interface (HSI) mode, and Audio/Visual (AV) mode. HSI and AV mode use OFDM transmission, and SC mode is single carrier transmission. The detail specifications of SC and HSI mode are listed in Table 2-2 and Table 2-3 respectively.

The RF band occupies 9 GHz bandwidth while the sampling rate is 1760 MHz as shown in Fig. 2-1. The standard indicates that the RF band is divided into four sub-bands such that the Nyquist bandwidth of each sub-band is exactly 1760MHz. In

addition, each sub-band has 432 MHz spacing to prevent the interference from each other. In this case, the RF band can support 4 transmission bands without any interference.



Fig. 2-1 RF band plan

Table 2-2 SC mode specifications

| Description                                 | Value                                                                           |  |
|---------------------------------------------|---------------------------------------------------------------------------------|--|
| Chip rate (MHz)                             | 1760                                                                            |  |
| Chip duration (ns)                          | ~0.568                                                                          |  |
| Subblock length (samples)                   | 512                                                                             |  |
| Pilot length (samples)                      | 0 0 8 64                                                                        |  |
| Length of data chips per subblock (samples) | 56 448                                                                          |  |
| Modulation schemes                          | π/2 BPSK, π/2 QPSK,<br>π/2 8-PSK, π/2 16-QAM                                    |  |
| FEC types                                   | RS(255,239), LDPC(672,336),<br>LDPC(672,504), LDPC(672,588),<br>LDPC(1440,1344) |  |
| Transmit center frequency tolerance (ppm)   | ±25                                                                             |  |
| Required frame error rate (FER)*            | 8 %                                                                             |  |

<sup>\*:</sup> FER is determined at the PHY Service Access Point interface after any applied error correction methods. The measurement shall be performed in AWGN channel with a frame payload length of 2048 octets.

Table 2-3 HSI mode specifications

| Description                         | Value                |
|-------------------------------------|----------------------|
| Sampling rate (MHz)                 | 2640                 |
| Sampling period (ns)                | ~0.38                |
| Number of subcarriers/FFT size      | 512                  |
| Data/pilot/guard subcarriers        | 336/16/141           |
| Guard interval length (samples)     | 64                   |
| Subcarrier frequency spacing (MHz)  | 5.156                |
| Modulation schemes                  | QPSK, 16-QAM, 64-QAM |
| 41111                               | LDPC(672,336),       |
|                                     | LDPC(672,504),       |
| FEC types                           | LDPC(672,420),       |
|                                     | LDPC(672,588)        |
| Transmit center frequency tolerance | 396                  |
| (ppm)                               | ±20                  |

#### 2.2.2 Equalization Related Specifications

In this standard, there are some well-known data streams that are assigned to specify different purposes, i.e. MAC layer control signal, the piconet coordinate signal, or the performance improving signal. This section will introduce the specific signaling that is directly related to the equalization.

#### • Common Mode Signaling(CMS)

The CMS is a low data rate SC mode and specified to enable the switching among different PHY modes. It's also used for transmission of the beacon frame, sync frame,

command frame, and training frame in the beamforming procedure. The order of sequences in time is SYNC, SFD and CES.



Fig. 2-2 CMS frame format

As shown in Fig. 2-2, the CMS preamble is constructed by Golay complimentary sequences  $a_{128}$  and  $b_{128}$  of length 128, which are listed in Table 2-4. The SYNC field uses a repetition of codes for higher robustness frame detection. The main purpose of SFD is used to establish frame timing as well as the header rate. The CES field is used for channel estimation. The sequences  $a_{256}$  and  $b_{256}$  inside the CES steam also hold the property of Golay complimentary, and can be decomposed as

$$a_{256} = [b_{128}a_{128}]$$

$$b_{256} = [\overline{b}_{128}a_{128}]$$
(2.1)

, where  $b_{128}$  and  $\overline{b}_{128}$  in the sequences are transmitted first in time and the binary-complement of a sequence x is donated as  $\overline{x}$ .

Table 2-4 Golay sequences

| Sequence name    | Sequence value                   |
|------------------|----------------------------------|
| $a_{128}$        | 0536635005C963AFFAC99CAF05C963AF |
| b <sub>128</sub> | 0A396C5F0AC66CA0F5C693A00AC66CA0 |

#### SC and HSI PHY preamble

A PHY preamble is added to aid the receiver algorithms such as AGC setting, timing acquisition, frame synchronization, and channel estimation. After the CMS, the system will switch to the designated mode and begin to transmit the data payload. In each beginning of transmission, the transmitter will send a PHY preamble to aid the receiver algorithms, just like the one in CMS. In SC mode, the preamble is transmitted at the rate of 1760 MHz. The SC PHY frame format and preamble structure are shown in Fig. 2-3 and Fig. 2-4.



Fig. 2-4 SC PHY preamble structure

Like CMS preamble structure, the PHY preamble consists of SYNC, SFD, and CES field. Each of the field functions like the one in CMS preamble: SYNC field for frame detection, SFD field for validating the beginning of the frame, and CES field for channel estimation.

The preamble of HSI mode is transmitted at the sampling rate of 2640 MHz. Two types of preamble are defined for this mode: the long preamble and optional short preamble. The former has the same structure as the CMS, and the latter has the same

structure as defined for the SC mode.

#### • Data Payload and Pilot Channel Estimation Sequence(PCES)

In SC mode, the data stream is divided into data blocks with each data block has 64 sub-blocks, as shown in Fig. 2-5 and Fig. 2-6. Each data block is followed by a PCES. The PCES insertion is an optional feature that allows the system to re-acquire the channel information periodically. The PCES is the same as the CES field in the SC PHY preamble and is shown in (2.2).

$$PCES_{SC} = [a_{256}b_{256}a_{256}b_{256}b_{128}]$$
 (2.2)

Since PCES contains the information of the CES field, this cyclic prefixed signal can provide the channel information periodically. These pilot words which are  $c_0a_L$  and  $c_1b_L$  are used for timing tracking, compensation for clock drift, and compensation for frequency offset error. Furthermore, the pilot words act as the cyclic prefix and enable the frequency-domain equalization.

1111111



Fig. 2-5 SC PHY payload structure



Fig. 2-6 SC data format

In HSI mode, every 96 ( $N_{PCES}$ ) OFDM symbols will insert one PCES, as shown in Fig. 2-7. In one OFDM symbol, cyclic prefix of 64 samples is added to prevent ISI effect. The PCES is identical to the CES field prepended by  $a_{128}$  in the PHY preamble which is shown in (2.3).

$$PCES_{HSI} = [a_{128}a_{256}b_{256}a_{256}b_{256}b_{128}]$$
 (2.3)



Fig. 2-7 HSI PHY payload structure

#### 2.2.3 Channel Model

Under the 60 GHz RF band, there are some special properties when waves are transmitted in the air that is much different from those below 10GHz RF band channel. Due to strong directivity, wave reflexes, diffracts, and scatters slightly. Also, the energy of the wave centralizes in a certain angles. Since the oxygen absorbs the wave in this RF band, the transmission distance is very short, less than 10 meters, which leads to negligible multipath effect. Based on these properties, IEEE 802.15.3c standard is pronounced for the indoor, over Gbps data rate wireless transmission using 60 GHz RF band. In general, for such a high data rate, the channel would be influenced a lot by line-of-sight (LOS)/non-light-of-sight (NLOS) channel, root-mean-square (RMS) delay spread, Doppler Effect, and negligible multipath effect when the wireless communication system operates under the 60 GHz RF band. These

properties are listed below:

#### High Path Loss

While the EM wave passes through the medium, the medium absorbs the energy and limits the distance that the EM wave can travel. The more energy it lost, the shorter it can travel. The ratio of energy loss is mainly depends on the characteristic of the medium and the EM wavelength. The wavelength of 60 GHz wave is close to the length of the oxygen chemical bond, so the wireless communication in 60 GHz RF band suffers tremendously high path loss. As the result, the transmission distance is limited to about 10 m in maximum. Moreover, the effect of the multi-path fading is reduced since the non-line-of-sight (NLOS) wave travels more distance and loses more energy than the line-of-sight (LOS) wave.

#### • Strong Directivity

The strong directivity means that the EM wave energy almost centralizes in a small angle path. Based on physical principle of diffraction, the beam width is inversely proportional to the operating frequency [8]. This phenomenon shows that the antenna can only receive the signal from the transmitter antenna within a small angle range. In conclusion, the NLOS path has lower path gain relative to the LOS path, and the multi-path fading effect is small.

The channel model is based on the golden set released by IEEE 802.15.3c group [9] [10]. The golden channel with RMS delay spread 3.2ns is chosen as the simulation channel model. Fig. 2-8 and Fig. 2-10 are SC channel impulse response and channel frequency response with sampling rate 1.76GHz, respectively. Fig. 2-9 and Fig. 2-11 are HSI channel impulse response and channel frequency response with sampling rate

#### 2.64GHz, respectively.



Fig. 2-8 SC channel impulse response



Fig. 2-9 HSI channel impulse response



Fig. 2-10 SC channel frequency response



Fig. 2-11 HSI channel frequency response

# Chapter 3

# SC/OFDM Dual-Mode Frequency and

# **Time Domain Equalizer**

This chapter will review frequency and time domain equalization with channel estimation in Section 3.1 and 3.2 respectively. Section 3.3 is the proposed frequency and time domain equalizer.

### 3.1 Review of Frequency Domain Equalization (FDE) [11]

A simple illustration of fully parallel FDE is shown in Fig. 3-1. The input passes through Serial-to-Parallel block and transforms to frequency domain by FFT. Then, the frequency domain data is multiplied with coefficients W and then transformed back to time domain by IFFT. Unlike TDE, the number of coefficients in FDE is fixed without regard to the length of the channel impulse response. The potential problem is when the length of the CIR is longer than the length of the CP. In that case, the circular convolution is ruined and FDE fails to equalize the channel effect. However, the channel model shows that the maximum length of CIR is far less than the length of CP, so this system does not have each problem.



Fig. 3-1 Structure of fully parallel FDE

The formula of circular convolution can be transformed into a simple multiplication in the frequency domain, and the capital letter means frequency domain signal:

$$\mathbf{R} = \mathbf{H} \cdot \mathbf{D} \tag{3.1}$$

,where  $\mathbf{H}$  is a diagonal matrix,  $\mathbf{R}$  is a received signal vector, and  $\mathbf{D}$  is transmitted data vector. To recover the transmitted data, we multiply the inverse of  $\mathbf{H}$  on both sides of equation:

$$\mathbf{H}^{-1} \cdot \mathbf{R} = \mathbf{H}^{-1} \cdot \mathbf{H} \cdot \mathbf{D} = \mathbf{D} \tag{3.2}$$

, where the inverse of  $\mathbf{H}$  is also a diagonal matrix. After CP removal, we can fully recover the transmitted signal  $\mathbf{D}$ .

The above equations describe the ideal case: no AWGN and time-variant channel. In reality, the white noise always exists due to the thermal noise, and the channel varies with time due to many effects, such as related movement, air flow, or moving object. Thus, the equation should be:

$$\mathbf{R}_{k} = \mathbf{J}_{k}(t) \cdot \mathbf{H}_{k} \cdot \mathbf{D}_{k} + \mathbf{N}_{k} \tag{3.3}$$

, where  $\mathbf{J}_k(t)$  means the time-variant effect matrix,  $N_k$  is a AWGN vector, and k is the index of the subchannels. If we simply multiply the inverse of  $\mathbf{H}_k$  all the time, the time-variant effect will corrupt the data. Furthermore, to get the accurate inverse of  $\mathbf{H}_k$  is a difficult job under AWGN. To break through the predicament, the first thing is to overcome AWGN and get the inverse of  $\mathbf{H}_k$  as accurate as possible. Then, an adaptive algorithm is performed to track the changes in the time-variant channel. In this way, the time-variant component  $\mathbf{J}_k(t)$  is no more an issue in the equalization.

#### 3.1.1 Channel Estimation

In the beginning of the transmission, the transmitter sends the time-domain training sequence  $a_{256}b_{256}$  (= $u_{512}$ ) located in CES field to assist the equalization as shown in Fig. 2-4. With the training sequence, we can easily estimate the channel matrix  $\mathbf{H}_k$ , which is the inverse of the coefficients  $\mathbf{W}_k$ .

$$\mathbf{H}_{k} = \frac{\mathbf{R}_{k}^{96}}{\mathbf{U}_{512,k}} \tag{3.4}$$

,where  $U_{512}$  is the frequency domain constant value of  $u_{512}$ , and k is the sub-carrier index.

This solution is known as zero-forcing (ZF) method. The benefit is the simple implementation, but this method suffers from a problem: noise enhancement. With AWGN, Eqn. (3.4) is revised as Eqn. (3.5).

$$\mathbf{W}_{k} = \frac{\mathbf{U}_{512,k}}{\mathbf{H}_{k} \cdot \mathbf{U}_{512,k} + \mathbf{N}_{k}}$$
(3.5)

The noise enhancement occurs when the channel gain  $\mathbf{H}_k$  is so small that the noise  $N_k$  is the dominant part in received signal. In that case, especially with large  $N_k$ , the

estimation result is far away from perfect estimation.

Since there are 2  $U_{512}$  in CES, using Least-Square (LS) method is a better way than using ZF. The main point of LS is to minimize the sum of the squares of the error. First of all, the equalization can be described as:

$$\mathbf{R}_{k} = \frac{\mathbf{U}_{512,k}}{\mathbf{W}_{k}} \tag{3.6}$$

Second, apply the error  $\epsilon_i$  caused by AWGN, where i stands for i-th  $U_{512}$  in CES.

$$\mathbf{R}_{k,i} = \frac{\mathbf{U}_{512,k,i}}{\mathbf{W}_k} + \mathbf{\varepsilon}_i$$

$$\mathbf{\varepsilon}_i = \mathbf{R}_{k,i} - \frac{\mathbf{U}_{512,k,i}}{\mathbf{W}_k}$$
(3.7)

Then, we need to minimize the sum of the squares, so let the partial derivative on  $\mathbf{W}_k$  be zero.

$$S = \sum_{i} \varepsilon_{i}^{2} = \sum_{i} (\mathbf{R}_{k,i} - \frac{\mathbf{U}_{512,k,i}}{W_{k}})^{2}$$

$$\frac{\partial S}{\partial \mathbf{W}_{k}} \Big|_{\hat{\mathbf{W}}} = \sum_{i} (2\mathbf{R}_{k,i} \frac{\mathbf{U}_{512,k,i}}{\mathbf{W}_{k}^{2}} - 2\frac{\mathbf{U}_{512,k,i}^{2}}{\mathbf{W}_{k}^{3}}) = 0$$
(3.8)

Finally, the solution of  $\hat{\mathbf{W}}_k$  indicates the system will have minimum of S.

$$\hat{\mathbf{W}}_{k} = \frac{\sum_{i} \mathbf{U}_{512,k,i}^{2}}{\sum_{i} (\mathbf{R}_{k,i} \mathbf{U}_{512,k,i})}$$
(3.9)

Since there are 2  $U_{512}$  (i=2) in CES and  $U_{512}$  is constant all the time, it can be rewritten as:

$$\hat{\mathbf{W}}_{k} = \frac{2\mathbf{U}_{512,k}^{2}}{(\sum_{i} \mathbf{R}_{k,i})\mathbf{U}_{512,k}} = \frac{\mathbf{U}_{512,k}}{\frac{1}{2}\sum_{i} \mathbf{R}_{k,i}} = \frac{\mathbf{U}_{512,k}}{\frac{1}{2}(\mathbf{R}_{k,1} + \mathbf{R}_{k,2})}$$
(3.10)

Substituting  $\mathbf{R}_k$  with  $U_{512}$ , the channel estimation result is:

$$\hat{\mathbf{W}}_{k} = \frac{\mathbf{U}_{512,k}}{\frac{1}{2} \sum_{i} (\mathbf{H}_{k,i} \mathbf{U}_{512,k} + \mathbf{N}_{k})} = \frac{\mathbf{U}_{512,k}}{\frac{1}{2} (\mathbf{H}_{k,1} \mathbf{U}_{512,k} + \mathbf{H}_{k,2} \mathbf{U}_{512,k})}$$
(3.11)

With the summation of  $N_k$ , the noise enhancement is reduced since the mean of AWGN is zero.

#### 3.1.2 Adaptive Equalization

In OFDM system, pilot subcarriers are designed to indicate the changes of the time-variant channel. However, we do not have any known message in the frequency domain when the system is SCBT. Thus, our FDE requires an adaptive algorithm against the time-variant channel.

There are many adaptive algorithms developed in the literals. These algorithms mainly focus on their computational complexity and convergence speed. The widely used algorithms are Minimum-Mean-Square-Error (MMSE), Recursive-Least-Square (RLS), and Least-Mean-Square (LMS) [12],[13]. Due to 2640MHz sampling rate, high computational complexity algorithms are not suitable for such high sampling rate system because of high hardware complexity and power consumption. Furthermore, using the information of SNR is not practical in the hardware design. Based on above the considerations, we will show that LMS is a good choice for the FDE.

Let's consider the block diagram of the adaptive FDE shown in Fig. 3-2. *R* is the output from FFT, and the adaptive FDE do the equalization and update filter

coefficients W. The FDE output is transformed back to time domain and decision of data is made by the demapper. The error E is the difference between FDE output and the training sequence (or sliced output when the data is transmitted).



Fig. 3-2 Illustration of adaptive FDE

The idea of LMS algorithm is to use the method of the steepest descent to find a set of W which minimizes the cost function. In our design, the FDE takes a subblock into the equalization, so the cost function should involve a block of errors, which is so called Block LMS (BLMS) [14]. However, since the equalization is independent of each subchannel, we can consider each cost function  $C_k$  in each subchannel independently instead of whole subblock.

$$C_k = Ex\{|\boldsymbol{E}_k|^2\} \tag{3.12}$$

The notation of  $Ex\{.\}$  is used to denote the expect value because we don't want to be confused with the error E. Then, applying the steepest descent is to take the partial derivative with respect to the filter coefficients W.

$$\nabla C = \nabla Ex\{EE^*\} = 2Ex\{\nabla EE^*\}$$
(3.13)

Since the equalization is independent of each subchannel, Eqn. (3.13) is equal to

zeros when the error E and coefficient W are in different subchannel. Then, substituting E with received signal R, we can rewrite Eqn. (3.13) as

$$\frac{d\mathbf{E}_{k}}{d\mathbf{W}_{k}} = \frac{d(\mathbf{D}_{k} - \mathbf{W}_{k} \mathbf{R}_{k})}{d\mathbf{W}_{k}} = -\mathbf{R}_{k}$$

$$\therefore \frac{dC_{k}}{d\mathbf{W}_{k}} = -2Ex\{\mathbf{R}_{k} \mathbf{E}_{k}^{*}\}$$
(3.14)

, where k is the subchannel index. Now, these derivatives show the steepest ascent of the cost function. To find out the minimum of the cost function, we take a step size of  $\frac{\mu}{2}$  in the opposite direction of the derivatives.

$$\mathbf{W}_{k,n+1} = \mathbf{W}_{k,n} - \frac{\mu}{2} \frac{dC_{k,n}}{d\mathbf{W}_{k,n}} = \mathbf{W}_{k,n} + \mu Ex\{\mathbf{R}_k \mathbf{E}_k^*\}$$
(3.15)

, where n indicates the subblock index or symbol index at SC or OFDM mode.

The expected value can be simplified, and the whole LMS algorithm can be expressed as:

LMS: 
$$\mathbf{W}_{k,n+1} = \mathbf{W}_{k,n} + \mu \mathbf{R}_k \mathbf{E}_k^*$$
 (3.16)

The derivations of MMSE and RLS can be found in [16], [17]:

MMSE: 
$$\mathbf{W}_{k} = \frac{\mathbf{H}_{k}^{*}}{\mathbf{H}_{k}\mathbf{H}_{k}^{*} + \frac{\sigma_{n}^{2}}{\sigma_{s}^{2}}}$$
 (3.17)

$$Y = \mathbf{W}R$$

$$U = \mathbf{P}Y^*$$
RLS:
$$\mathbf{g}_n = \frac{1}{\lambda + YU}U$$

$$\mathbf{W}_{n+1} = \mathbf{W}_n + \mathbf{g}_n \mathbf{E}_n$$
(3.18)

, where n indicates the subblock index or symbol index at SC or OFDM mode,  $\sigma_n^2$  and  $\sigma_s^2$  are variance of noise and signal respectively,  $\mathbf{Y}$  is equalized signal,  $\mathbf{U}$  is the intermediate vector, and  $\mathbf{g}_n$  is the gain vector.

Compared with MMSE [15]-[17] and RLS [18], [19], the LMS algorithm has less computational complexity than RLS since there is only one multiplication for updating at each sub-channel. In hardware design, more operations on updating will cause a longer feedback latency. The latency will impact the performance since the coefficient of equalizer can not be updated immediately. In high sampling rate system, high computational operations will required more pipelined stages, thus the latency is much longer. Furthermore, the low computational complexity leads to low power consumption. The low power issue is more important in the modern SOC design. In that case, LMS also has the advantage of low power consumption property. On the other hand, MMSE also has less computational complexity than RLS, but it requires the information of SNR, which is hard to be evaluated since there are Doppler and channel Effect on the received signal. Although there are some algorithms [15]-[17] trying to do SNR evaluation, the result is still not reliable in the practical system. Based on these considerations, LMS is suitable for FDE in high sampling rate design and can also achieve the required bit error rate (BER) with LS channel estimation that will be mentioned in Section 3.3.2.

### 3.2 Review of Time Domain Equalizer (TDE)

The basic structure of the TDE is the FIR filter, which performs the convolution between data stream and the filter coefficients. A simple illustration of the FIR filter is shown in Fig. 3-3, which is known as Zero Forcing (ZF). A robust adaptive decision feedback method can be used to enhance the performance as mentioned in Section

3.1.2. The Least Mean-Square (LMS) equalizer coefficients updating method is chosen to minimize the mean-square error, instead of ZF [20].



However, the computational complexity of the convolution in both ZF and LMS is proportional to the length of the filter taps, which is determined by the length of the CIR. From the channel model in Fig. 2-8 and Fig. 2-9, the filter coefficients must satisfy the mathematical property in Eqn. (3.19).

$$h*w = [1 \ 0 \ 0 \ \cdots \ 0]$$
 (3.19)

Although the parallel architecture can increase the throughput of TDE, the complexity grows linearly with the number of coefficients.

### 3.2.1 Multi-path Interference Cancellation

Multi-path Interference Cancellation (MPIC) [21] method is an efficient way for suppressing Inter-path Interference (IPI). MPIC is composed of two parts. The first part is multi-path interference replica, and the second part is multi-path interference cancellation [22].

The following lower-case variables are all in time domain. In Fig. 3-4, during data transmission, dominant data path could be interfered by other multi-path data and  $\tau$  is the multi-path delay.



Fig. 3-4 Multi-path interference

Because of beamforming technique, the channel will always be LOS, and LOS channel model provided by IEEE 802.15.3c standard has only two higher channel path gains. The rest channel path gains almost equal to zeros. Therefore, the received signal is expressed like

$$\mathbf{y} = \mathbf{h} \cdot \mathbf{x} = \begin{bmatrix} \mathbf{h}_{1,1} & 0 & \cdots & 0 & \mathbf{h}_{1,1+\tau} & 0 & 0 \\ 0 & \ddots & 0 & \cdots & 0 & \ddots & 0 \\ 0 & 0 & \ddots & 0 & \cdots & 0 & \mathbf{h}_{N-\tau,N} \\ 0 & 0 & 0 & \ddots & 0 & \cdots & 0 \\ 0 & 0 & 0 & 0 & \ddots & 0 & \vdots \\ 0 & 0 & 0 & 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & \mathbf{h}_{N,N} \end{bmatrix} \cdot \mathbf{x}$$
(3.20)

, where  $\mathbf{h}$  means channel impulse response matrix,  $\mathbf{x}$  is the transmitted data vector, and  $\mathbf{y}$  is the received signal vector. Define main data path gain vector

$$h[t] = [h_{1,1} \ h_{2,2} \ ... \ h_{N,N}], \ t = 1, 2, ..., N$$

$$h[t] = 0 , t \le 0$$
(3.21)

and second data path gain vector

$$m[t] = [h_{1,1+\tau} \ h_{2,2+\tau} \ ... \ h_{N-\tau,N}], \ t = 1, 2, ..., N$$
  
 $m[t] = 0 \ .t \le 0$  (3.22)

where *N* is the number of total sub-channels.

A modified MPIC [21] has two stages. In initial stage, we can obtain the channel impulse response after channel estimation. Assume multi-path gain is small, so we don't consider the multi-path gain m[t]. Then, we have

$$\hat{x}[t] = \frac{y[t]}{h[t]}, t = 1, 2, ..., N$$
(3.23)

, where  $\hat{x}[t]$  is initial data vector without considering multi-path effect. Because the second path delay is  $\tau$ , it means the dominant received signal will be affected by the second received signal after time  $\tau$ . In update stage, the received signal cancels the second path interference and the result is divided by the main path gain to get the updated transmitted data x[t].

$$x[t] = \frac{y[t] - m[t - \tau] \times \hat{x}[t - \tau]}{h[t]}, t = 1, 2, ..., N$$
(3.24)

The equation from (3.22) can be simplified as

$$x[t] = x[t] - \frac{m[t-\tau] \times x[t-\tau]}{h[t]}, t = 1, 2, ..., N.$$
(3.25)

If we only consider about high channel path gain which interferes the received signal larger, it can reduce the hardware complexity of traditional time domain equalization. The filter taps won't increase with the length of channel impulse response.

## 3.2.2 Golay-Sequence Aided Channel Estimation [15]

A set of complementary series is defined as a pair of equally long, finite sequences of two kinds of elements which have the property that the number of pairs of like elements with any one given separation in one series is equal to the number of pairs of unlike elements with the same given separation in the other series. For example, the two series a = 00010010 and b = 00011101 are complementary [23].

### **Golay Sequence Property**

Golay sequences with N of length are generated by delay and weight vectors with M of each length and a recursive algorithm [24], [25]. Binary Golay sequences are generated when the delay vector  $D = [D_0...D_{M-1}]$  is chosen as any permutation of  $[2^0...2^{M-1}]$  and the weight vector  $W = [W_0...W_{M-1}]$  has  $\pm 1$  of elements. The recursive algorithm for generating Golay sequences is described as follows:

$$a_{N,(0)}(i) = \delta(i),$$

$$b_{N,(0)}(i) = \delta(i),$$
(3.26)

$$b_{N,(0)}(i) = \delta(i), \tag{3.27}$$

$$a_{N,(m)}(i) = W_{m-1}a_{N,(m-1)}(i) + b_{N,(m-1)}(i - D_{m-1}), (3.28)$$

$$b_{N,(m)}(i) = W_{m-1}a_{N,(m-1)}(i) - b_{N,(m-1)}(i - D_{m-1}), (3.29)$$

$$a_{N,(0)}(i) = a_{N,(M)}(i),$$
 (3.30)

$$b_{N,(0)}(i) = b_{N,(M)}(i)$$
. (3.31)

Golay sequences are complementary sequences, which have an attractive property that the sum of their autocorrelations has single maximum peak and no side-lobe.

A pair of Golay sequences for i = 0,..., N-1, with  $N = 2^{M}$ :

$$a_N(i)$$
 and  $b_N(i)$ . (3.32)

Autocorrelation property:

$$R_a(i) + R_b(i) = 2N\delta(i), \qquad (3.33)$$

where

$$R_a(i) = \sum_{k=0}^{N-i-1} a_N(k) a_N^*(k+i), \qquad (3.34)$$

and

$$R_b(i) = \sum_{k=0}^{N-i-1} b_N(k) b_N^*(k+i).$$
 (3.35)

The symbol "\*" denotes complex conjugate.

### • Estimation Procedure

The received channel estimation sequence (CES) is expressed by

$$r_{CE}(t) = \sum_{t'=0}^{T_{CH}-1} h(t') s_{CE}(t-t') + n(t), \qquad (3.36)$$

where channel estimation sequence  $s_{CE}(t)$  which is shown in Fig. 3-5 can be divided into two parts of Golay sequences, "Part a" and "Part b".



Fig. 3-5 Golay Sequences of CES

Both parts have a common configuration, that is,  $N_{RCE}$  repetitions of base

sequences with length  $N_{\rm CE}$  and cyclic prefix and postfix with length  $N_{\rm CPCE}$ . The base sequences for "Parts a and b" are Golay sequences  $a_{N_{\rm CE}}(i)$  and  $b_{N_{\rm CE}}(i)$ , respectively. Then, the total length of CES is  $2(2N_{\rm CPCE}+N_{\rm RCE}N_{\rm CE})$ .

First, the Golay correlator calculates correlation values between the received CES and Golay sequences  $\alpha(t)$  and  $\beta(t)$ :

$$\alpha(t) = \frac{1}{N_{CE}} \sum_{d=0}^{N_{CE}-1} r_{CE}(t+d-N_{CE}+1) a_{N_{CE}}^{*}(d), \qquad (3.37)$$

$$\beta(t) = \frac{1}{N_{CE}} \sum_{d=0}^{N_{CE}-1} r_{CE}(t+d-N_{CE}+1) b_{N_{CE}}^{*}(d).$$
 (3.38)

Then, CPs are removed from the correlation values:

$$\hat{\alpha}(t) = \alpha(t + N_{CPCE}), t = 0, ..., N_{RCE}N_{CE} - 1,$$
 (3.39)

$$\hat{\beta}(t) = \beta(t + 3N_{CPCE} + N_{RCE}N_{CE}), t = 0, ..., N_{RCE}N_{CE} - 1.$$
(3.40)

After that, CIR  $\hat{h}(t)$  is estimated by sum and average operations:

$$\hat{h}(t) = \frac{1}{2N_{RCE}} \sum_{p=0}^{N_{RCE}} (\hat{\alpha}(t+pN_{CE}) + \hat{\beta}(t+pN_{CE})).$$
 (3.41)

The derived noiseless channel impulse response can be used in MPIC equalization.

## 3.3 Proposed Architecture for IEEE 802.15.3c

## 3.3.1 Proposed Adaptive LS-LMS FDE [11]

The proposed adaptive LS-LMS FDE operates based on equations in Section of 3.1.1, and 3.1.2, and the block diagram is shown in Fig. 3-6.



Fig. 3-6 Block diagram of the proposed adaptive LS-LMS FDE

The pseudo code of the system flow is explained as followed:

- 1. If (received signals == training sequences) the LS channel estimation evaluates the channel coefficients, else the received signals will be equalized by FDE.
- 2. If (mode == SC) the equalized data will be transformed to time domain by IFFT and sent to decision circuit, else will be sent to decision circuit straightly.
- 3. If (mode == SC) the errors between equalized and sliced signal will be transformed to frequency domain by FFT and sent to LMS adaptive algorithm, else will be sent to LMS adaptive algorithm straightly.
- 4. While (received signals == valid data) LMS adaptive algorithm will update the channel coefficients in FDE.

The proposed adaptive LS-LMS FDE needs additional two FFT (FFT and IFFT) for SC/OFDM dual mode system. In SC mode, the equalized signal must be transformed to time domain to do slicing, and the error between equalized data and decision data will be transformed back to frequency domain for adaptive algorithm. Although the hardware complexity is large, the adaptive LS-LMS FDE can track the change of time-variant channel.

LMS has to do training to achieve convergence before any data is ready to be equalized. To accelerate the convergence speed, a fast LMS algorithm is applied by simply increasing the step size [26]. However, compared with other algorithms, LMS still suffers the slow convergence speed problem [27]. Therefore, the training time of LMS is longer than others, and it requires longer training sequence to do training. According to the standard, the training sequence is available in CES field of PHY preamble. However, there are only two  $a_{256}b_{256}$  for training before payload, so the training result of LMS is not good enough as compared with LS channel estimation.

The learning curves are shown in Fig. 3-7. The simulation is under the channel model which RMS delay is 3.2ns and SNR is 18 dB. LMS algorithm takes about extra12 data subblocks to achieve the same performance of LS-LMS combined algorithm. The result supports that the convergence speed of the combined algorithm is indeed faster than single LMS algorithm.



Fig. 3-7 Learning curves

### 3.3.2 Proposed LOS Golay-MPIC TDE

The proposed LOS Golay-MPIC TDE operates based on equations in Section of 3.2.1 and 3.2.2, and the block diagram is shown in Fig. 3-8.



Fig. 3-8 Block diagram of the proposed LOS Golay-MPIC TDE

The pseudo code of the system flow is shown below:

#### 1896

- 1. If (received signals == training sequences) the Golay sequences aided channel estimation evaluates the channel impulse response, else the received signals will be equalized by MPIC TDE.
- 2. If (mode == HSI) the equalized data will be transformed to frequency domain by FFT, else will go through next block straightly.
- 3. While (received signals == valid data) the equalized data will be sent to decision circuit and demapper.

The proposed LOS Golay-MPIC TDE has low hardware complexity, which doesn't need additional IFFT/FFT for SC/OFDM dual mode system. In each PCES period, the LOS Golay-MPIC TDE will update the channel impulse response again by Golay sequence aided channel estimation.

The LOS channel model provided by IEEE 802.15.3c standard has only two higher channel path gains. Also, the second path gain of LOS channel model is at most 0.3 and the other paths are less than the main path [9] [10]. Therefore the MPIC TDE can be efficiently implemented. For evaluating the influence of the multi-path gain and delay, different test patterns with AWGN and test channels are created. The test channels have one main path and one delayed path, and the second path gain and delay differs from 0.1 to 0.5 and differs from 8 to 56 samples which is normalized to main path, respectively. The modulations of SC and HSI mode are pi/2 QPSK and QPSK respectively. In Fig. 3-9 and Fig. 3-10, it shows BER is very sensitive to second path gain, but rarely affected by second path delay.



Fig. 3-9 BER of TDE SC mode for 2 channel paths



Fig. 3-10 BER of TDE HSI mode for 2 channel paths

If the number of LOS channel paths has more than two paths, the proposed Golay-MPIC TDE can still work. But the BER will be worse as long as the third path gain becomes larger. Take SC mode with three channel paths as example, Fig. 3-9 shows the BER is  $4.89*10^{-4}$  and  $6.64*10^{-4}$  when the second path gain is 0.35 and delay is 8 and 40, respectively. If the third channel path is involving for these two cases, the BERs are shown in Fig. 3-11 and Fig. 3-12. In Fig. 3-11 and Fig. 3-12, when the third path gain becomes small, the BER is near  $4.89*10^{-4}$  and  $6.64*10^{-4}$ , respectively. The performance of different delay of the third path is almost the same.



Fig. 3-11 TDE SC mode BER for 3 channel paths & 2<sup>nd</sup> Path gain=0.35 and delay=8



Fig. 3-12 TDE SC mode BER for 3 channel paths &  $2^{nd}$  Path gain=0.35 and delay=40

The proposed LOS Golay-MPIC TDE can achieve the BER requirement of IEEE 802.15.3c standard. If the number of LOS channel paths is more, the computation is more complex. Also, the number of register becomes more with the longer delay path. Therefore, we consider the general case of two higher gain channels. It can reduce hardware complexity and achieve the required BER. Section 4.2.1 will describe an efficient architecture design of Golay-sequence aided channel estimation.



# **Chapter 4**

# **Architecture Design and Performance**

## **Analysis**

This chapter describes architecture design of the proposed adaptive LS-LMS FDE and LOS MPIC TDE in Section 4.1. The detail sub-blocks design is shown in Section 4.2. Section 4.3 is the synthesis result and performance of the proposed adaptive LS-LMS FDE and LOS MPIC TDE. The comparison of the proposed adaptive LS-LMS FDE and LOS MPIC TDE is presented in Section 4.4

### 4.1 Design Specifications and Architecture

1896

IEEE 802.15.3c and IEEE 802.11ad standards focus on over Gbps data rate wireless communication. To achieve the target, there are two key features in the standard. The first one is the usage of the 60 GHz RF band. The unlicensed RF bandwidth is wide enough to support the usage of large bandwidth. The transmission rate is proportional to the bandwidth, so using the unlicensed 60 GHz RF band is essential. The second one is the ultra-high sampling rate. Although there are many methods to achieve the target of high data rate, like using higher modulation or multi-input and multi-output (MIMO) system [28], raising the sampling rate is the most direct way since the data rate is proportional to the sampling rate. With the moderate modulation scheme, the data rate could be twice or three times of the sampling rate. In this way, we can easily achieve the target of over Gbps data rate.

In modern CMOS process, the issue of power consumption becomes more and more important. There are many methods to reduce the power consumption when we design the hardware, such as using low computational complexity algorithm, substituting high complexity arithmetic unit with lower one, or sharing the hardware resources. By using these methods, we can reduce the chip area and the switching power consumption. Meanwhile, the leakage power is also reduced when the chip area is reduced.

As identical in Section 2.2.1, the sampling rate is 1760 MHz in SC mode and 2640 MHz in HSI mode. In the hardware design, this fact means the throughput of the chip is also 2640 MHz for dual mode system. There are two ways to fulfill the requirement: pipeline or parallel structure.

The pipeline structure is essential for such high computational and high speed digital circuit. Although inserting more flip-flops can reduce the operating time in one stage and increase the clock rate, the cost is large chip area due to large number of flip-flops. Moreover, there is insertion delay ( $t_{\text{setup}} + t_{\text{C-Q}}$ ) of the flip-flops, where  $t_{\text{setup}}$  and  $t_{\text{C-Q}}$  are the setup and clock to Q delay time of flip-flops. Thus when clock rate is very high and computation path is long, the fully pipelined structure is not a good design. Using the parallel structure can reduce the clock rate and maintain the throughput at the same time. The drawback is the area is proportional to the number of the copies. Although the slower clock rate leads to lower power consumption, too many copies will generate more leakage power than expected, which violates our purpose of low power design.

Based on the considerations above, we adopt the combined structure. The system parameters are listed in Table 4-1. For dual mode system, we set the clock rate at 330 MHz with 8 parallels, based on two considerations: the clock rate and the power consumption.

Table 4-1 System parameters

|                     | SC                       | HSI       |  |
|---------------------|--------------------------|-----------|--|
| Sampling rate (MHz) | 1760                     | 2640      |  |
| Clock rate (MHz)    | 220                      | 330       |  |
| Modulation          | π/2 QPSK                 | QPSK      |  |
| FFT point/sub-block |                          |           |  |
| length              | 512 symbols              |           |  |
| CP length           | 64 symbols               |           |  |
| Channel made        | LOS residual model [9]   |           |  |
| Channel model       | RMS delay spread: 3.2 ns |           |  |
| Max. data rate      | 2.52 Chns                | 5 20 Chns |  |
| (uncoded)           | 3.52 Gbps                | 5.28 Gbps |  |
| Level of parallel   | 8 times                  |           |  |

# 4.1.1 Proposed Adaptive LS-LMS FDE and Baseband Receiver [11]



Fig. 4-1 Proposed block diagram of baseband receiver design

The baseband receiver mainly consists of three blocks which is shown in Fig. 4-1. The first block is the synchronization block, which consists of SCO compensator, symbol boundary detection (BD), and CFO synchronization. SCO performs the time interpolation method and the frequency rotation method for the dual modes. Symbol boundary detection (BD) allocates the incoming packet and finds the symbol boundary located in the ISI free region. CFO synchronization which uses correlation based method to estimate FCFO and can be realized by the same hardware with symbol synchronization. The second block is the FFT and frequency domain equalizer (FDE). FFT transforms signals from time domain into frequency domain, and FDE eliminates the channel effect. The final block is the LDPC decoder that corrects the error bits using normalized min-sum algorithm with row-based layered scheduling and can support four code rates of 802.15.3c applications.

The proposed LS channel estimation combined LMS FDE has been described in Section 3.2.1. The block diagram shown in Fig. 3-6 is redrawn in Fig. 4-2 due to the hardware design considerations. In the following sections, we will discuss our hardware design. FFT and IFFT are not the design target in this thesis and is implemented by [29].



Fig. 4-2 Block diagram of the proposed adaptive LS-LMS FDE

The proposed LS-LMS FDE can be used on SC and HSI mode in IEEE 802.15.3c specification. Fig. 4-2 shows SC and HSI mode system architecture. The signals flows of SC and HSI mode are different at FFT feedback loop, The SC mode needs to transform to time domain to obtain the original data. After doing slicing, the errors will be transformed to frequency domain to do LMS algorithm. SC mode has additional feedback delay, so it needs single-port memories to save the received data for feedback FFT output. The complex multiplier ( $|\cdot|^2$ ) in LS can be shared in one-tap equalizer, and the complex conjugate multiplier (Conj.) in LS can also be shared in LMS. Besides, in Fig. 4-2, the four dual-port memories marked with "No.1" in LS

channel estimation will be reused in one-tap equalizer to save coefficient W. The proposed equalizer equalizes the received data and updates the coefficient at the same time, and the sample rate is too high to use single-port memories with interleaved access. Hence, the work uses dual-port memories to implement the architecture. The pilot word recovery block is for SC mode to insert known Golay sequences behind sliced data to form one data subblock. The single-port memories in LMS adaptive algorithm are shared with BD block in the baseband receiver, and the gray parts are 69% shared by SC and HSI mode except FFT/IFFT. Fig. 4-3 shows the hardware reduction of the proposed LS-LMS FDE, and the hardware of SC and HSI mode are listed in Table 4-2.



Fig. 4-3 Hardware reduction of the proposed LS-LMS FDE

Table 4-2 FDE Hardware comparison between SC and HSI mode

|                          | SC | HSI |
|--------------------------|----|-----|
| 64x64 Single-Port Memory | 12 | 0   |
| 64x64 Dual-Port Memory   | 8  | 8   |
| FFT                      | 2  | 0   |
| PW Recovery              | 1  | 0   |
| Clockwise pi/2 shifter   | 1  | 0   |

### 4.1.2 Proposed LOS Golay-MPIC TDE and Baseband

### Receiver



Fig. 4-4 Proposed block diagram of baseband receiver design

The three parts of the block diagram in Fig. 4-4 are different from the baseband receiver mentioned in Section 4.1.1:

- Frequency domain equalizer (FDE) is replaced by time domain equalizer (TDE).
- The system simulation considers about phase noise effect, so phase noise cancellation (PNC) is added to baseband receiver.
- The additional two FFTs are cancelled.

Fig. 4-4 shows the equalization moves to time domain, and it only needs one FFT for OFDM mode. Phase noise cancellation is added after TDE and FFT. The inputs of PNC are connected to TDE outputs in SC mode and to FFT outputs in HSI mode.

The proposed Golay sequences aided channel estimation combined MPIC TDE has been described in Section 3.2.2. The block diagram shown in Fig. 3-8 is redrawn in Fig. 4-5 due to the hardware design considerations. In the following sections, we will discuss our hardware design. The only FFT is not the design target in this thesis, because the FFT is for OFDM mode which is not additional overhead for the system.



Fig. 4-5 Block diagram of the proposed Golay-MPIC TDE

The proposed Golay-MPIC TDE can be used on SC and HSI mode in IEEE 802.15.3c specification. Fig. 4-5 shows SC and HSI mode system architecture. In Golay-MPIC TDE, the signal flows of SC and HSI mode are almost the same. There is one block which is "clock wise pi/2 shifter" used by SC mode only, since the modulation in SC mode is pi/2 M-PSK. In channel estimation, we use Optimised Golay Correlator (OGC) [30] to implement the architecture which is mentioned in Section 3.2.2 and we will discuss the OGC in Section 4.2.1. Because the length of Golay sequence we used is 256, it has 8 stages (number of stage= $Log_2256$ ) to finish the computation. The proposed architecture has no feedback loop like LS-LMS FDE, so we don't need too many memories to store the received data. The only memory which is 144 bits by 16 rows is for channel estimation to store a256. The size of second path delay register in MPIC block depends on the maximum of cyclic prefix

length. The gray parts are 99% shared by SC and HSI mode except pi/2 phase shifter circuit, and the single-port memories in OGC channel estimation are shared with BD and PNC blocks in the baseband receiver. Fig. 4-6 shows the hardware reduction of the proposed Golay-MPIC TDE, and the hardware of SC and HSI mode are listed in Table 4-3.



Fig. 4-6 Hardware reduction of the proposed Golay-MPIC TDE

### 1896

Table 4-3 TDE Hardware comparison between SC and HSI mode

|                           | SC | HSI |
|---------------------------|----|-----|
| 144x16 Single-Port Memory | 2  | 2   |
| Clockwise pi/2 shifter    | 1  | 0   |

### **4.2 Sub-block Architecture Design**

## **4.2.1 Optimised Golay Correlator (OGC)**

Section 3.2.2 mentioned about the channel estimation method by the correlation of Golay sequences, and the computation of correlation is very large. Thus, it is not practical if we design the operation directly. The proposed Golay-MPIC TDE architecture in Section 4.1.2 uses an efficient arithmetic to reduce the complexity. The Optimised Golay Correlator (OGC) [30][29] is an efficient calculation to do the correlation of Golay sequences. The OGC is obtained by reordering some elements of the Efficient Golay Correlator (EGC) [25] which is shown in Fig. 4-7 and Fig. 4-8. EGC is based on the way in which the sequences are generated.



Fig. 4-7 Efficient Golay Correlator (EGC)



Fig. 4-8 EGC procedure

In Fig. 4-9, we can see the adders and subtracters are inter-changed with the delay and seed blocks. Secondly, the order of the EGC stages is reversed, placing the large delay stage at the input and the small delay stage at the output. Both changes allow the correlation of two inputs to be obtained simultaneously. The recursive algorithm of the OGC is:

$$a_0'[k] = a[k],$$
 (4.1)

$$b_0'[k] = b[k],$$
 (4.2)

where a[k] and b[k] are the received signals,  $a'_i[k]$  and  $b'_i[k]$  are partial results.

$$a'_{n}[k] = a'_{n-1}[k - D_{N-n+1}] + b'_{n-1}[k - D_{N-n+1}], (4.3)$$

$$b'_{n}[k] = W_{N-n}a'_{n-1}[k] - W_{N-n}b'_{n-1}[k], \qquad (4.4)$$

$$Y[k] = a'_{N}[k] + b'_{N}[k] = 2L\delta[k-L],$$
 (4.5)

where Y [k] is the sum of the cross-correlations between input signals and the Golay sequences.



Fig. 4-9 Optimsed Golay Correlator (OGC)

To compare the efficiency of the correlations of a signal detection system based on Golay sequences, three different architectures are considered: the straightforward correlator, the EGC and the OGC. The straightforward correlator, just as the EGC, utilizes two architectures to simultaneously perform the correlations against  $a_n[k]$  and  $b_n[k]$ . A final adder is considered to obtain the sum of correlations in there three cases. The results are listed in Table 4-4.

|                 | Straightforward | EGC                              | OGC                              |
|-----------------|-----------------|----------------------------------|----------------------------------|
| Multiplications | 2L              | $2\text{Log}_2(L)$               | $\text{Log}_2(L)$                |
| Add/Sub.        | 2(L-1)          | 4Log <sub>2</sub> ( <i>L</i> )-1 | 2Log <sub>2</sub> ( <i>L</i> )-1 |
| Delays          | 2(L-1)          | 2(L-1)                           | <i>L</i> -1                      |

Table 4-4 Number of calculation for each corrlator

### 4.2.2 Divider Free LS Method [11]

In Section 3.1.1, Eqn. (3.10) indicates LS method needs a complex division. There are two ways to avoid the division. One is using the phase operation as shown in Eqn. (4.6), and the other one is to multiply the conjugate of the divisor both on the denominator and the numerator as shown in Eqn. (4.7).

$$W_{k} = \frac{U_{512,k}}{\frac{1}{6} \sum_{i} R_{k,i}} = \frac{U_{512,k}}{\overline{R}_{k}}$$

$$= \frac{\sqrt{|U_{512,k}|^{2}} \angle \theta_{u}}{\sqrt{|\overline{R}_{k}|^{2}} \angle \theta_{r}}$$

$$= \sqrt{\frac{|U_{512,k}|^{2}}{|\overline{R}_{k}|^{2}}} \angle \theta_{u} - \theta_{r}$$
(4.6)

$$W_{k} = \frac{U_{512,k}}{\frac{1}{6} \sum_{i} R_{k,i}} = \frac{U_{512,k}}{\overline{R}_{k}}$$

$$= \frac{U_{512,k} \overline{R}_{k}^{*}}{\overline{R}_{k} \overline{R}_{k}^{*}}$$

$$= \frac{U_{512,k} \overline{R}_{k}^{*}}{\left|\overline{R}_{k}\right|^{2}}$$
(4.7)

The phase operation replaces the complex division into one square root function, two square functions, one scalar division, and one subtraction. However, the transformation between the phasor and complex number requires trigonometric function, as shown in Eqn. (4.8). Although there are some realistic designs, the hardware cost is still too high.

$$U_{512} = U_{512,r} + iU_{512,i}$$

$$|U_{512}|^2 = U_{512,r}^2 + U_{512,i}^2$$

$$\theta_u = \tan^{-1} \frac{U_{512,i}}{U_{512,r}}$$
(4.8)

Eqn. (4.7) transforms one complex division to one complex multiplication, one square function and one scalar division. This method is generally used when we calculate the complex division. However, there is one scalar division, which is much more complex than a multiplier [31].

Since the division is an inversed multiplication, then multiplying an inverse of the scalar is a commonly used method. To find out the inverse, we can try to use a table with all possible inverse of the scalar, and we can easily implement it with a ROM as illustrated in Fig. 4-10. The bit width is determined by the accuracy of the inverse, and the word width is determined by the word length of the scalar. According to the simulation result of fixed-point C language, the bit width should be 13 bits and the

word width is 14 bits to maintain the performance. Therefore, the size of the ROM is  $2^{14}*13$ , which is 213k bits. The cost is reduced, but the ROM still takes large area.

To reduce the size of the ROM, we can try to reduce the bit and word width. Since the accuracy is already determined by bit width, we need to focus on the reduction of the word width. By observing the inverse, we can find out that the inverse is almost the same in nearby words. An example is shown in Eqn. (4.9), the difference between 1/128 and 1/129 is so small to be represented in 13 bits. Therefore, nearby scalars can all map to the same inverse stored in the table, as illustrated in Fig. 4-11.



Fig. 4-10 Table of inversed scalar

This effect is more obvious when the scalar is large, as shown in Eqn. (4.10), where N is the reference scalar and  $\Delta n$  is the difference. Taking the property of the scalar into consideration, we can see that the scalar is always positive since it's the result of the square function. Hence, we can reconsider the scalar structure in Fig. 4-12. We can just look up the table according to the significant bits regardless of sign bits and  $\Delta n$ . From the simulation results, the optimal length of the significant bits is 4. The reduced table is shown in Fig. 4-13 and the size is  $2^{4}*11*13$ , which is only 2288 bits.

$$\frac{1}{128} = 0.0078125$$

$$\frac{1}{129} = 0.0077519$$

$$\frac{1}{128} - \frac{1}{129} = 0.0000606_{10} = 0.00000000000001_{2}$$
(4.9)

$$\varepsilon = \frac{1}{N} - \frac{1}{N + \Delta n} = \frac{\Delta n}{N(N + \Delta n)} = \frac{1}{N(\frac{N}{\Delta n} + 1)}$$
(4.10)



Fig. 4-11 Reduced mapping



Fig. 4-12 Structure of the scalar

Fig. 4-13 Reduced table

Since  $\Delta n$  bits are ignored, they looks like zeros. We can represent them as another form as illustrated in Eqn. (4.11), where SB means the significant bits.

$$scalar = SB * 2^{\Delta n} \tag{4.11}$$

The meaning of these  $\Delta n$  bits are doing the left shift on SB, so the inverse of the scalar can be represented as Eqn. (4.12).

$$inverse = \frac{1}{scalar} = \frac{1}{SB * 2^{\Delta n}} = \frac{2^{-\Delta n}}{SB}$$
 (4.12)

The 2<sup>-An</sup> is doing the right shift on the inverse of SB. Since SB has only 4 bits, the table has to store only 16 inversed scalars, which cost 208 bits storage area. Through this procedure, we substitute a division with one small ROM and one multiplier. The block diagram is shown in Fig. 4-14. Compared with a real divider in DesignWare, the

modified version has smaller area and can satisfy the requirement of high clock rate operation. Furthermore, the size of the ROM is 99.99% off by the method mentioned above.



Fig. 4-14 Block diagram of modified divider

### 4.2.3 FFT/IFFT Design Specifications

The IEEE 802.15.3c standard focuses on the ultra-high data rate wireless communication. However, to realize the high throughput digital circuit is a challenge in hardware implementation, especially the high computational complexity components. Obviously, FFT/IFFT takes highest computational complexity and is most critical in our design.

In the recent years, there are many researches on the high throughput FFT. The pipeline-based structure and large radix butterfly are commonly used to achieve the requirement. The high throughput FFT must be realized with large radix butterfly and parallel input/output. This is more obvious in large point FFT. The FFT in [32] is 512-point with maximum throughput of 2592 MHz. It is designated for IEEE 802.15.3c HSI mode, which uses OFDM system. There are three modes in that FFT: 4-way, 8-way, and 16-way, and each mode correspond to different throughput. The

butterfly is radix-8 and the input/output is up to 16 times parallel. The throughput and challenge are indeed highly related to the large radix and parallel input/output design.

In our equalization design, the specifications of FFT/IFFT are listed in Table 4-5. The point of FFT/IFFT is 512, which equals to the length of the sub-block. Since the sampling rate is 2640 MHz, the throughput is also 2640 MHz. In order to use the same clock rate with the proposed FDE, the clock rate of FFT/IFFT is set to 330 MHz and the input/output is 8 times parallel. From the fixed point simulation result of Matlab, the input/output word length of FFT and IFFT are 14/14 bits respectively.

Table 4-5 Specifications of FFT/IFFT in the proposed FDE

| Parameter                           | Value        |
|-------------------------------------|--------------|
| Point                               | 512 samples  |
| Throughput                          | 2640         |
| Clock rate                          | 330 MHz      |
| Parallel input/output               | 8 times      |
| FFT/IFFT word length (input/output) | 14 / 14 bits |

### 4.3 Synthesis and Simulation Results

The following architectures will use 65 CMOS process to synthesis. In the system simulation, the signals are interfered by channel model and AWGN and are assumed to be perfectly synchronized. To evaluate the performance, the channel model we use is based on the IEEE 802.15.3c standard group with Jakes' model, mentioned in Section 2.2. The whole transmitted sequence is composed of preamble, PW, data, and PCES. The preamble is used for training and PW works as cyclic prefix. The testing environment is built by MATLAB simulation.

### 4.3.1 Proposed Adaptive LS-LMS FDE

The LS-LMS FDE is designed by using 65 nm CMOS low power process, and the maxmum operating rate can acheive 400 MHz(required clock rate is 330 MHz). Total area percentage of each part in FDE is shown in Fig. 4-15. The LS channel estimation part is 2% including 4 dual-port memories, and the LMS adaptive part is 5% including 12 single-port memories. The rest parts are about 15%. It shows IFFT/FFT are 77% which occupy the most parts of the FDE. In Table 4-6, FDE without 2 FFT area is 415K gate-count and the power is about 81mW. FDE with 2 FFT area is about 1723K gate-count and the power is about 211 mW.



Fig. 4-15 Area Percentage of each part in LS-LMS FDE

Table 4-6 LS-LMS FDE for SC and HSI mode synthesis result

|                   | FDE without 2 FFT    | FDE With 2 FFT |
|-------------------|----------------------|----------------|
| Processing        | 65nm CMOS LP Process |                |
| Clock rate (MHz)  | 400                  |                |
| Area (Gate count) | 415K(24%)            | 1723K(100%)    |
| Power (mW)        | 81@1v                | 211@1v         |

The required uncoded BER of IEEE 802.15.3c is about 10<sup>-2</sup>. Fig. 4-16 shows the performance of SC mode with pi/2 QPSK modulation, and the BER is about 6.01\*10<sup>-4</sup> at SNR of 12dB. Also, it shows the performance of HSI mode with QPSK modulation, and the BER is about 9.68\*10<sup>-3</sup> at SNR of 12dB. The performance of SC mode is better than that of OFDM mode, that is because FDE in OFDM systems usually has poor performance without channel coding [33].



Fig. 4-16 BER of FDE in SC and HSI mode

### 4.3.2 Proposed LOS Golay-MPIC TDE

Like LS-LMS FDE architecture, the Golay-MPIC TDE can achieve 400MHz operating rate (required clock rate is 330 MHz) by using 65 nm CMOS low power process. The area of Golay-MPIC TDE is 405K gate-count and the power consumption is 88mW as shown in Table 4-7. Furthermore, Fig. 4-17 illustrates the total area percentage of each part in TDE. The OGC channel estimation part is 31%, the MPIC equalization part is 56%, and the others are about 13%.

TDE

Processing 65nm CMOS LP Process

Clock rate (MHz) 400

Area (Gate count) 405K

Power (mW) 88@1v

Table 4-7 Golay-MPIC TDE for SC and HSI mode synthesis result



Fig. 4-17 Area Percentage of each part in Golay-MPIC TDE

The required uncoded BER of IEEE 802.15.3c is about 10<sup>-2</sup>. Fig. 4-18 shows the performance of SC mode with pi/2 QPSK modulation, and the BER is about 4.45\*10<sup>-5</sup> at SNR of 12dB. Also, it shows the performance of HSI mode with QPSK modulation, and the BER is about 5.21\*10<sup>-6</sup> at SNR of 12dB. The performance of HSI mode is better than SC mode unlike LS-LMS FDE, because the FFT of HSI mode has the effect of averaging noise.



Fig. 4-18 BER of TDE in SC and HSI mode

# 4.4 Comparison of Adaptive LS-LMS FDE and MPIC Golay-MPIC TDE

From Section 4.1 and Section 4.3, the contents describe the architecture design and simulation result of equalizations for IEEE 802.15.3c standard. This section will compare the advantage of LS-LMS FDE and Golay-MPIC TDE. Furthermore, we will explain different equalizers shall be used for different environment.

**Golay-MPIC TDE** LS-LMS FDE Without Including **FFT** No 2 FFT 2 FFT 65nm CMOS Process **Processing** 400 Clock rate (MHz) 1723K Area (Gate count) 415K 405K

81@1.08v

Power(mW)

211@1.08v

88@1.08v

Table 4-8 Synthesis result of LS-LMS FDE and Golay-MPIC TDE

In Table 4-8, the area and power consumption of LS-LMS FDE is much larger than that of Golay-MPIC TDE. The most important factor is FFT, because FDE has feedback loop which needs FFT to transform data to time or frequency domain. On the other hand, the data stream of TDE is straightforward, and the only one FFT is for OFDM mode which is not an overhead for the system. The computation complexity of LS-LMS FDE and Golay-MPIC TDE are listed in Table 4-9. Golay-MPIC TDE uses more complex multipliers in MPIC stage and complex adders in OGC stage, but LS-LMS FDE uses more memory resources for FFT feedback loop to store received signals.

Table 4-9 Computation complexity of LS-LMS FDE and Golay-MPIC TDE

|                     | LS-LMS FDE             | Golay-MPIC TDE         |  |
|---------------------|------------------------|------------------------|--|
| Complex             | 1 (20 bits x 14 bits)  | 3 (21 bits x 15 bits)  |  |
| multiplication      | 1 (20 bits x 14 bits)  | 5 (21 ons x 15 ons)    |  |
| Complex conjugate   | 1 (18 bits x 15 bits)  | 2 (18 bits x 14 bits)  |  |
| multiplication      | 1 (10 otts x 15 otts)  | 2 (10 tits x 14 tits)  |  |
| Modified divider    | 1 (13 bits x 11 bits)  | 1 (13 bits x 11 bits)  |  |
| (scalar multiplier) | 1 (13 ons x 11 ons)    | 1 (13 ons x 11 ons)    |  |
| Complex<br>Adders   | 3 (16 bits + 15 bits)  | 18 (16 bits + 15 bits) |  |
| Single-port memory  | 12 (64 bits x 64 bits) | 2 (144 bits x 16 bits) |  |
| Dual-port memory    | 8 (64 bits x 64 bits)  | 0                      |  |

These two equalizers have their own advantages, and Table 4-10 is the comparison of LS-LMS FDE and Golay-MPIC TDE. For hardware complexity consideration, we propose a new architecture which is Golay-MPIC TDE for better channel. Since the architecture of TDE is simpler, pipeline can be much deeper. If the channel condition allowed, the low-cost TDE is a better choice. However, LS-LMS FDE is superior to Golay-MPIC TDE when channel is not LOS or the change of channel is fierce. It's a tradeoff to choose LS-LMS FDE or Golay-MPIC TDE.

Table 4-10 Comparison of LS-LMS FDE and Golay-MPIC TDE

|                    | LS-LMS FDE | Golay-MPIC TDE |
|--------------------|------------|----------------|
| Area and Power     | ×          | 0              |
| High sampling rate | ×          | 0              |
| Variant Channel    | 0          | ×              |

# Chapter 5

# **Baseband Design and Chip**

# **Implementation**

This chapter discusses the baseband receiver chip implementation of the proposed adaptive LS-LMS FDE and LOS MPIC TDE in Section 5.1 and 5.2, respectively.

## 5.1 Adaptive LS-LMS FDE in Baseband Receiver [34]

## 5.1.1 Chip Integration and Implementation Result

The overall block diagram excluding channel decoder of 8x parallel baseband receiver for IEEE 802.15.3c SC/HSI mode is redrawn in Fig. 5-1 where the dashed line represents control signals. All the function units work in digital domain. The simulation models of overall baseband are built with MATLAB and Verilog HDL.



Fig. 5-1 Proposed block diagram of baseband receiver design

The control of the data flow is in sequence block by block. In other words, the executed block will output control signals to trigger the next block and is then turned into sleeping mode to avoid redundant power consumption.

Considering the memory usage requirements, the baseband receiver design is synthesized and implemented by standard cell design flow using TSMC 65 nm 1P9M low power (LP) process with voltage supply 1 V at 667 MHz. The total area of baseband receiver circuit excluding FFT is about 701k gate count. The FFT blocks which including two FFT and one IFFT occupy 1,932k gate count. Fig. 5-2 illustrates the area proportion of each block circuit excluding FFT. The boundary detection (BD) block only active in preamble period, so it can share memory resource with FDE in data payload period. The shared memories are 32.68% of the system excluding FFT.



Fig. 5-2 Area proportion of each block circuit excluding FFT

In auto-placing and routing (APR), there are totally 144 32-word by 32-bit dual port memories (32x32 dual port memories) for the FFT usage which occupy most of the chip area. For a better IR drop and EM management, the memory datasheet

suggests that the width of block rings around memory block is at least 5µm.



Fig. 5-3 Size of 32x32 dual port memory with power rings

The area of one 32-word by 32-bit dual port memory is 247.12μm × 37.215μm. As shown in Fig. 5-3, the occupied area will increase to (273.12×63.215) ÷ (247.12×37.215) = 1.8 times of the original area after surrounding by block rings with default spacing. Comparing the area with registers as listed in Table 5-1, the 1.8 times memory area will be larger than the area of registers. For area efficiently, the 32-word by 32-bit dual port memories will be replaced with registers such that 28.48 % memory area for FFT will be saved.

Table 5-1 Area and power comparison of 32×32 dual port memory and register

|                        | Area (μm²) | Power (mW)        |  |
|------------------------|------------|-------------------|--|
| Register               | 11838.60   | 3.23              |  |
| 32×32 dual port memory | 9196.57    | 274/102(D/W)      |  |
| (with block ring)      | (17265.28) | 2.74 / 1.93 (R/W) |  |

Since the memory elements are replaced with register, we can use TSMC 65 nm 1P9M general purpose (GP) process with voltage supply 1 V for higher clock rate and more timing margin for APR. The chip layout view without pads is shown in Fig. 5-4. The core size is  $2556\mu m \times 3056\mu m$  with 65.91% utilization density. The post-layout simulation shows that the proposed baseband receiver design can achieve 333 MHz. Table 5-2 is the chip summary. Until now, no previous works have been reported for the baseband designs.



Fig. 5-4 1<sup>st</sup> Chip layout view of the proposed baseband receiver

| Table 5-2 1 <sup>s</sup> | t chip summa | ry (using l | LS-LMS | FDE) |
|--------------------------|--------------|-------------|--------|------|
|--------------------------|--------------|-------------|--------|------|

| Process                    | E TSMC 65 nm 1P9M GP process                                 |    |  |
|----------------------------|--------------------------------------------------------------|----|--|
| Sampling Frequency (MHz)   | 8 2640                                                       |    |  |
| Clock Frequency (MHz)      | 1896                                                         | 30 |  |
| Total Gate Count           | 3,463 K (100 %)                                              |    |  |
| Gate Count (excluding FFT) | 1,249 K (36.07 %)                                            |    |  |
| Gate count (FFT block)     | 2,214 K (63.93 %)                                            |    |  |
| Core Area                  | $7.81 \text{ mm}^2 (2.56 \text{ mm} \times 3.05 \text{ mm})$ |    |  |
| Utilization                | 65.91 %                                                      |    |  |
| Mode                       | SC HSI                                                       |    |  |
| BER (uncoded) @ 12 dB      | 8.92×10 <sup>-4</sup> 1.43×10 <sup>-2</sup>                  |    |  |
| Date Rate (Gbps)           | 3.52 5.28                                                    |    |  |
| Power (mW)                 | 793.98                                                       |    |  |
| Leakage Power (mW)         | 78.31                                                        |    |  |

#### **5.1.2 Measurement Consideration**

Due to the 8x parallelism, the chip has plenty of input and output bits. In order to reduce the number of pads and verified the function correctness easily, the testing plan is shown in Fig. 5-5. The input data can be given from outside of the chip or from pattern stored inside the chip. The comparator will check the correctness of computation results for the stored pattern.



Fig. 5-5 Testing diagram for measurement

The measurement results show that the chip is completely function as expected. Due to the limitation of area provided by shuttle program from the foundry, the power rings and power stripes of this chip are very restricted. Less power rings and power stripes could cause IR drop and decrease the chip operating speed. Although the clock frequency does not achieve our target frequency, the power consumption trend is correct as shown at Table 5-3. If clock frequency is 333 MHz shown in Table 5-2, the power consumption is about 741.75 mW estimated from the power consumption trend. Fig. 5-6 is the die photo of LS-LMS FDE chip.

Table 5-3 Power measurement of 1st chip

| Clock Frequency (MHz) | 1    | 10   | 20   | 30   | 40   |
|-----------------------|------|------|------|------|------|
| Power (mW)            | 33.8 | 51.9 | 65.5 | 78.2 | 89.1 |



Fig. 5-6 Die photo of 1st Chip baseband receiver

Fig. 5-7 shows the functional test of this baseband receiver. If the results are all correct, the O\_CHECK\_RESULT will be high (1) and O\_ERR\_COUNT will be low (0). At beginning, O\_CHECK\_RESULT is low (0), since the outputs of the system are not ready which are still in initial state (0). When computed outputs are arrived and compared with the stored results, O\_CHECK\_RESULT starts to work and set to be high (1).



Fig. 5-7 Functional test of 1<sup>st</sup> chip

Fig. 5-8, Fig. 5-9, and Fig. 5-10 are the testing and setup platform. We use Agilent 93000 SOC test system with CQFP 160 pins socket to test the chip. The equipment is provided by National Chip Implementation Center (CIC).



Fig. 5-8 Agilent 93000 SOC test system



Fig. 5-9 CQFP 160 pins socket



Fig. 5-10 Wire connection of CQFP 160 pins socket



## 5.2 LOS MPIC Golay-MPIC TDE in Baseband Receiver

## **5.2.1** Chip Integration and Implementation Result

The block diagram excluding channel decoder of 8x parallel baseband receiver for IEEE 802.15.3c SC/HSI mode is redrawn in Fig. 5-11 where the dashed line represents control signals. The simulation models of overall baseband are built with MATLAB and Verilog HDL. The control of the data flow is in sequence block by block. In other words, the executed block will output control signals to trigger the next block and is then turned into sleeping mode to avoid redundant power consumption.



Fig. 5-11 Proposed block diagram of baseband receiver design

TSMC 65nm 1P9M general purpose process with voltage supply 1 V has update the library, and the new library becomes faster but larger power consumption as compared with the old library that the version I chip uses. We use the new library to implement this chip for TSMC tape-out flow, and synthesize this chip at 667 MHz by two kinds of library to compare the differences. Table 5-4 shows the new library has 1.42 times power consumption to the old library, but the areas are almost the same.

| Table 5-4 Comparison of old and new | TSMC 65nm GF | process libraries |
|-------------------------------------|--------------|-------------------|
|-------------------------------------|--------------|-------------------|

|                   | Old Library | New Library |
|-------------------|-------------|-------------|
| Area (Gate-count) | 2416K       | 2407K       |
| Power (mW)        | 1169        | 1660        |

The total area of baseband receiver circuit is about 2479K gate count. Fig. 5-12 illustrates the area proportion of each block circuit. The shared memory is shared by BD, TDE and PNC blocks. BD uses the memory in preamble period, and TDE uses it in data payload. Finally, PNC utilizes it in PCES field.



Fig. 5-12 Area proportion of each block circuit

Since the memory elements are replaced with register which is explained in Section 5.1.1, we can use TSMC 65 nm 1P9M general purpose (GP) process with voltage supply 1 V for higher clock rate and more timing margin for APR. The chip layout view is shown in Fig. 5-13. The core size is  $2820\mu m \times 2820\mu m$  with 88.93% utilization density. The post-layout simulation shows that the proposed baseband

TDE BD SCO FFT CFO PNC

receiver design can achieve 336.7 MHz. Table 5-5 is the chip summary.

Fig. 5-13 2<sup>nd</sup> Chip layout view of the proposed baseband receiver

Table 5-5 2<sup>nd</sup> Chip summary (using Golay-MPIC TDE)

| Process                  | TSMC 65 nm 1P9M GP process                                   |  |  |
|--------------------------|--------------------------------------------------------------|--|--|
| Sampling Frequency (MHz) | 2640                                                         |  |  |
| Clock Frequency (MHz)    | 330                                                          |  |  |
| <b>Total Gate Count</b>  | 2915 K                                                       |  |  |
| Core Area                | $7.95 \text{ mm}^2 (2.82 \text{ mm} \times 2.82 \text{ mm})$ |  |  |
| Utilization              | 88.93 %                                                      |  |  |
| Mode                     | SC HSI                                                       |  |  |
| BER (uncoded) @ 12 dB    | 7.36×10 <sup>-5</sup> 9.30×10 <sup>-6</sup>                  |  |  |
| Date Rate (Gbps)         | 3.52 5.28                                                    |  |  |
| Power (mW)               | 1116.7                                                       |  |  |
| Leakage Power (mW)       | 87.42                                                        |  |  |

#### **5.2.2** Measurement Consideration

As a result of 8x parallelism, the chip has massive input and output bits. However, we use different method from last version to verify this chip. For testing the function correction, we use high SNR data pattern, so the bit width of input data can be reduced to only 5 bits. In Fig. 5-14, the Pseudo Random Binary Sequence (PRBS) block will generate noise which is attached to the end of input data. In that way, we can save the area of stored pattern, and import data from outside.



Fig. 5-14 Testing diagram for measurement

# Chapter 6

## **Conclusion and Future Work**

#### **6.1 Architecture Design Summary**

This thesis proposes an adaptive LS-LMS FDE and LOS Goaly-MPIC TDE that can satisfy the dual mode (SC and HSI) specifications of IEEE 802.15.3c. The hardware of both methods can be shared by SC and HSI mode to reduce hardware complexity. The BER and sampling rate can achieve the requirement of IEEE 802.15.3c.

The LS-LMS FDE combines LMS adaptive algorithm with LS channel estimation. The LMS algorithm has the advantage of low computational complexity and sufficient convergence speed with the aid of LS channel estimation. The simulation results show that the LS-LMS FDE can achieve 6.01\*10<sup>-4</sup> BER in SC mode and 9.68\*10<sup>-3</sup> BER in HSI mode (both uncoded) at SNR 12 dB. The total area is about 415K gate-count with 69% shared among SC and HSI mode except 2 FFT. The power consumption excluding FFT is only 81.27 mW when working at 400MHz.

On the other hand, the Golay-MPIC TDE uses MPIC equalization with Golay sequence-aided channel estimation. The MPIC algorithm can reduce the hardware complexity unlike traditional time-domain equalizer and Golay sequence-aided channel estimation will eliminate the AWGN noise. The Golay-MPIC TDE can achieve  $2.53*10^{-4}$  BER in SC mode and  $4.22*10^{-5}$  BER in HSI mode (both uncoded) at SNR 12dB. The total area is about 405K gate-count with 99% shared by SC and HSI mode. The power consumption is only 88mW when working at 400 MHz.

#### **6.2 Chip Implementation Summary**

The proposed different domain architectures are integrated in two indoor wireless communication baseband receiver systems. For the high speed and area efficiency considerations, the overall system designs are implemented using 65 nm 1P9M CMOS GP process under supply voltage 1.0 V.

The LS-LMS FDE chip occupies 7.81mm<sup>2</sup> core area with 65.91% utilization, and the clock rate is 333 MHz. The data rate of SC and HSI mode can achieve 3.52 Gbps and 5.28 Gbps, respectively. Also, the power consumption is 793.98 mW. The shared memory is 32.68% of the baseband system which is shared by BD and FDE blocks.

The core area of Golay-MPIC TDE chip is 7.95 mm<sup>2</sup> with 88.93% utilization, and the clock rate is 336.7 MHz. The data rate of SC and HSI mode can achieve 3.52 Gbps and 5.28 Gbps, respectively. Also, the power consumption is 1.12 W. The BD, TDE and PNC blocks use the same shared memory which is 37% of the baseband system.

#### 6.3 Future Work

In the future, we will consider the modifications on the Golay-MPIC TDE algorithm to deal with the effects of variant channel and NLOS channel. As regards the chip implementation, we will reduce the core area and power consumption. Also, 10 Gbps data rate is our design target in the future. Higher QAM modulation, deeper pipeline, and more parallels architecture can achieve the 10 Gbps data rate goal.

## Reference

- [1] S.K. Yong and C.C. Chong, "An overview of multigigabit wireless through millimeter wave technology: Potentials and technical challenges," *EURASIP Journal on Wireless Communications and Networking*, vol. 2007, 2007
- [2] L. Caetano, and S. Li, "Benefits of 60 GHz," Sibeam Corp., Nov., 2005.
- [3] T.C. Wei, "Synchronization Design for DVB-T/H and Indoor Wireless Receiver," Master Thesis, Dept. of EE, National Chiao Tung University, Hsinchu, Taiwan, May. 2011.
- [4] IEEE 802.15.3c-2009, IEEE P802.15 Working Group for Wireless Personal Area Networks, Oct. 2009.
- [5] IEEE Std. P802.11 TGad D0.1, "PHY/MAC Complete Proposal Specification," IEEE, May, 2010.
- [6] D. Falconer, S. L. Ariyavisitakul, A. Benyamin-Seeyar, and B. Eidson, "Frequency Domain Equalization for Single-Carrier Broadband Wireless Systems", *IEEE Communications Magazine*, vol. 40, no. 4, 2002, pp. 58-66.
- [7] R, Fisher, "60 GHz WPAN Standardization within IEEE 802.15.3c," *Proc. Signals, Syst. and Electroni. Symp.*, pp. 103-105, Aug. 2007.
- [8] C. Koh, "The Benefits of 60 GHz Unlicensed Wireless Communications," Deployment White Papers, YDI Wireless.
- [9] S. Yong, "TG3c channel modeling sub-committee final report," IEEE P802.15

  Working Group for Wireless Personal Area Networks,

  IEEE802.15-07-0584-00-003c, Jan. 2007.

- [10] channel-model-matlab-code-release [Online], IEEE 802.15 WPAN Millimeter Wave Alternative PHY Task Group 3c (TG3c), Available: <a href="http://www.ieee802.org/15/pub/TG3c\_contributions.html">http://www.ieee802.org/15/pub/TG3c\_contributions.html</a>.
- [11] T.Y. Liu, "Design of Fast Convergent Adaptive Frequency-Domain Equalizer for Single Carrier Indoor Wireless Receiver," Master Thesis, Dept. of EE, National Chiao Tung University, Hsinchu, Taiwan, Oct. 2009.
- [12] M. G. Bellanger, "Adaptive Digital Filters", 2nd ed., New York: Marcel Dekker, 2001.
- [13] P. S. R. Diniz, "Adaptive Filtering: algorithms and practical implementation", 2nd ed., Boston: Kluwer Academic Publishers, 2002.
- [14] Y. Yang, Y. H. Chew, and T. T. Tjhung, "Adaptive frequency-domain equalization for space-time block-coded DS-CDMA downlink," *IEEE International Conference on Communications*, vol. 4, May 2005, pp. 2343 2347.
- [15] R. Kimura, R. Funada, Y. Nishiguchi, M. Lei, T. Baykas, C. S. Sum, J. Wang, A. Rahman, Y. Shoji, H. Harada and S. Kato, "Golay Sequence Aided Channel Estimation for Millimeter-Wave WPAN Systems," Personal, Indoor and Mobile Radio Communications, Sept. 2008.
- [16] K. Ishihara, K. Takeda, and F. Adachi, "Iterative Channel Estimation for Frequency-Domain Equalization of DSSS Signals," *IEICE TRANS. COMMUN.*, vol. E90–B, no.5, MAY 2007.
- [17] K. Amis and D.L. Roux, "Predictive decision feedback equalization for space time block codes with orthogonality in frequency domain Personal," *IEEE 16th International Symposium on Indoor and Mobile Radio Communications*, vol. 2, 11-14 Sept. 2005, pp. 1140 – 1144.
- [18] R. Kumar and M. Khan, "Mitigation of multipath effects in broadband wireless systems using quantized state adaptive equalization methods," *IEEE Aerospace*

- Conference, 2006, pp. 9.
- [19] F. H. Hsiao and Terng-Yin Hsu, "A Frequency Domain Equalizer for WLAN 802.11g Single-Carrier Transmission Mode", *IEEE International Symposium on Circuits and Systems*, vol. 5, May 2005, pp. 4606 4609.
- [20] S. U. H. Qureshi, "Adaptive Equalization," Proceedings of The IEEE, vol. 73, No. 9, Sept. 1985.
- [21] H.Y. Chen, "Design of Baseband Receiver for High-Mobility Wireless Metropolitan Area Network," Master Thesis, Dept. of EE, National Chiao Tung University, Hsinchu, Taiwan, Sep. 2009.
- [22] Z. Gao and Q. Wu, J. Wang, "A Novel Combination Algorithm Based on Chip Equalizer and Multi-path Interference Cancellation," *Signal Processing*, 2004. Proceedings, ICSP '04.
- [23] M. Golay, "Complementary Series," *IRE Transactions on Information Theory*, Apr. 1961.
- [24] S. Budisin, "Efficient Pulse Compressor for Golay Complementary Sequences," Electronics Letters, vol.27, no.3, pp.219-220, Jan. 1991.
- [25] B. Popovic, "Efficient Golay Correlator," Electronics Letters, vol.35, no.17, pp.1427-1428, Aug. 1999.
- [26] S. Haykin, "Adaptive Filter Theory", 4th ed., Upper Saddle River, N.J.: Prentice Hall, 2002.
- [27] A. Burg, S. Haene, W. Fichtner, and M. Rupp, "Regularized Frequency Domain Equalization Algorithm and its VLSI Implementation," *IEEE International Symposium on Circuits and Systems*, May 2007, pp. 3530 3533.
- [28] J. Coon, S. Armour, M. Beach, and J. McGeehan, "Adaptive frequency-domain equalization for single-carrier MIMO systems," *IEEE International Conference on Communications*, vol. 4, June 2004, pp. 2487 2491.

- [29] S.J. Huang and Sau-Gee Chen, "A High-Throughput Radix-16 FFT Processor with Normal Input/Output Ordering for IEEE 802.15.3c," VLSI Design/CAD Symposium, Aug., 2011.
- [30] P.G. Donato, M.A. Funes, M.N. Hadad and D.O. Carrica, "Optimised Golay Correlator", *Electron. Lett.*, 26<sup>th</sup> Mar. 2009 vol.45 No.7
- [31] Y. Zeng and T. S. Ng, "Pilot Cyclic Prefixed Single Carrier Communication: Channel Estimation and Equalization," *IEEE Signal Processing Letters*, vol. 12, issue 1, Jan. 2005, pp. 56 59.
- [32] J. H. Yu, K. J. Hou, and T. D. Chiueh, "Multi-way Baseband Receiver Design for IEEE 802.15.3c HSI-OFDM Mode," *VLSI Design/CAD Symposium*, Session 3-3, 2009.
- [33] H. Sari, G. Karam, and I. Jeanclaude, "Transmission Technique for Digital Terrestrial TV Broadcasting," *IEEE Communications Magzine*, Feb. 1995.
- [34] Y.S. Huang, "Design and Implementation of Synchronization Detection for IEEE 802.15.3c," Master Thesis, Dept. of EE, National Chiao Tung University, Hsinchu, Taiwan, Oct. 2010.