# 行政院國家科學委員會專題研究計畫 成果報告

# 晶片系統溫度管理架構設計之矽智財研究

計畫類別: 個別型計畫

計畫編號: NSC92-2215-E-009-042-

執行期間: 92年01月01日至92年07月31日

執行單位: 國立交通大學電信工程學系

計畫主持人: 闕河鳴

計畫參與人員:張祐誠、何錫錡、王志軒、張智閔、劉明治、詹謹鴻、吳書豪、

林璟輝

報告類型: 精簡報告

處理方式: 本計畫可公開查詢

中 華 民 國 92 年 10 月 24 日

# 行政院國家科學委員會專題研究計畫 成果報告

# 晶片系統溫度管理架構設計之矽智財研究

計畫類別: 個別型計畫

計畫編號: 92-2215-E-009-042-

執行期間: 92 年01 月01 日至92 年07 月31 日

執行單位: 國立交通大學電信工程研究所

計畫主持人: 闕河鳴

計畫參與人員: 張祐誠、何錫錡、王志軒、張智閔、劉明治、詹謹鴻

、吳書豪、林璟輝

報告類型: 精簡報告

處理方式: 本計畫可公開查詢

2003 年 10 月 22 日

# 晶片系統溫度管理架構設計之矽智財研究

計畫編號:92-2215-E-009-042-

執行期限: 2003年01月01日 至 2003年07月31日 主持人: 闕河鳴 國立交通大學電信工程學系

## 一、中文摘要

隨者超大型積體電路的發展,電路密度與系統時脈逐步提升,產生區域性過熱的問題,造成積體電路上時脈不同步、電路參數不協調等參數的局部變化,進而使整個系統崩潰。對於現代的微處理器、晶片系統設計及混頻積體電路來說,此效應已成為系統設計的主要限制。因此 SoC 的溫度效應分析與其管理機制已為晶片系統時代的一個相當值得研究課題。基於上述的理由,本計畫即針對 SoC 的溫度管理,開發關鍵性的元件及電路設計,以提供一個完整的溫度管理系統給現今的 SoC 設計平台使用,在此年度的計畫中,我們完成了溫度管理架構、SoC 介面定義及設計、溫度感測器設計等,相關設計陸續透過 CIC 下線中,而部分成果已發表在國際的期刊與研討會。

關鍵詞:晶片系統設計、溫度效應、溫度管理、積體電路設計

### **Abstract**

Increases in circuit density and clock speed in modern VLSI designs have brought thermal issues into the spotlight of high-speed integrated circuit design. Local overheating in one spot of a high-density circuit, such as a CPU or high-speed mixed-signal circuit, can cause a whole system to crash. Clock synchronization problems, parameter mismatching and other coefficient changes due to temperature gradients generated by uneven heat-up of on-chip circuitry are the major reasons for system failure. The early stage of this project completely characterized the local heat-up problem in system-on-chip designs. The impact of temperature gradients on circuit behavior is evaluated. A systematic solution to thermal management is proposed. Instead of worst-case thermal management used in conventional systems, this design targets nominal power dissipation and requires the system to actively manage its thermal activity, including monitoring thermal activity and reacting to specified conditions through the control of cooling mechanisms to ensure operation within specification. This work includes the design and implementation of circuits and architectures of several building blocks for SoC thermal managements. They are Thermal management architecture, system management bus interface for SoC platform design, on-chip temperature sensor design for deep sub-micron technology. An intellectual property for thermal management is proposed and integrated to modern SOC CAD flows. The success of this project offers an opportunity for modern system-on-chip designs to incorporate thermal management techniques to enhance system stability and performance. This design yields intricate control and optimal management with little system overhead and minimum hardware requirements, as well as provides the flexibility to support different management algorithms.

Keywords: System-on-chip, VLSI, thermal management, intellectual property

## 二、計畫緣由及目的

隨者超大型積體電路的發展,電路密度與系統時脈逐步提升,產生區域性過熱的問題,造成積體電路上時脈不同步、電路參數不協調等參數的局部變化,進而使整個系統崩潰。對於現代的微處理器、晶片系統設計及混頻積體電路來說,此效應已成為系統設計的主要限制。因此 SOC 的溫度效應分析與其管理機制已為晶片系統時代的一個相當值得研究課題。

在本研究的先期,我們對積體電路上的溫度效應及其對電路參數的影響作一完整分析,然後再對 SoC 設計的流程中,提出一個針對溫度管理的系統性方法,並將此方法落實為晶片架構上的子系統架構設計。之後,本研究將會針對現今的電腦系統及系統晶片設計的介面(如系統管理匯流排 system management bus)作一整合,使的成果成為一個準矽智財 (prototype IP)。而晶片系統及電腦系統的設計者即可在其設計流程中非常容易的將此矽智財整合至最終的設計中。

此設計將著重於如何在系統中使用最少的資源(電路複雜度、佈局面積、輸出入埠的要求及增加的功率消耗)來達到對系統溫度及溫差的即時偵測,並對於即時的溫度事件(局部過熱或局部溫差過大)作有效的回應。有別於傳統的低限度溫度管理方式(緊急關機以保護系統),本研究將能提供晶片系統的設計者,使用最少的系統資源來達到系統的溫度及溫差的有效管理,而系統的穩定度及效能也能因此而提昇。

本計畫的執行分為兩大部分,第一部份為溫度管理架構與 SoC 介面之設計與實做,我們針對前述分析的結果,完成溫度管理架構設計,並針對選定的系統管理匯流排,整合溫度管理架構設計為一個軟體矽智財,並將此軟體矽智財獨立設計成一個測試晶片以驗證其功能。第二部分則是深次微米製成的溫度感測器部分,由於一個完整的溫度管理系統必須配合晶片上的溫度感測器方能有效運作,由於溫度感測器的電路在 0.6um 以下製程實現有其困難度,因此本計畫及針對 0.25um 製程設計溫度感測器的電路,使其成為硬體矽智財。配合第一級第二不份的成果,本計畫即可提供完整的溫度管理系統給現今的 SoC 設計使用。

本報告的第三部份將討論本計畫的研究方法及成果,第四部份則為結論與討論,附件 則為本計畫針對溫度管理系統所發表的期刊論文,實做部分則陸續下線整理終將發表於其 他的國際期刊或會議。

# 三、研究方法及成果

如第二部分所述,研究方法及成果分為兩個部分,一是溫度管理系統及系統管理匯流排, 本部分及針對上述主題逐一介紹,並在最後呈現兩部分的積體電路設計。

## (1) 温度管理系統

本設計的溫度管力系統見圖一,相關的架構設計見附件以發表的期刊論文,此計畫著力較多的部分是經由一個標準界面—系統管理匯流排(SMBus)來增加溫度管理系統的相容性,此匯流排目前廣泛使用在系統、功率和溫度管理元件上;SMBus 是由兩條訊號所組成的一種匯流排,一條是 SMBCLK (one direction),一條是 SMBDATA (bi-direction)如(圖 1.1),可讀出與寫入資料,減少外部的腳位和溫度管理系統內部的連接,根據(圖 1.2)與(圖 1.3)的狀態圖來建立 SMBus 的 Master 裝置與 Slave 裝置,為避免溫度管理系統過熱,因溫度急速升高造成本身的損毀,所以採取簡單且前瞻的設計,也須考慮電源分佈需均勻,不可有局部過熱的情形發生。



圖一、晶片系統溫度管理系統之架構設計





根據上述,經由 cell-based 設計流程來整合這個標準介面於溫度管理系統中,其中所使用的製程為 TSMC 0.25 micron CMOS technology,(圖 1.4)為此系統的模組分工情形,(表 1.1)為組成各模組的程式行數,此設計使用了四個的 Multi-level Controller 和一個 TMU,一個 SMBbus,其中包括一個 Slave 裝置和一個 Master 裝置,來組成完整的溫度管理系統,如(圖 1.5),經過完整的驗證與模擬形成 Soft IP,可容易的與其他系統做整合,其並達到預期的功能。



圖五、模組分工

表一、模組程式行數

| 模組名稱          | 模組數目 | 程式行數 | 模組功能                           |
|---------------|------|------|--------------------------------|
| top.v         | 1    | 55   | System Integrated              |
| TM_SMBslave.v | 1    | 70   | SMBus Slave and TMU Integrated |
| SMBmaster.v   | 1    | 262  | SMBus Master Device            |
| SMBslave.v    | 1    | 208  | SMBus Slave Device             |
| tmu.v         | 1    | 470  | Thermal Managemet Unit (TMU)   |
| mlc.v         | 4    | 20   | Multi-level Control            |



圖六、溫度管理系統之系統整合

## (2)温度感測器

此部分包含.BiCMOS PTAT (Proportional to Absolute Temperature), MOS PTAT, 三角積分調變器(Oversampling Sigma-Delta Modulator)等三個部分,詳述如下:

M45

M48

Rbe

 $\rm V_{\rm DD}$ 

## a .BiCMOS PTAT (Proportional to Absolute Temperature)



圖七、BiCMOS PTAT 電路圖

圖八、BiCMOS PTAT 加上類比輸出介面的電路圖

(圖 2.1)是一個 PTAT 的架構,由於 M3~M9 是一顆運算放大器,假設此運算放大器理想,則 VD2 = VD1,可得到 Q2 的電流是正比於絕對溫度。再來解釋(圖 2.2)輸出部分,M47 與 M2 是一組電流鏡將 PTAT 電流,複製到輸出端以便量測。另外為了使下一級的 Oversampling Sigma Delta Modulator 做參考電流用,我們利用 VEB 為負的溫度係數的特性,與 PTAT 正溫度係數的特性。調整 M43 與 M44 的大小使輸出的電流。其溫度係數為零。



圖九、PTAT 電流輸出



圖十、PTAT 電流所量的溫度誤差



圖十一、參考電流對溫度的變化

圖九的結果說明了 PTAT 電流與溫度呈現出了非常線性的關係,而圖十說明了由 PTAT 電流所量的的溫度誤差的結果均在1℃的範圍之內,而圖十一所呈現出來的參考電壓的變動 範圍也在 24.798uA~24.828uA 之間,也是非常的小。

## **b.MOS PTAT**



M11 M10 M2

圖十二、MOS PTAT(I) 電路圖

圖十三、MOS PTAT(II) 電路圖

圖十二是一個 MOS PTAT 的架構。M1、M2 操作在 weak inversion region,其 I-V 曲線呈現出 指數函數(exponential)特性; M3~M8 組成 - ORA(Operational Transresistance Amplifier), 固 定流經 M1、M2 的電流比值。如此一來,可得到跨在電阻上的電壓是正比於絕對溫度。圖十 三是將圖十二中的電阻以 M9~M11 取代,可大幅降低晶片面積。



12 14 16 18 2 Polymer ( for ( COT 7 2)

圖十四、VPTAT V.S. VDD (R-based)

圖十五、V<sub>PTAT</sub> V.S. V<sub>DD</sub> (all MOS)



圖十六、V<sub>PTAT</sub> V.S. Temperature

(圖十四)、(圖十五)分別為(圖十二)、(圖十三)中  $V_{PTAT}$  對  $V_{DD}$  的模擬結果,從圖中可看出操作電壓可降到 1.2V; PSRR(Power Supply Rejection Ratio)可達到 50~dB 以上 · (圖十六)為(圖十二)、(圖十三)中  $V_{PTAT}$  對溫度的模擬結果,圖中電阻架構與 all MOS 架構的模擬結果幾乎一致,而且呈現出一正比溫度的特性。

## c. 三角積分調變器(Oversampling Sigma-Delta Modulator)

Oversampling Sigma-Delta 架構的特色是利用一原先解析度較低的 ADC,透過閉迴路和遠大於兩倍輸入頻寬的取樣頻率,使的頻帶之內的雜訊被壓縮,進而提高訊雜比(SNR),得以達到高解析度 ADC 的規格,此架構對電路製程的敏感度遠低於其他架構,因此被大量運用在低頻高解析度的 ADC 之上。對運用在溫度感測器方面的調變器,一定要在溫度範圍很大的情況下保持正常工作,且本身也要盡可能的降低功率損耗,避免成為熱源之一。一般而言 8bit 到10bit 而操作速度低於 1MHz 的規格便已足夠,溫度範圍則約從 0 度到 150 度左右。為達到上述條件,使用一階三角積分調變是可行的方法之一,其架構圖如圖十七所示,修改的電路為圖十八。



圖十七、一階三角積分調變器



整體架構中的子電路,包括運算放大器、離散時間比較器、offset 抵銷電路、取樣電路四大部分,設計的關鍵則是取樣電路的工作頻率、積分器的濾波能力、迴授電路的解析度三方面。因此真正要完成一個調變器,必須從系統面找出可實現的規格再去設計子電路。(圖十九)為 folded cascade 運算放大器的電路圖。(圖二十)為離散時間比較器的電路圖,利用正迴授形成的 latch 結構,使的操作速度上升,再利用時脈去切換操作的工作模式。



圖十九、 Folded cascade 運算放大器



圖二十、 離散時間比較器

三角積分調變器能將低頻的類比輸入訊號,轉成高頻的數位訊號,兩者的頻率倍率的一半,即為過取樣比(Oversampling ratio),(圖二十一)是過取樣比為 64 的模擬結果,(圖二十二)是過取樣比為 256 的模擬結果。



圖二十一、過取樣比為 64 的模擬結果



(3)溫度管理系統與溫度感測器完整佈局圖



圖二十三、整合溫度管理系統佈局圖



圖二十四、溫度感測器佈局圖

# 表二、溫度管理系統規格表

| Thermal management operation frequency   | 100 MHz                              |
|------------------------------------------|--------------------------------------|
| SMB slave and master operation frequency | 500kHz                               |
| Multi-level controller                   | 10kHz                                |
| operation frequency Technology           | TSMC 0.25um Mixed Signal (1P5M) CMOS |
| <b>Power Consumption</b>                 | 10mW                                 |
| Transistor/Gate Count                    | 152340.484375/17.28 = 8816           |
| Chip Area ( $\mu m^2$ )                  | Total: 1535 x 1535                   |
| Pins                                     | Total: 104 pins                      |
|                                          | DC Power: 21 pins (Core power)       |
|                                          | AC Power: 11 pins (Pad Power)        |
|                                          | System signals: 72 pins              |
|                                          | (1) TM and SMBslave (2) SMBmaster    |
|                                          | input: 39 pins input: 13 pins        |
|                                          | output: 7 pins output: 13pins        |
| Package Type                             | CQFP128                              |

表三、 温度感測器規格表

| Operation frequency     | 50 MHz           |  |
|-------------------------|------------------|--|
| 溫度誤差                    | 1 degree C       |  |
| Power consumption       | 3.3mw            |  |
| Chip Area ( $\mu m^2$ ) | Total: 1000x 900 |  |
| Pins                    | Total: 34 pins   |  |
|                         | DC Power: 8 pins |  |
|                         | AC Power: 6 pins |  |
|                         | Output :20 pins  |  |
| Package Type            | 40 S/B           |  |

## 四、結論與討論

本計畫已順利完成各項預期工作項目。其中部分研究成果已被國外期刊發表的有一篇、 已投稿國外會議的有1篇,參與人員並完成兩篇碩士論文;其他部分仍陸續整理投稿於國際 會議和期刊中。以發表之期刊論文請見附件。

## 五、參考文獻

- Herming Chiueh, Jeffrey Draper, and John Choma, Jr., "A Dynamic Thermal Management Circuit for System-on-Chip Designs," *Analog Integrated Circuits and Signal Processing*, Vol 36, pp 175-181, 2003.
- 2. 張佑誠, "Design and Implementation of Interface Circuits for Thermal Management Systems", Master Thesis, Department of Communication Engineering, National Chiao Tung University, Hsin-Chu, Taiwan, 2003.
- 3. 何錫錡, "A Fully Integrated Multi-Level Controller for System-on-Chip Thermal Management Designs", Master Thesis, Department of Communication Engineering, National Chiao Tung University, Hsin-Chu, Taiwan, 2003.

## A Dynamic Thermal Management Circuit for System-On-Chip Designs\*

## HERMING CHIUEH, 1,† JEFFREY DRAPER<sup>2</sup> AND JOHN CHOMA, JR.<sup>2</sup>

<sup>1</sup>Department of Communication Engineering, National Chiao Tung University, HsinChu 30050, Taiwan <sup>2</sup>Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089, USA E-mail: chiueh@ieee.org; draper@isi.edu; johnc@usc.edu

Received April 27, 2002; Revised December 12, 2002; Accepted January 25, 2003

**Abstract.** A novel fully integrated dynamic thermal management circuit for system-on-chip design is proposed. Instead of worst-case thermal management used in conventional systems, this design yields continual monitoring of thermal activity and reacts to specified conditions. With the above system, we are able to incorporate on-chip power/speed modulation and integrated multi-stage fan controllers, which allows us to achieve nominal power dissipation and ensure operation within specification. Both architecture and circuitry are optimized for modern system-on-chip designs. This design yields intricate control and optimal management with little system overhead and minimum hardware requirements, as well as provides the flexibility to support different thermal management algorithms.

**Key Words:** thermal management, system-on-chip, VLSI system design

#### 1. Introduction

Increases in circuit density and clock speed in modern VLSI designs have brought thermal issues into the spotlight of high-speed integrated circuit design. Local overheating [1] in one spot of a high-density circuit, such as CPUs and high-speed mixed-signal circuits, can cause a whole system to crash due to resulting clock synchronization problems, parameter mismatches or other coefficient changes due to the uneven heat-up on a single chip [2].

Passive heat dissipation mechanisms, such as heat sinks and fans, are widely used in system design. Recently, advanced computer systems and circuit designs have incorporated active mechanisms to detect and properly handle an over-heating event [3]. Such a capability guarantees the system will operate within a certain package temperature specification to avoid failure. The ACPI (Advanced Configuration and Power Interface) standard is an example specification for active

power and thermal management in personal computer systems [4,5]. However, the ACPI standard is quite limited, as it simply supports extra control to turn on or off a cooling mechanism and shift the alert level that is fed back to the system.

As die size and power density increase in this systemon-chip (SoC) era, the management of package temperature is no longer sufficient to solve the problem. Uneven heat-up and temperature offset on chip [1,6,7]has become a major factor and limits the system performance. A good example is the recent Intel recall on Pentium III 1.13 GHz CPUs [8–10]. Recent research has focused on predicting on-chip temperature offset [1] and electro-thermal simulations [7,11,12] to provide thermal distribution information to circuit simulation to achieve more accurate circuit behavior predictions. Such research is valuable for circuit design, but post-fabrication approaches to addressing on-chip temperature offsets are also needed as die size and power density increase. Without such an approach, some circuit behavior becomes unacceptable, which makes management and control of on-chip temperature offset as important as the reduction of package temperature.

In this research, we propose a dynamic thermal management circuit to provide a watchdog for system-onchip designs. This circuit is optimized in architecture

<sup>\*</sup>This research was supported by DARPA contract F30602-98-2-0180, USA and by National Science Council grant NSC-92-2215-E-009-042-, Taiwan.

<sup>&</sup>lt;sup>†</sup>Author to whom correspondence should be addressed. Tel.: +886-35-712121 ext. 54597, +996-918422677, Fax: +886-35-5710116.

and circuit implementation to fit system-on-chip designs. The following items describe the technical justification of the thermal management design for SoC that we take into consideration. First, since an on-chip monitoring mechanism is included, complicated electrothermal or numerical thermal simulation [7,11,12] can be omitted. However, an analytical model [1,6,7] providing sufficient information like temperature range and quality guidelines for circuit designers is beneficial. Second, architecture and circuit implementation will be constrained to be compatible with the system's process (most likely to be a digital process), and minimum extra system resources should be used. Therefore, an interrupt-based system will be implemented, and reprogramming to provide flexibility and simplify the architecture will be supported. Finally, with respect to system integration, a complex cooling system that requires extra processing steps is not chosen, although this proposed system has the potential to cooperate with such novel micro-machining cooling methods [13]. Instead, a pure digital design for a fan controller is attractive if the circuit block is small enough. Target systems with power management can take advantage of such a cooling mechanism when combined with thermal management systems.

Given the above considerations, a circuit based on our previous research [14–16] with significant architecture enhancements is proposed. Those enhancements are described as follows. First, the number of temperature sensors has been increased to fit the need of more complex systems to monitor the temperature in several locations throughout the system. Second, the updated architecture provides simultaneous monitoring of multiple temperature sensors instead of the previous approach of single-sensor monitoring at a time. Third, circuits to monitor temperature offset between sensors and thresholds for interrupts that provide alerts other than package temperature have been added. Fourth, the threshold values have been expanded to have upper and lower limits for each sensor in order to achieve a more robust monitoring capability. Last, we have integrated a multi-channel, multi-stage fan controller, which we developed as an active cooling mechanism for maintaining a consistent package temperature.

These enhancements are aimed at solving thermal problems specific to SoC designs. The proposed thermal management subcomponents are encapsulated into a single IP block to foster use by the SoC market, in which the IP-based design approach has become very popular. The resulting discrete IP block facilitates

verification of the architecture and thermal management algorithm through small low-cost test prototypes without compromising the applicability of the approach to SoC designs.

In Section 2, the function and architecture of the Thermal Management Circuit are described. In Section 3, the implementation plan of this system is addressed. A summary and conclusion follow in Section 4.

#### 2. Function and Architecture

#### 2.1. Architecture

The architecture of the thermal management circuitry is divided into two portions: the thermal management circuit blocks and the system integration blocks. The former represent the designed thermal management system, and the latter represent the interface to the target system. The designed thermal management system could be applied to different SoC designs. However, the system integration portion is modified to fit different target systems as well as prototyping implementations.

The block diagram of the dynamic thermal management circuit is shown in Fig. 1. The thermal management circuit blocks are the white boxes with shadows; the gray boxes represent the system integration blocks. The function of every block is described in Section 2.1.1 and Section 2.1.2.

### 2.1.1. Thermal Management Circuit Blocks

• Temperature Acquisition Unit: This unit is simply an interface to acquire temperature from sensors. This circuit could be very different when applied to



Fig. 1. Block diagram of the dynamic thermal management circuit.

different temperature sensors. In our prototype system, a commercial temperature sensor with one-bit serial output will be used. The major function of this circuit is to convert and latch the temperature input to parallel digital values. In our prototype design, four sensors with 16-bit precision are supported. Sensor placement will be optimized by application of the developed analytical model [1,6] to compensate for the temperature offset between the heat source and sensor. Thus, the same analytical model will also predict the maximum reading error with respect to the highest junction temperature. These offsets will be processed in the Temperature Acquisition Unit in order to provide a complete thermal analysis of the target to the thermal management system.

- Programmable Unit: This unit contains 8 threshold registers to program the high and low threshold values for each temperature sensor. Two threshold registers for upper and lower bounds are provided for offset temperatures between different sensors. With these threshold values, the watchdog unit can generate interrupts for desired situations. Three fan-speed registers provide the setup for the integrated fan controllers. Interrupt mask and offset mask registers indicate which interrupts should be enabled and which set of temperature sensors should be included for offset temperature monitoring. Finally, decoding circuitry and necessary configuration registers provide the communication signals between the processor and other circuit blocks.
- Watchdog Unit: This unit contains two monitoring circuits: the threshold monitor for each temperature sensor, and the offset temperature monitor. Both circuits are designed to minimize circuit area while providing sufficient speed to compare the sensors provided in the system.
- Output and Interrupt Generator: This unit provides data outputs that are read by the system CPU, like temperature value, offset temperature value, and interrupt types.
- Active Cooling Unit-Fan Controller: There are two active cooling units: the integrated fan controller and the system speed controller provided by the system. The integrated fan controller circuit is based on our previous pure-digital fully integrated design [15].

#### 2.1.2. System Integration Blocks

• Temperature Sensors: Many kinds of temperature sensors can be used in this design. Our previous

- on-chip temperature sensor design is one option for system-on-chip design. For the prototype, we are using a commercial part with a system management bus interface [17].
- *CPU/System*: For pure system-on-chip design, the thermal management circuitry should be directly mapped into a CPU special-purpose register and interrupt space. In this prototype design, a memory-mapped approach will be implemented to emulate the proposed architecture. This approach also supports a flexible off-chip hardware and software platform for testing the circuit.
- Active Cooling Unit-System Speed Controller: For complete dynamic thermal management systems, the processor should be able to use the offset temperature data to tune the speed of different execution units to maintain the offset temperature within specification. Tradeoffs for slowing down some execution units are necessary in a critical temperature situation to prevent system failure. The mechanisms provided in the SoC implementation or processors should cooperate with this circuit to provide the function of managing the offset temperature.

### 2.2. Operating Modes

The operation of the thermal management system can be divided into three modes from the point of view of the processor. They are the programming, data acquisition, and interrupt modes. Each mode requires different timing and data order definitions, which will be implemented in the programmable unit of the system. The basic functionality of each mode is described below.

- Programming Mode: This mode provides the function to program the threshold registers for temperature sensors and offset temperatures, mask registers for interrupt and offset temperature monitoring, and fan stage assignments for the integrated fan controllers. To conserve address space, the multiple temperature sensor registers will be mapped to the same address, with the configuration register contents specifying which set is actually being accessed.
- Data Acquisition Mode: This mode provides the capability for the processor to read data and status from the thermal management circuit in a polling fashion. Information like current temperatures, offset temperatures, setups, and interrupt status can be acquired by

the processor as often as it wishes to flexibly support different thermal management algorithms.

Interrupt Mode: Interrupts are provided for designated alert conditions, and interrupt type information is also provided when the interrupt service routine reads the interrupt type register.

#### 2.3. System Integration

The thermal management circuit architecture for an SoC design is proposed in the previous sections. However to prove the validity of the complete architecture, some attention to detail must be given to the system integration blocks (gray boxes in Fig. 1). The technical decision and justification for these blocks are given here with a detailed implementation following in Section 3.

To reduce circuit complexity and die size in this areaconstrained prototype chip, an off-the-shelf processor with conventional interface signals will be used for the prototype of the SoC design. In our previous design, a PowerPC interface was implemented [14], but in this prototype, a simpler memory-mapped interface will be implemented for more flexible hardware/software support. The basic function and architecture of the thermal management system are not affected since only system integration portions of the proposed designed have been modified due to the prototyping limitations as discussed in Section 2.

Instead of using on-chip temperature sensors from previous research [17], we will use external commercial parts for monitoring temperatures. Although on-chip sensors provide direct temperature readout without constraining the data transfer protocol due to pin limitations, as the number of sensors increases, the die size limit makes using on-chip sensors impractical for this prototype. Furthermore, the interface to the external sensors requires very few pins, and the sensors are not the focus of this prototype.

## 3. Prototype Implementation

A prototype implementation of the proposed design is presented in this section. Due to the limitations of an area-constrained prototype TinyChip [18] and cost of integrating the proposed IP to a complete SoC design, the prototype thermal management system is implemented separately from the processor (computer system). In Fig. 2, a detailed block diagram of a

prototype design is presented. Block diagrams of the offset temperature monitor and threshold temperature monitor are shown in Figs. 3 and 4. Both designs achieve minimum area with sufficient speed to respond to system temperature changes.

As shown in Figs. 2–4, the proposed SoC IP is implemented in a discrete fashion while adhering to the IP-based design methodology. Using this approach, the proposed thermal management architecture is verified using the external temperature sensors and cooling mechanisms; such parts and processors are often treated as "hard" IPs in modern SoC design flows. Once the architecture and management algorithms are verified, this design can be easily integrated to IP-based platform design flows. The following remarks address the compatibility of the prototype implementation with the proposed architecture:

- The SoC computer system will be replaced with a hybrid design consisting of a commercial processor and a prototype TinyChip. Since the proposed design requires a special register and address mapping in the processor, a bus interface circuit between the Thermal Management Unit and processor is implemented to replace the special register and address mapping. The signal assignment between bus interface and Thermal Management Unit will accurately reflect the proposed design.
- Since the target temperature reading is on the processor, a matching hybrid temperature sensor part for the processor is used to replace the on-chip one. This situation introduces an extra System Management Bus Interface [17] between the Thermal Management Unit and temperature sensor used. This mechanism provides the ability to measure the target processor's temperature and does not impede the concept of the proposed design since the on-chip temperature sensor is implemented in our previous research [19], matching the qualification defined in Section 2.
- Even with the added blocks and replacement parts needed for an initial prototype implementation, the signal assignment and design specification is still valid for SoC design. The prototype implementation is simply used to verify the architecture as described in Section 2.
- In both designs of offset temperature monitor and threshold temperature monitor, the speed of the temperature sensors and speed of the programmable unit have been defined to use a single comparator circuit



Fig. 2. Detailed block diagram of thermal management system.



Fig. 3. Offset temperature monitor.

to monitor multiple sensors using serial I/O, thus an implementation of more then 4 channels in this design can be done with very little extra circuitry.

With this sample prototype design, the proposed thermal management system for SoC design can be



Fig. 4. Threshold temperature monitor.

easily verified. Also, different approaches for a thermal management system can be easily implemented with the proposed architecture, since this system provides flexible ways for systems to read the temperature, set the threshold value for interrupt generation, and measure temperature values from different sensors. This design can be used to implement but is not limited to the ACPI protocol. For instance, the temperature threshold can be set to any number of values to represent any number of critical situations. Fuzzy logic control and other algorithms requiring more levels of alerts can be applied. Also with the capability of actively acquiring temperature measures at any time, the CPU can verify a desired temperature response when it exectues a cooling action. With this feedback, actions like increase/reduce FAN speed and clock rates can be applied for more compex management algorithms.

#### 4. Conclusion

A novel fully integrated dynamic thermal management circuit for system-on-chip design has been described. The architecture and design detail with its justification, as well as the final system integration for a complete thermal management system for SoC design was presented. The innovative temperature offset monitoring provides a mechanism for system-onchip designs to monitor the temperature offset across the system and enhance stability. With proper handling of this information, the system not only prevents failure but also enhances performance by controlling each subcomponent's operation speed with feedback from thermal information. With minimum overhead in chip area and system resources, this design provides intricate control and optimal thermal management on chip, upon which a complete dynamic thermal management system for modern computer designs can be implemented.

### References

- H. Chiueh, J. Draper, L. Luh, and J. Choma Jr., "A thermal evaluation of integrated circuits: On chip offset temperature measurement and modeling," in *Proc. 2nd Internationl Workshop* on *Design of Mixed-Mode Integrated Circuits and Applications*, 1998, pp. 109–113.
- V. Szekely, M. Rencz, and B. Courtois, "Thermal testing methods to increase system reliability," in *Proc. 13th IEEE SEMI-THERM Symposium*, 1997, pp. 210–217.
- J. Draper, J. Block, J. Koller, and C. Steele, "Thermal management in embedded systems using MEMS," in *Proc. Lecture Notes in Computer Science 1388 (IPPS/SPDP'98 Workshops Proceedings)*, 1998, pp. 900–901.
- Compaq, Intel, Microsoft, Phoenix, and Toshiba, "Advanced configuration and power interface specification," July 27, 2000.

- 5. J. Steele, "ACPI thermal sensing and control in the PC," in *Proc. Wescon 98*, 1998, pp. 169–182.
- H. Chiueh, J. Draper, L. Luh, and J. Choma Jr., "A novel model for on-chip heat dissipation," in *Proc. The 1998 IEEE Asia-Pacific Conference on Circuits and Systems*, 1998, pp. 779–782.
- C.-H. Tsai and S.-M. Kang, "Substrate thermal model reduction for efficient transient electrothermal simulation," in *Proc. 2000* Southwest Sympoium on Mixed-Signal Designs, 2000, pp. 185– 100
- I. Fried, "Glitch prompts Intel to recall 1.13-GHz Pentiums," http://news.cnet.com.
- I. Fried, "Hardware sites help Intel isolate chip problem," http://news.cnet.com.
- S. Musil, "The week in review: Intel hits a speed bump," http://news.cnet.com.
- J. W. Sofia, "Analysis of thermal transient data with synthesized dynamic models for semiconductor devices," in *Proc. 10th IEEE* SEMI-THERM, 1994, pp. 78–85.
- V. Szekely, A. Poppe, A. Pahi, A. Csendes, G. Hajas, and M. Rencz, "Electro-thermal and logic-thermal simulation of VLSI designs." *IEEE Transactions on VLSI Systems* 5, pp. 258–269, 1997
- Goodson, Santiago, T. W. Kenny, Carruthers, and Towe, "Electrokinetic micro coolers," presented at *International Interconnect Technology Conference*, San Francisco, CA, 2000.
- H. Chiueh, J. Draper, and J. Choma, Jr., "A programmable thermal management interface circuit for powerPC systems," in Proc. 6th International Workshop on Thermal Investigation of ICs and Systems, 2000.
- H. Chiueh, L. Luh, J. Draper, and J. Choma Jr., "A novel fully intergrated fan controller for advanced computer systems," in *Proc. Southwest Symposium on Mixed-Signal Design*, 2000, pp. 191–194.
- H. Chiueh, J. Draper, and J. Choma Jr., "Implementation of a temperature monitor interface circuit for powerPC systems," in Proc. The 43rd Midwest Symposium on Circuits and Systems, 2000
- 17. SBS Implementers Forum, "System management bus (SMBus) specification," August 3, 2000.
- 18. MOSIS. http://www.mosis.com.
- L. Luh, J. Choma Jr., J. Draper, and H. Chiueh, "A high-speed CMOS on-chip temperature sensor," in *Proc. European Solid-State Circuits Conference (ESSCIRC99)*, 1999, pp. 290–293.



**Herming Chiueh** received the B.S. degree from the Department of Electrophysics, National Chiao Tung University, Hsin-Chu, Taiwan in 1992, and the M.S. and Ph.D. degrees from Department of Electrical Engineering, University of Southern California, Los Angeles, U.S. in 1994 and 2002. From 1996–2002, he was with Information Sciences Institute, University of Southern California, Marina del Rey, California, U.S. Currently, he is an Assistant Professor, Department of Communication Engineering, School of Electrical Engineering and Computer Science, National Chiao Tung University, Hsin-Chu, Taiwan. His research interests include system-on-chip design methodology, thermal management for VLSI, and power-aware integrated circuits.



Jeffrey Draper holds a joint appointment as a Research Assistant Professor in the Department of Electrical Engineering at University of Southern California and a Project Leader in the Computational Sciences Division at USC Information Sciences Institute. Dr. Draper has led the VLSI effort on several large projects in the past 5 years and most recently directed the development of a 55-million transistor processing-in-memory (PIM) chip. Dr. Draper received his Ph.D. in Computer Engineering from the University of Texas at Austin. His research interests are PIM architectures, VLSI, thermal management parallel computer architectures, and interconnection networks.



**John Choma** earned his B.S., M.S., and Ph.D. degrees in electrical engineering from the University of

Pittsburgh in 1963, 1965, and 1969, respectively. He is Professor of Electrical Engineering at the University of Southern California, where he teaches undergraduate and graduate courses in electrical circuit theory and analog integrated electronics. Prof. Choma consults in the areas of broadband analog and high-speed digital integrated circuit analysis, design, and modeling.

Prior to joining the USC faculty in 1980, Prof. Choma was a senior staff design engineer in the TRW Microelectronics Center in Redondo Beach, California. His earlier positions include technical staff at Hewlett-Packard Company in Santa Clara, California, Senior Lecturer in the Graduate Division of the Department of Electrical Engineering of the California Institute of Technology, lectureships at the University of Santa Clara and the University of California at Los Angeles, and a faculty appointment at the University of Pennsylvania.

Prof. Choma, the author or co-author of some 135 journal and conference papers and the presenter of more than sixty invited short courses, seminars, and tutorials, is the 1994 recipient of the Prize Paper Award from the IEEE Microwave Theory and Techniques Society. He is the author of a Wiley Interscience text on electrical network theory and a forthcoming text on integrated circuit design for communication system applications. Prof. Choma has contributed several chapters to five edited electronic circuit texts, and he was an area of editor of the IEEE/CRC Press Handbook of Circuits and Filters.

Prof. Choma has served the IEEE Circuits And Systems Society as a member of its Board of Governors, its Vice President for Administration, and its President. He has been an Associate Editor and Editor 'In' Chief of the IEEE Transactions On Circuits And Systems, Part II. He is an Associate Editor of the Journal of Analog Integrated Circuits and Signal Processing and a former Regional Editor of the Journal of Circuits, Systems, and Computers.

A Fellow of the IEEE, Prof. Choma has been awarded the IEEE Millennium medal and has received three awards from the IEEE Circuits and Systems Society; namely, the Golden Jubilee Award, the 1999 Education Award, and the 2000 Meritorious Service Award. He is also the recipient of several local and national teaching awards. Prof. Choma is a Distinguished Lecturer in the IEEE Circuits And Systems Society.