标题: | x86超纯量微处理机中以语意为基础之载入/储存指令排程 Semantic-Based Load/Store Scheduling for X86 Superscalar Microprocessors |
作者: | 蒋昆成 Kuen-Cheng Chiang 单智君 Jean Jyh-Jiun Shann 资讯科学与工程研究所 |
关键字: | 超纯量;微处理机;以语意为基础;排程;载入/储存;superscalar;microprocessor;semantic-based;scheduling;load/store |
公开日期: | 1999 |
摘要: | x86相容微处理器是目前最被广泛使用的一般用途计算机架构,然而其效能却严重的受限于载入/储存指令的执行时间。因此,有许多方法被提出尝试要突破记忆体存取的限制,例如:位址预测之载入/储存指令排程、相依性预测之载入/储存指令排程。但这些载入/储存指令排程的策略仍受限于位址计算及耗时的记忆体存取。在本篇论文中,我们提出一种新的机制,称为以语意为基础的载入/储存指令排程(Semantic-based load/store scheduling),来减缓这些限制的影响。在x86计算机架构中,大部分函式呼叫的区域变数与呼叫参数都被储存在堆叠记忆体(Stack memory)中。我们发现如果与堆叠存取相关之指令中的偏移值(Displacement)相同的话,则这些指令将会存取记忆体中相同的位址;因此,我们可以依据指令中的偏移值来追踪堆叠存取指令间的相依性(dependency)及前馈路径(forwarding path)。模拟的结果显示,以语意为基础的载入/储存指令排程的机制相对于载入指令规避储存指令并前馈资料(load bypassing store with forwarding)的效能,其增益可以达到1.47倍。若是在以语意为基础的载入/储存指令排程再加上选择性位址/相依性预测机制的辅助,效能的增益更可达到1.70倍。 The x86 compatible processor is the most widely used general-purpose architecture and the accessing latencies of load/store operations impact the performance of an x86 processor severely. Therefore, many techniques, such as address prediction and dependency prediction load/store scheduling, have been proposed to break the constraint of memory accesses. However, the performances of these strategies of load/store scheduling are limited in address calculation problems and time-consumed memory accesses. In this thesis, we propose a new mechanism called semantic-based load/store scheduling to alleviate these limitations. In x86 architecture, most of the local variables and parameters of a function are stored in stack memory. We find that the addresses of stack accessing operations will be the same if the displacements within these instructions are the same. Therefore, we track the dependencies and forwarding paths between the stack accessing operations according to the displacement values of the operations. From our simulation results, the speedup of semantic-based load/store scheduling alone can achieve 1.47 compared with the strategy of load bypassing stores with forwarding. While combing this scheduling with selective address/dependency prediction, it can achieve the speedup of 1.70. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT880392020 http://hdl.handle.net/11536/65416 |
显示于类别: | Thesis |