题名DSP&CPU编译系统的研究与实现
作者叶崴
学位类别博士
答辩日期2005
授予单位中国科学院声学研究所
授予地点中国科学院声学研究所
关键词VLIW/SIMD体系结构 指令级并行编译 移植 自动生成SIMD代码 模调度
其他题名Research and Implementation On DSP&CPU Compiler
中文摘要本文选题来源于国家973项目"面向功能可重组结构的DSP&CPU芯片及其软件系统的基础研究"(课题编号:G1999032900)该项目组包括硬件系统和软件系统的研制。硬件系统包含OSP&CPU总体和芯片测试研究,OSP&CpU体系结构、功能组织和硬件实现研究等子项目,软件系统包含编译器,操作系统,汇编系统和调试仿真系统等子项目。DSP&CPU芯片采用了VLIW/SIMD体系结构,将DSP和CPU有机地结合在一起,采用软、硬件协同设计的思想,克服了单纯依靠堆积硬件来提高处理器性能的传统设计模式。本文重点研究了DSP&CPU芯片编译系统的优化和实现,对指令级并行编译技术以及编译器自动生成SIMD代码等作了深入的研究,并将部分研究成果应用于移植后的GCC后端,取得比较好的效果。本文的主要贡献包括以下内容:1.提出了DSP&CPU芯片的编译系统自动生成SIMD代码时媒体向量协处理器的内存单元连续访问和对齐限制的解决算法,利用数据重组图来实现和优化数据重组指令的插入。在以DSP&CPU芯片为平台对MPEGZ视频编码算法进行优化,在roCT解码中应用了新的快速算法,采用SIMD指令对程序进行优化。2.提出了采用模调度在DSP&CPU芯片上实现软件流水的算法。针对支持推断和推测机制,资源同构和中等大小的寄存器文件,不支持旋转寄存器文件等体系结构上的特点,实现推断互斥操作共享同一功能单元来有效减少资源的压力,讨论了单循环出口的代码生成,研究了模变量展开来消除迭代内和迭代间的数据伪相关。3.对GCc现有后端进行优化和局部的重新设计,主要的工作体现在机器描述中,利用基于正则表达式语法规则和有限自动机原理的流水线描述语言对DSP&CPU芯片的流水线结构进行描述,实现指令级并行调度。添加新的指令模板来支持窥孔优化,推断执行和分支延迟调度等优化。4.提出了访问图分割的启发式算法,进一步改进了局部变量存储赋值算法。详细讨论了如何通过对局部变量在堆栈上的存储位置重新安排,利用自增量/减量寻址模式来减少地址运算指令的数量,并通过访问序列和访问图的概念,将该问题转化为图的最小覆盖的问题。
英文摘要The work in this paper is a part of National 973 Program "The Basic Research on a DSP&CPU Chip with reconfigurable architecture to function and its software system"(Project No. Gl999032900), and the research project include the development of both hardware system and software system. The hardware system includes the test research for DSP&CPU chip, the research of the architecture, functional organization and implementation of DSP&CPU chip etc. The software system includes compiler, operating system, assembler, the debugger and simulator etc. The design of DSP&CPU chip adopt the VLIW/SIMD architecture, effectively combining the CPU and DSP, adopting the idea of cooperating design of software and hardware, overcoming the traditional designing mode of promoting the processor's performance simply by incorporating more circuit. This paper focus on the optimization and implementation of the the DPS&CPU chip compilation system, discussing in depth the key techniques of instruction-level parallelizing compiler system and automatic generation of SIMD code etc. We applied the part of the research result in the backend of already ported GCC and acquired fairly good effect. The main contributions of this paper are: Present the solution to the alignment constraint in the Media Vector Coprocessor's memory unit during the process of the automatic generation of SIMD code, applying the data reorganization graph to implement and optimize the insert of data organizing instructions. We optimize the MPEG2 visual encoding/decoding algorithm for the DSP&CPU chip, applying the SIMD instructions to optimizing the new quick algorithm in IDCT encoding. Present the modulo scheduling to implement the software pipelining for the DSP&CPU chip. Detailing the architectural features that impact software pipelining such as the predication and speculation, homogeneous resources and a moderately sized register file, not supporting rotating register file, we utilize the predication to make the mutex operations share the same function unit in order to reduce the resource pressure, discussing the code generation scheme for single exit loop, applying the modulo variable expansion to eliminate the intraand inter-iteration false dependence. 1. Make optimization and partial redesign of the existing back end of GCC. The most work was done to the machine description. We use the pipeline description language based on the syntax of regular expression and finite state machine, to decribe the pipeline structure of DSP&CPU chip and implement the instruction-level parallelizing scheduling. We added new instruction template to support the peephole optimization, predicating execution and conditional deley scheduling etc. Present the partition heuristics algorithm of access graph, improving the storage assignment of local variables. By specially arranging the placement of variables in the stack and accessing the local variables by using as much auto-increment/auto-decrement addressing mode as we can, we can decrease the amount of address arithmetic instruction and promote the program performance. With the concept of Access Sequence and Access Graph, the solution of this problem is equivalent to find the maximum cover of the Access Graph.
语种中文
公开日期2011-05-07
页码111
内容类型学位论文
源URL[http://159.226.59.140/handle/311008/900]  
专题声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式
GB/T 7714
叶崴. DSP&CPU编译系统的研究与实现[D]. 中国科学院声学研究所. 中国科学院声学研究所. 2005.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace