标题: | JPEG2000编码器之加速和TI DSP系统平台上之实现 Acceleration and Implementation of JPEG2000 Encoder on TI DSP Platform |
作者: | 刘建志 杭学鸣 电机学院IC设计产业专班 |
关键字: | JPEG2000;TI DSP;DSP系统加速;EBCOT;JPEG2000;TI DSP;DSP platform acceleration;EBCOT |
公开日期: | 2006 |
摘要: | 由于数位影像应用的逐渐普及,为了提供更有压缩效率以及支援更多功能的影像处理,一个新一代的静态影像压缩标准JPEG2000于是产生。它在高压缩率下也能够提供相当好的主观品质,此外,它在压缩效能和传送位元流时提供了更细致的调整功能。然而,JPEG2000在计算上的复杂度相当的高,在本论文中,我们将JPEG200编码器实现在TI DSP平台上。我们根据JPEG2000中最复杂的Tier部份,提出两种改善方法,并且加上TI DSP最佳化的各种相关工具来进行加速。 我们的参考软体采用了openJPEG ver.1.0,因为这套软体的小波转换模组已经使用一维补尝式结构(lifting scheme)来进行加速,所以针对占了整个编码器九成运算量的Tier1模组,我们先探讨常见的改善方式,并实际在我们所使用的平台上做测试,然后我们提出了两种改进方法,一种称为VGOSS(Variable group of sample skip),另外一种则是修改VGOSS的方式,来达成减少运算量的目的。这个方式是将需要编码的资料纪录起来,减少对不需要的编码的资料所浪费的检查时间。另外,我们改变了原来编码的顺序,提供更快的运算架构。当我们对影像使用无失真编码时,除了采用所提供的加速方法,还有使用DSP的编译程序最佳化、及程式码的加速技术、还有快取记忆体的重新配置等功能,在最后的在DSP系统上的实验数据显示,我们使用以上所有技术后,可以比最原始的效能还要快32倍,如果比较在同样的DSP最佳化设定还有记忆体配置下,我们的快速演算法仍然可以减少45%的运算量。 Because the usage for digital imagery gets increasingly popular, to enhance the compressed image efficiency and features, a new still image coding standard called JPEG2000 was proposed. It provides an excellent subjective quality at low bit rates. It also offers fine granularity scalability in compression efficiency and transmitting compressed bit stream. However, JPEG2000 is also very complicated in computational complexity. In this thesis, we implement a JPEG2000 encoder on the TI DSP platform. We propose two speed-up methods and use the TI DSP optimization tools to accelerate the Tier1 module, which is the most complex part in the JPEG2000 standard. We start with the ver.1.0 OpenJPEG reference software, which has adopted the 1-D lifting scheme to accelerate the DWT module. Thus we focus on the Tier1 module, which takes about 90% of total computing time. We study the previous methods first and examine their effectiveness on our DSP platform. Then, we propose two improved methods, one is called VGOSS (Variable Group Of Sample Skip), and the other is a modified VGOSS method. We eliminate the unnecessary checking cycles by recording the NBC (Need-to-Be-Coded) samples on a list. Furthermore, the sample index is reordered to facilitate fast execution. In the DSP implementation of the proposed methods, we use code acceleration techniques and DSP compiler-level optimization. We also tune the cache allocation to reduce memory access time. The experimental results show that the best performance is up to 32 times faster than the original program without any optimization on the DSP platform. If the original program is compiled with the DSP optimization tools and proper cache assignment, our fast algorithm can still reduce the computation by 45%. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009395506 http://hdl.handle.net/11536/80343 |
显示于类别: | Thesis |
文件中的档案:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.