Multimedia processor-based implementation of an error-diffusionhalftoning algorithm exploiting subword parallelism

Jae-Woo Ahn; Wonyong Sun

首页> 外文期刊>IEEE Transactions on Circuits and Systems for Video Technology >Multimedia processor-based implementation of an error-diffusionhalftoning algorithm exploiting subword parallelism

【24h】

Multimedia processor-based implementation of an error-diffusionhalftoning algorithm exploiting subword parallelism

机译：利用子词并行性的基于多媒体处理器的错误扩散半色调算法的实现

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Multimedia processor-based implementations of digital image processing algorithms have become important since several multimedia processors are now available and can replace special-purpose hardware-based systems because of their flexibility. Multimedia processors increase throughput by processing multiple pixels simultaneously using a subword-parallel arithmetic and logic unit architecture. The error-diffusion halftoning algorithm employs feedback of quantized output signals to faithfully convert a multi-level image to a binary image or to one with fewer levels of quantization. This makes it difficult to achieve speedup by utilizing the multimedia extension. In this study, the error-diffusion halftoning algorithm is implemented for a multimedia processor using three methods: single-pixel, single-line, and multiple-line processing. The single-pixel approach is the closest to conventional implementations, but the multimedia extension is used only in the filter kernel. The single-line approach computes multiple pixels in one scan-line simultaneously, but requires a complex algorithm transformation to remove dependencies between pixels. The multiple-line method exploits parallelism by employing a skewed data structure and processing multiple pixels in different scan-lines. The Pentium MMX instruction set is used for quantitative performance evaluation including run-time overheads and misaligned memory accesses. A speedup of more than ten times is achieved compared to the software (integer C) implementation on a conventional processor for the structurally sequential error-diffusion halftoning algorithm

机译：由于现在已经有几种多媒体处理器可用，并且由于它们的灵活性可以替代基于专用硬件的系统，因此基于多媒体处理器的数字图像处理算法实现变得非常重要。多媒体处理器通过使用子字并行算术和逻辑单元架构同时处理多个像素来提高吞吐量。误差扩散半色调算法利用量化输出信号的反馈将多级图像忠实地转换为二进制图像或量化级别较少的图像。这使得难以通过利用多媒体扩展来实现加速。在这项研究中，使用三种方法为多媒体处理器实现了误差扩散半色调算法：单像素，单行和多行处理。单像素方法最接近常规实现，但是多媒体扩展仅在过滤器内核中使用。单行方法可以同时计算一条扫描线中的多个像素，但是需要进行复杂的算法转换才能消除像素之间的依赖性。多行方法通过采用倾斜的数据结构并在不同的扫描行中处理多个像素来利用并行性。 Pentium MMX指令集用于定量性能评估，包括运行时开销和未对齐的内存访问。与结构上顺序的误差扩散半色调算法的常规处理器上的软件（整数C）实现相比，实现了十倍以上的加速

著录项

来源
《IEEE Transactions on Circuits and Systems for Video Technology》 |2001年第2期|p.129-138|共10页
作者
Jae-Woo Ahn; Wonyong Sun;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;
关键词
digital signal processing chips; image representation; multimedia systems; parallel architectures; Pentium MMX instruction set; binary image; digital image processing algorithms; error-diffusion halftoning algorithm; feedback; logic unit architecture; misaligned m;

机译：数字信号处理芯片;图像表示;多媒体系统;并行体系结构;奔腾MMX指令集;二进制图像;数字图像处理算法;误差扩散半色调算法;反馈;逻辑单元体系结构;错位;

相似文献

外文文献
中文文献
专利

1. Combined Application of Data Transfer and Storage Optimizing Transformations and Subword Parallelism Exploitation for Power Consumption and Execution Time Reduction in VLIW Multimedia Processors [J] . K. MASSELOS, F. CATTHOOR, C. E. GOUTIS, Journal of VLSI signal processing . 2004,第1期

机译：数据传输和存储优化转换与子字并行开发在减少VLIW多媒体处理器中的功耗和执行时间方面的结合应用
2. High Performance Discrete Cosine Transform Operator Using Multimedia Oriented Subword Parallelism [J] . ShafqatKhan, EmmanuelCasseau, DanielMenard Advances in Computer Engineering . 2015,第4期

机译：面向多媒体子字并行的高性能离散余弦变换算子
3. Architecture optimization for multimedia application exploiting data and thread-level parallelism [J] . Limousin C, Sebot J, Vartanian A, Journal of systems architecture . 2005,第1期

机译：利用数据和线程级并行性的多媒体应用程序的体系结构优化
4. Implementation of a High Performance Subword Parallelism 64-Bit IMAC for Multimedia Service [C] . GUO Yuan, LI Shaokang, WANG Yiyu, International Conference on Computer Engineering and Technology;ICCET 2010 . 2010

机译：多媒体服务的高性能子字并行64位IMAC的实现
5. Memory reference reduction and exploit parallelism for DSP and communication algorithms and systems implementations on digital signal processor [D] . Tang, Yiyan 2005

机译：用于数字信号处理器上的DSP和通信算法及系统实现的内存引用减少和利用并行性
6. On the Accuracy and Parallelism of GPGPU-Powered Incremental Clustering Algorithms [O] . Chunlei Chen, Li He, Huixiang Zhang, 2017

机译：基于GPGPU的增量聚类算法的准确性和并行性
7. A Fast, Cache-Aware Algorithm for the Calculation of Radiological Paths Exploiting Subword Parallelism [O] . Mark Christiaens, Bjorn De Sutter, Koen De Bosschere, 1998

机译：一种利用子词并行性的放射路径计算的快速缓存感知算法

Multimedia processor-based implementation of an error-diffusionhalftoning algorithm exploiting subword parallelism

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅