Segmentation of Mixed Chinese/English Document Including Scattered Italic Characters

机译：混合中文/英语文件的分割，包括分散斜体字符

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

It is difficult to segment mixed Chinese/English documents when there are many italic characters scattered in documents. Most contributions attach more attention to English documents. However, mixed document is different from English document and some special features should be considered. This paper gives a new way to solve the problem. At first, an appropriate character area is chosen to detect italic. Next, a two-step strategy is adopted. Italic determination is done first and then if the character pattern is identified as italic, the estimation of slant angle will be done. Finally the italic character pattern is corrected by shear transform. A method of adopting two-step weighted projection profile histogram for italic determination is introduced. And a fast algorithm to estimate slant angle is also introduced. Three large sample collections, including character and character-pair and document respectively, are provided to evaluate our method and encouraging results are achieved.

机译：当文件分散在文档中时，难以进行混合的汉语/英语文件。大多数贡献都要更多地关注英语文件。但是，混合文件与英语文件不同，应考虑一些特殊功能。本文给出了解决问题的新方法。首先，选择适当的字符区域以检测斜体。接下来，采用两步策略。首先完成斜体确定，然后如果将字符模式被识别为斜体，则将完成倾斜角的估计。最后，通过剪切变换来校正斜体字符模式。介绍了采用两步加权投影曲线直方图的方法进行斜体确定。还引入了一种快速算法来估计倾斜角度。分别提供了三个大型样本集合，包括字符和字符对和文件，以评估我们的方法，并达到令人鼓舞的结果。

著录项

来源
《International Conference on Computer Processing of Oriental Languages》|2006年||共9页
会议地点
作者
Yong Xia; Chun-Heng Wang; Ru-Wei Dai;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP394;
关键词

相似文献

外文文献
中文文献
专利

1. Gabor-based kernel self-optimization Fisher discriminant for optical character segmentation from text-image-mixed document [J] . Li Jun-Bao, Li Meng, Pan Jeng-Shyang, Optik: Zeitschrift fur Licht- und Elektronenoptik: = Journal for Light-and Electronoptic . 2015,第21期

机译：基于Gabor的内核自优化Fisher判别器，用于从文本图像混合文档中进行光学字符分割
2. A method for improving the accuracy of automatic indexing of Chinese-English mixed documents [J] . Yan ZHAO, Hui SHI 中国文献情报（英文刊） . 2012,第004期

机译：一种提高中英文混合文档自动索引准确性的方法
3. A method for improving the accuracy of automatic indexing of Chinese-English mixed documents [J] . Yan, ZHAO, Hui, 中国文献情报：英文版 . 2012,第004期

机译：一种提高中英文混合文档自动索引准确性的方法
4. Segmentation of Mixed Chinese/English Document Including Scattered Italic Characters [C] . Yong Xia, Chun-Heng Wang, Ru-Wei Dai Computer Processing of Oriental Languages: Beyond the Orient: The Research Challenges Ahead; Lecture Notes in Artificial Intelligence; 4285 . 2006

机译：包含离散斜体字符的中英文混合文档的细分
5. The development of Chinese word reading: Relations of sub-character processing, phonological awareness, morphological awareness, and orthographic knowledge to Chinese-English biscriptal reading. [D] . Tong, Xiuli. 2008

机译：汉语单词阅读的发展：汉字双字阅读中的子字符处理，语音意识，形态意识和拼字知识的关系。
6. Comparison of prototype and rote instruction of English names for Chinese visual characters. [O] . D W Duan, A J Cuvo 1996

机译：中文视觉字符英文名原型和死记硬背指令的比较。
7. Character Segmentation and Recognition of Alphanumeric-mixed Documents Based on Pattern Recognition Information [O] . Yasuo Hongo 2002

机译：基于模式识别信息的字母分割与识别字母数字混合文档

Segmentation of Mixed Chinese/English Document Including Scattered Italic Characters

摘要

著录项

相似文献

相关主题

期刊订阅