首页> 外文会议>International Workshop on Document Analysis Systems >Script Identification in Printed Bilingual Documents
【24h】

Script Identification in Printed Bilingual Documents

机译:印刷双语文档中的脚本识别

获取原文

摘要

Identification of script in multi-lingual documents is essential for many language dependent applications such as machine translation and optical character recognition. Techniques for script identification generally require large areas for operation so that sufficient information is available. Such assumption is nullified in Indian context, as there is an interspersion of words of two different scripts in most documents. In this paper, techniques identify the script of a word are discussed. Two different approaches have been proposed and tested. The first method structures words into 3 distinct spatial zones and utilizes the information on the spatial spread of a word in upper and lower zones, together with the character density, in order to identify the script. The second technique analyzes the directional energy distribution of a word using Gabor filters with suitable frequencies and orientations. Words with various font styles and sizes have been used for the testing of the proposed algorithms and the results obtained are quite encouraging.
机译:在多语言文档中识别脚本对于许多语言依赖性应用是必不可少的,例如机器转换和光学字符识别。脚本识别的技术通常需要大区域进行操作,以便提供足够的信息。在印度语境中,这种假设是无效的,因为大多数文档中有两个不同脚本的单词的界定。在本文中,讨论了技术识别单词的脚本。已经提出并测试了两种不同的方法。第一种方法将单词构成为3个不同的空间区域,并利用关于上下区域的单词的空间扩展信息,以及字符密度,以识别脚本。第二种技术通过具有合适频率和方向的Gabor滤波器分析了单词的方向能量分布。具有各种字体样式和大小的单词已被用于测试所提出的算法,并且获得的结果非常令人鼓舞。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号