...
首页> 外文期刊>Vivek >Skew Estimation by Improved Boundary Growing for Text Documents in South Indian Languages
【24h】

Skew Estimation by Improved Boundary Growing for Text Documents in South Indian Languages

机译:改进的边界增长对南印度语言文本文档的偏斜估计

获取原文
获取原文并翻译 | 示例

摘要

Estimating the inclination of lines in skewed documents made up of texts in south Indian languages (Kannada, Telugu, Tamil and Malayalam) is not as straight forward as computing the skew of text documents in English. This is due to additional modifier-characters, which get plugged in as bottom fixes or top fixes, or as extensions, that remain as disconnected protrusions of a main character. Under such circumstances direct application of Boundary Growing (BG) method would fail to perform accurately, hence we have proposed a corrective step employing Nearest Neighbor Clustering (NNC). BG and NNC jointly derive the coordinates to be input into moments computation to estimate the angle of inclination. The new model is tested on varieties of documents containing noisy texts, mixed with pictures, text in different resolutions which are composed in south Indian languages Kannada, Telugu, Tamil and Malayalam. For the purpose of contrasting, texts in English are also considered.
机译:估计由南印度语言(卡纳达语,泰卢固语,泰米尔语和马拉雅拉姆语)组成的倾斜文档中的行的倾斜度不如计算英语文本文档的倾斜度那样直接。这是由于附加的修饰符,它们作为底部固定或顶部固定,或作为扩展插入,并保留为主要角色的不连续突出部分。在这种情况下,直接应用边界增长(BG)方法将无法准确执行,因此我们提出了使用最近邻居聚类(NNC)的纠正步骤。 BG和NNC共同得出要输入到力矩计算中的坐标,以估计倾斜角度。该新模型在包含嘈杂文本,混合图片,各种分辨率的文本的各种文档上进行了测试,这些文本以南印度语卡纳达语,泰卢固语,泰米尔语和马拉雅拉姆语组成。为了对比,还考虑了英语文本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号