首页> 外文期刊>International Journal of Image, Graphics and Signal Processing >A New Algorithm for Skew Detection of Telugu Language Document based on Principle-axis Farthest Pairs Quadrilateral (PFPQ)
【24h】

A New Algorithm for Skew Detection of Telugu Language Document based on Principle-axis Farthest Pairs Quadrilateral (PFPQ)

机译:基于主轴最远对四边形(PFPQ)的泰卢固语文档倾斜检测新算法

获取原文
           

摘要

Skew detection and correction is one of the major preprocessing steps in the document analysis and understanding. In this paper we are proposing a new method called “Principle-axis farthest pairs Quadrilateral (PFPQ)” mainly for detecting skew in the Telugu language document and also in other Indian languages. One of the popular and classical languages of India is Telugu language. The Telugu language is spoken by more than 80 million people. The Telugu language consists of simple and complex characters attached with some extra marks known as “maatras” and “vatthulu”. This makes the process of skewing of Telugu document is more complex when compared to other languages. The PFPQ, initially performs pre-processing and divides the text in to connected components and estimates principle axis furthest pair quadrilateral then removes the small and large portions of quadrilaterals of connected components. Then by using painting and directional smearing algorithms the PFPQ estimates the skew angle and performs the de-skew. We tested extensively the proposed algorithm with five different kinds of documents collected from various categories i.e., Newspapers, Magazines, Textbooks, handwritten documents, Social media and documents of other Indian languages. The images of these documents also contain complex categories like scientific formulas, statistical tables, trigonometric functions, images, etc. and encouraging results are obtained.
机译:歪斜检测和纠正是文档分析和理解中的主要预处理步骤之一。在本文中,我们提出一种称为“本轴最远对四边形(PFPQ)”的新方法,主要用于检测泰卢固语和其他印度语言中的偏斜。印度的流行和古典语言之一是泰卢固语。泰卢固语已经有8000万人使用。泰卢固语由简单和复杂的字符组成,并附加了一些额外的标记,称为“ maatras”和“ vatthulu”。与其他语言相比,这使泰卢固文文档的倾斜过程更加复杂。 PFPQ首先执行预处理,然后将文本分为连接的组件,并估计主轴最远的四边形对,然后删除连接的组件的四边形的较小部分和较大部分。然后,通过使用绘画和定向涂抹算法,PFPQ估计偏斜角并执行去偏斜。我们从五种不同类别的文档中广泛地测试了所提出的算法,这些文档分别来自报纸,杂志,教科书,手写文档,社交媒体和其他印度语言的文档。这些文档的图像还包含复杂的类别,例如科学公式,统计表,三角函数,图像等,并且获得了令人鼓舞的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号