A New Algorithm for Skew Detection of Telugu Language Document based on Principle-axis Farthest Pairs Quadrilateral (PFPQ)

MSLB. Subrahmanyam; V. Vijaya Kumar; B. Eswara Reddy

首页> 外文期刊>International Journal of Image, Graphics and Signal Processing >A New Algorithm for Skew Detection of Telugu Language Document based on Principle-axis Farthest Pairs Quadrilateral (PFPQ)

【24h】

A New Algorithm for Skew Detection of Telugu Language Document based on Principle-axis Farthest Pairs Quadrilateral (PFPQ)

机译：基于主轴最远对四边形（PFPQ）的泰卢固语文档倾斜检测新算法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Skew detection and correction is one of the major preprocessing steps in the document analysis and understanding. In this paper we are proposing a new method called “Principle-axis farthest pairs Quadrilateral (PFPQ)” mainly for detecting skew in the Telugu language document and also in other Indian languages. One of the popular and classical languages of India is Telugu language. The Telugu language is spoken by more than 80 million people. The Telugu language consists of simple and complex characters attached with some extra marks known as “maatras” and “vatthulu”. This makes the process of skewing of Telugu document is more complex when compared to other languages. The PFPQ, initially performs pre-processing and divides the text in to connected components and estimates principle axis furthest pair quadrilateral then removes the small and large portions of quadrilaterals of connected components. Then by using painting and directional smearing algorithms the PFPQ estimates the skew angle and performs the de-skew. We tested extensively the proposed algorithm with five different kinds of documents collected from various categories i.e., Newspapers, Magazines, Textbooks, handwritten documents, Social media and documents of other Indian languages. The images of these documents also contain complex categories like scientific formulas, statistical tables, trigonometric functions, images, etc. and encouraging results are obtained.

机译：歪斜检测和纠正是文档分析和理解中的主要预处理步骤之一。在本文中，我们提出一种称为“本轴最远对四边形（PFPQ）”的新方法，主要用于检测泰卢固语和其他印度语言中的偏斜。印度的流行和古典语言之一是泰卢固语。泰卢固语已经有8000万人使用。泰卢固语由简单和复杂的字符组成，并附加了一些额外的标记，称为“ maatras”和“ vatthulu”。与其他语言相比，这使泰卢固文文档的倾斜过程更加复杂。 PFPQ首先执行预处理，然后将文本分为连接的组件，并估计主轴最远的四边形对，然后删除连接的组件的四边形的较小部分和较大部分。然后，通过使用绘画和定向涂抹算法，PFPQ估计偏斜角并执行去偏斜。我们从五种不同类别的文档中广泛地测试了所提出的算法，这些文档分别来自报纸，杂志，教科书，手写文档，社交媒体和其他印度语言的文档。这些文档的图像还包含复杂的类别，例如科学公式，统计表，三角函数，图像等，并且获得了令人鼓舞的结果。

著录项

来源
《International Journal of Image, Graphics and Signal Processing》 |2018年第3期|共12页
作者
MSLB. Subrahmanyam; V. Vijaya Kumar; B. Eswara Reddy;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类 TP391.41;
关键词

相似文献

外文文献
中文文献
专利

1. Language Independent Skew Detection and Correction of Printed Text Document Images: A Non-rotational Approach [J] . S. Murali, G. Hemanthkumar, P. Nagabhushan Vivek . 2006,第2期

机译：与语言无关的偏斜检测和打印文本文档图像校正：一种非旋转方法
2. A ROBUST AND FAST SKEW DETECTION ALGORITHM FOR GENERIC DOCUMENTS [J] . Yu B., Jain AK. Pattern Recognition: The Journal of the Pattern Recognition Society . 1996,第10期

机译：通用文档的鲁棒快速偏移检测算法
3. Skew Estimation by Improved Boundary Growing for Text Documents in South Indian Languages [J] . Shivakumara P., Nagabhushan P., Hemantha Kumar G., Vivek . 2006,第2期

机译：改进的边界增长对南印度语言文本文档的偏斜估计
4. Fast and accurate skew detection algorithm for a text document or a document with straight lines [C] . Goroh Bessho, Ricoh Co., Ltd., Conference on document recognition . 1994

机译：用于文本文档或直线文档的快速准确的歪斜检测算法
5. Extended Bayes and skewing: On two improvements to standard induction-based learning algorithms. [D] . Rosell, Bernard. 2005

机译：扩展的贝叶斯和偏斜：基于标准归纳学习算法的两个改进。
6. Early detection of internet trolls: Introducing an algorithm based on word pairs / single words multiple repetition ratio [O] . Sergei Monakhov, Alexandre Bovet, Alexandre Bovet, 2020

机译：早期检测互联网巨魔：引入基于词对/单词多个重复率的算法
7. Skew detection in document images based on rectangular active contour [O] . Fan HJ(范慧杰), Zhu LL(朱琳琳), Tang YD(唐延东) 2010

机译：基于矩形活动轮廓的文档图像歪斜检测
8. Fast algorithm for skew detection [R] . Adrian Amin, Stephen Fischer, Tony Parkinson 1994

机译：用于偏斜检测的快速算法

A New Algorithm for Skew Detection of Telugu Language Document based on Principle-axis Farthest Pairs Quadrilateral (PFPQ)

摘要

著录项

相似文献

相关主题

期刊订阅