首页> 外文OA文献 >Automatic Removal of Handwritten Annotations from Between-text-lines and Inside-text-line Regions of a Printed Text Document
【2h】

Automatic Removal of Handwritten Annotations from Between-text-lines and Inside-text-line Regions of a Printed Text Document

机译:从打印的文本文档的文本行之间和文本行内部区域自动删除手写注释

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Recovering the original printed text document from handwritten annotations, and making it machine readable is still one of the challenging problems in document image analysis, especially when the original document is unavailable. Therefore, our overall aim of this research is to detect and remove any handwritten annotations that may appear in any part of the document, without causing any loss of original printed information. In this paper, we propose two novel methods to remove handwritten annotations that are specifically located in between-text-lines and inside-text-line regions. To remove between-text-line annotations, a two stage algorithm is proposed, which detects the base line of the printed text lines using the analysis of connected components and removes the annotations with the help of statistically computed distance between the text line regions. On the other hand, to remove the inside-text-line annotations, a novel idea of distinguishing between handwritten annotations and machine printed text is proposed, which involves the extraction of three features for the connected components merged at word level from every detected printed text line. As a first distinguishing feature, we compute the density distribution using vertical projection profile; then in the subsequent step, we compute the number of large vertical edges and the major vertical edge as the second and third distinguishing features employing Prewitt edge detection technique. The proposed method is experimented with a dataset of 170 documents having complex handwritten annotations, which results in an overall accuracy of 93.49% in removing handwritten annotations and an accuracy of 96.22% in recovering the original printed text document.
机译:从手写注释中恢复原始打印的文本文档并使其可机读仍然是文档图像分析中的难题之一,尤其是在原始文档不可用时。因此,我们这项研究的总体目标是检测并删除可能出现在文档任何部分中的任何手写注释,而不会造成原始打印信息的丢失。在本文中,我们提出了两种新颖的方法来删除专门位于文本行之间和内部文本行区域中的手写注释。为了删除文本行之间的注释,提出了一种两阶段算法,该算法使用连接的组件分析来检测打印的文本行的基线,并借助统计计算出的文本行区域之间的距离来删除注释。另一方面,为了删除内部文本行注解,提出了区分手写注解和机器印刷文本的新思路,其中涉及从每个检测到的印刷文本中提取在单词级别上合并的连接组件的三个特征。线。作为第一个区别特征,我们使用垂直投影轮廓来计算密度分布。然后在接下来的步骤中,我们使用Prewitt边缘检测技术计算出较大的垂直边缘和主要的垂直边缘的数量,作为第二和第三区别特征。该方法对包含复杂手写注释的170个文档的数据集进行了实验,删除手写注释的总体准确性为93.49%,恢复原始打印的文本文档的准确性为96.22%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号