首页> 外文会议>International Workshop on Document Analysis Systems >Word and Sentence Extraction Using irregular Pyramid
【24h】

Word and Sentence Extraction Using irregular Pyramid

机译:使用不规则金字塔的单词和句子提取

获取原文

摘要

This paper presents the result of our continued work on a further enhancement to our previous proposed algorithm. Moving beyond the extraction of word groups and based on the same irregular pyramid structure the new proposed algorithm groups the extracted words into sentences. The uniqueness of the algorithm is in its ability to process text of a wide variation in terms of size, font, orientation and layout on the same document image. No assumption is made on any specified document type. The algorithm is based on the irregular pyramid structure with the application of four fundamental concepts. The first is the inclusion of background information. The second is the concept of closeness where text information within a group is close to each other, in terms of spatial distance, as compared to other text areas. The third is the "majority win" strategy that is more suitable under the greatly varying environment than a constant threshold value. The final concept is the uniformity and continuity among words belonging to the same sentence.
机译:本文介绍了我们继续努力进一步提升我们之前提出的算法的结果。超越单词组的提取并基于相同的不规则金字塔结构,新的建议算法将提取的单词归类为句子。算法的唯一性是能够在同一文档图像上的大小,字体,方向和布局方面处理广泛变化的文本。任何指定的文档类型都没有假设。该算法基于不规则金字塔结构,应用四个基本概念。首先是包含背景信息。第二个是与其他文本区域相比,在空间距离方面彼此接近的近距离的近距离的闭合概念。第三是在大大变化的环境下比恒定阈值更适合的“多数胜利”策略。最终概念是属于同一句子的单词的均匀性和连续性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号