【24h】

Text Area Identification in Web Images

机译:Web图像中的文本区域识别

获取原文
获取原文并翻译 | 示例

摘要

With the explosive growth of the World Wide Web, millions of documents are published and accessed on-line. Statistics show that a significant part of Web text information is encoded in Web images. Since Web images have special characteristics that sometimes distinguish them from other types of images, commercial OCR products often fail to recognize Web images due to their special characteristics. This paper proposes a novel Web image processing algorithm that aims to locate text areas and prepare them for OCR procedure with better results. Our methodology for text area identification has been fully integrated with an OCR engine and with an Information Extraction system. We present quantitative results for the performance of the OCR engine as well as qualitative results concerning its effects to the Information Extraction system. Experimental results obtained from a large corpus of Web images, demonstrate the efficiency of our methodology.
机译:随着万维网的爆炸性增长,数百万个文档被在线发布和访问。统计数据表明,Web文本信息的很大一部分都编码在Web图像中。由于Web图像具有有时会与其他类型的图像区分开的特殊特征,因此商用OCR产品通常由于其特殊特征而无法识别Web图像。本文提出了一种新颖的Web图像处理算法,旨在定位文本区域并为OCR过程做好准备,以取得更好的效果。我们的文本区域识别方法已与OCR引擎和信息提取系统完全集成。我们提出了OCR引擎性能的定量结果,以及有关其对信息提取系统的影响的定性结果。从大量Web图像集获得的实验结果证明了我们方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号