Text Area Identification in Web Images

机译：Web图像中的文本区域识别

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the explosive growth of the World Wide Web, millions of documents are published and accessed on-line. Statistics show that a significant part of Web text information is encoded in Web images. Since Web images have special characteristics that sometimes distinguish them from other types of images, commercial OCR products often fail to recognize Web images due to their special characteristics. This paper proposes a novel Web image processing algorithm that aims to locate text areas and prepare them for OCR procedure with better results. Our methodology for text area identification has been fully integrated with an OCR engine and with an Information Extraction system. We present quantitative results for the performance of the OCR engine as well as qualitative results concerning its effects to the Information Extraction system. Experimental results obtained from a large corpus of Web images, demonstrate the efficiency of our methodology.

机译：随着万维网的爆炸性增长，数百万个文档被在线发布和访问。统计数据表明，Web文本信息的很大一部分都编码在Web图像中。由于Web图像具有有时会与其他类型的图像区分开的特殊特征，因此商用OCR产品通常由于其特殊特征而无法识别Web图像。本文提出了一种新颖的Web图像处理算法，旨在定位文本区域并为OCR过程做好准备，以取得更好的效果。我们的文本区域识别方法已与OCR引擎和信息提取系统完全集成。我们提出了OCR引擎性能的定量结果，以及有关其对信息提取系统的影响的定性结果。从大量Web图像集获得的实验结果证明了我们方法的有效性。

著录项

来源
《Hellenic Conference on AI(Artificial Intellignece)(SENTN 2004); 20040505-20040508; Samos; GR》|2004年|P.82-92|共11页
会议地点 Samos(GR);Samos(GR)
作者
Stavros J. Perantonis; Basilios Gatos; Vassilios Maragos; Vangelis Karkaletsis; George Petasis;
展开▼
作者单位

Computational Intelligence Laboratory, Institute of Informatics and Telecommunications, National Research Center "Demokritos", 153 10 Athens, Greece;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. A novel method for binarization of scene text images and its application in text identification [J] . Ghoshal Ranjit, Roy Anandarup, Banerjee Ayan, Pattern Analysis and Applications . 2019,第4期

机译：一种场景文本图像二值化的新方法及其在文本识别中的应用
2. Large Size Czochralski Growth and Scintillation Properties of src="/images/tex/38450.gif" alt="text{Mg}^{2+}"> Co-doped src="/images/tex/38451.gif" alt="text{Ce}:text{Gd}_{3}text{Ga}_{3}text{Al}_{2}text{O}_{12}"> [J] . Kei Kamada, Yasuhiro Shoji, Vladimir V. Kochurikhin, IEEE Transactions on Nuclear Science . 2016,第2期

机译： src =“ / images / tex / 38450.gif” alt =“ text {Mg} ^ {2 +}”> 共掺杂的大尺寸直拉生长和闪烁特性 src =“ / images / tex / 38451.gif” alt =“文本{Ce}：文本{Gd} _ {3}文本{Ga} _ {3}文本{Al} _ {2 } text {O} _ {12}“>
3. A 0.27e src="/images/tex/33864.gif" alt="^{-}_{text {rms}}"> Read Noise 220- src="/images/tex/33865.gif" alt="mu text{V}/text{e}^{-}"> Conversion Gain Reset-Gate-Less CMOS Image Sensor With 0.11- src="/images/tex/26026.gif" alt="mu text{m}"> CIS Process [J] . Seo Min-Woong, Kawahito Shoji, Kagawa Keiichiro, Electron Device Letters, IEEE . 2015,第12期

机译：0.27e src =“ / images / tex / 33864.gif” alt =“ ^ {-} _ {text {rms}}”> 读取噪声220- src =“ / images / tex / 33865.gif” alt =“ mu text {V} / text {e} ^ {-}”> 转换增益Reset-Gate-Less CMOS具有0.11- 的图像传感器 src =“ / images / tex / 26026.gif” alt =“ mu text {m}”> CIS工艺
4. Local Binary Pattern-Based Features for Text Identification of Web Images [C] . Jung Insook, Oh Il-Seok 2010 20th International Conference on Pattern Recognition . 2010

机译：基于本地二进制模式的Web图像文本识别功能
5. Rhetorical relationships between images and text in Web pages. [D] . Marsh, Emily Elizabeth. 2002

机译：网页中图像和文本之间的修辞关系。
6. Block selective redaction for minimizing loss during de-identification of burned in text in irreversibly compressed JPEG medical images [O] . David A. Clunie, Dan Gebow 2015

机译：阻止选择性编辑以最大程度地减少不可逆压缩的JPEG医学图像中文本中烧伤的识别过程中的损失
7. Text area identification in web images [O] . S. J. Perantonis, B. Gatos, V. Maragos, 2004

机译：Web图像中的文本区域标识
8. Machine Printed Text and Handwriting Identification in Noisy Document Images [R] . Zheng, Y. , Li, H. , Doermann, D. 2003

机译：嘈杂文档图像中的机器打印文本和手写识别

Text Area Identification in Web Images

摘要

著录项

相似文献

相关主题

期刊订阅