Combining Visual and Textual Features for Information Extraction from Online Flyers

机译：结合视觉和文字功能，从在线传单中提取信息

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Information in visually rich formats such as PDF and HTML is often conveyed by a combination of textual and visual features. In particular, genres such as marketing flyers and info-graphics often augment textual information by its color, size, positioning, etc. As a result, traditional text-based approaches to information extraction (IE) could underperform. In this study, we present a supervised machine learning approach to IE from online commercial real estate flyers. We evaluated the performance of SVM classifiers on the task of identifying 12 types of named entities using a combination of textual and visual features. Results show that the addition of visual features such as color, size, and positioning significantly increased classifier performance.

机译：诸如PDF和HTML之类的视觉丰富格式的信息通常是通过结合文本和视觉功能来传达的。特别是，诸如营销传单和信息图形之类的类型通常会通过其颜色，大小，位置等来增强文本信息。结果，传统的基于文本的信息提取（IE）方法可能表现不佳。在这项研究中，我们提出了一种从在线商业房地产传单到IE的有监督的机器学习方法。我们在结合文本和视觉功能识别12种命名实体的任务上评估了SVM分类器的性能。结果表明，视觉特征（例如颜色，大小和位置）的添加显着提高了分类器的性能。

著录项

来源
《Conference on empirical methods in natural language processing》|2014年|1924-1929|共6页
会议地点
作者
Emilia Apostolova; Noriko Tomuro;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An architecture of open-source tools to combine textual information extraction, faceted search and information visualisation [J] . Sonntag Daniel, Profitlich Hans-Juergen Artificial intelligence in medicine . 2019,第JANa期

机译：结合文本信息提取，分面搜索和信息可视化的开源工具架构
2. Statistical modeling for the detection, localization and extraction of text from heterogeneous textual images using combined feature scheme [J] . Chitrakala Gopalan, D. Manjula Signal, Image and Video Processing . 2011,第2期

机译：使用组合特征方案从异构文本图像中检测，定位和提取文本的统计模型
3. Statistical modeling for the detection, localization and extraction of text from heterogeneous textual images using combined feature scheme - Springer [J] . Chitrakala Gopalan, D. Manjula Signal, Image and Video Processing . 2011,第2期

机译：使用组合特征方案从异类文本图像中检测，定位和提取文本的统计模型-Springer
4. Combining Visual and Textual Features for Information Extraction from Online Flyers [C] . Emilia Apostolova, Noriko Tomuro Conference on empirical methods in natural language processing . 2014

机译：与在线传单中的信息提取相结合的视觉和文本功能
5. All Purpose Textual Data Information Extraction, Visualization and Querying [D] . Hashmi, Syed Usama 2018

机译：通用文本数据信息提取，可视化和查询
6. Biomedical Imaging Modality Classification Using Combined Visual Features and Textual Terms [O] . Xian-Hua Han, Yen-Wei Chen 2011

机译：使用结合的视觉特征和文本术语的生物医学成像模式分类
7. Combining Visual and Textual Features for Information Extraction from Online Flyers [O] . Emilia Apostolova, Noriko Tomuro 2015

机译：结合视觉和文本功能从在线传单中提取信息

Combining Visual and Textual Features for Information Extraction from Online Flyers

摘要

著录项

相似文献

相关主题

期刊订阅