首页> 外文会议>Conference on empirical methods in natural language processing >Combining Visual and Textual Features for Information Extraction from Online Flyers
【24h】

Combining Visual and Textual Features for Information Extraction from Online Flyers

机译:结合视觉和文字功能,从在线传单中提取信息

获取原文

摘要

Information in visually rich formats such as PDF and HTML is often conveyed by a combination of textual and visual features. In particular, genres such as marketing flyers and info-graphics often augment textual information by its color, size, positioning, etc. As a result, traditional text-based approaches to information extraction (IE) could underperform. In this study, we present a supervised machine learning approach to IE from online commercial real estate flyers. We evaluated the performance of SVM classifiers on the task of identifying 12 types of named entities using a combination of textual and visual features. Results show that the addition of visual features such as color, size, and positioning significantly increased classifier performance.
机译:诸如PDF和HTML之类的视觉丰富格式的信息通常是通过结合文本和视觉功能来传达的。特别是,诸如营销传单和信息图形之类的类型通常会通过其颜色,大小,位置等来增强文本信息。结果,传统的基于文本的信息提取(IE)方法可能表现不佳。在这项研究中,我们提出了一种从在线商业房地产传单到IE的有监督的机器学习方法。我们在结合文本和视觉功能识别12种命名实体的任务上评估了SVM分类器的性能。结果表明,视觉特征(例如颜色,大小和位置)的添加显着提高了分类器的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号