首页> 外文会议>Conference on empirical methods in natural language processing >Combining Visual and Textual Features for Information Extraction from Online Flyers
【24h】

Combining Visual and Textual Features for Information Extraction from Online Flyers

机译:与在线传单中的信息提取相结合的视觉和文本功能

获取原文

摘要

Information in visually rich formats such as PDF and HTML is often conveyed by a combination of textual and visual features. In particular, genres such as marketing flyers and info-graphics often augment textual information by its color, size, positioning, etc. As a result, traditional text-based approaches to information extraction (IE) could underperform. In this study, we present a supervised machine learning approach to IE from online commercial real estate flyers. We evaluated the performance of SVM classifiers on the task of identifying 12 types of named entities using a combination of textual and visual features. Results show that the addition of visual features such as color, size, and positioning significantly increased classifier performance.
机译:视觉丰富的格式(如PDF和HTML)的信息通常由文本和视觉功能的组合传达。 特别是,诸如营销传单和信息图形的流派经常通过其颜色,大小,定位等增强文本信息,结果,传统的基于文本的信息提取方法(即)可能低于表现。 在这项研究中,我们向IE提供了一个来自在线商业房地产传单的监督机器学习方法。 我们使用文本和可视功能的组合鉴定了SVM分类器对识别12种命名实体的任务的性能。 结果表明,添加视觉功能,如颜色,尺寸和定位显着提高了分类器性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号