首页> 外文会议>IEEE International Conference on Computer and Communications >Effectiveness of Visual Features on Diverse Reading Orders for Information Extraction
【24h】

Effectiveness of Visual Features on Diverse Reading Orders for Information Extraction

机译:视觉特征对信息阅读中不同阅读顺序的影响

获取原文

摘要

Information extraction from unstructured documents, meant only for human readers, has to be dealt with differently than from the structured documents. Unstructured documents include visual clues that draw human attention and convey the majority of information to readers. There have been several recent advancements in information extraction in such documents using the conventional natural language processing methodologies. However, there has been little to no work towards using the non-sequential relationships that are found only in unstructured documents for the task of information extraction. In this study, we propose novel methodologies to capture the non-sequential relationships present in the unstructured documents for the task of Named Entity Recognition (NER) using Conditional Random Field (CRF). We experiment with two different datasets having different types of logical reading order and we compare three sets of features. The NER model, that uses the proposed novel features, achieves mean F1-Scores of 68.15% on Retail Receipt and 85.54% on Air Ticket documents.
机译:从非结构化文档中提取信息(仅适用于人类读者)的处理方式必须不同于结构化文档。非结构化文档包括视觉线索,这些线索会引起人们的注意,并将大多数信息传达给读者。使用常规的自然语言处理方法在这种文档中的信息提取方面有一些最新进展。但是,对于将仅在非结构化文档中发现的非顺序关系用于信息提取的任务,几乎没有工作。在这项研究中,我们提出了一种新颖的方法来使用条件随机场(CRF)捕获命名实体识别(NER)任务中非结构化文档中存在的非顺序关系。我们对具有不同类型的逻辑阅读顺序的两个不同的数据集进行了实验,并比较了三组特征。使用建议的新颖功能的NER模型在零售收据上的平均F1-得分达到68.15%,在机票单据上的平均F1-得分为85.54%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号