首页> 外文会议>International Conference on Web Information Systems Engineering >Exploiting Structural Similarity for Automatic Information Extraction from Lists
【24h】

Exploiting Structural Similarity for Automatic Information Extraction from Lists

机译:利用列表自动信息提取的结构相似性

获取原文

摘要

In this paper, we propose a novel technique to reduce dependency on knowledge base for ONDUX, the current state-of-art method for information extraction by text segmentation. While the existing approach mainly relies on high overlapping between pre-existing data and input lists to build an extraction model, our approach exploits structural similarity of text segments in the sequences of a list to align them into groups to achieve effectiveness with low dependency on pre-existing data. Firstly, a structural similarity measure between text segments is proposed and combined with content similarity to assess how likely two text segments in a list should be aligned in the same group. Then we devise a data shifting-alignment technique in which positional information and the similarity scores are employed to cluster text segments into groups before their labels are revised by an HMM-based graphical model. The experimental results on different datasets demonstrate the ability of our method to extract information from lists with high performance and less dependence on knowledge base than the current state-of-art method.
机译:在本文中,我们提出了一种新颖的技术来减少对ONDUX知识库的依赖性,通过文本分割来提取信息提取的当前最先进的方法。虽然现有方法主要依赖于预先存在的数据和输入列表之间的高重叠来构建提取模型,但我们的方法利用了列表的序列中文本段的结构相似性,以将它们对齐,以实现对前的低依赖性的效果 - 表达数据。首先,提出了文本段之间的结构相似度测量并与内容相似度组合,以评估列表中的两个文本段的可能性有多可能在同一组中对齐。然后,我们设计了一种数据转换对准技术,其中使用基于HMM的图形模型对其标签进行修订之前将文本段群集组文本段群集到组中。不同数据集上的实验结果证明了我们的方法从具有高性能的列表中提取信息的能力,而不是对知识库的依赖性而不是当前的最先进的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号