首页> 外文会议>International Conference on Web Information Systems Engineering >Exploiting Structural Similarity for Automatic Information Extraction from Lists

【24h】

Exploiting Structural Similarity for Automatic Information Extraction from Lists

机译：利用列表自动信息提取的结构相似性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a novel technique to reduce dependency on knowledge base for ONDUX, the current state-of-art method for information extraction by text segmentation. While the existing approach mainly relies on high overlapping between pre-existing data and input lists to build an extraction model, our approach exploits structural similarity of text segments in the sequences of a list to align them into groups to achieve effectiveness with low dependency on pre-existing data. Firstly, a structural similarity measure between text segments is proposed and combined with content similarity to assess how likely two text segments in a list should be aligned in the same group. Then we devise a data shifting-alignment technique in which positional information and the similarity scores are employed to cluster text segments into groups before their labels are revised by an HMM-based graphical model. The experimental results on different datasets demonstrate the ability of our method to extract information from lists with high performance and less dependence on knowledge base than the current state-of-art method.

机译：在本文中，我们提出了一种新颖的技术来减少对ONDUX知识库的依赖性，通过文本分割来提取信息提取的当前最先进的方法。虽然现有方法主要依赖于预先存在的数据和输入列表之间的高重叠来构建提取模型，但我们的方法利用了列表的序列中文本段的结构相似性，以将它们对齐，以实现对前的低依赖性的效果 - 表达数据。首先，提出了文本段之间的结构相似度测量并与内容相似度组合，以评估列表中的两个文本段的可能性有多可能在同一组中对齐。然后，我们设计了一种数据转换对准技术，其中使用基于HMM的图形模型对其标签进行修订之前将文本段群集组文本段群集到组中。不同数据集上的实验结果证明了我们的方法从具有高性能的列表中提取信息的能力，而不是对知识库的依赖性而不是当前的最先进的方法。

著录项

来源
《International Conference on Web Information Systems Engineering 》|2013年||共14页
会议地点
作者
Dat T. Huynh; Jiajie Xu; Shazia Sadiq; Xiaofang Zhou;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-53;
关键词

相似文献

外文文献
中文文献
专利

1. Exploiting structural similarity for effective Web information extraction [J] . Sergio Flesca, Giuseppe Manco, Elio Masciari, Data & Knowledge Engineering . 2007 ,第1期

机译：利用结构相似性来有效地提取Web信息
2. Towards an automatic classification of protein structural domains based on structural similarity [J] . Vichetra Sam, Chin-Hsien Tai, Jean Garnier, BMC Bioinformatics . 2008 ,第1期

机译：基于结构相似性的蛋白质结构域自动分类
3. Exploiting extensible background knowledge for clustering-based automatic keyphrase extraction [J] . Hassan Alrehamy, Coral Walker Soft computing: A fusion of foundations, methodologies and applications . 2018 ,第21期

机译：利用基于聚类的自动关键字级提取的可扩展背景知识
4. Exploiting Structural Similarity for Automatic Information Extraction from Lists [C] . Dat T. Huynh, Jiajie Xu, Shazia Sadiq, International conference on web information systems engineering . 2013

机译：利用结构相似性从列表中自动提取信息
5. Automatic term extraction and document similarity in special text corpora. [D] . Dong, Li. 2002

机译：特殊文本语料库中的自动术语提取和文档相似性。
6. Towards an automatic classification of protein structural domains based on structural similarity [O] . Vichetra Sam, Chin-Hsien Tai, Jean Garnier, 2008

机译：基于结构相似性的蛋白质结构域自动分类
7. Exploiting Structural Similarity For Effective Web Information Extraction [O] . Masciari Elio, Flesca Sergio, Manco Giuseppe, 2005

机译：利用结构相似性进行有效的Web信息抽取

Exploiting Structural Similarity for Automatic Information Extraction from Lists

摘要

著录项

相似文献

相关主题

期刊订阅