...
首页> 外文期刊>The Journal of Artificial Intelligence Research >Unsupervised Methods for Determining Object and Relation Synonyms on the Web
【24h】

Unsupervised Methods for Determining Object and Relation Synonyms on the Web

机译:Web上确定对象和关系同义词的无监督方法

获取原文
获取原文并翻译 | 示例

摘要

The task of identifying synonymous relations and objects, or synonym resolution, is critical for high-quality information extraction. This paper investigates synonym resolution in the context of unsupervised information extraction, where neither hand-tagged training examples nor domain knowledge is available. The paper presents a scalable, fully-implemented system that runs in O(KN log N) time in the number of extractions, N, and the maximum number of synonyms per word, K. The system, called Resolver, introduces a probabilistic relational model for predicting whether two strings are co-referential based on the similarity of the assertions containing them. On a set of two million assertions extracted from the Web, Resolver resolves objects with 78% precision and 68% recall, and resolves relations with 90% precision and 35% recall. Several variations of Resolver's probabilistic model are explored, and experiments demonstrate that under appropriate conditions these variations can improve F1 by 5%. An extension to the basic Resolver system allows it to handle polysemous names with 97% precision and 95% recall on a data set from the TREC corpus.
机译:识别同义词关系和对象或同义词解析的任务对于高质量信息提取至关重要。本文研究在无监督信息提取的情况下的同义词解析,在这种情况下,既没有手工标记的训练示例,也没有领域知识。本文提出了一个可扩展的,完全实现的系统,该系统在O(KN log N)时间内以提取数N和每个单词的最大同义词数K运行。该系统称为Resolver,介绍了一种概率关系模型用于根据包含两个断言的相似性来预测两个字符串是否为共同引用。在从Web提取的一组两百万个断言中,Resolver以78%的精度和68%的调用率解析对象,并以90%的精度和35%的调用率解析关系。探索了Resolver概率模型的几种变体,实验表明,在适当的条件下,这些变体可使F1提高5%。基本解析程序系统的扩展允许它以TREC语料库中的数据集处理97%的精度和95%的查全率的多义词名称。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号