首页> 外文会议>ACM international conference on information and knowledge management >Filtering and Clustering Relations for Unsupervised Information Extraction in Open Domain
【24h】

Filtering and Clustering Relations for Unsupervised Information Extraction in Open Domain

机译:开放域中无监督信息提取的过滤和聚类关系

获取原文
获取外文期刊封面目录资料

摘要

Information Extraction has recently been extended to new areas by loosening the constraints on the strict definition of the extracted information and allowing to design more open information extraction systems. In this new domain of unsupervised information extraction, we focus on the task of extracting and characterizing a priori, unknown relations between a given set of entity types. One of the challenges of this task is to deal with the large amount of candidate relations when extract ing them from a large corpus. We propose in this paper an approach for the filtering of such candidate relations based on heuristics and machine learning models. More precisely, we show that the best model for achieving this task is a Conditional Random Field model according to evaluations performed on a manually annotated corpus of about one thousand relations. We also tackle the problem of identifying semantically similar relations by clustering large sets of them. Such clustering is achieved by combining a classical clustering algorithm and a method for the efficient identification of highly similar relation pairs. Finally, we evaluate the impact of our filtering of relations on this semantic clustering with both internal measures and external measures. Results show that the filtering procedure doubles the recall of the clustering while keeping the same precision.
机译:信息提取最近通过松开了对提取信息的严格定义并允许设计更多开放信息提取系统的限制来扩展到新领域。在这个无监督信息提取的新域中,我们专注于提取和表征一组特定实体类型之间的先后关系的任务。这项任务的一个挑战是在从大型语料库中提取它们时处理大量候选关系。我们提出了一种基于启发式和机器学习模型的这种候选关系过滤的方法。更确切地说,我们表明,实现这项任务的最佳模型是根据对大约一千个关系的手动注释的语料库执行的评估的条件随机字段模型。我们还通过聚类大集合来解决识别语义相似关系的问题。通过组合经典聚类算法和用于高效识别高度相似关系对的方法来实现这种聚类。最后,我们通过内部措施和外部措施评估了我们对这种语义聚类关系的影响。结果表明,过滤程序将群集的召回加倍,同时保持相同的精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号