首页> 外文会议>CIKM 10;ACM conference on information and knowledge management >Extracting Structured Information from Wikipedia Articles to Populate Infoboxes
【24h】

Extracting Structured Information from Wikipedia Articles to Populate Infoboxes

机译:从Wikipedia文章中提取结构化信息以填充信息框

获取原文

摘要

Roughly every third Wikipedia article contains an infobox - a table that displays important facts about the subject in attribute-value form. The schema of an infobox, i.e., the attributes that can be expressed for a concept, is defined by an infobox template. Often, authors do not specify all template attributes, resulting in incomplete infoboxes. With iPopulator, we introduce a system that automatically populates infoboxes of Wikipedia articles by extracting attribute values from the article's text. In contrast to prior work, iPopulator detects and exploits the structure of attribute values to independently extract value parts. We have tested iPopulator on the entire set of infobox templates and provide a detailed analysis of its effectiveness. For instance, we achieve an average extraction precision of 91% for 1,727 distinct infobox template attributes.
机译:大约每三篇Wikipedia文章都包含一个信息框-该表以属性-值形式显示有关主题的重要事实。信息框的架构,即可以为概念表达的属性,是由信息框模板定义的。通常,作者没有指定所有模板属性,从而导致信息框不完整。使用iPopulator,我们引入了一个系统,该系统通过从文章的文本中提取属性值来自动填充Wikipedia文章的信息框。与以前的工作相反,iPopulator检测并利用属性值的结构来独立提取值部分。我们已经在整个信息框模板集上测试了iPopulator,并对其有效性进行了详细分析。例如,对于1,727个不同的信息框模板属性,我们实现了91%的平均提取精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号