Information Extraction from Wikipedia: Moving Down the Long Tail

机译：维基百科中的信息提取：长尾巴

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable self-supervised information extraction. While previous efforts at extraction from Wikipedia achieve high precision and recall on well-populated classes of articles, they fail in a larger number of cases, largely because incomplete articles and infrequent use of infoboxes lead to insufficient training data. This paper presents three novel techniques for increasing recall from Wikipedia's long tail of sparse classes: (1) shrinkage over an automatically-learned subsumption taxonomy, (2) a retraining technique for improving the training data, and (3) supplementing results by extracting from the broader Web. Our experiments compare design variations and show that, used in concert, these techniques increase recall by a factor of 1.76 to 8.71 while maintaining or increasing precision.

机译：Wikipedia不仅是质量信息的综合来源，而且它具有多种内部结构（例如，称为信息框的关系摘要），可以进行自我监督的信息提取。尽管以前从Wikipedia提取数据的努力达到了很高的精度，并且可以回想起填充良好的文章类，但它们在很多情况下都失败了，这在很大程度上是因为文章不完整和信息框的不经常使用导致训练数据不足。本文介绍了三种新颖的技术，可提高Wikipedia稀疏类的长尾记忆：（1）缩小自动学习的分类法的范围;（2）一种用于改进训练数据的再训练技术;以及（3）通过从中提取信息来补充结果更广泛的网络。我们的实验比较了设计变体，结果表明，这些技术共同使用时，可以在保持或提高精度的同时，将召回率提高1.76至8.71倍。

著录项

来源
《ACMKDD International Conference on Knowledge Discovery and Data Mining;KDD 2008》|2008年|713-721|共9页
会议地点
作者
Fei Wu; Raphael Hoffmann; Daniel S. Weld;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息与知识传播;
关键词
information extraction; wikipedia; semantic web;

机译：信息提取;维基百科;语义网;

相似文献

外文文献
中文文献
专利

1. WHAD: Wikipedia historical attributes data Historical structured data extraction and vandalism detection from the Wikipedia edit history [J] . Enrique Alfonseca, Guillermo Garrido, Jean-Yves Delort, Language Resources and Evaluation . 2013,第4期

机译：WHAD：Wikipedia历史属性数据历史数据结构化数据提取和Wikipedia编辑历史中的恶意破坏检测
2. Space use by Black-tailed Godwits Limosa limosa limosa during settlement at a previous or a new nest location: Capsule Black-tailed Godwits first return to the nest location of the previous year, even when moving to a different nest location later that season. [J] . Bird Study . 2008,第2期

机译：黑尾God鱼在先前或新的巢穴定居期间对空间的利用：胶囊黑尾God鱼首先返回上一年的巢穴位置，即使在该季节晚些时候移至其他巢穴位置时也是如此。
3. Tail Calculus with Remainder, Applications to Tail Expansions for Infinite Order Moving Averages, Randomly Stopped Sums, and Related Topics [J] . PH. BARBE, W.P. MCCORMICK Extremes . 2004,第4期

机译：带余数的尾部演算，无限阶移动平均的尾部扩展应用，随机停止的求和以及相关主题
4. Information Extraction from Wikipedia: Moving Down the Long Tail [C] . ACMKDD International Conference on Knowledge Discovery and Data Mining . 2008

机译：来自维基百科的信息提取：移动长尾
5. Entity Extraction and Disambiguation in Short Text Using Wikipedia and Semantic User Profiles. [D] . Zendejas, Ignacio. 2014

机译：使用Wikipedia和语义用户配置文件在短文本中提取和消除歧义。
6. Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall [O] . Daniel M. Lowe, Noel M. O’Boyle, Roger A. Sayle 2016

机译：使用Wikipedia进行有效的化学疾病识别和关系提取以提高召回率
7. Is Wikipedia Growing a Longer Tail? [O] . Shyong (tony K. Lam, John Riedl 2011

机译：维基百科长尾了吗？

Information Extraction from Wikipedia: Moving Down the Long Tail

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅