Using Wikipedia to Bootstrap Open Information Extraction

Daniel S. Weld; Raphael Hoffmann; Fei Wu

首页> 外文期刊>SIGMOD record >Using Wikipedia to Bootstrap Open Information Extraction

【24h】

Using Wikipedia to Bootstrap Open Information Extraction

机译：使用维基百科引导开放信息提取

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

By converting unstructured, natural-language text to relational form, information extraction enables many powerful Data Management techniques. However, in order to scale IE to the Web, we must focus on open IE - a paradigm that tackles an unbounded number of relations, eschews domain-specific training data, and scales computationally [2, 11]. This paper describes Kylin, which uses self-supervised learning to train relationally-targeted extractors from Wikipedia infoboxes. We explained how shrinkage and retraining allow Kylin to improve extractor robustness, and we demonstrate that these extractors can successfully mine tuples from a broader set of Web pages. Finally, we argued that the best way to utilize human efforts is by inviting humans to quickly validate the correctness of machine-generated extractions.

机译：通过将非结构化的自然语言文本转换为关系形式，信息提取可实现许多强大的数据管理技术。但是，为了将IE扩展到Web，我们必须专注于开放式IE –一种解决无数关系，避免特定于域的训练数据并进行计算扩展的范例[2，11]。本文介绍了Kylin，它使用自我监督学习来训练Wikipedia信息框中的关系定位提取器。我们解释了收缩和重新训练如何使Kylin改善提取器的鲁棒性，并且我们证明了这些提取器可以成功地从更广泛的Web页面中挖掘元组。最后，我们认为，利用人类努力的最佳方法是邀请人类快速验证机器生成的提取的正确性。

著录项

来源
《SIGMOD record》 |2008年第4期|62-68|共7页
作者
Daniel S. Weld; Raphael Hoffmann; Fei Wu;
展开▼
作者单位

Computer Science & Engineering University of Washington Seattle, WA-98195, USA;

Computer Science & Engineering University of Washington Seattle, WA-98195, USA;

Computer Science & Engineering University of Washington Seattle, WA-98195, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. WHAD: Wikipedia historical attributes data Historical structured data extraction and vandalism detection from the Wikipedia edit history [J] . Enrique Alfonseca, Guillermo Garrido, Jean-Yves Delort, Language Resources and Evaluation . 2013,第4期

机译：WHAD：Wikipedia历史属性数据历史数据结构化数据提取和Wikipedia编辑历史中的恶意破坏检测
2. Parallel sentence extraction to improve cross-language information retrieval from Wikipedia [J] . Juryong Cheon, Youngjoong Ko Journal of Information Science . 2021,第2期

机译：并行句子提取以改善维基百科的交叉语言信息检索
3. Developing an automated mechanism to identify medical articles from wikipedia for knowledge extraction [J] . Yu Lishan, Yu Sheng International journal of medical informatics . 2020,第Sepa期

机译：制定自动机制，以识别维基百科的医疗文章以获取知识提取
4. Bootstrapping Multilingual Relation Discovery Using English Wikipedia and Wikimedia-Induced Entity Extraction [C] . Schone Patrick, Allison Tim, Giannella Chris, 2011 23rd IEEE International Conference on Tools with Artificial Intelligence . 2011

机译：使用英语维基百科和维基媒体诱导的实体提取来引导多语言关系发现
5. Entity Extraction and Disambiguation in Short Text Using Wikipedia and Semantic User Profiles. [D] . Zendejas, Ignacio. 2014

机译：使用Wikipedia和语义用户配置文件在短文本中提取和消除歧义。
6. Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall [O] . Daniel M. Lowe, Noel M. O’Boyle, Roger A. Sayle 2016

机译：使用Wikipedia进行有效的化学疾病识别和关系提取以提高召回率
7. Using wikipedia to bootstrap open information extraction [O] . Daniel S. Weld, Raphael Hoffmann, Fei Wu 2009

机译：使用维基百科引导开放信息提取
8. Expanding the Recall of Relation Extraction by Bootstrapping [R] . Tomita, J. , Soderland, S. , Etzioni, O. 2006

机译：通过Bootstrapping扩展关系提取的召回

Using Wikipedia to Bootstrap Open Information Extraction

摘要

著录项

相似文献

相关主题

期刊订阅