Improving Knowledge Base Construction from Robust Infobox Extraction

机译：通过可靠的信息框提取改善知识库的构建

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

A capable, automatic Question Answering (QA) system can provide more complete and accurate answers using a comprehensive knowledge base (KB). One important approach to constructing a comprehensive knowledge base is to extract information from Wikipedia infobox tables to populate an existing KB. Despite previous successes in the Infobox Extraction (IBE) problem (e.g., DB-pedia), three major challenges remain: 1) Deterministic extraction patterns used in DBpe-dia are vulnerable to template changes; 2) Over-trusting Wikipedia anchor links can lead to entity disambiguation errors; 3) Heuristic-based extraction of unlinkable entities yields low precision, hurting both accuracy and completeness of the final KB. This paper presents a robust approach that tackles all three challenges. We build probabilistic models to predict relations between entity mentions directly from the infobox tables in HTML. The entity mentions are linked to identifiers in an existing KB if possible. The unlinkable ones are also parsed and preserved in the final output. Training data for both the relation extraction and the entity linking models are automatically generated using distant supervision. We demonstrate the empirical effectiveness of the proposed method in both precision and recall compared to a strong IBE baseline, DBpe-dia, with an absolute improvement of 41.3% in average F_1. We also show that our extraction makes the final KB significantly more complete, improving the completeness score of list-value relation types by 61.4%.

机译：功能强大的自动问答系统（QA）可以使用综合知识库（KB）提供更完整和准确的答案。构建综合知识库的一种重要方法是从Wikipedia信息框表中提取信息，以填充现有的知识库。尽管先前在信息框提取（IBE）问题（例如DB-pedia）方面取得了成功，但仍存在三个主要挑战：1）DBpe-dia中使用的确定性提取模式易受模板更改的影响; 2）过度信任Wikipedia锚链接可能导致实体歧义错误; 3）基于启发式的不可链接实体提取产生较低的精度，从而损害了最终知识库的准确性和完整性。本文提出了一种可解决所有三个挑战的强大方法。我们建立概率模型，以直接从HTML的信息框表中预测实体提及之间的关系。如果可能，将实体提及链接到现有KB中的标识符。不可链接的内容也将被解析并保留在最终输出中。使用远程监督自动生成关系提取和实体链接模型的训练数据。我们证明了与强IBE基线DBpe-dia相比，该方法在精确度和召回率上的经验有效性，平均F_1绝对提高了41.3％。我们还表明，我们的提取使最终的知识库显着更完整，将列表-值关系类型的完整性得分提高了61.4％。

著录项

来源
《Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2019年|138-148|共11页
会议地点
作者
Boya Peng; Yejin Huh; Xiao Ling; Michele Banko;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Methods of Adaptive Extraction and Analysis of Knowledge for Knowledge-base Construction and Fast Decision Making [J] . Alexander Kuzemin, Darya Fastova, Igor Yanchevsky International Journal Information Theories and Applications . 2005,第1期

机译：用于知识库构建和快速决策的知识自适应提取与分析方法
2. A robust algorithm for weld seam extraction based on prior knowledge of weld seam [J] . Ye Z., Fang G., Chen S., Sensor Review . 2013,第2期

机译：基于焊缝先验知识的焊缝提取鲁棒算法
3. Robust neurofuzzy rule base knowledge extraction and estimation using subspace decomposition combined with regularization and D-optimality [J] . Xia Hong, Harris C.J., Sheng Chen IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics . 2004,第1期

机译：结合子空间分解，正则化和D最优性的鲁棒神经模糊规则库知识提取和估计
4. Improving Knowledge Base Construction from Robust Infobox Extraction [C] . Boya Peng, Yejin Huh, Xiao Ling, Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2019

机译：从强大的InfoBox提取改善知识库建设
5. Wikipedia Infobox Temporal RDF Knowledge Base and Indices [D] . Song, Aige 2015

机译：维基百科信息框时间RDF知识库和索引
6. Design and Construction of a NLP Based Knowledge Extraction Methodology in the Medical Domain Applied to Clinical Information [O] . Denis Cedeño Moreno, Miguel Vargas-Lombardo 2018

机译：基于NLP的医学领域临床信息知识提取方法的设计与构建
7. Methods of Adaptive Extraction and Analysis of Knowledge for Knowledge-base Construction and Fast Decision Making [O] . Kuzemin Alexander, Fastova Darya, Yanchevsky Igor 2005

机译：用于知识库构建和快速决策的知识自适应提取与分析方法

Improving Knowledge Base Construction from Robust Infobox Extraction

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅