首页> 外文期刊>Semantic web >N-ary relation extraction for simultaneous T-Box and A-Box knowledge base augmentation
【24h】

N-ary relation extraction for simultaneous T-Box and A-Box knowledge base augmentation

机译:N-ARY关于同时T字幕的关系提取和A字幕知识库增强

获取原文
获取原文并翻译 | 示例
           

摘要

The Web has evolved into a huge mine of knowledge carved in different forms, the predominant one still being the free-text document. This motivates the need for intelligent Web-reading agents: hypothetically, they would skim through disparate Web sources corpora and generate meaningful structured assertions to fuel knowledge bases (KBs). Ultimately, comprehensive KBs, like WIKIDATA and DBPEDIA, play a fundamental role to cope with the issue of information overload. On account of such vision, this paper depicts the FACT EXTRACTOR, a complete natural language processing (NLP) pipeline which reads an input textual corpus and produces machine-readable statements. Each statement is supplied with a confidence score and undergoes a disambiguation step via entity linking, thus allowing the assignment of KB-compliant URIs. The system implements four research contributions: it (1) executes n-ary relation extraction by applying the frame semantics linguistic theory, as opposed to binary techniques; it (2) simultaneously populates both the T-Box and the A-Box of the target KB; it (3) relies on a single NLP layer, namely part-of-speech tagging; it (4) enables a completely supervised yet reasonably priced machine learning environment through a crowdsourcing strategy. We assess our approach by setting the target KB to DBpedia and by considering a use case of 52,000 Italian Wikipedia soccer player articles. Out of those, we yield a dataset of more than 213,000 triples with an estimated 81.27% F-1. We corroborate the evaluation via (i) a performance comparison with a baseline system, as well as (ii) an analysis of the T-Box and A-Box augmentation capabilities. The outcomes are incorporated into the Italian DBpedia chapter, can be queried through its SPARQL endpoint, and/or downloaded as standalone data dumps. The codebase is released as free software and is publicly available in the DBpedia association repository.
机译:网络已经发展成为一种以不同形式雕刻的巨大知识,主要的一个仍然是自由文本文件。这激励了对智能网络阅读代理的需求:假设,他们将浏览不同的网源Corpora并产生有意义的结构性断言,以燃料知识库(KBS)。最终,综合kBs,如wikidata和dbpedia,起到应对信息过载问题的基本作用。由于此类愿景,本文描述了事实提取器,完整的自然语言处理(NLP)管道读取输入文本语料库并产生机器可读语句。每个陈述都提供置信度分数并通过实体链接进行消歧步骤,从而允许分配符合KB的URI。该系统实现了四种研究贡献:它(1)通过应用帧语义语言理论而不是二元技术来执行N-ARY关系提取;它(2)同时填充T字幕和目标KB的A盒;它(3)依赖于单个NLP层,即代表段标记;它(4)通过众群策略,能够完全监督但价格合理的机器学习环境。我们通过将目标KB设置为DBPedia并考虑使用52,000意大利维基百科足球运动员物品的用例来评估我们的方法。在那些中,我们产生了超过213,000三人的数据集,估计有81.27%的F-1。我们通过(i)与基线系统的性能比较进行证实评估,以及(ii)对T字箱和A盒增强功能的分析。结果将其纳入意大利DBPedia章节,可以通过其SparQL端点查询,和/或下载作为独立数据转储。 CodeBase被释放为自由软件,并在DBPedia关联存储库中公开可用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号