首页> 外文期刊>Engineering Applications of Artificial Intelligence >Derivation of 'is a' taxonomy from Wikipedia Category Graph
【24h】

Derivation of 'is a' taxonomy from Wikipedia Category Graph

机译:来自维基百科类别图的“是”分类法的派生

获取原文
获取原文并翻译 | 示例

摘要

Knowledge acquisition still represents one of the main challenging obstacles to designing intelligent systems exhibiting human-level performance in complex intelligent tasks. The recent developments in crowdsourcing technologies have opened new promising opportunities to overcome this problem by exploiting large amounts of machine readable knowledge to perform tasks requiring human intelligence. Wikipedia is a case of this research trend, being the largest collaborative and multilingual resource and linguistic knowledge that contains unstructured and semi-structured information. In this paper, we propose an approach for deriving "is a" taxonomy from the Wikipedia Categories Graph (WCG), which is an open collaborative resource. After building and filtering the WCG from a Wikipedia dump, the process would mainly consist in the exploitation of the "BY" tag and the sharing of plural headers. These methods provide a graph formed by a set of non-connected subgraphs. Therefore, we propose a process for linking them to finally obtain an "is a" taxonomy with only one root and modeled as a direct acyclic graph (DAG). In this work, specific DAG handling algorithms are used, including an algorithm for a DAG into sub-DAGs and another for merging two DAGs. The obtained taxonomy is assessed using semantic similarity measures, which consist in quantifying the likeness between two concepts or words. Therefore, we exploit a set of well-known benchmarks to compare the results obtained via the generated taxonomy to those achieved with WordNet, a resource created and maintained by domain experts. The experimental results revealed good correlations between computed values and human judgments. Compared to WordNet, the derived taxonomy was also noted to lead to an enhanced coverage capacity.
机译:知识获取仍然是设计在复杂智能任务中表现出人类水平性能的智能系统的主要挑战之一。众包技术的最新发展通过利用大量机器可读知识来执行需要人类智能的任务,从而为克服该问题提供了新的有希望的机会。维基百科就是这种研究趋势的一个例子,它是最大的协作和多语言资源以及包含非结构化和半结构化信息的语言知识。在本文中,我们提出了一种从Wikipedia类别图(WCG)派生“ is”分类法的方法,该类别是一种开放式协作资源。在从Wikipedia转储构建并过滤了WCG之后,该过程主要包括利用“ BY”标签和共享多个标头。这些方法提供了由一组未连接的子图形成的图。因此,我们提出了一个链接它们的过程,以最终获得仅具有一个根的“是”分类法,并将其建模为直接非循环图(DAG)。在这项工作中,使用了特定的DAG处理算法,包括将DAG转换为子DAG的算法,以及用于合并两个DAG的算法。使用语义相似性度量对获得的分类法进行评估,语义相似性度量包括量化两个概念或单词之间的相似性。因此,我们利用一组众所周知的基准来比较通过生成的分类法获得的结果与使用WordNet(由领域专家创建和维护的资源)获得的结果。实验结果表明,计算值与人工判断之间具有良好的相关性。与WordNet相比,还指出派生的分类法可以增强覆盖范围。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号