首页> 外文期刊>Knowledge and information systems >Taxonomic data integration from multilingual Wikipedia editions
【24h】

Taxonomic data integration from multilingual Wikipedia editions

机译:来自多语言Wikipedia版本的分类数据集成

获取原文
获取原文并翻译 | 示例
           

摘要

Information systems are increasingly making use of taxonomic knowledge about words and entities. A taxonomic knowledge base may reveal that the Lago di Garda is a lake and that lakes as well as ponds, reservoirs, and marshes are all bodies of water. As the number of available taxonomic knowledge sources grows, there is a need for techniques to integrate such data into combined, unified taxonomies. In particular, the Wikipedia encyclopedia has been used by a number of projects, but its multilingual nature has largely been neglected. This paper investigates how entities from all editions of Wikipedia as well as WordNet can be integrated into a single coherent taxonomic class hierarchy. We rely on linking heuristics to discover potential taxonomic relationships, graph partitioning to form consistent equivalence classes of entities, and a Markov chain-based ranking approach to construct the final taxonomy. This results in MENTA (Multilingual Entity Taxonomy), a resource that describes 5.4 million entities and is one of the largest multilingual lexical knowledge bases currently available.
机译:信息系统越来越多地利用有关单词和实体的分类学知识。分类学知识基础可能表明,Lago di Garda是一个湖泊,而湖泊以及池塘,​​水库和沼泽都是水域。随着可用的分类学知识来源的数量增加,需要将这些数据集成到组合的统一分类法中的技术。特别是,Wikipedia百科全书库已被许多项目使用,但其多语言性质已被大大忽略。本文研究了如何将来自Wikipedia所有版本以及WordNet的实体集成到单个一致的分​​类类层次结构中。我们依靠链接启发法来发现潜在的分类学关系,图形划分以形成一致的实体等价类,以及基于马尔可夫链的排名方法来构建最终分类学。这样就产生了MENTA(多语言实体分类法),该资源描述了540万个实体,是当前可用的最大的多语言词汇知识库之一。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号