首页> 外文会议>International conference on web information systems engineering >A Multilingual Approach to Discover Cross-Language Links in Wikipedia
【24h】

A Multilingual Approach to Discover Cross-Language Links in Wikipedia

机译:在维基百科中发现跨语言链接的多语言方法

获取原文

摘要

Wikipedia is a well-known public and collaborative encyclopaedia consisting of millions of articles. Initially in English, the popular website has grown to include versions in over 288 languages. These versions and their articles are interconnected via cross-language links, which not only facilitate navigation and understanding of concepts in multiple languages, but have been used in natural language processing applications, developments in linked open data, and expansion of minor Wikipedia language versions. These applications axe the motivation for an automatic, robust, and accurate technique to identify cross-language links. In this paper, we present a multilingual approach called EurekaCL to automatically identify missing cross-language links in Wikipedia. More precisely, given a Wikipedia article (the source) EurekaCL uses the multilingual and semantic features of BabelNet 2.0 in order to efficiently identify a set of candidate articles in a target language that are likely to cover the same topic as the source. The Wikipedia graph structure is then exploited both to prune and to rank the candidates. Our evaluation carried out on 42,000 pairs of articles in eight language versions of Wikipedia shows that our candidate selection and pruning procedures allow an effective selection of candidates which significantly helps the determination of the correct article in the target language version.
机译:维基百科是著名的公共和协作百科全书,包含数百万篇文章。最初使用英语,该受欢迎的网站已发展成为包含超过288种语言的版本。这些版本及其文章通过跨语言链接相互连接,这些链接不仅有助于导航和理解多种语言的概念,而且还用于自然语言处理应用程序,链接开放数据的开发以及次要Wikipedia语言版本的扩展。这些应用程序消除了自动,鲁棒和准确的技术来识别跨语言链接的动机。在本文中,我们提出了一种称为EurekaCL的多语言方法,可以自动识别Wikipedia中缺少的跨语言链接。更准确地说,给定Wikipedia文章(源),EurekaCL使用BabelNet 2.0的多语言和语义功能,以有效地识别目标语言中可能涵盖与源相同主题的一组候选文章。然后,利用Wikipedia图结构修剪和排序候选者。我们对Wikipedia的8种语言版本的42,000对文章进行的评估表明,我们的候选人选择和修剪程序可以有效地选择候选人,从而极大地帮助您确定目标语言版本中的正确文章。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号