首页> 外文会议>Web technologies and applications. >Extracting Difference Information from Multilingual Wikipedia
【24h】

Extracting Difference Information from Multilingual Wikipedia

机译:从多语言维基百科中提取差异信息

获取原文
获取原文并翻译 | 示例

摘要

Wikipedia articles for a particular topic are written in many languages. When we select two articles which are about a single topic but which are written in different languages, the contents of these two articles are expected to be identical because of the Wikipedia policy. However, these contents are actually different, especially topics related to culture. In this paper, we propose a system to extract different Wikipedia information between that shown for Japan and that of other countries. An important technical problem is how to extract comparison target articles of Wikipedia. A Wikipedia article is written in different languages, with their respective linguistic structures. For example, "Cricket" is an important part of English culture, but the Japanese Wikipedia article related to cricket is too simple. Actually, it is only a single page. In contrast, the English version is substantial. It includes multiple pages. For that reason, we must consider which articles can be reasonably compared. Subsequently, we extract comparison target articles of Wikipedia based on a link graph and article structure. We implement our proposed method, and confirm the accuracy of difference extraction methods.
机译:特定主题的维基百科文章用多种语言编写。当我们选择两个主题相同但用不同语言编写的文章时,由于维基百科政策的原因,这两篇文章的内容应该相同。但是,这些内容实际上是不同的,尤其是与文化有关的主题。在本文中,我们提出了一种系统,该系统可以提取在日本显示的信息与其他国家/地区显示的维基百科信息之间的差异。一个重要的技术问题是如何提取维基百科的比较目标文章。维基百科的文章用不同的语言编写,并具有各自的语言结构。例如,“板球”是英语文化的重要组成部分,但是日语Wikipedia中与板球有关的文章太简单了。实际上,它只是一个页面。相反,英文版本是实质性的。它包括多个页面。因此,我们必须考虑可以合理比较哪些条款。随后,我们基于链接图和文章结构提取Wikipedia的比较目标文章。我们实施我们提出的方法,并确认差异提取方法的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号