首页> 外文期刊>Expert systems with applications >Web-based Pattern Learning For Named Entity Translation In Korean-chinese Cross-language Information Retrieval

Web-based Pattern Learning For Named Entity Translation In Korean-chinese Cross-language Information Retrieval


获取原文并翻译 | 示例


Named entity (NE) translation plays an important role in many applications, such as information retrieval and machine translation. In this paper, we focus on translating NEs from Korean to Chinese in order to improve Korean-Chinese cross-language information retrieval (KCIR). The ideographic nature of Chinese makes NE translation difficult because one syllable may map to several Chinese characters. We propose a hybrid NE translation system. First, we integrate two online databases to extend the coverage of our bilingual dictionaries. We use Wikipedia as a translation tool based on the inter-language links between the Korean edition and the Chinese or English editions. We also use Naver.com's people search engine to find a query name's Chinese or English translation. The second component of our system is able to learn Korean-Chinese (K-C), Korean-English (K-E), and English-Chinese (E-C) translation patterns from the web. These patterns can be used to extract K-C, K-E and E-C pairs from Google snippets. We found KCIR performance using this hybrid configuration over five times better than that a dictionary-based configuration using only Naver people search. Mean average precision was as high as 0.3385 and recall reached 0.7578. Our method can handle Chinese, Japanese, Korean, and non-CJK NE translation and improve performance of KCIR substantially.
机译:命名实体(NE)转换在许多应用程序中扮演重要角色,例如信息检索和机器翻译。在本文中,我们专注于将NE从韩文翻译成中文,以改善韩汉跨语言信息检索(KCIR)。中文的表意性质使NE翻译变得困难,因为一个音节可能映射到几个汉字。我们提出了一种混合式NE翻译系统。首先,我们集成了两个在线数据库以扩展双语词典的覆盖范围。我们根据朝鲜语版本与中文或英文版本之间的中间语言链接,将Wikipedia用作翻译工具。我们还使用Naver.com的人员搜索引擎来查找查询名称的中文或英文翻译。我们系统的第二个组件是能够从网络上学习朝鲜语-中文(K-C),朝鲜语-英语(K-E)和英语-汉语(E-C)的翻译模式。这些模式可用于从Google摘录中提取K-C,K-E和E-C对。我们发现,使用这种混合配置的KCIR性能要比仅使用Naver人员搜索的基于字典的配置好五倍。平均平均精度高达0.3385,召回率达到0.7578。我们的方法可以处理中文,日文,韩文和非CJK NE翻译,并显着提高KCIR的性能。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号