首页> 外文会议>Research and advanced technology for digital libraries >Multilingual Information Retrieval Based on Document Alignment Techniques
【24h】

Multilingual Information Retrieval Based on Document Alignment Techniques

机译:基于文档对齐技术的多语种信息检索

获取原文
获取原文并翻译 | 示例

摘要

A multilingual information retrieval method is presented where the user formulates the query in his/her preferred language to retrieve relevant information from a multilingual document collection. This multilingual retrieval method involves mono-language searches as well as merging their resutls. We adopt a corpus based approach where documents of differnet languages are associated if they cover a similar story. The resulting comparable corpus enables two novel techniques we have developed. First, it enables Cross-Language Information Retrieval (CLIR) which does not lack vocabulary coverage as we observed in the case of approaches that are based on automatic Machine Translation (MT). Second, aligned documents of this corpus facilitate to merge the resutls of mono- and cross-language searches. Using hte TREC CLIR data, excellent resuts are obtained. In addition, our evaluation of the document alignments gives us new insights about the usefulness of comparable copora.
机译:提出了一种多语言信息检索方法,其中用户以他/她的首选语言来制定查询,以从多语言文档集中检索相关信息。这种多语言检索方法涉及单语言搜索以及合并其结果。我们采用基于语料库的方法,其中如果不同的网络语言的文档涵盖了相似的故事,则它们将被关联。由此产生的可比语料库实现了我们开发的两种新颖技术。首先,它启用了跨语言信息检索(CLIR),该语言不会缺少词汇覆盖率,正如我们在基于自动机器翻译(MT)的方法中观察到的那样。其次,该语料库的对齐文档有助于合并单语言和跨语言搜索的结果。使用TREC CLIR数据可获得出色的结果。此外,我们对文档对齐方式的评估为我们提供了可比拟的copora实用性的新见解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号