首页> 外文学位 >A Neural Approach to Cross-Lingual Information Retrieval
【24h】

A Neural Approach to Cross-Lingual Information Retrieval

机译:跨语言信息检索的神经方法

获取原文
获取原文并翻译 | 示例

摘要

With the rapid growth of world-wide information accessibility, cross-language information retrieval (CLIR) has become a prominent concern for search engines. Traditional CLIR technologies require special purpose components and need high quality translation knowledge (e.g. machine readable dictionaries, machine translation systems) and careful tuning to achieve high ranking performance. However, with the help of a neural network architecture, it's possible to solve CLIR problem without extra tuning or special components. This work proposes a bilingual training approach, a neural CLIR solution allowing automatic learning of translation relationships from noisy translation knowledge. External sources of translation knowledge are used to generate bilingual training data then the bilingual training data is fed into a kernel based neural ranking model. During the end-to-end training, word embeddings are tuned to preserve translation relationships between bilingual word pairs and also tailored for the ranking task. In experiments we show that the bilingual training approach outperforms traditional CLIR techniques given the same external translation knowledge source and it's able to yield ranking results as good as that of a monolingual information retrieval system.;In experiments we investigate the source of effectiveness for our neural CLIR approach by analyzing the pattern of trained word embeddings. Also, possible methods to further improve performance are explored in experiments, including cleaning training data by removing ambiguous training queries, exploring whether more training data will improve the performance by learning the relationship between training dataset size and model performance, and investigating the affect of English queries' text-transform in training data. Lastly, we design an experiment that analyzes the quality of testing query translation to quantify the model performance in a real testing scenario where model takes manually written English queries as input.
机译:随着全球信息可访问性的快速增长,跨语言信息检索(CLIR)已成为搜索引擎关注的重点。传统的CLIR技术需要特殊用途的组件,并且需要高质量的翻译知识(例如机器可读词典,机器翻译系统)和精心调整才能获得较高的排名性能。但是,借助神经网络体系结构,无需额外的调整或特殊组件即可解决CLIR问题。这项工作提出了一种双语培训方法,一种神经CLIR解决方案,可以从嘈杂的翻译知识中自动学习翻译关系。翻译知识的外部来源用于生成双语培训数据,然后将双语培训数据输入基于内核的神经排名模型中。在端到端训练期间,调整单词嵌入以保留双语单词对之间的翻译关系,还可以针对排名任务进行量身定制。在实验中我们证明了双语培训方法在相同的外部翻译知识源的情况下优于传统的CLIR技术,并且能够产生与单语信息检索系统相同的排名结果。;在实验中我们研究了神经元有效性的来源CLIR方法是通过分析训练的词嵌入模式来实现的。此外,在实验中探索了可能进一步提高性能的方法,包括通过删除模棱两可的训练查询来清理训练数据,通过学习训练数据集大小和模型性能之间的关系来探索更多的训练数据是否会改善性能,以及调查英语的影响。查询训练数据中的文本转换。最后,我们设计了一个实验,该实验分析测试查询翻译的质量,以量化实际测试场景中模型的性能,在该场景中,模型将手动编写的英语查询作为输入。

著录项

  • 作者

    Liu, Qing.;

  • 作者单位

    Carnegie Mellon University.;

  • 授予单位 Carnegie Mellon University.;
  • 学科 Computer science.
  • 学位 M.S.
  • 年度 2018
  • 页码 98 p.
  • 总页数 98
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号