A Neural Approach to Cross-Lingual Information Retrieval

机译：跨语言信息检索的神经方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the rapid growth of world-wide information accessibility, cross-language information retrieval (CLIR) has become a prominent concern for search engines. Traditional CLIR technologies require special purpose components and need high quality translation knowledge (e.g. machine readable dictionaries, machine translation systems) and careful tuning to achieve high ranking performance. However, with the help of a neural network architecture, it's possible to solve CLIR problem without extra tuning or special components. This work proposes a bilingual training approach, a neural CLIR solution allowing automatic learning of translation relationships from noisy translation knowledge. External sources of translation knowledge are used to generate bilingual training data then the bilingual training data is fed into a kernel based neural ranking model. During the end-to-end training, word embeddings are tuned to preserve translation relationships between bilingual word pairs and also tailored for the ranking task. In experiments we show that the bilingual training approach outperforms traditional CLIR techniques given the same external translation knowledge source and it's able to yield ranking results as good as that of a monolingual information retrieval system.;In experiments we investigate the source of effectiveness for our neural CLIR approach by analyzing the pattern of trained word embeddings. Also, possible methods to further improve performance are explored in experiments, including cleaning training data by removing ambiguous training queries, exploring whether more training data will improve the performance by learning the relationship between training dataset size and model performance, and investigating the affect of English queries' text-transform in training data. Lastly, we design an experiment that analyzes the quality of testing query translation to quantify the model performance in a real testing scenario where model takes manually written English queries as input.

机译：随着全球信息可访问性的快速增长，跨语言信息检索（CLIR）已成为搜索引擎关注的重点。传统的CLIR技术需要特殊用途的组件，并且需要高质量的翻译知识（例如机器可读词典，机器翻译系统）和精心调整才能获得较高的排名性能。但是，借助神经网络体系结构，无需额外的调整或特殊组件即可解决CLIR问题。这项工作提出了一种双语培训方法，一种神经CLIR解决方案，可以从嘈杂的翻译知识中自动学习翻译关系。翻译知识的外部来源用于生成双语培训数据，然后将双语培训数据输入基于内核的神经排名模型中。在端到端训练期间，调整单词嵌入以保留双语单词对之间的翻译关系，还可以针对排名任务进行量身定制。在实验中我们证明了双语培训方法在相同的外部翻译知识源的情况下优于传统的CLIR技术，并且能够产生与单语信息检索系统相同的排名结果。;在实验中我们研究了神经元有效性的来源CLIR方法是通过分析训练的词嵌入模式来实现的。此外，在实验中探索了可能进一步提高性能的方法，包括通过删除模棱两可的训练查询来清理训练数据，通过学习训练数据集大小和模型性能之间的关系来探索更多的训练数据是否会改善性能，以及调查英语的影响。查询训练数据中的文本转换。最后，我们设计了一个实验，该实验分析测试查询翻译的质量，以量化实际测试场景中模型的性能，在该场景中，模型将手动编写的英语查询作为输入。

著录项

作者
Liu, Qing.;
展开▼
作者单位

Carnegie Mellon University.;

展开▼
授予单位 Carnegie Mellon University.;
学科 Computer science.
学位 M.S.
年度 2018
页码 98 p.
总页数 98
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A multilingual text mining approach to web cross-lingual text retrieval [J] . Rowena Chau, Chung-Hsing Yeh Knowledge-Based Systems . 2004,第5a6期

机译：Web跨语言文本检索的多语言文本挖掘方法
2. Cross-lingual embedding for cross-lingual question retrieval in low-resource community question answering [J] . HajiAminShirazi Shahrzad, Momtazi Saeedeh Machine translation . 2020,第4期

机译：低资源社区问题回答中的交叉语言问题的交叉嵌入
3. Ozone profile retrieval from Global Ozone Monitoring Experiment (GOME) data using a neural network approach (Neural Network Ozone Retrieval System (NNORSY)) - art. no. 4497 [J] . Muller MD., Kaifel AK., Weber M., Journal of Geophysical Research. Biogeosciences . 2003,第d16期

机译：使用神经网络方法（神经网络臭氧检索系统（NNORSY））从全球臭氧监测实验（GOME）数据中检索臭氧资料-艺术。没有。 4497
4. Learning to Translate: A Query-Specific Combination Approach for Cross-Lingual Information Retrieval [C] . Ferhan Ture, Elizabeth Boschee Conference on empirical methods in natural language processing . 2014

机译：学习翻译：跨语言信息检索的特定于查询的组合方法
5. A corpus-based approach for cross-lingual information retrieval. [D] . Li, Kar Wing. 2004

机译：基于语料库的跨语言信息检索方法。
6. Evaluation of a generalizable approach to clinical information retrieval using the automated retrieval console (ARC) [O] . Leonard W DAvolio, Thien M Nguyen, Wildon R Farwell, 2010

机译：使用自动检索控制台（ARC）评估临床信息检索的通用方法
7. Learning to Translate: A Query-Specific Combination Approach for Cross-Lingual Information Retrieval [O] . Ferhan Ture, Elizabeth Boschee 2015

机译：学习翻译：跨语言信息检索的查询特定组合方法

A Neural Approach to Cross-Lingual Information Retrieval

摘要

著录项

相似文献

相关主题

期刊订阅