首页> 外文OA文献 >Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings

【2h】

Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings

机译：基于（双语）词嵌入的单语和跨语言信息检索模型

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a new unified framework for monolingual (MoIR) and cross-lingual information retrieval (CLIR) which relies on the induction of dense real-valued word vectors known as word embeddings (WE) from comparable data. To this end, we make several important contributions: (1) We present a novel word representation learning model called Bilingual Word Embeddings Skip-Gram (BWESG) which is the first model able to learn bilingual word embeddings solely on the basis of document-aligned comparable data; (2) We demonstrate a simple yet effective approach to building document embeddings from single word embeddings by utilizing models from compositional distributional semantics. BWESG induces a shared cross-lingual embedding vector space in which both words, queries, and documents may be presented as dense real-valued vectors; (3) We build novel ad-hoc MoIR and CLIR models which rely on the induced word and document embeddings and the shared cross-lingual embedding space; (4) Experiments for English and Dutch MoIR, as well as for English-to-Dutch and Dutch-to-English CLIR using benchmarking CLEF 2001-2003 collections and queries demonstrate the utility of our WE-based MoIR and CLIR models. The best results on the CLEF collections are obtained by the combination of the WE-based approach and a unigram language model. We also report on significant improvements in ad-hoc IR tasks of our WE-based framework over the state-of-the-art framework for learning text representations from comparable data based on latent Dirichlet allocation (LDA).

机译：我们提出了一种新的统一的单语言（MoIR）和跨语言信息检索（CLIR）框架，该框架依赖于从可比数据中引出的密集实值单词向量，即单词嵌入（WE）。为此，我们做出了一些重要的贡献：（1）我们提出了一种新颖的单词表示学习模型，称为双语单词嵌入跳过语言（BWESG），这是第一个能够仅基于文档对齐来学习双语单词嵌入的模型可比数据；（2）我们展示了一种简单有效的方法，可以利用组成分布语义的模型从单个单词嵌入中构建文档嵌入。 BWESG引入了一个共享的跨语言嵌入向量空间，其中单词，查询和文档都可以表示为密集的实值向量；（3）我们建立了新颖的临时MoIR和CLIR模型，该模型依赖于诱导的单词和文档嵌入以及共享的跨语言嵌入空间；（4）使用基准CLEF 2001-2003集合和查询进行的英语和荷兰语MoIR以及英语到荷兰语和荷兰语到英语CLIR的实验证明了我们基于WE的MoIR和CLIR模型的实用性。通过基于WE的方法和unigram语言模型的组合，可以获得CLEF集合的最佳结果。我们还报告了基于WE的框架的临时IR任务相对于基于潜在Dirichlet分配（LDA）从可比较数据中学习文本表示的最新框架的显着改进。

著录项

作者
Vulic Ivan; Moens Marie-Francine;
展开▼
作者单位

展开▼
年度 2015
总页数
原文格式 PDF
正文语种 en
中图分类

相似文献

外文文献
中文文献
专利

1. Statistical Models for Monolingual and Bilingual Information Retrieval [J] . NICOLA BERTOLDI, MARCELLO FEDERICO Information retrieval . 2004,第1a2期

机译：单语和双语信息检索的统计模型
2. A Survey of Cross-lingual Word Embedding Models [J] . Ruder Sebastian, Vulic Ivan, Sogaard Anders The Journal of Artificial Intelligence Research . 2019,第期

机译：跨语言嵌入模型的调查
3. BoVW model based on adaptive local and global visual words modeling and log-based relevance feedback for semantic retrieval of the images [J] . Ruqia Bibi, Zahid Mehmood, Rehan Mehmood Yousaf, EURASIP journal on image and video processing . 2020,第1期

机译：基于自适应本地和全局视觉单词建模的BOVW模型和基于日志相关反馈的图像的语义检索
4. Learning Indonesian-Chinese Lexicon with Bilingual Word Embedding Models and Monolingual Signals [C] . Xinying Qiu, Gangqin Zhu Workshop on south and southeast Asian NLP . 2016

机译：用双语词嵌入模型和单语信号学习印尼汉语词典
5. Multilingual model using cross-lingual word embeddings based on subword alignment and cross-task projection利用統計を見る [D] . Sakuma Jin 2019

机译：使用基于子词对齐和跨任务投影的跨语言词嵌入的多语言模型
6. An effective content-based image retrieval technique for image visuals representation based on the bag-of-visual-words model [O] . Safia Jabeen, Zahid Mehmood, Toqeer Mahmood, -1

机译：基于视觉袋模型的基于内容的有效图像检索技术
7. Monolingual and cross-lingual probabilistic topic models and their applications in information retrieval [O] . Moens Marie-Francine, Vulic Ivan 2013

机译：单语言和跨语言概率主题模型及其在信息检索中的应用

Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings

摘要

著录项

相似文献

相关主题

期刊订阅