Detecting Highly Confident Word Translations from Comparable Corpora without Any Prior Knowledge

机译：在没有任何先验知识的情况下从可比语料库中检测高度自信的单词翻译

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we extend the work on using latent cross-language topic models for identifying word translations across comparable corpora. We present a novel precision-oriented algorithm that relies on per-topic word distributions obtained by the bilingual LDA (BiLDA) latent topic model. The algorithm aims at harvesting only the most probable word translations across languages in a greedy fashion, without any prior knowledge about the language pair, relying on a symmetrization process and the one-to-one constraint. We report our results for Italian-English and Dutch-English language pairs that outperform the current state-of-the-art results by a significant margin. In addition, we show how to use the algorithm for the construction of high-quality initial seed lexicons of translations.

机译：在本文中，我们扩展了使用潜在的跨语言主题模型来识别可比语料库中的单词翻译的工作。我们提出了一种新颖的面向精度的算法，该算法依赖于通过双语LDA（BiLDA）潜在主题模型获得的每个主题的单词分布。该算法旨在以贪婪的方式仅跨语言获取最可能的单词翻译，而无需任何有关语言对的先验知识，这取决于对称化过程和一对一的约束。我们报告了意大利语-英语和荷兰语-英语对的搜索结果，这些搜索结果明显优于当前最新的搜索结果。另外，我们展示了如何使用该算法来构建高质量的初始种子词典。

著录项

来源
《13th Conference of the European Chapter of the Association for Computational Linguistics 2012.》|2012年|p.449-459|共11页
会议地点 Avignon(FR);Avignon(FR)
作者
Ivan Vulic; Marie-Francine Moens;
展开▼
作者单位

Department of Computer Science KU Leuven Celestijnenlaan 200A Leuven, Belgium;

Department of Computer Science KU Leuven Celestijnenlaan 200A Leuven, Belgium;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序设计、软件工程;程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Finding translations for low-frequency words in comparable corpora [J] . Viktor Pekar, Ruslan Mitkov, Dimitar Blagoev, Machine translation . 2006,第4期

机译：在可比语料库中查找低频词的翻译
2. A language modeling approach for extracting translation knowledge from comparable corpora [J] . Jolanta Mizera-Pietraszko Computing reviews . 2014,第1期

机译：一种从可比较语料库中提取翻译知识的语言建模方法
3. Extracting translations from comparable corpora for Cross-Language Information Retrieval using the language modeling framework [J] . Razieh Rahimi, Azadeh Shakery, Irwin King Information Processing & Management . 2016,第2期

机译：使用语言建模框架从可比较的语料库中提取翻译以进行跨语言信息检索
4. Detecting Highly Confident Word Translations from Comparable Corpora without Any Prior Knowledge [C] . Ivan Vulic, Marie-Francine Moens Conference of the European Chapter of the Association for Computational Linguistics . 2012

机译：在没有任何先前知识的情况下，检测来自可比语料库的高度自信的词语
5. Parallel Sentence Detection in Comparable Corpora with Bilingual Word Embeddings for Low-Resource Languages [D] . Cadigan, John. 2018

机译：与低资源语言的双语单词嵌入式的同类语料中的并行句子检测
6. Chronological corpora curve clustering: From scientific corpora construction to knowledge dynamics discovery through word life-cycles clustering [O] . Matilde Trevisani, Arjuna Tuzzi 2018

机译：时序语料库曲线聚类：从科学语料库构建到通过单词生命周期聚类的知识动力学发现
7. Detecting highly confident word translations from comparable corpora without any prior knowledge [O] . Vulic Ivan, Moens Marie-Francine 2012

机译：在没有任何先验知识的情况下，从类似的语料库中检测出高度自信的单词翻译
8. Statistical Word-Level Translation Model for Comparable Corpora [R] . Diab, M. , Finch, S. 2000

机译：可比公司的统计词级翻译模型

Detecting Highly Confident Word Translations from Comparable Corpora without Any Prior Knowledge

摘要

著录项

相似文献

相关主题

期刊订阅