Bilingual Lexicon Extraction with Forced Correlation from Comparable Corpora

机译：可比语料库中具有强制相关的双语词典提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently a simple linear transformation with word embedding has been found to be highly effective to extract a bilingual lexicon from comparable corpora. However, the pairs of bilingual word embedding for training this transformation are assumed to satisfy a linear relationship automatically which actually can't be guaranteed absolutely in practice. This paper proposes a simple solution based on canonical correlation analysis (CCA) which forces the bilingual word embedding for training the transformation to be maximally linearly correlated onto the projection subspaces. After projecting the original word embedding into the new correlation subspace in two languages, a better transformation matrix is again learned with the new projected word embeddings as before. The experimental results confirm that the proposed solution can achieve a significant improvement of 62% in the precision at Top-1 over the baseline approach on the English-to-Chinese bilingual lexicon extraction task.

机译：最近，发现具有单词嵌入的简单线性变换对于从可比语料库中提取双语词典非常有效。然而，用于训练该变换的一对双语单词嵌入被假定为自动满足线性关系，这实际上在实践中是绝对不能保证的。本文提出了一种基于规范相关分析（CCA）的简单解决方案，该解决方案要求将用于训练变换的双语词嵌入最大程度地线性关联到投影子空间上。用两种语言将原始单词嵌入到新的相关子空间中后，通过使用新的投射单词嵌入，再次学习到更好的转换矩阵。实验结果证实，所提出的解决方案与英语-中文双语词典提取任务的基线方法相比，在Top-1上的精度可显着提高62％。

著录项

来源
《International conference on neural information processing》|2015年|528-535|共8页
会议地点
作者
Chunyue Zhang; Tiejun Zhao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Exploiting unbalanced specialized comparable corpora for bilingual lexicon extraction [J] . EMMANUEL MORIN, AMIR HAZEM Natural language engineering . 2016,第pta4期

机译：利用不平衡的专业可比语料库提取双语词典
2. Iterative bilingual lexicon extraction from comparable corpora using a modified perceptron algorithm [J] . Contemporary Engineering Sciences . 2014,第24期

机译：使用改进的Perceptron算法的可迭代双语词典提取来自可比较的语料库
3. Extraction of Bilingual Dictionary from Comparable Corpora for Resource Scarce Languages [J] . Journal of computational and theoretical nanoscience . 2020,第1期

机译：从可比语料库中提取双语词典的资源稀缺语言
4. Bilingual Lexicon Extraction with Forced Correlation from Comparable Corpora [C] . Chunyue Zhang, Tiejun Zhao International Conference on Neural Information Processing . 2015

机译：双语词典提取与可比语料库的强制相关性
5. Parallel Sentence Detection in Comparable Corpora with Bilingual Word Embeddings for Low-Resource Languages [D] . Cadigan, John. 2018

机译：与低资源语言的双语单词嵌入式的同类语料中的并行句子检测
6. Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary [O] . Yan Xu, Luoxin Chen, Junsheng Wei, 2015

机译：可比语料库中英语出院摘要和中文出院摘要的双语术语对齐
7. Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora [O] . Morin Emmanuel, Prochasson Emmanuel 2011

机译：并行语料库增强可比语料库的双语词典提取

Bilingual Lexicon Extraction with Forced Correlation from Comparable Corpora

摘要

著录项

相似文献

相关主题

期刊订阅