Unsupervised Learning of Cross-Lingual Symbol Embeddings Without Parallel Data

机译：没有并行数据的跨语言符号嵌入的无监督学习

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a new method for unsupervised learning of multilingual symbol (e.g. character) embeddings, without any parallel data or prior knowledge about correspondences between languages. It is able to exploit similarities across languages between the distributions over symbols' contexts of use within their language, even in the absence of any symbols in common to the two languages. In experiments with an artificially corrupted text corpus, we show that the method can retrieve character correspondences obscured by noise. We then present encouraging results of applying the method to real linguistic data, including for low-resourced languages. The learned representations open the possibility of fully unsupervised comparative studies of text or speech corpora in low-resourced languages with no prior knowledge regarding their symbol sets.

机译：我们提出了一种无监督学习多语言符号（例如字符）嵌入的新方法，而无需任何并行数据或有关语言之间对应关系的先验知识。即使在没有两种语言共有的符号的情况下，它也能够利用符号在其语言中使用上下文的分布之间的跨语言相似性。在使用人为破坏的文本语料库进行的实验中，我们证明了该方法可以检索被噪音遮盖的字符对应。然后，我们提出了将该方法应用于真实语言数据（包括资源匮乏的语言）的令人鼓舞的结果。习得的表示法为完全没有监督的情况下，对资源不足的语言中的文本或语音语料库进行比较研究提供了可能性，而无需事先了解其符号集。

著录项

来源
《Second annual meeting of the Society for Computation in Linguistics》|2019年|19-28|共10页
会议地点 New York(US)
作者
Mark Granroth-Wilding; Hannu Toivonen;
展开▼
作者单位

University of Helsinki;

University of Helsinki;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-26 14:31:34

相似文献

外文文献
中文文献
专利

1. Unsupervised group matching with application to cross-lingual topic matching without alignment information [J] . Iwata Tomoharu, Kanagawa Motonobu, Hirao Tsutomu, Data mining and knowledge discovery . 2017,第2期

机译：无监督的组与应用程序匹配，在没有对齐信息的情况下匹配的跨语言主题
2. Unsupervised group matching with application to cross-lingual topic matching without alignment information [J] . Iwata Tomoharu, Kanagawa Motonobu, Hirao Tsutomu, Data mining and knowledge discovery . 2017,第2期

机译：无监督的组与应用程序匹配，在没有对齐信息的情况下匹配的跨语言主题
3. Unsupervised Active Learning of CRF Model for Cross-Lingual Information Extraction [J] . Mohamed Farouk Abdel Hady, Abubakrelsedik Karali, Eslam Kamal, International journal of computational linguistics and applications . 2014,第2期

机译：跨语言信息提取的CRF模型的无监督主动学习
4. Unsupervised Learning of Cross-Lingual Symbol Embeddings Without Parallel Data [C] . Mark Granroth-Wilding, Hannu Toivonen Annual meeting of the Society for Computation in Linguistics . 2019

机译：没有平行数据的跨语言符号嵌入的无监督学习
5. Graph-based Latent Embedding, Annotation and Representation Learning in Neural Networks for Semi-supervised and Unsupervised Settings [D] . Kilinc, Ismail Ozsel. 2017

机译：半监督和非监督设置的神经网络中基于图的潜在嵌入，注释和表示学习
6. Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning [O] . Jiayi Wu, Yong-Bei Ma, Charles Congdon, -1

机译：通过统计流形学习进行大规模并行无监督单粒子低温电磁数据聚类
7. Graph embedding and unsupervised learning predict genomic sub-compartments from HiC chromatin interaction data [O] . Haitham Ashoor, Xiaowen Chen, Wojciech Rosikiewicz, 2020

机译：图形嵌入和无监督学习预测HIC染色质交互数据的基因组子隔室

Unsupervised Learning of Cross-Lingual Symbol Embeddings Without Parallel Data

摘要

著录项

相似文献

相关主题

期刊订阅