首页> 外文会议>Second annual meeting of the Society for Computation in Linguistics >Unsupervised Learning of Cross-Lingual Symbol Embeddings Without Parallel Data
【24h】

Unsupervised Learning of Cross-Lingual Symbol Embeddings Without Parallel Data

机译:没有并行数据的跨语言符号嵌入的无监督学习

获取原文
获取原文并翻译 | 示例

摘要

We present a new method for unsupervised learning of multilingual symbol (e.g. character) embeddings, without any parallel data or prior knowledge about correspondences between languages. It is able to exploit similarities across languages between the distributions over symbols' contexts of use within their language, even in the absence of any symbols in common to the two languages. In experiments with an artificially corrupted text corpus, we show that the method can retrieve character correspondences obscured by noise. We then present encouraging results of applying the method to real linguistic data, including for low-resourced languages. The learned representations open the possibility of fully unsupervised comparative studies of text or speech corpora in low-resourced languages with no prior knowledge regarding their symbol sets.
机译:我们提出了一种无监督学习多语言符号(例如字符)嵌入的新方法,而无需任何并行数据或有关语言之间对应关系的先验知识。即使在没有两种语言共有的符号的情况下,它也能够利用符号在其语言中使用上下文的分布之间的跨语言相似性。在使用人为破坏的文本语料库进行的实验中,我们证明了该方法可以检索被噪音遮盖的字符对应。然后,我们提出了将该方法应用于真实语言数据(包括资源匮乏的语言)的令人鼓舞的结果。习得的表示法为完全没有监督的情况下,对资源不足的语言中的文本或语音语料库进行比较研究提供了可能性,而无需事先了解其符号集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号