您现在的位置:首页> 外文会议>Annual meeting of the Society for Computation in Linguistics >文献详情

Unsupervised Learning of Cross-Lingual Symbol Embeddings Without Parallel Data

机器翻译没有并行数据的跨语言符号嵌入的无监督学习

原文传递 原文传递并翻译 加入购物车 收藏
3 【6hr】

【摘要】We present a new method for unsupervised learning of multilingual symbol (e.g. character) embeddings, without any parallel data or prior knowledge about correspondences between languages. It is able to exploit similarities across languages between the distributions over symbols' contexts of use within their language, even in the absence of any symbols in common to the two languages. In experiments with an artificially corrupted text corpus, we show that the method can retrieve character correspondences obscured by noise. We then present encouraging results of applying the method to real linguistic data, including for low-resourced languages. The learned representations open the possibility of fully unsupervised comparative studies of text or speech corpora in low-resourced languages with no prior knowledge regarding their symbol sets.

【作者】Mark Granroth-Wilding; Hannu Toivonen;

【作者单位】University of Helsinki; University of Helsinki;

【年(卷),期】2019,,

【页码】19-28

【总页数】10

【正文语种】eng

【中图分类】;

【关键词】;