首页> 外文会议>International Conference on Semantics, Knowledge and Grids >Learning Tibetan-Chinese Cross-Lingual Word Embeddings
【24h】

Learning Tibetan-Chinese Cross-Lingual Word Embeddings

机译:学习藏汉跨语言单词嵌入

获取原文

摘要

The idea of Word Embedding is based on the semantic distribution hypothesis of the linguist Harris (1954), who believes that words of the same semantics are distributed in similar contexts. Learning of vector-space word embeddings is a technique of central importance in natural language processing. In recent years, cross-lingual word vectors have received more and more attention. Cross-lingual word vectors enable knowledge transfer between different languages, the most important It is this transfer that can take place between resource-rich and low-resource languages. This paper uses Tibetan and Chinese Wikipedia corpus to train monolingual word vectors, mainly using the fastText word vector training method, and the two monolingual word vectors are analyzed by CCA correlation, thus obtaining Tibetan-Chinese cross-lingual word vectors. In the experiment, we evaluated the resulting word representations on standard lexical semantic evaluation tasks and the results show that this method has a certain improvement on the semantic representation of the word vector.
机译:词嵌入的思想基于语言学家哈里斯(1954)的语义分布假设,他认为具有相同语义的词分布在相似的上下文中。向量空间词嵌入的学习是自然语言处理中至关重要的技术。近年来,跨语言单词向量越来越受到关注。跨语言单词向量可以实现不同语言之间的知识转移,最重要的是,这种转移可以发生在资源丰富的语言和资源匮乏的语言之间。本文利用藏汉维基百科语料库,主要通过fastText词向量训练方法训练单语词向量,并通过CCA相关性分析了两个单语词向量,从而得到藏汉跨语言词向量。在实验中,我们用标准词汇语义评估任务评估了所得的词表示,结果表明该方法对词向量的语义表示有一定的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号