首页> 外文会议>International Conference on Cloud Computing and Security >A Word Embeddings Training Method Based on Modified Skip-Gram and Align
【24h】

A Word Embeddings Training Method Based on Modified Skip-Gram and Align

机译:基于修改的Skip-gram的单词嵌入训练方法

获取原文

摘要

To solve the problems that there is no sufficient annotated data in low-resource languages and it is hard to mine the deep semantic correspondence between languages via existing bilingual word embedding learning methods, this paper presents an effective text processing method based on transfer learning and bilingual word embedding model CWDR-BiGRU (Cross-context window of dynamic ratio bidirectional Gated Recurrent Unit) which contains an enhanced Skip-gram called cross-context window of dynamic ratio and encoder-decoder. The method can process low-resource language text effectively only using sentence-aligned corpus of bilingual resource languages and annotated data of high-resource language. The experimental results of semantic reasoning and word embedding visualization show that CWDR-BiGRU can effectively train bilingual word embeddings. In the task of Chinese-Tibetan cross-lingual document classification, the accuracy of transfer learning method based on CWDR-BiGRU is higher than the conventional method by 13.5%, and higher than the existing Bilingual Autoencoder, BilBOWA, BiCCV and BiSkip by 7.4%, 5.8%, 3.1% and 1.6% respectively, indicating CWDR-BiGRU which has reduced the difficulty of acquiring corpora for bilingual word embeddings can accurately excavate the deep alignment relationship and semantic properties.
机译:为了解决低资源语言中没有足够的注释数据的问题,并且很难通过现有的双语词嵌入学习方法挖掘语言之间的深度语义对应,本文提出了一种基于转移学习和双语的有效文本处理方法Word嵌入模型CWDR-Bigru(动态比率的交叉上下文窗口,双向网格通用单元),其包含一个称为动态比和编码器解码器的跨上下文窗口的增强刀片窗口。该方法只需使用句子对齐的双语资源语言语料库和高资源语言的注释数据来处理低资源语言文本。语义推理和单词嵌入可视化的实验结果表明,CWDR-Bigru可以有效地培训双语词嵌入。在中藏交叉文档分类的任务中,基于CWDR-BIGRU的转移学习方法的准确性高于传统方法,比现有的双语自身额,BILBOBA,BICCV和BISKIP高7.4%分别为5.8%,3.1%和1.6%,指示CWDR-Bigru减少了获得双语单词嵌入的难度的难度可以准确地挖掘深度对准关系和语义特性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号