首页> 外文会议>International conference on cloud computing and security >A Word Embeddings Training Method Based on Modified Skip-Gram and Align
【24h】

A Word Embeddings Training Method Based on Modified Skip-Gram and Align

机译:基于改进的Skip-Gram和align的词嵌入训练方法

获取原文

摘要

To solve the problems that there is no sufficient annotated data in low-resource languages and it is hard to mine the deep semantic correspondence between languages via existing bilingual word embedding learning methods, this paper presents an effective text processing method based on transfer learning and bilingual word embedding model CWDR-BiGRU (Cross-context window of dynamic ratio bidirectional Gated Recurrent Unit) which contains an enhanced Skip-gram called cross-context window of dynamic ratio and encoder-decoder. The method can process low-resource language text effectively only using sentence-aligned corpus of bilingual resource languages and annotated data of high-resource language. The experimental results of semantic reasoning and word embedding visualization show that CWDR-BiGRU can effectively train bilingual word embeddings. In the task of Chinese-Tibetan cross-lingual document classification, the accuracy of transfer learning method based on CWDR-BiGRU is higher than the conventional method by 13.5%, and higher than the existing Bilingual Autoencoder, BilBOWA, BiCCV and BiSkip by 7.4%, 5.8%, 3.1% and 1.6% respectively, indicating CWDR-BiGRU which has reduced the difficulty of acquiring corpora for bilingual word embeddings can accurately excavate the deep alignment relationship and semantic properties.
机译:为解决资源匮乏的语言中注释数据不足,难以通过现有的双语词嵌入学习方法挖掘语言之间深层语义对应的问题,提出了一种基于迁移学习和双语的有效文本处理方法词嵌入模型CWDR-BiGRU(动态比率双向门控循环单元的跨上下文窗口)包含增强的Skip-gram,称为动态比率跨上下文窗口和编码器/解码器。该方法仅使用双语资源语言的句子对齐语料库和高资源语言的注释数据就可以有效地处理低资源语言的文本。语义推理和词嵌入可视化的实验结果表明,CWDR-BiGRU可以有效地训练双语词嵌入。在汉藏跨语言文档分类任务中,基于CWDR-BiGRU的迁移学习方法的准确性比传统方法高13.5%,并且比现有的双语自动编码器,BilBOWA,BiCCV和BiSkip高7.4%分别为5.8%,3.1%和1.6%,这表明CWDR-BiGRU降低了获取双语词法嵌入语料库的难度,可以准确挖掘出深度对齐关系和语义特性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号