首页> 外文会议>IEEE International Conference on Automation Science and Engineering >An Adaptive Wordpiece Language Model for Learning Chinese Word Embeddings
【24h】

An Adaptive Wordpiece Language Model for Learning Chinese Word Embeddings

机译:学习中文单词嵌入的自适应纹理语言模型

获取原文

摘要

Word representations are crucial for many natural language processing tasks. Most of the existing approaches learn contextual information by assigning a distinct vector to each word and pay less attention to morphology. It is a problem for them to deal with large vocabularies and rare words. In this paper we propose an Adaptive Wordpiece Language Model for learning Chinese word embeddings (AWLM), as inspired by previous observation that subword units are important for improving the learning of Chinese word representation. Specifically, a novel approach called BPE+ is established to adaptively generates variable length of grams which breaks the limitation of stroke n-grams. The semantical information extraction is completed by three elaborated parts i.e., extraction of morphological information, reinforcement of fine-grained information and extraction of semantical information. Empirical results on word similarity, word analogy, text classification and question answering verify that our method significantly outperforms several state-of-the-art methods.
机译:Word表示对于许多自然语言处理任务至关重要。大多数现有方法通过将不同的向量分配给每个单词来学习上下文信息,并减少注意形态。他们处理大词汇和稀有词语是一个问题。在本文中,我们提出了一种学习中文单词嵌入式(AWLM)的自适应字体语言模型,因为引发了以前观察的灵感,即次字单元对于改善汉字表示的学习很重要。具体地,建立一种名为BPE +的新方法以自适应地产生可变长度的克,这断断了行程N-克的限制。语义信息提取由三个精细的部分完成,提取形态信息,加强细粒度信息和语义信息的提取。单词相似性,单词类比,文本分类和问题的经验结果验证了我们的方法显着优于几种最先进的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号