An Adaptive Wordpiece Language Model for Learning Chinese Word Embeddings

机译：学习中文单词嵌入的自适应纹理语言模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Word representations are crucial for many natural language processing tasks. Most of the existing approaches learn contextual information by assigning a distinct vector to each word and pay less attention to morphology. It is a problem for them to deal with large vocabularies and rare words. In this paper we propose an Adaptive Wordpiece Language Model for learning Chinese word embeddings (AWLM), as inspired by previous observation that subword units are important for improving the learning of Chinese word representation. Specifically, a novel approach called BPE+ is established to adaptively generates variable length of grams which breaks the limitation of stroke n-grams. The semantical information extraction is completed by three elaborated parts i.e., extraction of morphological information, reinforcement of fine-grained information and extraction of semantical information. Empirical results on word similarity, word analogy, text classification and question answering verify that our method significantly outperforms several state-of-the-art methods.

机译：Word表示对于许多自然语言处理任务至关重要。大多数现有方法通过将不同的向量分配给每个单词来学习上下文信息，并减少注意形态。他们处理大词汇和稀有词语是一个问题。在本文中，我们提出了一种学习中文单词嵌入式（AWLM）的自适应字体语言模型，因为引发了以前观察的灵感，即次字单元对于改善汉字表示的学习很重要。具体地，建立一种名为BPE +的新方法以自适应地产生可变长度的克，这断断了行程N-克的限制。语义信息提取由三个精细的部分完成，提取形态信息，加强细粒度信息和语义信息的提取。单词相似性，单词类比，文本分类和问题的经验结果验证了我们的方法显着优于几种最先进的方法。

著录项

来源
《IEEE International Conference on Automation Science and Engineering》|2019年|607p|共6页
会议地点
作者
BinChen Xu; Lu Ma; Liang Zhang; HaoHai Li; Qi Kang; MengChu Zhou;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术及设备;
关键词
Adaptation models; Task analysis; Data mining; Semantics; Road transportation; Electromagnetic interference; Computational modeling;

机译：适应模型;任务分析;数据挖掘;语义;道路运输;电磁干扰;计算建模;

相似文献

外文文献
中文文献
专利

1. Word Embedding Models for Finding Semantic Relationship between Words in Tamil Language [J] . S. G. Ajay, M. Srikanth, M. Anand Kumar, Indian Journal of Science and Technology . 2016,第45期

机译：查找泰米尔语单词之间语义关系的单词嵌入模型
2. Adaptive cross-contextual word embedding for word polysemy with unsupervised topic modeling [J] . Li Shuangyin, Pan Rong, Luo Haoyu, Knowledge-Based Systems . 2021,第Apra22期

机译：与无监督主题建模的自适应交叉上下文词嵌入Word Polysemy
3. Learning a functional grammar of protein domains using natural language word embedding techniques [J] . Buchan Daniel W. A., Jones David T. Proteins: Structure, Function, and Genetics . 2020,第4期

机译：使用自然语言词嵌入技术学习蛋白质域的功能语法
4. An Adaptive Wordpiece Language Model for Learning Chinese Word Embeddings [C] . BinChen Xu, Lu Ma, Liang Zhang, IEEE International Conference on Automation Science and Engineering . 2019

机译：一种学习中文词嵌入的自适应词表语言模型
5. Word segmentation, word recognition, and word learning: A computational model of first language acquisition. [D] . Daland, Robert. 2009

机译：分词，单词识别和单词学习：母语习得的计算模型。
6. Learning to Pronounce First Words in Three Languages: An Investigation of Caregiver and Infant Behavior Using a Computational Model of an Infant [O] . Ian S. Howard, Piers Messum 2010

机译：学习用三种语言发音的第一个单词：使用婴儿的计算模型对照顾者和婴儿行为的调查
7. Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations [O] . Aditi Chaudhary, Chunting Zhou, Lori Levin, 2018

机译：使用形态和语音子字表示调整单词嵌入到新语言

An Adaptive Wordpiece Language Model for Learning Chinese Word Embeddings

摘要

著录项

相似文献

相关主题

期刊订阅