On the Impact of the Length of Subword Vectors on Word Embeddings

机译：关于次字矢量长度对单词嵌入的影响

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper hypothesizes that better word embeddings can be learned by representing words and subwords by different lengths of vectors. To investigate the impact of the length of subword vectors on word embeddings, this paper proposes a model based on the Subword Information Skip-gram model. 'The experiments on two datasets with respect to two tasks show that the proposed model outperforms 6 baselines, which confirms the aforementioned hypothesis. In addition, we also observe that, within a specific range, a higher dimensionality of subword vectors always improve the quality of word embeddings.

机译：本文通过不同长度的向量代表单词和子字可以通过表示不同的向量来学习更好的单词嵌入。为了调查次字矢量长度对单词嵌入的影响，本文提出了一种基于子字信息Skip-Gram模型的模型。 “关于两个任务的两个数据集上的实验表明，所提出的模型优于6个基线，这证实了上述假设。此外，我们还观察到，在特定范围内，次字矢量的更高维度始终提高Word Embeddings的质量。

著录项

来源
《International Workshop on Big Data Management and Service》|2019年|xxi 607 p.|共5页
会议地点
作者
Xiangrui Cai; Yonghong Luo; Ying Zhang; Xiaojie Yuan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13;
关键词
Word embeddings; Subword information; Length of subword vectors;

机译：单词嵌入;子字信息;子字向量的长度;

相似文献

外文文献
中文文献
专利

1. Explicit song lyrics detection with subword-enriched word embeddings [J] . Rospocher Marco Expert systems with applications . 2021,第Jana期

机译：用语中富有单词嵌入的显式歌曲歌词检测
2. The ν-Tamari Lattice via ν-Trees, ν-Bracket Vectors, and Subword Complexes [J] . Cesar Ceballos, Arnau Padrol, Camilo Sarmiento Electronic Journal Of Combinatorics . 2020,第1期

机译：ν-tamari格子通过ν树，ν - 括号矢量和次字复合物
3. On the Impact of the Length of Subword Vectors on Word Embeddings [C] . Xiangrui Cai, Yonghong Luo, Ying Zhang, International Workshop on Big Data Management and Service;International Conference on Database Systems for Advanced Applications;International Workshop on Big Data Quality Management;International Workshop on Graph Data Management and Analysis . 2019

机译：子词向量长度对词嵌入的影响
4. Multilingual model using cross-lingual word embeddings based on subword alignment and cross-task projection利用統計を見る [D] . Sakuma Jin 2019

机译：使用基于子词对齐和跨任务投影的跨语言词嵌入的多语言模型
5. BioWordVec improving biomedical word embeddings with subword information and MeSH [O] . Yijia Zhang, Qingyu Chen, Zhihao Yang, 2019

机译：BioWordVec通过子词信息和MeSH改善生物医学词嵌入
6. Enriching Word Vectors with Subword Information [O] . Piotr Bojanowski, Edouard Grave, Armand Joulin, 2017

机译：丰富与子字信息的字向量

On the Impact of the Length of Subword Vectors on Word Embeddings

摘要

著录项

相似文献

相关主题

期刊订阅