Compressing Word Embeddings

机译：压缩词嵌入

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using large-scale unlabelled text analysis. However, these representations typically consist of dense vectors that require a great deal of storage and cause the internal structure of the vector space to be opaque. A more 'idealized' representation of a vocabulary would be both compact and readily interpretable. With this goal, this paper first shows that Lloyd's algorithm can compress the standard dense vector representation by a factor of 10 without much loss in performance. Then, using that compressed size as a 'storage budget', we describe a new GPU-friendly factorization procedure to obtain a representation which gains interpretability as a side-effect of being sparse and non-negative in each encoding dimension. Word similarity and word-analogy tests are used to demonstrate the effectiveness of the compressed representations obtained.

机译：用于学习单词的向量空间表示的最新方法已成功使用大规模的未标记文本分析来捕获细粒度的语义和句法规则性。但是，这些表示形式通常由密集的矢量组成，这些矢量需要大量存储并导致矢量空间的内部结构不透明。词汇表的更“理想化”表示既紧凑又易于解释。为此，本文首先证明了劳埃德算法可以将标准的密集矢量表示压缩10倍，而不会造成性能损失。然后，使用该压缩后的大小作为“存储预算”，我们描述了一种新的GPU友好的因式分解程序，以获得一种表示形式，该表示形式在每个编码维中作为稀疏且非负的副作用而具有可解释性。单词相似度和单词相似度测试用于证明所获得的压缩表示形式的有效性。

著录项

来源
《International conference on neural information processing》|2016年|413-422|共10页
会议地点
作者
Martin Andrews;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Learning multi-prototype word embedding from single-prototype word embedding with integrated knowledge [J] . Yang Xuefeng, Mao Kezhi Expert Systems with Application . 2016,第Sepa期

机译：从具有集成知识的单原型词嵌入中学习多原型词嵌入
2. word2set: WordNet-Based Word Representation Rivaling Neural Word Embedding for Lexical Similarity and Sentiment Analysis [J] . Jimenez Sergio, Gonzalez Fabio A., Gelbukh Alexander, IEEE computational intelligence magazine . 2019,第2期

机译：word2set：基于词网的词表示与神经词嵌入竞争，以进行词汇相似度和情感分析
3. word2set: WordNet-Based Word Representation Rivaling Neural Word Embedding for Lexical Similarity and Sentiment Analysis [J] . Jimenez Sergio, Gonzalez Fabio A., Gelbukh Alexander, IEEE computational intelligence magazine . 2019,第2期

机译：Word2Set：基于Wordnet的字表示竞争神经词嵌入词汇相似性和情感分析
4. On the Downstream Performance of Compressed Word Embeddings [C] . Avner May, Jian Zhang, Tri Dao, Conference on Neural Information Processing Systems . 2020

机译：关于压缩词嵌入的下游性能
5. Improved GloVe Word Embedding Using Linear Weighting Scheme for Word Similarity Tasks [D] . Lu, Qinglan. 2021

机译：使用线性加权方案进行改进的手套单词嵌入单词相似性任务
6. A Word on Words in Words: How Do Embedded Words Affect Reading? [O] . Joshua Snell, Jonathan Grainger, Mathieu Declerck 2018

机译：单词中的单词：嵌入式单词如何影响阅读？
7. Compressing Word Embeddings [O] . Andrews, Martin 2016

机译：压缩Word嵌入

Compressing Word Embeddings

摘要

著录项

相似文献

相关主题

期刊订阅