【24h】

Tweaks and Tricks for Word Embedding Disruptions

机译:单词嵌入中断的调整和技巧

获取原文
获取外文期刊封面目录资料

摘要

Word embeddings are established as very effective models used in several NLP applications. If they differ in their architecture and training process, they often exhibit similar properties and remain vector space models with continuously-valued dimensions describing the observed data. The complexity resides in the developed strategies for learning the values within each dimensional space. In this paper, we introduce the concept of disruption which we define as a side effect of the training process of embedding models. Disruptions are viewed as a set of embedding values that are more likely to be noise than effective descriptive features. We show that dealing with disruption phenomenon is of a great benefit to bottom-up sentence embedding representation. By contrasting several in-domain and pre-trained embedding models, we propose two simple but very effective tweaking techniques that yield strong empirical improvements on textual similarity task.
机译:单词嵌入被确立为在多个NLP应用程序中使用的非常有效的模型。如果它们的体系结构和训练过程不同,它们通常会表现出相似的属性,并保持向量空间模型,这些模型具有描述观察数据的连续值维。复杂性在于用于学习每个维度空间中的值的已开发策略。在本文中,我们介绍了中断的概念,我们将其定义为嵌入模型训练过程的副作用。干扰被视为一组嵌入值,比有效的描述功能更可能是噪声。我们表明,处理干扰现象对于自下而上的句子嵌入表示具有很大的好处。通过对比几种域内和预训练的嵌入模型,我们提出了两种简单但非常有效的调整技术,它们对文本相似性任务产生了强大的经验改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号