首页> 外文会议>2017 IEEE International Conference on Big Knowledge >Revisit Word Embeddings with Semantic Lexicons for Modeling Lexical Contrast
【24h】

Revisit Word Embeddings with Semantic Lexicons for Modeling Lexical Contrast

机译:重访带有语义词汇的词嵌入,以建立词汇对比模型

获取原文
获取原文并翻译 | 示例

摘要

It is widely accepted that traditional word embedding models, which rely on distributional semantics hypothesis, are relatively limited for contrast meaning problem. Distributional semantics hypothesis indicates that words lying in similar contexts have similar representations in vector space. Nevertheless, synonyms and antonyms often locate in similar contexts, which means they appear close to each other in vector space. Hence, it is of great difficulty to distinguish antonyms from synonyms. To address this challenge, we propose an optimization model, named Lexicon-based Word Embedding Tuning (LWET) model. The goal of LWET is to incorporate reliable semantic lexicons to tune the distributions of pre-trained word embeddings in the vector space so as to improve their ability of distinguishing antonyms from synonyms. To speed up the training process of LWET, we propose two approximation algorithms, including positive sampling and quasi-hierarchical softmax. Compared with quasi-hierarchical softmax, positive sampling is faster, however, at the cost of worse performance. In experiments, LWET and other state-of-the-art models are tested on antonyms recognition, distinguishing antonyms from synonyms and word similarity. The results of the first two experiments show that LWET significantly improves the ability of word embeddings to detect antonyms, thus achieving the state-of-the-art performance. On word similarity, LWET gets slightly better performance than the state-of-the-art models. It means that LWET can remain and strengthen the semantic structure rather than destroy it when tuning word distributions in vector space. In general, compared with related work, LWET can not only achieve similar or even better performance, but also speed up the training process.
机译:众所周知,依赖于分布语义假说的传统词嵌入模型在对比意义问题上相对有限。分布语义学假设表明,位于相似上下文中的单词在向量空间中具有相似的表示形式。但是,同义词和反义词通常位于相似的上下文中,这意味着它们在向量空间中看起来彼此接近。因此,将反义词与同义词区别开来是非常困难的。为了解决这一挑战,我们提出了一个优化模型,称为基于词汇的词嵌入调整(LWET)模型。 LWET的目标是结合可靠的语义词典来调整向量空间中预训练单词嵌入的分布,从而提高其区分反义词和同义词的能力。为了加快LWET的训练过程,我们提出了两种近似算法,包括正采样和准分层softmax。与准分层softmax相比,正采样速度更快,但代价是性能较差。在实验中,对LWET和其他最新模型进行了反义词识别测试,将反义词与同义词和单词相似性区分开来。前两个实验的结果表明,LWET显着提高了词嵌入检测反义词的能力,从而实现了最新的性能。在词相似性方面,LWET的性能比最新模型好一些。这意味着在调整向量空间中的单词分布时,LWET可以保留并加强语义结构,而不是破坏语义结构。通常,与相关工作相比,LWET不仅可以达到类似甚至更好的性能,而且可以加快培训过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号