首页> 外文会议>International conference on computational linguistics >A Probabilistic Model for Learning Multi-Prototype Word Embeddings
【24h】

A Probabilistic Model for Learning Multi-Prototype Word Embeddings

机译:学习多原型词嵌入的概率模型

获取原文

摘要

Distributed word representations have been widely used and proven to be useful in quite a few natural language processing and text mining tasks. Most of existing word embedding models aim at generating only one embedding vector for each individual word, which, however, limits their effectiveness because huge amounts of words are polysemous (such as bank and star). To address this problem, it is necessary to build multi embedding vectors to represent different meanings of a word respectively. Some recent studies attempted to train multi-prototype word embeddings through clustering context window features of the word. However, due to a large number of parameters to train, these methods yield limited scalability and are inefficient to be trained with big data. In this paper, we introduce a much more efficient method for learning multi embedding vectors for polysemous words. In particular, we first propose to model word polysemy from a probabilistic perspective and integrate it with the highly efficient continuous Skip-Gram model. Under this framework, we design an Expectation-Maximization algorithm to learn the word's multi embedding vectors. With much less parameters to train, our model can achieve comparable or even better results on word-similarity tasks compared with conventional methods.
机译:分布式单词表示已被广泛使用,并被证明在许多自然语言处理和文本挖掘任务中很有用。现有的大多数单词嵌入模型都旨在为每个单个单词生成一个嵌入向量,但是由于大量单词是多义的(例如银行和明星),因此限制了其有效性。为了解决这个问题,有必要构建多个嵌入向量来分别表示单词的不同含义。最近的一些研究试图通过聚类单词的上下文窗口特征来训练多原型单词嵌入。但是,由于要训练的参数很多,因此这些方法的可扩展性有限,并且对于使用大数据进行训练效率不高。在本文中,我们介绍了一种学习多义词的多嵌入向量的有效方法。特别是,我们首先建议从概率的角度对单词多义性进行建模,并将其与高效的连续Skip-Gram模型集成。在此框架下,我们设计了Expectation-Maximization算法来学习单词的多重嵌入向量。与传统方法相比,通过较少的参数训练,我们的模型就可以在词相似性任务上取得可比甚至更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号