首页> 外文会议>International conference on computational linguistics >A Probabilistic Model for Learning Multi-Prototype Word Embeddings
【24h】

A Probabilistic Model for Learning Multi-Prototype Word Embeddings

机译:用于学习多原型Word Embeddings的概率模型

获取原文

摘要

Distributed word representations have been widely used and proven to be useful in quite a few natural language processing and text mining tasks. Most of existing word embedding models aim at generating only one embedding vector for each individual word, which, however, limits their effectiveness because huge amounts of words are polysemous (such as bank and star). To address this problem, it is necessary to build multi embedding vectors to represent different meanings of a word respectively. Some recent studies attempted to train multi-prototype word embeddings through clustering context window features of the word. However, due to a large number of parameters to train, these methods yield limited scalability and are inefficient to be trained with big data. In this paper, we introduce a much more efficient method for learning multi embedding vectors for polysemous words. In particular, we first propose to model word polysemy from a probabilistic perspective and integrate it with the highly efficient continuous Skip-Gram model. Under this framework, we design an Expectation-Maximization algorithm to learn the word's multi embedding vectors. With much less parameters to train, our model can achieve comparable or even better results on word-similarity tasks compared with conventional methods.
机译:分布式字表示已被广泛使用和证明在相当多的自然语言处理和文本挖掘任务中是有用的。大多数现有的单词嵌入模型的目标是仅为每个单词生成一个嵌入的向量,但是,这限制了它们的效率,因为大量的单词是多乐园(如银行和星星)。为了解决这个问题,必须建立多嵌入向量,以分别表示单词的不同含义。最近的一些研究试图通过群集语境窗口窗口培训多原型单词嵌入。然而,由于培训的大量参数,这些方法产生有限的可扩展性,并且具有大数据培训效率低下。在本文中,我们引入了一种更有效的方法来学习多嵌入载体的多嵌入矢量。特别是,我们首先建议从概率的角度模拟单词多义目,并将其与高效连续跳过克模型集成。在此框架下,我们设计期望最大化算法,以了解单词的多嵌入向量。与培训的参数减少得多,我们的模型可以与传统方法相比,在单词相似性任务中实现可比或更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号