首页> 外文会议>Conference on empirical methods in natural language processing >Word Re-Embedding via Manifold Dimensionality Retention
【24h】

Word Re-Embedding via Manifold Dimensionality Retention

机译:通过流形维数保留重新嵌入单词

获取原文

摘要

Word embeddings seek to recover a Euclidean metric space by mapping words into vectors, starting from words cooccurrences in a corpus. Word embeddings may underestimate the similarity between nearby words, and overestimate it between distant words in the Euclidean metric space. In this paper, we re-embed pre-trained word embeddings with a stage of manifold learning which retains dimensionality. We show that this approach is theoretically founded in the metric recovery paradigm, and empirically show that it can improve on state-of-the-art embeddings in word similarity tasks 0.5 - 5 0% points depending on the onginal space.
机译:从语料库中的单词共现开始,单词嵌入试图通过将单词映射到向量中来恢复欧几里得度量空间。词嵌入可能会低估附近词之间的相似度,而会高估欧几里得度量空间中远距离词之间的相似度。在本文中,我们通过保留维数的流形学习阶段重新嵌入了预训练的单词嵌入。我们证明了这种方法理论上是建立在度量恢复范式中的,并凭经验表明,根据词法空间的不同,它可以改进词相似性任务中最新的嵌入0.5-5 0%点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号