首页> 外文会议>International conference on computational linguistics >Word Embeddings, Analogies, and Machine Learning: Beyond King - Man + Woman = Queen
【24h】

Word Embeddings, Analogies, and Machine Learning: Beyond King - Man + Woman = Queen

机译:词嵌入,类比和机器学习:超越国王-男人+女人=女王

获取原文

摘要

Solving word analogies became one of the most popular benchmarks for word embeddings on the assumption that linear relations between word pairs (such as king:man :: woman:queen) are indicative of the quality of the embedding. We question this assumption by showing that the information not detected by linear offset may still be recoverable by a more sophisticated search method, and thus is actually encoded in the embedding. The general problem with linear offset is its sensitivity to the idiosyncrasies of individual words. We show that simple averaging over multiple word pairs improves over the state-of-the-art. A further improvement in accuracy (up to 30% for some embeddings and relations) is achieved by combining cosine similarity with an estimation of the extent to which a candidate answer belongs to the correct word class. In addition to this practical contribution, this work highlights the problem of the interaction between word embeddings and analogy retrieval algorithms, and its implications for the evaluation of word embeddings and the use of analogies in extrinsic tasks.
机译:假设单词对之间的线性关系(例如king:man :: woman:queen)表明嵌入的质量,则解决单词类比成为单词嵌入的最受欢迎基准之一。通过显示未被线性偏移检测到的信息仍然可以通过更复杂的搜索方法恢复,并因此实际上在嵌入中进行编码,我们对此假设提出了质疑。线性偏移的一般问题是它对单个单词的特质的敏感性。我们表明,对多个单词对进行简单平均可以改善现有技术。通过将余弦相似度与候选答案属于正确单词类别的程度的估计相结合,可以进一步提高准确性(对于某些嵌入和关系,最高可达30%)。除了这一实际贡献外,这项工作还突出了词嵌入和类比检索算法之间的交互问题,以及它对词嵌入的评估以及在外部任务中使用类比的意义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号