首页> 外文会议>Discovery science >k-NN Embedding Stability for word2vec Hyper-Parametrisation in Scientific Text
【24h】

k-NN Embedding Stability for word2vec Hyper-Parametrisation in Scientific Text

机译:科学文本中word2vec超参数化的k-NN嵌入稳定性

获取原文
获取原文并翻译 | 示例

摘要

Word embeddings are increasingly attracting the attention of researchers dealing with semantic similarity and analogy tasks. However, finding the optimal hyper-parameters remains an important challenge due to the resulting impact on the revealed analogies mainly for domain-specific corpora. While analogies are highly used for hypotheses synthesis, it is crucial to optimise word embedding hyper-parameters for precise hypothesis synthesis. Therefore, we propose, in this paper, a methodological approach for tuning word embedding hyper-parameters by using the stability of κ-nearest neighbors of word vectors within scientific corpora and more specifically Computer Science corpora with Machine learning adopted as a case study. This approach is tested on a dataset created from NIPS (Conference on Neural Information Processing Systems) publications, and evaluated with a curated ACM hierarchy and Wikipedia Machine Learning outline as the gold standard. Our quantitative and qualitative analysis indicate that our approach not only reliably captures interesting patterns like "unsupervised-learning is to kmeans as supervised learning is to knn", but also captures the analogical hierarchy structure of Machine Learning and consistently outperforms the 61% sate-of-the-art embeddings on syntactic accuracy with 68%.
机译:词嵌入越来越引起研究语义相似性和类比任务的研究人员的注意。然而,由于对主要针对领域特定语料的揭示类比产生的影响,找到最佳超参数仍然是一个重要的挑战。尽管类比在假设合成中得到了广泛使用,但优化词嵌入超参数对于精确的假设合成至关重要。因此,在本文中,我们提出了一种方法,该方法通过利用科学语料库(更具体地讲是采用机器学习的计算机科学语料库)中词向量的κ最近邻的稳定性来调整词嵌入超参数。此方法在从NIPS(神经信息处理系统会议)出版物创建的数据集中进行了测试,并以精选的ACM层次结构和Wikipedia机器学习大纲作为黄金标准进行了评估。我们的定量和定性分析表明,我们的方法不仅可靠地捕获了有趣的模式,如“无监督学习是对keans的学习,而监督学习是对knn的学习”,而且还捕获了机器学习的类比层次结构,并且始终优于61%的状态最先进的嵌入技术,语法准确性高达68%。

著录项

  • 来源
    《Discovery science》|2018年|328-343|共16页
  • 会议地点 Limassol(CY)
  • 作者单位

    School of Computing and Digital Technology, Birmingham City University, Millennium Point, Birmingham B4 7XG, United Kingdom;

    School of Computing and Digital Technology, Birmingham City University, Millennium Point, Birmingham B4 7XG, United Kingdom;

    School of Computing and Digital Technology, Birmingham City University, Millennium Point, Birmingham B4 7XG, United Kingdom;

    School of Computing and Digital Technology, Birmingham City University, Millennium Point, Birmingham B4 7XG, United Kingdom;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Word embedding; Word2vec; Skip-gram; Hyper-parameters; k-NN stability; ACM hierarchy; Wikipedia outline; NIPS;

    机译:词嵌入; Word2vec;跳过图超参数; k-NN稳定性; ACM层次结构;维基百科大纲; NIPS;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号