首页> 外文会议>IEEE International Conference on Data Engineering >NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph Embedding
【24h】

NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph Embedding

机译:NSCACHING:知识图形嵌入的简单有效的负面抽样

获取原文

摘要

Knowledge graph (KG) embedding is a fundamental problem in data mining research with many real-world applications. It aims to encode the entities and relations in the graph into low dimensional vector space, which can be used for subsequent algorithms. Negative sampling, which samples negative triplets from non-observed ones in the training data, is an important step in KG embedding. Recently, generative adversarial network (GAN), has been introduced in negative sampling. By sampling negative triplets with large scores, these methods avoid the problem of vanishing gradient and thus obtain better performance. However, using GAN makes the original model more complex and harder to train, where reinforcement learning must be used. In this paper, motivated by the observation that negative triplets with large scores are important but rare, we propose to directly keep track of them with cache. However, how to sample from and update the cache are two important questions. We carefully design the solutions, which are not only efficient but also achieve good balance between exploration and exploitation. In this way, our method acts as a "distilled" version of previous GAN-based methods, which does not waste training time on additional parameters to fit the full distribution of negative triplets. The extensive experiments show that our method can gain significant improvement on various KG embedding models, and outperform the state-of-the-arts negative sampling methods based on GAN.
机译:知识图(千克)嵌入是数据挖掘研究中的基本问题,具有许多现实世界应用。它旨在将图中的实体和关系编码为低维矢量空间,其可用于后续算法。消极采样,其在训练数据中从未观察到的非观察到的阴性三胞胎,是KG嵌入的重要步骤。最近,生成的对抗性网络(GaN)已被引入负面采样。通过具有大得分的负三胞胎,这些方法避免了梯度消失的问题,从而获得更好的性能。然而,使用GaN使原始模型更复杂,更难训练,必须使用加固学习。在本文中,通过观察到具有大分数的负三胞胎很重要但罕见,我们建议直接通过缓存跟踪它们。但是,如何从和更新缓存是两个重要问题。我们仔细设计了解决方案,这不仅有效,而且在勘探和剥削之间实现了良好的平衡。通过这种方式,我们的方法充当了先前GaN的方法的“蒸馏”版本,它不会在附加参数上浪费训练时间以适应负三胞胎的全部分布。广泛的实验表明,我们的方法可以对各种KG嵌入模型进行显着改进,并且优于基于GaN的最先进的负采样方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号