Simple and automated negative sampling for knowledge graph embedding

Zhang Yongqi; Yao Quanming; Chen Lei

摘要

Negative sampling, which samples negative triplets from non-observed ones in knowledge graph (KG), is an essential step in KG embedding. Recently, generative adversarial network (GAN) has been introduced in negative sampling. By sampling negative triplets with large gradients, these methods avoid the problem of vanishing gradient and thus obtain better performance. However, they make the original model more complex and harder to train. In this paper, motivated by the observation that negative triplets with large gradients are important but rare, we propose to directly keep track of them with the cache. In this way, our method acts as a "distilled" version of previous GAN-based methods, which does not waste training time on additional parameters to fit the full distribution of negative triplets. However, how to sample from and update the cache are two critical questions. We propose to solve these issues by automated machine learning techniques. The automated version also covers GAN-based methods as special cases. Theoretical explanation of NSCaching is also provided, justifying the superior over fixed sampling scheme. Besides, we further extend NSCaching with skip-gram model for graph embedding. Finally, extensive experiments show that our method can gain significant improvements on various KG embedding models and the skip-gram model and outperforms the state-of-the-art negative sampling methods.

机译：负抽样，其在知识图中的非观察到的非观察者（kg）中的负三胞胎是kg嵌入的重要步骤。最近，生成的对抗网络（GAN）已在负面采样中引入。通过用大梯度采样负三胞胎，这些方法避免了消失梯度的问题，从而获得更好的性能。但是，它们使原始模型更复杂，更难训练。在本文中，通过观察到具有大梯度的负三胞胎很重要但罕见，我们建议直接通过缓存跟踪它们。通过这种方式，我们的方法充当了先前GaN的方法的“蒸馏”版本，它不会在附加参数上浪费训练时间以适应负三胞胎的全部分布。但是，如何从和更新缓存是两个关键问题。我们建议通过自动化机器学习技术解决这些问题。自动化版还将GaN的方法视为特殊情况。还提供了NSCACHING的理论解释，证明了优异的固定采样方案。此外，我们将使用Skip-Gram模型进一步扩展NSCACHING，用于图形嵌入。最后，广泛的实验表明，我们的方法可以对各种KG嵌入模型和跳过革兰模型的显着改进，并且优于最先进的负采样方法。

Simple and automated negative sampling for knowledge graph embedding

摘要

著录项

相关主题

期刊订阅