首页> 外文会议>IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining >Learning the information diffusion probabilities by using variance regularized EM algorithm
【24h】

Learning the information diffusion probabilities by using variance regularized EM algorithm

机译:使用方差正则化EM算法学习信息扩散概率

获取原文

摘要

In this paper we address the problem of learning the information diffusion probabilities when there is no sufficient data of information diffusion. By observing the information diffusion behavior on the popular social network web-site Twitter, we find that the evidence of information diffusion is extremely sparse. Less than one percent of tweets are retweeted, which is considered as the most important form of information diffusion evidence on Twitter. Previous research on predicting information diffusion probabilities has failed under such scenarios because the problem of over fitting. To overcome this problem, we first propose to use the variance of the diffusion probabilities as a measure of model complexity for the independent cascade model. After that, we propose two regularization schemes to reduce model complexity. The first scheme is based on regularizing the variance of the diffusion probabilities directly. The second scheme is based on regularizing the mean absolute deviation of the logarithm of the diffusion probabilities. We are able to derive an approximation solution for the first scheme and analytical solution to the second scheme. We conduct experiments by simulating information diffusion on six social network datasets. Experimental results show that the variance regularization scheme outperforms the baseline by a noticeable margin. The mean absolute deviation regularization scheme is better than the baseline.
机译:在本文中,我们解决了在没有足够的信息扩散数据的情况下学习信息扩散概率的问题。通过观察流行的社交网络网站Twitter上的信息传播行为,我们发现信息传播的证据极为稀疏。不到1%的推文被转发,这被认为是Twitter上信息传播证据的最重要形式。在这种情况下,由于过度拟合的问题,先前关于预测信息扩散概率的研究失败了。为了克服这个问题,我们首先提出使用扩散概率的方差作为独立级联模型的模型复杂性的量度。之后,我们提出了两种正则化方案来降低模型复杂度。第一种方案直接基于正则化扩散概率的方差。第二种方案基于正则化扩散概率对数的平均绝对偏差。我们能够导出第一种方案的逼近解和第二种方案的解析解。我们通过模拟六个社交网络数据集上的信息传播进行实验。实验结果表明,方差正则化方案的性能明显优于基线。平均绝对偏差正则化方案优于基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号