首页> 外文会议>ACM SIGKDD international conference on knowledge discovery and data mining;KDD 10 >On Community Outliers and their Efficient Detection in Information Networks*
【24h】

On Community Outliers and their Efficient Detection in Information Networks*

机译:关于社区异常值及其在信息网络中的有效检测*

获取原文

摘要

Linked or networked data are ubiquitous in many applications. Examples include web data or hypertext documents connected via hyperlinks, social networks or user profiles connected via friend links, co-authorship and citation information, blog data, movie reviews and so on. In these datasets (called "information networks"), closely related objects that share the same properties or interests form a community. For example, a community in blogsphere could be users mostly interested in cell phone reviews and news. Outlier detection in information networks can reveal important anomalous and interesting behaviors that are not obvious if community information is ignored. An example could be a low-income person being friends with many rich people even though his income is not anomalously low when considered over the entire population. This paper first introduces the concept of community outliers (interesting points or rising stars for a more positive sense), and then shows that well-known baseline approaches without considering links or community information cannot find these community outliers. We propose an efficient solution by modeling networked data as a mixture model composed of multiple normal communities and a set of randomly generated outliers. The probabilistic model characterizes both data and links simultaneously by defining their joint distribution based on hidden Markov random fields (HMRF). Maximizing the data likelihood and the posterior of the model gives the solution to the outlier inference problem. We apply the model on bothsynthetic data and DBLP data sets, and the results demonstrate importance of this concept, as well as the effectiveness and efficiency of the proposed approach.
机译:链接或联网数据在许多应用程序中无处不在。示例包括通过超链接连接的Web数据或超文本文档,通过朋友链接连接的社交网络或用户个人资料,共同作者和引用信息,博客数据,电影评论等。在这些数据集(称为“信息网络”)中,共享相同属性或兴趣的密切相关的对象形成一个社区。例如,博客圈中的社区可以是对手机评论和新闻最感兴趣的用户。如果忽略社区信息,信息网络中的异常值检测可以揭示重要的异常现象和有趣的行为。一个例子可能是低收入者与许多富人成为朋友,即使从整个人口来看,他的收入并非异常低。本文首先介绍了社区离群值的概念(兴趣点或后起之秀,以获得更积极的意义),然后说明了不考虑链接或社区信息的众所周知的基准方法无法找到这些社区离群值。通过将网络数据建模为由多个正常社区和一组随机生成的异常值组成的混合模型,我们提出了一种有效的解决方案。概率模型通过基于隐马尔可夫随机字段(HMRF)定义数据和链接的联合分布来同时表征数据和链接。最大化数据似然性和模型的后验可为异常值推断问题提供解决方案。我们在两个模型上都应用该模型 综合数据和DBLP数据集,结果证明了此概念的重要性以及所提出方法的有效性和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号