On Community Outliers and their Efficient Detection in Information Networks*

机译：关于社区异常值及其在信息网络中的有效检测*

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Linked or networked data are ubiquitous in many applications. Examples include web data or hypertext documents connected via hyperlinks, social networks or user profiles connected via friend links, co-authorship and citation information, blog data, movie reviews and so on. In these datasets (called "information networks"), closely related objects that share the same properties or interests form a community. For example, a community in blogsphere could be users mostly interested in cell phone reviews and news. Outlier detection in information networks can reveal important anomalous and interesting behaviors that are not obvious if community information is ignored. An example could be a low-income person being friends with many rich people even though his income is not anomalously low when considered over the entire population. This paper first introduces the concept of community outliers (interesting points or rising stars for a more positive sense), and then shows that well-known baseline approaches without considering links or community information cannot find these community outliers. We propose an efficient solution by modeling networked data as a mixture model composed of multiple normal communities and a set of randomly generated outliers. The probabilistic model characterizes both data and links simultaneously by defining their joint distribution based on hidden Markov random fields (HMRF). Maximizing the data likelihood and the posterior of the model gives the solution to the outlier inference problem. We apply the model on bothsynthetic data and DBLP data sets, and the results demonstrate importance of this concept, as well as the effectiveness and efficiency of the proposed approach.

机译：链接或联网数据在许多应用程序中无处不在。示例包括通过超链接连接的Web数据或超文本文档，通过朋友链接连接的社交网络或用户个人资料，共同作者和引用信息，博客数据，电影评论等。在这些数据集（称为“信息网络”）中，共享相同属性或兴趣的密切相关的对象形成一个社区。例如，博客圈中的社区可以是对手机评论和新闻最感兴趣的用户。如果忽略社区信息，信息网络中的异常值检测可以揭示重要的异常现象和有趣的行为。一个例子可能是低收入者与许多富人成为朋友，即使从整个人口来看，他的收入并非异常低。本文首先介绍了社区离群值的概念（兴趣点或后起之秀，以获得更积极的意义），然后说明了不考虑链接或社区信息的众所周知的基准方法无法找到这些社区离群值。通过将网络数据建模为由多个正常社区和一组随机生成的异常值组成的混合模型，我们提出了一种有效的解决方案。概率模型通过基于隐马尔可夫随机字段（HMRF）定义数据和链接的联合分布来同时表征数据和链接。最大化数据似然性和模型的后验可为异常值推断问题提供解决方案。我们在两个模型上都应用该模型综合数据和DBLP数据集，结果证明了此概念的重要性以及所提出方法的有效性和效率。

著录项

来源
《ACM SIGKDD international conference on knowledge discovery and data mining;KDD 10》|2011年|p.813-822|共10页
会议地点
作者
Jing Gaot; Feng Liang; Wei Fan; Chi Wang; Yizhou Sun; Jiawei Han;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13;
关键词
outlier detection; community discovery; information networks;

机译：离群值检测;社区发现;信息网络;

相似文献

外文文献
中文文献
专利

1. Community outlier detection in social networks based on graph matching [J] . Soufiana Mekouar, Nabila Zrira, El Houssine Bouyakhf International journal of autonomous and adaptive communications systems . 2018,第3期

机译：基于图匹配的社交网络社区离群值检测
2. Integrating community matching and outlier detection for mining evolutionary community outliers. [J] . Christoph F. Strnadl Computing reviews . 2013,第8期

机译：集成社区匹配和离群值检测以挖掘进化社区离群值。
3. Integrating Community Matching and Outlier Detection for Mining Evolutionary Community Outliers [J] . Manish Gupta, Jing Gao, Yizhou Sun, SIGKDD explorations . 2012,第CDaROM期

机译：集成社区匹配和离群值检测以挖掘进化社区离群值
4. On Community Outliers and their Efficient Detection in Information Networks* [C] . Jing Gaot, Feng Liang, Wei Fan, ACM SIGKDD international conference on knowledge discovery and data mining . 2010

机译：关于社区异常值及其在信息网络中的有效检测*
5. Toward accurate and efficient outlier detection in high dimensional and large data sets. [D] . Nguyen, Minh Quoc. 2010

机译：致力于在高维和大数据集中进行精确有效的离群值检测。
6. An efficient semi-supervised community detection framework in social networks [O] . Zhen Li, Yong Gong, Zhisong Pan, -1

机译：社交网络中高效的半监督社区检测框架
7. On Community Outliers and their Efficient Detection in Information Networks∗ [O] . 2016

机译：社区异常值及其在信息网络中的有效检测*

On Community Outliers and their Efficient Detection in Information Networks*

摘要

著录项

相似文献

相关主题

期刊订阅