首页> 外文会议>European conference on machine learning and knowledge discovery in databases >As Strong as the Weakest Link: Mining Diverse Cliques in Weighted Graphs
【24h】

As Strong as the Weakest Link: Mining Diverse Cliques in Weighted Graphs

机译:与最弱的链接一样强大:在加权图中挖掘各种集团

获取原文

摘要

Mining for cliques in networks provides an essential tool for the discovery of strong associations among entities. Applications vary, from extracting core subgroups in team performance data arising in sports, entertainment, research and business; to the discovery of functional complexes in high-throughput gene interaction data. A challenge in all of these scenarios is the large size of real-world networks and the computational complexity associated with clique enumeration. Furthermore, when mining for multiple cliques within the same network, the results need to be diversified in order to extract meaningful information that is both comprehensive and representative of the whole dataset. We formalize the problem of weighted diverse clique mining (mDκC) in large networks, incorporating both individual clique strength (measured by its weakest link) and diversity of the cliques in the result set. We show that the problem is NP-hard due to the diversity requirement. However, our formulation is sub-modular, and hence can be approximated within a constant factor from the optimal. We propose algorithms for mDκC that exploit the edge weight distribution in the input network and produce performance gains of more than 3 orders of magnitude compared to an exhaustive solution. One of our algorithms, Diverse Cliques (DiCliQ,), guarantees a constant factor approximation while the other, Bottom Up Diverse Cliques (BUDiC,), scales to large and dense networks without compromising the solution quality. We evaluate both algorithms on 5 real-world networks of different genres and demonstrate their utility for discovery of gene complexes and effective collaboration subgroups in sports and entertainment.
机译:网络中的群体挖掘为发现实体之间的强大关联提供了必不可少的工具。应用程序各不相同,从提取体育,娱乐,研究和商业中产生的团队绩效数据中的核心子组;高通量基因相互作用数据中功能复合物的发现。在所有这些情况下,面临的挑战是现实网络的规模庞大以及与派生枚举相关的计算复杂性。此外,在同一网络中挖掘多个集团时,结果需要多样化,以提取有意义的信息,这些信息既全面又代表整个数据集。我们将大型网络中的加权多样性集团挖掘(mDκC)问题形式化,将单个集团实力(通过其最薄弱的环节来衡量)和集团多样性融合到结果集中。我们表明由于多样性要求,问题是NP难的。但是,我们的公式是次模块的,因此可以从最佳常数中近似得出。我们提出了针对mDκC的算法,该算法利用了输入网络中的边缘权重分布,与详尽的解决方案相比,可产生超过3个数量级的性能增益。我们的一种算法Diverse Cliques(DiCliQ)保证恒定的因子近似值,而另一种算法Bottom Up Diverse Cliques(BUDiC)可以扩展到大型且密集的网络,而不会影响解决方案的质量。我们评估了5种不同类型的真实世界网络上的两种算法,并证明了它们在体育和娱乐领域发现基因复合物和有效协作亚组的效用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号