【24h】

On mining cross-graph quasi-cliques

机译:关于挖掘跨图准cliques

获取原文

摘要

Joint mining of multiple data sets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in cross-market customer segmentation, a group of customers who behave similarly in multiple markets should be considered as a more coherent and more reliable cluster than clusters found in a single market. As another example, in bioinformatics, by joint mining of gene expression data and protein interaction data, we can find clusters of genes which show coherent expression patterns and also produce interacting proteins. Such clusters may be potential pathways.In this paper, we investigate a novel data mining problem, mining cross-graph quasi-cliques, which is generalized from several interesting applications such as cross-market customer segmentation and joint mining of gene expression data and protein interaction data. We build a general model for mining cross-graph quasi-cliques, show why the complete set of cross-graph quasi-cliques cannot be found by previous data mining methods, and study the complexity of the problem. While the problem is difficult, we develop an efficient algorithm, Crochet, which exploits several interesting and effective techniques and heuristics to efficaciously mine cross-graph quasi-cliques. A systematic performance study is reported on both synthetic and real data sets. We demonstrate some interesting and meaningful cross-graph quasi-cliques in bioinformatics. The experimental results also show that algorithm Crochet is efficient and scalable.
机译:联合挖掘多个数据集通常可以发现有趣,新颖和可靠的模式,而这些模式不能仅从任何单一来源获得。例如,在跨市场客户细分中,与在单个市场中发现的集群相比,在多个市场中表现相似的一组客户应被视为更一致,更可靠的集群。再举一个例子,在生物信息学中,通过共同挖掘基因表达数据和蛋白质相互作用数据,我们可以发现基因簇,这些簇显示出一致的表达模式并产生相互作用的蛋白质。这样的集群可能是潜在的途径。在本文中,我们研究了一个新的数据挖掘问题,即挖掘跨图准cliclis ,该问题是从跨市场客户细分和联合等一些有趣的应用程序中推广而来的。基因表达数据和蛋白质相互作用数据的挖掘。我们建立了一个用于挖掘跨图准cliclis的通用模型,展示了为什么以前的数据挖掘方法无法找到完整的跨图准cliclis集合,并研究了问题的复杂性。虽然问题很棘手,但我们开发了一种有效的算法 Crochet ,该算法利用了几种有趣且有效的技术和启发式方法来有效地挖掘跨图的准古怪。报告了综合和真实数据集的系统性能研究。我们在生物信息学中展示了一些有趣且有意义的交叉图准气候。实验结果还表明,算法 Crochet 是有效且可扩展的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号