首页> 外文会议>Database systems for advanced applications >Efficient Streaming Detection of Hidden Clusters in Big Data Using Subspace Stream Clustering
【24h】

Efficient Streaming Detection of Hidden Clusters in Big Data Using Subspace Stream Clustering

机译:使用子空间流聚类对大数据中的隐藏聚类进行有效的流检测

获取原文
获取原文并翻译 | 示例

摘要

Recently, many data mining techniques were revisited to cope with the new big data challenges. Nearly all of these algorithms considered the efficiency of the mining algorithm to survive the increasing size of the data. However, as the dimensionality of the data increases, not only the efficiency but also the effectiveness of traditional mining algorithms is compromised. For instance, clusters hidden in some sub-spaces are hard to be detected using traditional clustering algorithms, as the dimensionality of the data increases, In this paper, we consider both the huge size, and the high dimensionality of big data by providing a novel solution that presents a three-phase model for subspace stream clustering algorithms. Our novel model, overcomes the huge size of the big data in its first phase, by continuously applying a streaming concept over the huge data objects, and summarizing them into micro-clusters. Then, after each certain batch of data, or after upon a user request, the second phase is applied over the data summarized in micro-clusters, to reconstruct the current distribution of the data out of the current summaries. In the third phase, a subspace clustering algorithm is applied to overcome the high dimensionality of the data, and to find hidden clusters within some subspace. An extensive evaluation study over different scenarios that follow our model over a big data set is performed.
机译:最近,重新审视了许多数据挖掘技术以应对新的大数据挑战。几乎所有这些算法都考虑了挖掘算法在不断增长的数据大小中生存的效率。但是,随着数据维数的增加,不仅损害了传统挖掘算法的效率,而且损害了其有效性。例如,随着数据维数的增加,使用传统的聚类算法很难检测到隐藏在某些子空间中的聚类。在本文中,我们通过提供一种新颖的方法来考虑大数据的庞大规模和高维度提出了一个用于子空间流聚类算法的三相模型的解决方案。我们的新颖模型通过在大数据对象上连续应用流概念并将其汇总为微型集群,从而在第一阶段克服了大数据的巨大规模。然后,在每一批特定数据之后,或在用户请求之后,将第二阶段应用于在微型集群中汇总的数据,以从当前摘要中重建数据的当前分布。在第三阶段,应用子空间聚类算法来克服数据的高维性,并在某些子空间中找到隐藏的聚类。在大数据集上,按照我们的模型对不同场景进行了广泛的评估研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号