首页> 外文期刊>Future generation computer systems >Clustering big IoT data by metaheuristic optimized mini-batch and parallel partition-based DGC in Hadoop
【24h】

Clustering big IoT data by metaheuristic optimized mini-batch and parallel partition-based DGC in Hadoop

机译:通过元启发式优化的迷你批处理和基于并行分区的DGC在Hadoop中对大型IoT数据进行集群

获取原文
获取原文并翻译 | 示例
           

摘要

Clustering algorithms are an important branch of data mining family which has been applied widely in IoT applications such as finding similar sensing patterns, detecting outliers, and segmenting large behavioral groups in real-time. Traditional full batchk-means for clustering IoT big data is confronted by large scaled storage and high computational complexity problems. In order to overcome the latency inherited from full batchk-means, two big data processing methods were often used: the first method is to use small batches as the input data to multiple computers for reducing the computation efforts. However, depending on the sensed data which may be heterogeneously fused from different sources in an IoT network, the size of each mini batch may vary in each iteration of clustering process. When these input data are subject to clustering their centers would shift drastically, which affects the final clustering results. The second method is parallel computing, it decreases the runtime while the overall computational effort remains the same. Furthermore, some centroid based clustering algorithm such ask-means converges easily into local optima. In light of this, in this paper, a new partitioned clustering method that is optimized by metaheuristic is proposed for IoT big data environment. The method has three main activities: Firstly, a sample of the dataset is partitioned into mini batches. It is followed by adjusting the centroids of the mini batches of data. The third step is collating the mini batches to form clusters, so the quality of the clusters would be maximized. How the positions of the centroids could be optimally attuned at the mini batches are governed by a metaheuristic called Dynamic Group Optimization. The data are processed in parallel in Hadoop. Extensive experiments are conducted to investigate the performance. The results show that our proposed method is a promising tool for clustering fused IoT data efficiently.
机译:聚类算法是数据挖掘家族的重要分支,已广泛应用于物联网应用中,例如查找相似的感应模式,检测异常值以及实时分割大型行为组。传统的用于物联网大数据集群的完全批量处理方法面临着大规模存储和高计算复杂性的问题。为了克服从完整的batchk-means继承的延迟,经常使用两种大数据处理方法:第一种方法是使用小批量作为多台计算机的输入数据,以减少计算量。但是,根据可能从物联网网络中不同来源异构融合的感测数据,在群集过程的每次迭代中,每个小型批次的大小可能会有所不同。当这些输入数据进行聚类时,它们的中心将急剧移动,这会影响最终的聚类结果。第二种方法是并行计算,它减少了运行时间,而总体计算量却保持不变。此外,一些基于质心的聚类算法,例如问均值,很容易收敛到局部最优。有鉴于此,本文针对物联网大数据环境,提出了一种通过元启发式优化的分区聚类新方法。该方法具有三个主要活动:首先,将数据集的样本划分为小批量。接下来是调整小批量数据的质心。第三步是整理微型批次以形成集群,这样集群的质量将得到最大化。如何在微型批次中最佳地调整质心的位置由称为动态组优化的元启发法控制。数据在Hadoop中并行处理。进行了广泛的实验以研究其性能。结果表明,我们提出的方法是一种有效地聚类融合物联网数据的有前途的工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号