首页> 美国卫生研究院文献>Sensors (Basel Switzerland) >PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data
【2h】

PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data

机译:PARSUC:基于并行子采样的遥感大数据聚类方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Remote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and analyzing remote sensing imagery. However, conventional clustering algorithms are designed for relatively small datasets. When applied to problems with RSBD, they are, in general, too slow or inefficient for practical use. In this paper, we proposed a parallel subsampling-based clustering (PARSUC) method for improving the performance of RSBD clustering in terms of both efficiency and accuracy. PARSUC leverages a novel subsampling-based data partitioning (SubDP) method to realize three-step parallel clustering, effectively solving the notable performance bottleneck of the existing parallel clustering algorithms; that is, they must cope with numerous repeated calculations to get a reasonable result. Furthermore, we propose a centroid filtering algorithm (CFA) to eliminate subsampling errors and to guarantee the accuracy of the clustering results. PARSUC was implemented on a Hadoop platform by using the MapReduce parallel model. Experiments conducted on massive remote sensing imageries with different sizes showed that PARSUC (1) provided much better accuracy than conventional remote sensing clustering algorithms in handling larger image data; (2) achieved notable scalability with increased computing nodes added; and (3) spent much less time than the existing parallel clustering algorithm in handling RSBD.
机译:遥感大数据(RSBD)通常具有大量,多样性和高维度的特征。从RSBD挖掘用于不同应用程序的隐藏信息会带来巨大的计算挑战。聚类是一种广泛用于处理和分析遥感影像的重要数据挖掘技术。但是,常规的聚类算法是为相对较小的数据集设计的。通常,当将其应用于RSBD问题时,它们对于实际使用而言太慢或效率低下。在本文中,我们提出了一种基于并行子采样的聚类(PARSUC)方法,以从效率和准确性两方面提高RSBD聚类的性能。 PARSUC利用一种新颖的基于子采样的数据分区(SubDP)方法来实现三步并行聚类,有效解决了现有并行聚类算法的显着性能瓶颈;也就是说,他们必须应对大量重复计算才能得出合理的结果。此外,我们提出了一种质心滤波算法(CFA),以消除二次采样误差并保证聚类结果的准确性。 PARSUC通过使用MapReduce并行模型在Hadoop平台上实现。在不同大小的大型遥感影像上进行的实验表明,在处理较大的影像数据时,PARSUC(1)比传统的遥感聚类算法具有更高的精度; (2)通过增加计算节点实现了显着的可伸缩性; (3)在处理RSBD上花费的时间比现有的并行聚类算法少得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号