首页> 外文OA文献 >DECODE: a new method for discovering clusters of different densities in spatial data
【2h】

DECODE: a new method for discovering clusters of different densities in spatial data

机译:解码:一种在空间数据中发现不同密度的聚类的新方法

摘要

When clusters with different densities and noise lie in a spatial point set, the major obstacle to classifying these data is the determination of the thresholds for classification, which may form a series of bins for allocating each point to different clusters. Much of the previous work has adopted a model-based approach, but is either incapable of estimating the thresholds in an automatic way, or limited to only two point processes, i.e. noise and clusters with the same density. In this paper, we present a new density-based cluster method (DECODE), in which a spatial data set is presumed to consist of different point processes and clusters with different densities belong to different point processes. DECODE is based upon a reversible jump Markov Chain Monte Carlo (MCMC) strategy and divided into three steps. The first step is to map each point in the data to its mth nearest distance, which is referred to as the distance between a point and its mth nearest neighbor. In the second step, classification thresholds are determined via a reversible jump MCMC strategy. In the third step, clusters are formed by spatially connecting the points whose mth nearest distances fall into a particular bin defined by the thresholds. Four experiments, including two simulated data sets and two seismic data sets, are used to evaluate the algorithm. Results on simulated data show that our approach is capable of discovering the clusters automatically. Results on seismic data suggest that the clustered earthquakes, identified by DECODE, either imply the epicenters of forthcoming strong earthquakes or indicate the areas with the most intensive seismicity, this is consistent with the tectonic states and estimated stress distribution in the associated areas. The comparison between DECODE and other state-of-the-art methods, such as DBSCAN, OPTICS and Wavelet Cluster, illustrates the contribution of our approach: although DECODE can be computationally expensive, it is capable of identifying the number of point processes and simultaneously estimating the classification thresholds with little prior knowledge.
机译:当具有不同密度和噪声的聚类位于空间点集中时,对这些数据进行分类的主要障碍是确定分类阈值,这可能会形成一系列将每个点分配给不同聚类的箱。先前的许多工作都采用了基于模型的方法,但是要么无法自动估计阈值,要么仅限于两点过程,即噪声和具有相同密度的聚类。在本文中,我们提出了一种新的基于密度的聚类方法(DECODE),其中假定空间数据集由不同的点过程组成,并且具有不同密度的簇属于不同的点过程。 DECODE基于可逆跳马尔可夫链蒙特卡洛(MCMC)策略,分为三个步骤。第一步是将数据中的每个点映射到它的第m个最近距离,这称为点与其第m个最近邻居之间的距离。在第二步中,通过可逆跳转MCMC策略确定分类阈值。在第三步中,通过在空间上连接第m个最近距离落入由阈值定义的特定bin中的点来形成聚类。使用四个实验(包括两个模拟数据集和两个地震数据集)来评估算法。模拟数据的结果表明,我们的方法能够自动发现集群。地震数据的结果表明,由DECODE识别的聚集地震要么暗示即将发生的强震的震中,要么表明地震烈度最强的地区,这与构造状态和相关地区的估计应力分布是一致的。 DECODE与其他最先进的方法(例如DBSCAN,OPTICS和Wavelet Cluster)之间的比较说明了我们方法的贡献:尽管DECODE在计算上可能很昂贵,但它能够同时识别点过程的数量在几乎没有先验知识的情况下估计分类阈值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号