首页> 外文期刊>Information Technology Journal >Efficient Clustering for High Dimensional Data: Subspace Based Clustering and Density Based Clustering
【24h】

Efficient Clustering for High Dimensional Data: Subspace Based Clustering and Density Based Clustering

机译:高维数据的有效聚类:基于子空间的聚类和基于密度的聚类

获取原文
获取原文并翻译 | 示例
       

摘要

Finding clusters in a high dimensional data space is challenging because a high dimensional data space has hundreds of attributes and hundreds of data tuples and the average density of data points is very low. The distance functions used by many conventional algorithms fail in this scenario. Clustering relies on computing the distance between objects and thus, the complexity of the similarity models has a severe influence on the efficiency of the clustering algorithms. Especially for density-based clustering, range queries must be supported efficiently to reduce the runtime of clustering. The density-based clustering is also influenced by the density divergence problem that affects the accuracy of clustering. If clusters do not exist in the original high dimensional data space, it may be possible that clusters exist in some subspaces of the original data space. Subspace clustering algorithms localize the search for relevant dimensions allowing them to find clusters that exist in multiple, possibly overlapping subspaces. Subspace clustering algorithms identifies such subspace clusters. But for clustering based on relative region densities in the subspaces, density based subspace clustering algorithms are applied where the clusters are regarded as regions whose densities are relatively high as compared to the region densities in a subspace. This study presents a review of various subspaces based clustering algorithms and density based clustering algorithms with their efficiencies on different data sets.
机译:在高维数据空间中查找群集具有挑战性,因为高维数据空间具有数百个属性和数百个数据元组,并且数据点的平均密度非常低。在这种情况下,许多常规算法使用的距离函数将失败。聚类依赖于计算对象之间的距离,因此,相似度模型的复杂性严重影响了聚类算法的效率。特别是对于基于密度的群集,必须有效地支持范围查询以减少群集的运行时间。基于密度的聚类还受到影响聚类精度的密度发散问题的影响。如果原始高维数据空间中不存在群集,则可能在原始数据空间的某些子空间中存在群集。子空间聚类算法可对相关维度进行本地化搜索,从而使它们能够找到存在于多个可能重叠的子空间中的聚类。子空间聚类算法可识别此类子空间聚类。但是,对于基于子空间中相对区域密度的聚类,应用了基于密度的子空间聚类算法,其中将聚类视为与子空间中的区域密度相比密度相对较高的区域。这项研究提出了各种基于子空间的聚类算法和基于密度的聚类算法及其在不同数据集上的效率的综述。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号