首页> 外文期刊>Fundamenta Informaticae >Scalable Clustering for Mining Local-Correlated Clusters in High Dimensions and Large Datasets
【24h】

Scalable Clustering for Mining Local-Correlated Clusters in High Dimensions and Large Datasets

机译:用于在高维度和大数据集中挖掘与本地相关的集群的可伸缩集群

获取原文
获取原文并翻译 | 示例
           

摘要

Clustering is useful for mining the underlying structure of a dataset in order to support decision making since target or high-risk groups can be identified. However, for high dimensional datasets, the result of traditional clustering methods can be meaningless as clusters may only be depicted with respect to a small part of features. Taking customer datasets as an example, certain customers may correlate with their salary and education, and the others may correlate with their job and house location. If one uses all the features of a customer for clustering, these local-correlated clusters may not be revealed. In addition, processing high dimensions and large datasets is a challenging problem in decision making. Searching all the combinations of every feature with every record to extract local-correlated clusters is infeasible, which is in exponential scale in terms of data dimensionality and cardinality. In this paper, we propose a scalable 2-Leveled Approximated Hyper-image-based Clustering framework, referred as 2L-HIC-A, for mining local-correlated clusters, where each level clustering process requires only one scan of the original dataset. Moreover, the data-processing time of 2L-HIC-A can be independent of the input data size. In 2L-HIC-A, various well-developed image processing techniques can be exploited for mining clusters. In stead of proposing a new clustering algorithm, our framework can accommodate other clustering methods for mining local-corrected clusters, and to shed new light on the existing clustering techniques.
机译:聚类可用于挖掘数据集的基础结构以支持决策,因为可以识别目标或高风险组。但是,对于高维数据集,传统聚类方法的结果可能毫无意义,因为聚类可能仅针对特征的一小部分进行了描述。以客户数据集为例,某些客户可能与他们的工资和学历相关联,而其他客户可能与他们的工作和房屋所在地相关联。如果使用客户的所有功能进行群集,则可能不会显示这些与本地相关的群集。另外,处理高维度和大型数据集是决策中的挑战性问题。搜索每个特征与每个记录的所有组合以提取与本地相关的聚类是不可行的,这在数据维数和基数方面呈指数级。在本文中,我们提出了一种可扩展的基于两层近似基于超图像的聚类框架,称为2L-HIC-A,用于挖掘与本地相关的聚类,其中每个聚类过程仅需要对原始数据集进行一次扫描。而且,2L-HIC-A的数据处理时间可以与输入数据大小无关。在2L-HIC-A中,可以利用各种发达的图像处理技术来挖掘集群。代替提出新的聚类算法,我们的框架可以容纳用于挖掘局部校正的聚类的其他聚类方法,并为现有聚类技术提供新的思路。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号