面向海量数据流的基于密度的簇结构挖掘算法

于彦伟; 王欢; 王沁; 赵金东

首页> 中文期刊>软件学报 >面向海量数据流的基于密度的簇结构挖掘算法

面向海量数据流的基于密度的簇结构挖掘算法

开具论文收录证明 >>

期刊封面封底目录下载 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes a mining algorithm of density-based cluster-structure, named MCluStream, to resolve the problems of input parameter selection and overlapping cluster identification in evolving data stream. First, a tree topology index, named CR-Tree, is designed to map a pair of data points with directly core reachable into relationship of father and child node. The CR-Tree that record relationships among points represents cluster-structure under a series ofsubEps settings. Second, the online update of cluster-structure on CR-Tree is completed by MCluStream under sliding window environments, which effectively maintains clusters over massive evolving data streams. Third, a fast cluster-structure extraction method is implemented from the CR-Tree. Users can easily select reasonable clustering results according to the visualized cluster-structure. Finally, experimental evaluations on massive-scale real and synthetic data demonstrate the effective mining result and better performance of the proposed algorithm compared against state-of-the-art methods. MCluStream is desirable to be applied to self-adaptive density-based clustering over high-volume data streams.%提出一种基于密度的簇结构挖掘算法(mining density-based clustering structure over data streams,简称MCluStream),以解决数据流密度聚类中输入参数选择困难和重叠簇识别等问题.首先,设计了一种树拓扑 CR-Tree索引结构,将直接核心可达的一对数据点映射成树结构中的父子关系,蕴含了数据点依赖关系的 CR-Tree 涵盖了一系列subEps参数下的基于密度的簇结构;其次,MCluStream算法采用滑动窗口的方式更新CR-Tree,在线维护当前窗口上的簇结构,实现了对海量数据流的快速演化聚类分析;再次,设计了一种快速从CR-Tree提取簇结构的方法,根据可视化的簇结构,选择合理的聚类结果;最后,在真实和合成海量数据上的实验验证了 MCluStream 算法具有有效的挖掘效果、较高的聚类效率和较小的空间开销.MCluStream 可适用于海量数据流应用中自适应的密度聚类演化分析.

著录项

来源
《软件学报》|2015年第5期|1113-1128|共16页
作者
于彦伟; 王欢; 王沁; 赵金东;
展开▼
作者单位

烟台大学计算机与控制工程学院,山东烟台 264005;

Department of Computer Science, University of California, San Diego, USA;

北京科技大学计算机与通信工程学院,北京 100083;

烟台大学计算机与控制工程学院,山东烟台 264005;

展开▼
原文格式 PDF
正文语种 chi
中图分类程序设计、软件工程;
关键词
聚类分析; 密度聚类; 簇结构; 数据流; 滑动窗口;
入库时间 2022-08-18 05:34:13

相似文献

中文文献
外文文献
专利

1. 基于复杂网络数据流密度的增量子空间数据挖掘算法 [J] . 侯燕 ,李巍 ,文乔农 . 计算机应用研究 . 2015,第007期
2. 基于网格和密度的海量数据增量式离群点挖掘算法 [J] . 张净 ,孙志挥 ,杨明 . 计算机研究与发展 . 2011,第005期
3. 一种基于改进的DBSCAN的面向海量船舶位置数据码头挖掘算法 [J] . 丁兆颖 ,姚迪 ,吴琳 . 计算机工程与科学 . 2015,第011期
4. 基于时间衰减和密度的任意簇数据流聚类 [J] . 龚云 ,赵鹏 ,王守军 . 微型机与应用 . 2011,第006期
5. 基于密度泛函理论的Bn（n=1-8）团簇结构和稳定性研究 [J] . 曹欣伟 ,姜振益 ,薛瑞波 . 纳米科技 . 2014,第001期
6. 面向海量数据流的基于密度的簇结构挖掘算法 [C] . 于彦伟 ,王欢 ,王沁 . 第十一全国博士生学术年会——信息技术与安全专题 . 2013
7. 基于密度的数据流聚类挖掘算法 [A] . 王延明 . 2006

面向海量数据流的基于密度的簇结构挖掘算法

摘要

著录项

相似文献

相关主题

期刊订阅