Evolving clustering algorithm based on mixture of typicalities for stream data mining

Jose Maia; Carlos Alberto Severiano Junior; Frederico Gadelha Guimaraes; Cristiano Leite de Castro; Andre Paim Lemos; Juan Camilo Fonseca Galindo; Miri Weiss Cohen

首页> 外文期刊>Future generation computer systems >Evolving clustering algorithm based on mixture of typicalities for stream data mining

【24h】

Evolving clustering algorithm based on mixture of typicalities for stream data mining

机译：基于混合性的演化聚类算法的流数据挖掘

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many applications have been producing streaming data nowadays, which motivates techniques to extract knowledge from such sources. In this sense, the development of data stream clustering algorithms has gained an increasing interest. However, the application of these algorithms in real systems remains a challenge, since data streams often come from non-stationary environments, which can affect the choice of a proper set of model parameters for fitting the data or finding a correct number of clusters. This work proposes an evolving clustering algorithm based on a mixture of typicalities. It is based on the TEDA framework and divide the clustering problem into two subproblems: micro-clusters and macro-clusters. Experimental results with benchmarking data sets showed that the proposed methodology can provide good results for clustering data and estimating its density even in the presence of events that can affect data distribution parameters, such as concept drifts. In addition, the model parameters were robust in relation to the state-of-the-art algorithms.

机译：如今，许多应用程序已在生成流数据，这激发了从此类资源中提取知识的技术。从这个意义上讲，数据流聚类算法的发展引起了越来越多的兴趣。但是，这些算法在实际系统中的应用仍然是一个挑战，因为数据流通常来自非平稳环境，这可能会影响选择一组合适的模型参数以拟合数据或找到正确数量的聚类。这项工作提出了一种基于混合性的进化聚类算法。它基于TEDA框架，将聚类问题分为两个子问题：微观集群和宏观集群。具有基准数据集的实验结果表明，即使存在可能影响数据分布参数（例如概念漂移）的事件，所提出的方法也可以为聚类数据和估计其密度提供良好的结果。另外，相对于最新算法，模型参数是可靠的。

著录项

来源
《Future generation computer systems》 |2020年第5期|672-684|共13页
作者
Jose Maia; Carlos Alberto Severiano Junior; Frederico Gadelha Guimaraes; Cristiano Leite de Castro; Andre Paim Lemos; Juan Camilo Fonseca Galindo; Miri Weiss Cohen;
展开▼
作者单位

Department of Electrical Engineering Universidade Federal de Minas Gerais Belo Horizonte Brazil;

Department of Electrical Engineering Universidade Federal de Minas Gerais Belo Horizonte Brazil Machine Intelligence and Data Science (MINDS) Laboratory Federal University of Minas Gerais Belo Horizonte Brazil Instituto Federal de Minas Gerais Campus Sahara Brazil;

Department of Electrical Engineering Universidade Federal de Minas Gerais Belo Horizonte Brazil Machine Intelligence and Data Science (MINDS) Laboratory Federal University of Minas Gerais Belo Horizonte Brazil;

Department of Software Engineering Braude College of Engineering Karmiel Israel;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Clustering; Data stream; Concept drift; Stream data mining; Evolving fuzzy systems;

机译：集群;数据流;概念漂移;流数据挖掘;不断发展的模糊系统;

相似文献

外文文献
中文文献
专利

1. An evolving approach to data streams clustering based on typicality and eccentricity data analytics [J] . Information Sciences: An International Journal . 2020,第期

机译：基于典型和偏心数据分析的数据流聚类的不断发展的方法
2. LeaDen-Stream: A Leader Density-Based Clustering Algorithm over Evolving Data Stream [J] . Amineh Amini, Teh Ying Wah Journal of Computer and Communications . 2013,第5期

机译：LeaDen-Stream：不断发展的数据流上基于领导者密度的聚类算法
3. Clustering right-skewed data stream via Birnbaum-Saunders mixture models: A flexible approach based on fuzzy clustering algorithm [J] . Hashemi Farzane, Naderi Mehrdad, Mashinchi Mashallah Applied Soft Computing . 2019,第期

机译：通过Birnbaum-Saunders混合模型聚类右偏斜数据流：一种基于模糊聚类算法的灵活方法
4. On Density-based Clustering Algorithms over Evolving Data Streams: A Summarization Paradigm [C] . Amineh Amini, Teh Ying Wah International Conference on Information Technology and Management Innovation . 2013

机译：关于基于密度的聚类算法在不断发展的数据流中：概述范式
5. Scalable frameworks and algorithms for cluster ensembles and clustering data streams. [D] . Hore, Prodip. 2007

机译：用于集群集成和集群数据流的可扩展框架和算法。
6. Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions [O] . E Andres Houseman, Brock C Christensen, Ru-Fang Yeh, 2008

机译：DNA甲基化阵列数据的基于模型的聚类：针对β分布混合出现的高维数据的递归划分算法
7. On Density-based Clustering Algorithms over Evolving Data Streams: A Summarization Paradigm [O] . Amineh Amini, Teh Ying Wah 2016

机译：基于密度的数据流演化聚类算法：概述范式

Evolving clustering algorithm based on mixture of typicalities for stream data mining

摘要

著录项

相似文献

相关主题

期刊订阅