首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Scalable model-based clustering for large databases based on data summarization
【24h】

Scalable model-based clustering for large databases based on data summarization

机译:基于数据汇总的可扩展的基于模型的大型数据库集群

获取原文
获取原文并翻译 | 示例

摘要

The scalability problem in data mining involves the development of methods for handling large databases with limited computational resources such as memory and computation time. In this paper, two scalable clustering algorithms, bEMADS and gEMADS, are presented based on the Gaussian mixture model. Both summarize data into subclusters and then generate Gaussian mixtures from their data summaries. Their core algorithm, EMADS, is defined on data summaries and approximates the aggregate behavior of each subcluster of data under the Gaussian mixture model. EMADS is provably convergent. Experimental results substantiate that both algorithms can run several orders of magnitude faster than expectation-maximization with little loss of accuracy.
机译:数据挖掘中的可伸缩性问题涉及开发用于以有限的计算资源(例如内存和计算时间)处理大型数据库的方法。在本文中,基于高斯混合模型,提出了两种可扩展的聚类算法bEMADS和gEMADS。两者都将数据汇总到子群集中,然后根据其数据摘要生成高斯混合。他们的核心算法EMADS在数据摘要中定义,并在高斯混合模型下近似每个数据子集群的聚合行为。 EMADS可证明是收敛的。实验结果证实,两种算法的运行速度都比预期最大化快几个数量级,而准确性损失很小。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号