首页> 外文会议>International conference on future data and security engineering >A Comparative Study of the Use of Coresets for Clustering Large Datasets
【24h】

A Comparative Study of the Use of Coresets for Clustering Large Datasets

机译:使用核心集对大型数据集进行聚类的比较研究

获取原文

摘要

Coresets can be described as a compact subset such that models trained on coresets will also provide a good fit with models trained on full data set. By using coresets, we can scale down a big data to a tiny one in order to reduce the computational cost of a machine learning problem. In recent years, data scientists have investigated various methods to create coresets. The two state-of-the-art algorithms have been proposed in 2018 are ProTraS by Ros & Guillaume and Lightweight Coreset by Bachem et al. In this paper, we briefly introduce these two algorithms and make a comparison between them to find out the benefits and drawbacks of each one.
机译:核心集可以描述为一个紧凑的子集,这样,在核心集上训练的模型也将与在完整数据集上训练的模型很好地契合。通过使用核心集,我们可以将大数据缩减为很小的数据,以减少机器学习问题的计算成本。近年来,数据科学家研究了各种创建核心集的方法。 Ros&Guillaume的ProTraS和Bachem等人的Lightweight Coreset于2018年提出了两种最先进的算法。在本文中,我们简要介绍了这两种算法,并进行了比较,以找出每种算法的优缺点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号