首页> 外文会议>IEEE International Conference on Cybernetics >Combining nearest neighbour classifiers based on small subsamples for big data analytics
【24h】

Combining nearest neighbour classifiers based on small subsamples for big data analytics

机译:基于大数据分析的小亚样品组合最近的邻邻分类器

获取原文

摘要

Contemporary machine learning systems must be able to deal with ever-growing volumes of data. However, most of the canonical classifiers are not well-suited for big data analytics. This is especially vivid in case of distance-based classifiers, where their classification time is prohibitive. Recently, many methods for adapting nearest neighbour classifier for big data were proposed. We investigate simple, yet efficient technique based on random under-sampling of the dataset. As we deal with stationary data, one may assume that a subset of objects will sufficiently capture the properties of given dataset. We propose to build distance-based classifiers on the basis of very small subsamples and then combine them into an ensemble. With this, one does not need to aggregate datasets, only local decisions of classifiers. On the basis of experimental results we show that such an approach can return comparable results to nearest neighbour classifier over the entire dataset, but with a significantly reduced classification time. We investigate the number of sub-samples (ensemble members), that are required for capturing the properties of each dataset. Finally, we propose to apply our sub-sampling based ensemble in a distributed environment, which allows for a further reduction of the computational complexity of nearest neighbour rule for big data.
机译:当代机器学习系统必须能够处理不断增长的数据量。然而,大多数规范分类器都不适合大数据分析。在基于距离的分类器的情况下,这尤其生动,其中分类时间是令人禁止的。最近,提出了许多用于调整最近邻分类的大数据的方法。我们根据数据集的随机抽样调查简单但有效的技术。当我们处理静止数据时,可以假设对象的子集将充分捕获给定数据集的属性。我们建议在非常小的子样本的基础上建立基于距离的分类器,然后将它们组合成集合。有了这个,一个不需要聚合数据集,只有分类器的本地决策。在实验结果的基础上,我们表明这种方法可以在整个数据集中返回与最近的邻居分类器的相当结果,但是具有显着降低的分类时间。我们调查捕获每个数据集的属性所需的子样本(集合成员)的数量。最后,我们建议在分布式环境中应用基于子采样的集合,这允许进一步减少最近邻居规则的大数据的计算复杂性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号