首页> 外文期刊>International Journal of Applied Engineering Research >Cluster Sampling to Improve Classifier Accuracy for Categorical Data
【24h】

Cluster Sampling to Improve Classifier Accuracy for Categorical Data

机译:群集采样以提高分类数据的分类器精度

获取原文
获取原文并翻译 | 示例
       

摘要

Clustering is one of the essential techniques to group similar data. Improving model accuracy is still a challenge for all variety of data. Training and testing a classifier on entire data is not possible for large scale of data. Sampling of the data is necessary for any modeling and is an important aspect in data mining. All models train and test on different samples taken by traditional techniques like random forest ensemble method. In this paper, we propose cluster sampling which is superior to any other sampling methods in improving classifier accuracy. Sampling the data from usual methods cannot cover all variety of data from the original. Cluster sampling is a two-step approach. First it clusters the entire data, second it selects samples from each cluster. These samples consists all verity of data with equal proportion. Cluster sampling leverages the tree based ensemble to handle categorical, numerical and mixed type of data. Classifiers modeled on cluster sampling samples shown superior in accuracy than modeled on other sampling techniques.
机译:群集是对类似数据的基本技术之一。提高模型准确性对所有各种数据仍然是一个挑战。培训和测试整个数据上的分类器是不可能进行大规模的数据。数据的采样对于任何建模是必要的,并且是数据挖掘中的一个重要方面。所有模型列车和测试在不同的样本上,通过传统的技术,如随机森林集合方法。在本文中,我们提出了群集采样,其优于任何其他采样方法,提高了分类器精度。从常规方法中采样数据无法覆盖原始的各种数据。群集采样是一种两步的方法。首先,它群集整个数据,第二个它选择来自每个群集的样本。这些样本由等同的比例组成所有数据。群集采样利用基于树的集合来处理分类,数值和混合类型的数据。在集群采样样本上建模的分类器,精度优于如其他采样技术的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号