Cluster Sampling to Improve Classifier Accuracy for Categorical Data

Lakshmi Sreenivasa Reddy D

首页> 外文期刊>International Journal of Applied Engineering Research >Cluster Sampling to Improve Classifier Accuracy for Categorical Data

【24h】

Cluster Sampling to Improve Classifier Accuracy for Categorical Data

机译：群集采样以提高分类数据的分类器精度

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering is one of the essential techniques to group similar data. Improving model accuracy is still a challenge for all variety of data. Training and testing a classifier on entire data is not possible for large scale of data. Sampling of the data is necessary for any modeling and is an important aspect in data mining. All models train and test on different samples taken by traditional techniques like random forest ensemble method. In this paper, we propose cluster sampling which is superior to any other sampling methods in improving classifier accuracy. Sampling the data from usual methods cannot cover all variety of data from the original. Cluster sampling is a two-step approach. First it clusters the entire data, second it selects samples from each cluster. These samples consists all verity of data with equal proportion. Cluster sampling leverages the tree based ensemble to handle categorical, numerical and mixed type of data. Classifiers modeled on cluster sampling samples shown superior in accuracy than modeled on other sampling techniques.

机译：群集是对类似数据的基本技术之一。提高模型准确性对所有各种数据仍然是一个挑战。培训和测试整个数据上的分类器是不可能进行大规模的数据。数据的采样对于任何建模是必要的，并且是数据挖掘中的一个重要方面。所有模型列车和测试在不同的样本上，通过传统的技术，如随机森林集合方法。在本文中，我们提出了群集采样，其优于任何其他采样方法，提高了分类器精度。从常规方法中采样数据无法覆盖原始的各种数据。群集采样是一种两步的方法。首先，它群集整个数据，第二个它选择来自每个群集的样本。这些样本由等同的比例组成所有数据。群集采样利用基于树的集合来处理分类，数值和混合类型的数据。在集群采样样本上建模的分类器，精度优于如其他采样技术的准确性。

著录项

来源
《International Journal of Applied Engineering Research》 |2019年第13期|共8页
作者
Lakshmi Sreenivasa Reddy D;
展开▼
作者单位

Department of Information Technology Chaitanya Bharathi Institute of Technology Gandipet;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类工程基础科学;
关键词
Clustering; Categorical data; Numerical data; Random forest; Classifier; Sampling;

机译：聚类;分类数据;数值数据;随机森林;分类器;抽样;

相似文献

外文文献
中文文献
专利

1. Cluster Sampling to Improve Classifier Accuracy for Categorical Data [J] . Lakshmi Sreenivasa Reddy D International Journal of Applied Engineering Research . 2019,第13期

机译：群集采样以提高分类数据的分类器精度
2. Two Stage Cluster Sampling Based Asymptotic Inferences in Survey Population Models for Longitudinal Count and Categorical Data [J] . Sutradhar Brajendra C. Sankhya . 2021,第1期

机译：基于阶段集群采样的纵向计数和分类数据调查人口模型中的渐近推论
3. CLUSTER ANALYSIS OF ENVIRONMENTAL DATA WHICH IS NOT INTERVAL SCALED BUT CATEGORICAL: EVALUATION OF AERIAL PHOTOGRAPHS OF GROYNEFIELDS FOR THE DETERMINATION OF REPRESENTATIVE SAMPLING SITES [J] . Hannappel S., Piepho B. Chemosphere . 1996,第2期

机译：环境数据的聚类分析，但不是间歇性的，而是分类的：为确定代表性采样点而对粗粒塑料的航空照相进行评估
4. Comparitive study of outlier analysis methods in improving classifier accuracy on categorical data [C] . D Lakshmi Sreenivasa Reddy, M. Ramchander, B Raveendra Babu, 2016 International Conference on Microelectronics, Computing and Communications . 2016

机译：改进分类数据分类器准确性的离群分析方法的比较研究
5. Automatic categorical data clustering and spatial data clustering by consecutive resolution refinement. [D] . Foss, Andrew Philip Ogilvie. 2002

机译：通过连续的分辨率优化自动分类数据聚类和空间数据聚类。
6. New ways to classify bipolar disorders: going from categorical groups to symptom clusters or dimensions [O] . Chantal Henry, Bruno Etain -1

机译：分类双极性障碍的新方法：从分类组到症状簇或尺寸
7. Improving Categorical Data Clustering Algorithm by Weighting Uncommon Attribute Value Matches [O] . Zengyou He, Xiaofei Xu, Shenchun Deng 2008

机译：通过加权罕见属性值匹配来改进分类数据聚类算法

Cluster Sampling to Improve Classifier Accuracy for Categorical Data

摘要

著录项

相似文献

相关主题

期刊订阅