Large-Scale Instance Selection using Center of Principal Components

机译：使用主组件中心的大型实例选择

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

One-hot encoding is one of the most popular data preprocessing methods used to convert categorical data into numerical data before applying a training classification model. Drawbacks of one-hot encoding are its high memory consumption and large storage requirements when the training set has many categorical columns. To overcome these drawbacks, we proposed the PC-CS method to reduce the training set size by only selecting principal components center as a representative instance of each disjoint partition. The proposed method was compared with two instance selection methods and the full training model as the baseline model. We used five datasets from the UCI dataset repository website. Evaluation of the classification performance involved four classifier algorithms: decision tree, naive Bayes, support vector machine, and logistic regression. The average classification performance of the proposed PC-CS method was approximately 2% higher than those of the other algorithms tested, while the reduction rate was about 5% higher.

机译：一次性编码是用于在应用训练分类模型之前将分类数据转换为数值数据的最流行数据预处理方法之一。单热编码的缺点是其高存储器消耗和当训练集有许多分类列时的存储要求。为了克服这些缺点，我们提出了通过仅选择主成分中心作为每个不相交分区的代表性实例来减少训练集大小的PC-CS方法。将该方法与两种实例选择方法和作为基线模型的完整训练模型进行了比较。我们使用了UCI DataSet存储库网站的五个数据集。评估分类性能涉及四分类器算法：决策树，天真贝叶斯，支持向量机和逻辑回归。所提出的PC-CS方法的平均分类性能比测试的其他算法高约2％，而减少率高约5％。

著录项

来源
《International Conference on Digital Arts, Media and Technology;ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering》|2021年|157-160|共4页
会议地点
作者
Chatchai Kasemtaweechok; Patnaree Pharkdepinyo; Patimakorn Doungmanee;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Support vector machines; Computational modeling; Training data; Approximation algorithms; Encoding; Partitioning algorithms;

机译：培训;支持向量机;计算建模;培训数据;近似算法;编码;分区算法;
入库时间 2022-08-26 13:57:19

相似文献

外文文献
中文文献
专利

1. Evaluation of principal component selection methods to form a global prediction model by principal component regression [J] . Xie YL, Kalivas JH Analytica chimica acta . 1997,第1a3期

机译：通过主成分回归评估主成分选择方法以形成全局预测模型
2. Centered and non-centered principal component analyses in the frequency domain [J] . A. Boudou, E.N. Cabral, Y. Romain Statistics & Probability Letters . 2010,第2期

机译：频域中的中心和非中心主成分分析
3. A Variable Selection Criterion for Two Sets of Principal Component Scores in Principal Canonical Correlation Analysis [J] . TORU OGURA, YASUNORI FUJIKOSHI, TAKAKAZU SUGIYAMA Communications in Statistics . 2013,第10a12期

机译：主正则相关分析中两组主成分得分的变量选择准则
4. A Bias Trick for Centered Robust Principal Component Analysis (Student Abstract) [C] . Baokun He, Guihong Wan, Haim Schweitzer AAAI Conference on Artificial Intelligence . 2020

机译：以居中强制主成分分析（学生摘要）的偏见伎俩
5. Parameter Selection of Sparse Functional Principal Component Analysis with fMRI data. [D] . Han, Joo Yoon. 2016

机译：使用fMRI数据进行稀疏功能主成分分析的参数选择。
6. Benchmarking principal component analysis for large-scale single-cell RNA-sequencing [O] . Koki Tsuyuzaki, Hiroyuki Sato, Kenta Sato, 2020

机译：大规模单细胞RNA测序的基准主成分分析
7. A Criterion for the Selection of Principal Components in the Robust Principal Component Regression [O] . Bu-Yong Kim 2011

机译：在强大的主成分回归中选择主成分的标准

Large-Scale Instance Selection using Center of Principal Components

摘要

著录项

相似文献

相关主题

期刊订阅