Instance Ranking Using Data Complexity Measures for Training Set Selection

机译：实例使用数据复杂度措施排名进行培训集选择

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

A classifier's performance is dependent on the training set provided for the training. Hence training set selection holds an important place in the classification task. This training set selection plays an important role in improving the performance of the classifier and reducing the time taken for training. This can be done using various methods like algorithms, data-handling techniques, cost-sensitive methods, ensembles and so on. In this work, one of the data complexity measures, Maximum Fisher's discriminant ratio (F1), has been used to determine the good training instances. This measure discriminates any two classes using a specific feature by comparing the class means and variances. This measure in particular provides the overlap between the classes. In the first phase, F1 of the whole data set is calculated. After that, F1 using leave-one-out method is computed to rank each of the instances. Finally, the instances that lower the F1 value are all removed as a batch from the data set. According to F1, a small value represents a strong overlap between the classes. Therefore if those instances that cause more overlap are removed then overlap will reduce further. Empirically demonstrated in this work, the efficacy of the proposed reduction algorithm (DRF1) using 4 different classifiers (Random Forest, Decision Tree-C5.0, SVM and kNN) and 6 data sets (Pima, Musk, Sonar, Winequality(R and W) and Wisconsin). The results confirm that the DRF1 leads to a promising improvement in kappa statistics and classification accuracy with the training set selection using data complexity measure. Approximately 18-50% reduction is achieved. There is a huge reduction of training time also.

机译：分类器的性能取决于为培训提供的培训集。因此，培训设置选择在分类任务中保持重要位置。此培训集选择在提高分类器的性能和减少培训所需的时间方面起着重要作用。这可以使用算法等各种方法来完成，数据处理技术，成本敏感方法，集合等。在这项工作中，数据复杂度措施之一，最高渔民的判别比（F1）已被用于确定良好的培训实例。通过比较类手段和差异来使用特定特征来判别任意两个类。该措施特别提供了类之间的重叠。在第一阶段，计算整个数据集的F1。之后，计算使用休假方法的F1来对每个实例进行排名。最后，降低F1值的实例全部将作为从数据集的批处理删除。根据F1，一个小值表示类之间的强重叠。因此，如果删除导致更多重叠的那些情况，则重叠将进一步减少。经验证明在这项工作中，使用4种不同的分类器（随机林，决策树-C5.0，SVM和KNN）和6个数据集（PIMA，Musk，Sonar，WineQuality（R和w）和威斯康星州）。结果证实，DRF1通过使用数据复杂度测量的训练设置选择，DRF1导致kappa统计和分类准确性的提高。减少约18-50％。还有巨大减少培训时间。

著录项

来源
《International Conference on Pattern Recognition and Machine Intelligence》|2019年|637p|共10页
会议地点
作者
Junaid Alam; T. Sobha Rani;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.4-53;
关键词
Maximum fisher's discriminant ratio; Classification; Batch removal; Kappa statistics; Instance ranking;

机译：最大的渔民的判别比率;分类;分批删除;kappa统计;实例排名;

相似文献

外文文献
中文文献
专利

1. A Set of Complexity Measures Designed for Applying Meta-Learning to Instance Selection [J] . Leyva E., Gonzalez A., Perez R. Knowledge and Data Engineering, IEEE Transactions on . 2015,第2期

机译：一套用于将元学习应用于实例选择的复杂性度量标准
2. An efficient instance selection algorithm to reconstruct training set for support vector machine [J] . Liu Chuan, Wang Wenyong, Wang Meng, Knowledge-Based Systems . 2017,第JANa15期

机译：支持向量机重构训练集的高效实例选择算法
3. A methodology for training set instance selection using mutual information in time series prediction [J] . Milos B. Stojanovic, Milos M. Bozic, Milena M. Stankovic, Neurocomputing . 2014,第octa2期

机译：在时间序列预测中使用互信息训练集合实例的方法
4. Instance Ranking Using Data Complexity Measures for Training Set Selection [C] . Junaid Alam, T. Sobha Rani International conference on pattern recognition and machine intelligence . 2019

机译：使用数据复杂性度量进行训练集选择的实例排名
5. SELECTION OF SCI-TECH BOOKS: A STUDY UTILIZING A BINARY CHOICE REPEATED MEASURES MODEL, A SIMPLE CROSSOVER DESIGN, AND A FORCED CHOICE RANKING OF TEN SELECTION PRINCIPLES. [D] . HAYASHIKAWA, DORIS SHINOBU (IGE). 1983

机译：科技书的选择：利用二元选择重复测量模型，简单的交叉设计和十个选择原则的强制选择进行研究的研究。
6. Ranking the whole MEDLINE database according to a large training set using text indexing [O] . Brian P Suomela, Miguel A Andrade 2005

机译：使用文本索引根据大型训练集对整个MEDLINE数据库进行排名
7. An efficient instance selection algorithm to reconstruct training set for support vector machine [O] . Liu Chuan, Wang Wenyong, Wang Meng, 2017

机译：一种有效的实例选择算法来重建支持向量机的训练集

Instance Ranking Using Data Complexity Measures for Training Set Selection

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅