首页> 外文会议>IEEE International Conference on Information Reuse and Integration >Hidden dependencies between class imbalance and difficulty of learning for bioinformatics datasets

【24h】

Hidden dependencies between class imbalance and difficulty of learning for bioinformatics datasets

机译：班级失衡与生物信息学数据集学习难度之间的隐性依赖

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many bioinformatics datasets share certain problems: they have class imbalance (one class with many more instances than the remaining class(es)), or are difficult to learn from (build accurate models with). Much research has investigated these two problems, or even considered both at once. However, hidden dependencies can exist between these two problems: in a given collection of datasets, the highly imbalanced datasets may be particularly difficult or easy to learn from, and so conclusions based on the level of class imbalance may actually reflect the difficulty of learning. We present a case study with twenty-six bioinformatics datasets which exhibits this dependency, and highlights how it can result in misleading conclusions regarding the absolute and relative performance of learners and feature rankers across balance levels.

机译：许多生物信息学数据集存在某些问题：类不平衡（一个类的实例比其余类多），或者难以学习（使用它们建立准确的模型）。许多研究已经调查了这两个问题，或者甚至同时考虑了这两个问题。但是，这两个问题之间可能存在隐藏的依赖关系：在给定的数据集集合中，高度不平衡的数据集可能特别困难或易于学习，因此基于班级不平衡水平的结论实际上可能反映了学习的难度。我们用26个生物信息学数据集展示了一个案例研究，该数据集显示了这种依赖性，并着重强调了它如何导致关于学习者和功能等级在整个平衡水平上的绝对和相对表现的误导性结论。

著录项

来源
《IEEE International Conference on Information Reuse and Integration 》|2013年|232-238|共7页
会议地点
作者
Wald Randall; Khoshgoftaar Taghi M.; Fazelpour Alireza; Dittman David J.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Class Imbalance; Classification; Cross-Validation; DNA Microarray; Difficulty-of-Learning;

机译：类别失衡;分类;交叉验证; DNA芯片;学习困难;

相似文献

外文文献
中文文献
专利

1. Adjusted geometric-mean: A novel performance measure for imbalanced bioinformatics datasets learning [J] . Batuwita R., Palade V. Journal of Bioinformatics and Computational Biology . 2012 ,第4期

机译：调整后的几何平均值：一种用于不平衡生物信息学数据集学习的新颖性能度量
2. ADJUSTED GEOMETRIC-MEAN: A NOVEL PERFORMANCE MEASURE FOR IMBALANCED BIOINFORMATICS DATASETS LEARNING [J] . RUKSHAN BATUWITA* and VASILE PALADE† Journal of Bioinformatics and Computational Biology . 2012 ,第4期

机译：调整后的几何均值：学习生物信息学数据集不平衡的新性能度量
3. Feature Selection and Ensemble Learning Techniques in One-Class Classifiers: An Empirical Study of Two-Class Imbalanced Datasets [J] . Chih-Fong Tsai, Wei-Chao Lin Quality Control, Transactions . 2021 ,第1期

机译：单级分类器中的特征选择和集合学习技术：两级不平衡数据集的实证研究
4. Hidden dependencies between class imbalance and difficulty of learning for bioinformatics datasets [C] . Wald Randall, Khoshgoftaar Taghi M., Fazelpour Alireza, IEEE International Conference on Information Reuse and Integration . 2013

机译：类别不平衡与生物信息学数据集的难度之间的隐藏依赖关系
5. Diversified ensemble classifiers for highly imbalanced data learning and its application in bioinformatics. [D] . Ding, Zejin. 2011

机译：用于高度不平衡数据学习的多元化集成分类器及其在生物信息学中的应用。
6. Stable polyp-scene classification via subsampling and residual learning from an imbalanced large dataset [O] . Hayato Itoh, Holger Roth, Masahiro Oda, 2019

机译：通过子采样和残差学习从不平衡的大型数据集中进行稳定的息肉场景分类
7. Learning Shallow Syntactic Dependencies from Imbalanced Datasets: A Case Study in Modern Greek and English [O] . Karozou, Argiro, Kermanidis, Katia, 2011

机译：从不平衡数据集中学习浅句法依存关系：以现代希腊语和英语为例

Hidden dependencies between class imbalance and difficulty of learning for bioinformatics datasets

摘要

著录项

相似文献

相关主题

期刊订阅