首页> 外文会议>Artificial intelligence in medicine >The Role of Biomedical Dataset in Classification
【24h】

The Role of Biomedical Dataset in Classification

机译:生物医学数据集在分类中的作用

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we investigate the role of a biomedical dataset on the classification accuracy of an algorithm. We quantify the complexity of a biomedical dataset using five complexity measures: correlation-based feature selection subset merit, noise, imbalance ratio, missing values and information gain. The effect of these complexity measures on classification accuracy is evaluated using five diverse machine learning algorithms: J48 (decision tree), SMO (support vector machines), Naive Bayes (probabilistic), IBk (instance based learner) and JRIP (rule-based induction). The results of our experiments show that noise and correlation-based feature selection subset merit - not a particular choice of algorithm - play a major role in determining the classification accuracy. In the end, we provide researchers with a meta-model and an empirical equation to estimate the classification potential of a dataset on the basis of its complexity. This well help researchers to efficiently pre-process the dataset for automatic knowledge extraction.
机译:在本文中,我们研究了生物医学数据集对算法分类准确性的作用。我们使用五个复杂性度量来量化生物医学数据集的复杂性:基于相关性的特征选择子集优点,噪声,失衡比,缺失值和信息增益。使用五种不同的机器学习算法评估了这些复杂性度量对分类准确性的影响:J48(决策树),SMO(支持向量机),朴素贝叶斯(概率),IBk(基于实例的学习者)和JRIP(基于规则的归纳) )。我们的实验结果表明,基于噪声和相关性的特征选择子集优点-不是特定的算法选择-在确定分类精度中起着重要作用。最后,我们为研究人员提供了一个元模型和一个经验方程式,以根据数据集的复杂性估算数据集的分类潜力。这很好地帮助研究人员有效地预处理了数据集以进行自动知识提取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号