首页> 外文会议>IEEE International Symposium on Biomedical Imaging >PREDICTING CLASSIFIER PERFORMANCE WITH A SMALL TRAINING SET: APPLICATIONS TO COMPUTER-AIDED DIAGNOSIS AND PROGNOSIS
【24h】

PREDICTING CLASSIFIER PERFORMANCE WITH A SMALL TRAINING SET: APPLICATIONS TO COMPUTER-AIDED DIAGNOSIS AND PROGNOSIS

机译:用小型训练集预测分类器性能:应用于计算机辅助诊断和预后的应用

获取原文
获取外文期刊封面目录资料

摘要

Selection of an appropriate classifier for computer-aided diagnosis (CAD) applications has typically been an ad hoc process. It is difficult to know a priori which classifier will yield high accuracies for a specific application, especially when well-annotated data for classifier training is scarce. In this study, we utilize an inverse power-law model of statistical learning to predict classifier performance when only limited amounts of annotated training data is available. The objectives of this study are to (a) predict classifier error in the context of different CAD problems when larger data cohorts become available, and (b) compare classifier performance and trends (both at the sample/patient level and at the pixel level) as additional data is accrued (such as in a clinical trial). In this paper we utilize a power law model to evaluate and compare various classifiers (Support Vector Machine (SVM), C4.5 decision tree, k-nearest neighbor) for four distinct CAD problems. The first two datasets deal with sample/patient-level classification for distinguishing between (1) high from low grade breast cancers and (2) high from low levels of lymphocytic infiltration in breast cancer specimens. The other two datasets are pixel-level classification problems for discriminating cancerous and non-cancerous regions on prostate (3) MRI and (4) histopathology. Our empirical results suggest that, given sufficient training data, SVMs tend to be the best classifiers. This was true for datasets (1), (2), and (3), while the C4.5 decision tree was the best classifier for dataset (4). Our results also suggest that results of classifier comparison made on small data cohorts should not be generalized as holding true when large amounts of data become available.
机译:选择适当的计算机辅助诊断分类器(CAD)应用程序通常是AD HoC过程。难以知道PRESTI,分类器将为特定应用产生高精度,特别是当分类器培训的良好注释数据稀缺时。在本研究中,我们利用统计学习的逆动力法模型来预测仅数量有限的注释训练数据时进行分类器性能。本研究的目标是(a)当较大的数据群组变得可用时不同CAD问题的上下文中预测分类器错误,并且(b)比较分类器性能和趋势(在样本/患者级别和像素级别)随着额外数据的累积(例如在临床试验中)。在本文中,我们利用权力法模型来评估和比较各种分类器(支持向量机(SVM),C4.5决策树,K最近邻居)的四个不同的CAD问题。前两个数据集处理样本/患者级分类,以区分从低级乳腺癌的(1)高,从乳腺癌标本中的低水平淋巴细胞浸润量高(2)。另外两个数据集是用于鉴别前列腺(3)MRI和(4)组织病理学的癌症和非癌变区的像素级分类问题。我们的经验结果表明,鉴于足够的培训数据,SVM往往是最好的分类器。对于数据集(1),(2)和(3),这是真的,而C4.5决策树是数据集(4)的最佳分类器。我们的结果还表明,当大量数据可用时,对小型数据群体的分类器比较的结果不应普遍存在。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号