首页> 外文会议>Image Perception, Observer Performance, and Technology Assessment; Progress in Biomedical Optics and Imaging; vol.7 no.32 >The Effect of Data Set Size on Computer-Aided Diagnosis of Breast Cancer: Comparing Decision Fusion to a Linear Discriminant
【24h】

The Effect of Data Set Size on Computer-Aided Diagnosis of Breast Cancer: Comparing Decision Fusion to a Linear Discriminant

机译:数据集大小对乳腺癌的计算机辅助诊断的影响:比较决策融合与线性判别式

获取原文
获取原文并翻译 | 示例

摘要

Data sets with relatively few observations (cases) in medical research are common, especially if the data are expensive or difficult to collect. Such small sample sizes usually do not provide enough information for computer models to learn data patterns well enough for good prediction and generalization. As a model that may be able to maintain good classification performance in the presence of limited data, we used decision fusion. In this study, we investigated the effect of sample size on the generalization ability of both linear discriminant analysis (LDA) and decision fusion. Subsets of large data sets were selected by a bootstrap sampling method, which allowed us to estimate the mean and standard deviation of the classification performance as a function of data set size. We applied the models to two breast cancer data sets and compared the models using receiver operating characteristic (ROC) analysis. For the more challenging calcification data set, decision fusion reached its maximum classification performance of AUC = 0.80±0.04 at 50 samples and pAUC = 0.34±0.05 at 100 samples. The LDA reached a lower performance and required many more cases, with a maximum of AUC = 0.68±0.04 and pAUC = 0.12±0.05 at 450 samples. For the mass data set, the two classifiers had more similar performance, with AUC = 0.92±0.02 and pAUC = 0.48±0.02 at 50 samples for decision fusion and AUC = 0.92±0.03 and pAUC = 0.55±0.04 at 500 samples for the LDA.
机译:在医学研究中,具有相对较少观察值(案例)的数据集很常见,尤其是在数据昂贵或难以收集的情况下。如此小的样本量通常无法为计算机模型提供足够的信息,从而无法足够好地学习数据模式以进行良好的预测和概括。由于模型可以在数据有限的情况下保持良好的分类性能,因此我们使用了决策融合。在这项研究中,我们调查了样本量对线性判别分析(LDA)和决策融合的泛化能力的影响。大数据集的子集通过自举抽样方法进行选择,这使我们能够根据数据集大小来估计分类性能的均值和标准差。我们将模型应用于两个乳腺癌数据集,并使用接收者操作特征(ROC)分析比较了模型。对于更具挑战性的钙化数据集,决策融合在50个样本时达到其最大分类性能AUC = 0.80±0.04,在100个样本时达到pAUC = 0.34±0.05。 LDA达到较低的性能,需要更多的情况,在450个样本中,最大AUC = 0.68±0.04,pAUC = 0.12±0.05。对于海量数据集,这两个分类器具有更相似的性能,决策融合的50个样本的AUC = 0.92±0.02和pAUC = 0.48±0.02,LDA的500个样本的AUC = 0.92±0.03和pAUC = 0.55±0.04 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号