首页> 外文会议>Conference on Medical Imaging: Image Perception, Observer Performance, and Technology Assessment >Supplementing training with data from a shifted distribution for machine learning classifiers: Adding more cases may not always help
【24h】

Supplementing training with data from a shifted distribution for machine learning classifiers: Adding more cases may not always help

机译:用来自机器学习分类器的变化分布的数据补充训练:添加更多案例可能并不总是有帮助

获取原文

摘要

In this study, we show that when a training data set is supplemented by drawing samples from a distribution that is different from that of the target population, the differences in the distributions of the original and supplemental training populations should be considered to maximize the performance of the classifier in the target population. Depending on these distributions, drawing a large number of cases from the supplemental distribution may result in lower performance compared to limiting the number of added cases. This is relevant for medical images when synthetic data is used for training a machine learning algorithm, which may result in a mixed distribution for the training set. We simulated a two-class classification problem and determined the performance of a linear classifier and a neural network classifier on test cases when trained with cases from only the target distribution, and when cases from a shifted, supplemental distribution are added to a limited number of cases from the target distribution. We show that adding data from a supplemental distribution for machine learning classifier training may improve the performance on the target test distribution. However, given the same number of training cases from a mixed distribution, the performance may not reach the performance of only training on data from the target distribution. In addition, the increase in performance will peak or plateau, depending on the shift in the distribution and the number of cases from the supplemental distribution.
机译:在这项研究中,我们表明,当通过从与目标人群的分布不同的分布中抽取样本来补充训练数据集时,应考虑原始和补充训练人群的分布差异,以最大程度地发挥目标人群中的分类器。取决于这些分布,与限制添加案例的数量相比,从补充分布中抽取大量案例可能会导致性能降低。当合成数据用于训练机器学习算法时,这与医学图像有关,这可能会导致训练集的混合分布。我们模拟了两类分类问题,并确定了线性分类器和神经网络分类器在测试用例上的性能,当仅使用目标分布中的案例进行训练时,以及将移位后的附加分布中的案例添加到有限数量的目标案例中时,就可以确定测试案例的线性分类器和神经网络分类器的性能。从目标分布的案例。我们表明,从用于机器学习分类器训练的补充分布中添加数据可以改善目标测试分布上的性能。但是,如果混合分布中的训练案例数相同,则性能可能无法达到仅对目标分布中的数据进行训练的性能。此外,性能的提高将达到高峰或平稳,这取决于分布的变化和补充分布的病例数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号