Supplementing training with data from a shifted distribution for machine learning classifiers: Adding more cases may not always help

机译：用来自机器学习分类器的变化分布的数据补充训练：添加更多案例可能并不总是有帮助

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this study, we show that when a training data set is supplemented by drawing samples from a distribution that is different from that of the target population, the differences in the distributions of the original and supplemental training populations should be considered to maximize the performance of the classifier in the target population. Depending on these distributions, drawing a large number of cases from the supplemental distribution may result in lower performance compared to limiting the number of added cases. This is relevant for medical images when synthetic data is used for training a machine learning algorithm, which may result in a mixed distribution for the training set. We simulated a two-class classification problem and determined the performance of a linear classifier and a neural network classifier on test cases when trained with cases from only the target distribution, and when cases from a shifted, supplemental distribution are added to a limited number of cases from the target distribution. We show that adding data from a supplemental distribution for machine learning classifier training may improve the performance on the target test distribution. However, given the same number of training cases from a mixed distribution, the performance may not reach the performance of only training on data from the target distribution. In addition, the increase in performance will peak or plateau, depending on the shift in the distribution and the number of cases from the supplemental distribution.

机译：在这项研究中，我们表明，当通过从与目标人群的分布不同的分布中抽取样本来补充训练数据集时，应考虑原始和补充训练人群的分布差异，以最大程度地发挥目标人群中的分类器。取决于这些分布，与限制添加案例的数量相比，从补充分布中抽取大量案例可能会导致性能降低。当合成数据用于训练机器学习算法时，这与医学图像有关，这可能会导致训练集的混合分布。我们模拟了两类分类问题，并确定了线性分类器和神经网络分类器在测试用例上的性能，当仅使用目标分布中的案例进行训练时，以及将移位后的附加分布中的案例添加到有限数量的目标案例中时，就可以确定测试案例的线性分类器和神经网络分类器的性能。从目标分布的案例。我们表明，从用于机器学习分类器训练的补充分布中添加数据可以改善目标测试分布上的性能。但是，如果混合分布中的训练案例数相同，则性能可能无法达到仅对目标分布中的数据进行训练的性能。此外，性能的提高将达到高峰或平稳，这取决于分布的变化和补充分布的病例数。

著录项

来源
《Conference on Medical Imaging: Image Perception, Observer Performance, and Technology Assessment》|2020年|113160S.1-113160S.6|共6页
会议地点
作者
Kenny H. Cha; Alexej Gossmann; Nicholas Petrick; Berkman Sahiner;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Distribution shift; simulation study; classifier; neural network; machine learning;

机译：分配转移;模拟研究;分类器神经网络;机器学习;

相似文献

外文文献
中文文献
专利

1. Extending the Classifier Algorithms in Machine Learning to Improve the Performance in Spoken Language Understanding Systems Under Deficient Training Data [J] . Sheetal Jagdale, Milind Shah Advances in Science, Technology and Engineering Systems . 2020,第6期

机译：在机器学习中扩展分类器算法，以提高在缺乏训练数据下的口语理解系统中的性能
2. Automated Dataset Generation for Training Peer-to-Peer Machine Learning Classifiers [J] . Roozbeh Zarei, Alireza Monemi, Muhammad Nadzir Marsono Journal of network and systems management . 2015,第1期

机译：自动数据集生成，用于训练对等机器学习分类器
3. Predicted Shifts in Small Mammal Distributions and Biodiversity in the Altered Future Environment of Alaska: An Open Access Data and Machine Learning Perspective (vol 10, e0132054, 2015) [J] . Baltensperger A. P., Huettmann F. Journal of land use science . 2018,第1a3期

机译：在阿拉斯加未来改变环境中的小型哺乳动物分布和生物多样性的预测变化：开放式访问数据和机器学习透视（第10卷，E0132054,2015）
4. Generating genetic engineering linked indicator datasets for machine learning classifier training in biosecurity [C] . Christopher Painter, Nathaniel D. Bastian Conference on Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications . 2021

机译：生成基因工程连接指示器数据集，用于生物安全的机器学习分类器培训
5. Data Reduction for Speeding up the Training of Machine Learning Classifiers [D] . Alamro, Reham Mohammed A. 2020

机译：用于加速机器学习分类器的培训的数据减少
6. Predicted Shifts in Small Mammal Distributions and Biodiversity in the Altered Future Environment of Alaska: An Open Access Data and Machine Learning Perspective [O] . A. P. Baltensperger, F. Huettmann -1

机译：小动物分布和生物多样性在阿拉斯加改变的未来环境中的预测变化：开放获取数据和机器学习的视角
7. Correction: Predicted Shifts in Small Mammal Distributions and Biodiversity in the Altered Future Environment of Alaska: An Open Access Data and Machine Learning Perspective [O] . A. P. Baltensperger, F. Huettmann 2018

机译：校正：在阿拉斯加未来改变的未来环境中的小型哺乳动物分布和生物多样性的预测变化：开放式访问数据和机器学习视角

Supplementing training with data from a shifted distribution for machine learning classifiers: Adding more cases may not always help

摘要

著录项

相似文献

相关主题

期刊订阅