The Effect of Training and Testing Process on Machine Learning in Biomedical Datasets

Ucar Muhammed Kursad; Nour Majid; Sindi HatemPolat Kemal

摘要

Training and testing process for the classification of biomedical datasets in machine learning is very important. The researcher should choose carefully the methods that should be used at every step. However, there are very few studies on method choices. The studies in the literature are generally theoretical. Besides, there is no useful model for how to select samples in the training and testing process. Therefore, there is a need for resources in machine learning that discuss the training and testing process in detail and offer new recommendations. This article provides a detailed analysis of the training and testing process in machine learning. The article has the following sections. The third section describes how to prepare the datasets. Four balanced datasets were used for the application. The fourth section describes the rate and how to select samples at the training and testing stage. The fundamental sampling theorem is the subject of statistics. It shows how to select samples. In this article, it has been proposed to use sampling methods in machine learning training and testing process. The fourth section covers the theoretic expression of four different sampling theorems. Besides, the results section has the results of the performance of sampling theorems. The fifth section describes the methods by which training and pretest features can be selected. In the study, three different classifiers control the performance. The results section describes how the results should be analyzed. Additionally, this article proposes performance evaluation methods to evaluate its results. This article examines the effect of the training and testing process on performance in machine learning in detail and proposes the use of sampling theorems for the training and testing process. According to the results, datasets, feature selection algorithms, classifiers, training, and test ratio are the criteria that directly affect performance. However, the methods of selecting samples at the training and testing stages are vital for the system to work correctly. In order to design a stable system, it is recommended that samples should be selected with a stratified systematic sampling theorem.

机译：机器学习中生物医学数据集分类的训练和测试过程非常重要。研究人员应仔细选择每一步都应该使用的方法。然而，关于方法选择的研究很少。文献中的研究通常是理论性的。此外，在训练和测试过程中，对于如何选择样本，没有有用的模型。因此，需要机器学习方面的资源来详细讨论训练和测试过程并提供新的建议。本文详细分析了机器学习中的训练和测试过程。本文包含以下部分。第三部分介绍如何准备数据集。该应用程序使用了四个平衡数据集。第四部分介绍了速率以及如何在训练和测试阶段选择样本。基本抽样定理是统计学的主题。它显示了如何选择样本。在本文中，提出了在机器学习训练和测试过程中使用采样方法。第四部分介绍了四种不同抽样定理的理论表达式。此外，结果部分还有抽样定理的性能结果。第五部分介绍了选择训练和预测试特征的方法。在这项研究中，三个不同的分类器控制着性能。结果部分描述了应如何分析结果。此外，本文还提出了性能评估方法来评估其结果。本文详细研究了训练和测试过程对机器学习性能的影响，并提出了在训练和测试过程中使用抽样定理。根据结果，数据集、特征选择算法、分类器、训练和测试比率是直接影响性能的标准。然而，在训练和测试阶段选择样本的方法对于系统正常工作至关重要。为了设计一个稳定的系统，建议采用分层系统抽样定理选择样品。

The Effect of Training and Testing Process on Machine Learning in Biomedical Datasets

摘要

著录项

引文网络

相关主题

期刊订阅