首页> 外文期刊>Indonesian Journal of Computing and Cybernetics Systems >Dataset Splitting Techniques Comparison For Face Classification on CCTV Images

Dataset Splitting Techniques Comparison For Face Classification on CCTV Images




The performance of classification models in machine learning algorithms is influenced by many factors, one of which is dataset splitting method. To avoid overfitting, it is important to apply a suitable dataset splitting strategy. This study presents comparison of four dataset splitting techniques, namely Random Sub-sampling Validation (RSV), k-Fold Cross Validation (k-FCV), Bootstrap Validation (BV) and Moralis Lima Martin Validation (MLMV). This comparison is done in face classification on CCTV images using Convolutional Neural Network (CNN) algorithm and Support Vector Machine (SVM) algorithm. This study is also applied in two image datasets. The results of the comparison are reviewed by using model accuracy in training set, validation set and test set, also bias and variance of the model. The experiment shows that k-FCV technique has more stable performance and provide high accuracy on training set as well as good generalizations on validation set and test set. Meanwhile, data splitting using MLMV technique has lower performance than the other three techniques since it yields lower accuracy. This technique also shows higher bias and variance values and it builds overfitting models, especially when it is applied on validation set.
机译:机器学习算法中分类模型的性能受许多因素的影响,其中一个是数据集分裂方法。为避免过度装备,应用合适的数据集分裂策略很重要。本研究介绍了四个数据集分裂技术的比较,即随机子采样验证(RSV),K倍交叉验证(K-FCV),引导验证(BV)和Moralis Lima Martin验证(MLMV)。使用卷积神经网络(CNN)算法(CNN)算法和支持向量机(SVM)算法在CCTV图像上对CCTV图像进行这种比较。该研究也应用于两个图像数据集。通过在训练集,验证集和测试集中使用模型精度来审查比较结果,也是模型的偏差和方差。实验表明,K-FCV技术的性能更具稳定性,在训练集以及验证集和测试集的良好概括方面提供了高精度。同时,使用MLMV技术的数据分离比其他三种技术较低,因为它产生较低的精度。该技术还显示出更高的偏差和方差值,并且它构建过度拟合模型,尤其是当它在验证集上应用时。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号