首页> 外文学位 >Statistical enhancement of support vector machines.
【24h】

Statistical enhancement of support vector machines.

机译:支持向量机的统计增强。

获取原文
获取原文并翻译 | 示例

摘要

Support Vector Machines (SVM) and Random Forests (RF) have consistently outperformed other machine learning algorithms on a variety of problems. SVM can be used for classification and regression on many types of data (e.g. nonlinear, high dimensional), but cannot handle missing or mixed data. This research implements a permutation-based variable importance measure and missing value imputation method for SVM founded on similar techniques developed for RF.;The results of the SVMvariable importance measure are compared to RF results on simulated data sets with known variable importance. The variability of the importance outcomes are examined when different tuning parameter values (for SVM and RF) and kernels (for SVM) are used on benchmark data sets. Two of the benchmark data sets are also used to evaluate the missing value imputation method.;The variable importance measure developed in this study has comparable results to RF on the simulated data sets. However, the results have greater variability and less consistency than RF on the same benchmark data sets for the tuning parameter values investigated. SVM often had a smaller test error than RF, indicating that SVM was able to better fit the benchmark data. Unlike the RF results, the SVM variable importance results can be highly sensitive to the choice of tuning parameters. Successive grid searches are needed to tune these parameters and achieve more consistent SVM variable importance results.;This research compares a median-based missing value imputation method to a mean-based approach. The quality of the methods was evaluated by comparing the test set error (or test mean-square error) achieved after application to two benchmark data sets. There is improvement on a regression data set, but no significant difference in results for a classification example. Further investigation is needed to evaluate this imputation technique.;A variable importance measure for SVM provides insight into which explanatory variables are important in determining the response. SVM has been known to perform better than other machine learning algorithms on some data sets. By developing such a measure, this research has furthered the capabilities of an important algorithm used for data mining.
机译:支持向量机(SVM)和随机森林(RF)在各种问题上一直优于其他机器学习算法。 SVM可用于对多种类型的数据(例如非线性,高维)进行分类和回归,但无法处理丢失或混合的数据。本研究基于为RF开发的类似技术,为SVM实现了基于置换的变量重要性度量和缺失值插补方法。;将SVM变量重要性度量的结果与已知变量重要性的模拟数据集的RF结果进行了比较。当在基准数据集上使用不同的调整参数值(对于SVM和RF)和内核(对于SVM)时,将检查重要性结果的可变性。还使用了两个基准数据集来评估缺失值的估算方法。本研究中开发的可变重要性度量与模拟数据集的RF具有可比的结果。但是,对于所研究的调整参数值,在相同基准数据集上的结果比RF具有更大的可变性和更少的一致性。 SVM的测试误差通常比RF小,这表明SVM能够更好地适应基准数据。与RF结果不同,SVM变量重要性结果对调整参数的选择非常敏感。需要连续的网格搜索来调整这些参数并获得更一致的SVM变量重要性结果。该研究将基于中位数的缺失值插补方法与基于均值的方法进行了比较。通过将应用后的测试集误差(或测试均方误差)与两个基准数据集进行比较,来评估方法的质量。回归数据集有所改进,但分类示例的结果没有显着差异。需要进一步研究以评估这种插补技术。SVM的变量重要性度量可洞察哪些解释变量对确定响应很重要。众所周知,在某些数据集上,SVM的性能要优于其他机器学习算法。通过制定这样的措施,本研究进一步提高了用于数据挖掘的重要算法的功能。

著录项

  • 作者

    Taylor, Aimee E.;

  • 作者单位

    Oregon State University.;

  • 授予单位 Oregon State University.;
  • 学科 Statistics.;Computer Science.;Operations Research.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 156 p.
  • 总页数 156
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 统计学;运筹学;自动化技术、计算机技术;
  • 关键词

  • 入库时间 2022-08-17 11:37:38

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号