首页> 外文期刊>International Journal of Approximate Reasoning >Machine learning-based receiver operating characteristic (ROC) curves for crisp and fuzzy classification of DNA microarrays in cancer research
【24h】

Machine learning-based receiver operating characteristic (ROC) curves for crisp and fuzzy classification of DNA microarrays in cancer research

机译:基于机器学习的接收器操作特性(ROC)曲线用于癌症研究中DNA微阵列的清晰模糊分类

获取原文
获取原文并翻译 | 示例
       

摘要

Receiver operating characteristic (ROC) curves were generated to obtain classification area under the curve (AUC) as a function of feature standardization, fuzzification, and sample size from nine large sets of cancer-related DNA microarrays. Classifiers used included k-nearest neighbor (kNN), naieve Bayes classifier (NBC), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), learning vector quantization (LVQ1), logistic regression (LOG), polytomous logistic regression (PLOG), artificial neural networks (ANN), particle swarm optimization (PSO), constricted particle swarm optimization (CPSO), kernel regression (RBF), radial basis function networks (RBFN), gradient descent support vector machines (SVMGD), and least squares support vector machines (SVMLS). For each data set, AUC was determined for a number of combinations of sample size, total sum[-log(p)] of feature t-tests, with and without feature standardization and with (fuzzy) and without (crisp) fuzzification of features. Altogether, a total of 2,123,530 classification runs were made. At the greatest level of sample size, ANN resulted in a fitted AUC of 90%, while PSO resulted in the lowest fitted AUC of 72.1%. AUC values derived from 4NN were the most dependent on sample size, while PSO was the least. ANN depended the most on total statistical significance of features used based on sum[-log(p)], whereas PSO was the least dependent. Standardization of features increased AUC by 8.1% for PSO and -0.2% for QDA, while fuzzification increased AUC by 9.4% for PSO and reduced AUC by 3.8% for QDA. AUC determination in planned microarray experiments without standardization and fuzzification of features will benefit the most if CPSO is used for lower levels of feature significance (i.e., sum[- log([p] ~ 50) and ANN is used for greater levels of significance (i.e., sum[- log(p)] ~ 500). When only standardization of features is performed, studies are likely to benefit most by using CPSO for low levels of feature statistical significance and LVQ1 for greater levels of significance. Studies involving only fuzzification of features should employ LVQ1 because of the substantial gain in AUC observed and low expense of LVQ1. Lastly, PSO resulted in significantly greater levels of AUC (89.5% average) when feature standardization and fuzzification were performed. In consideration of the data sets used and factors influencing AUC which were investigated, if low-expense computation is desired then LVQ1 is recommended. However, if computational expense is of less concern, then PSO or CPSO is recommended.
机译:生成了接收器工作特征(ROC)曲线,以获取来自九套与癌症相关的DNA微阵列的特征标准化,模糊化和样本量函数的曲线下分类面积(AUC)。使用的分类器包括k最近邻(kNN),朴素贝叶斯分类器(NBC),线性判别分析(LDA),二次判别分析(QDA),学习向量量化(LVQ1),对数回归(LOG),多态对数回归(PLOG) ),人工神经网络(ANN),粒子群优化(PSO),压缩粒子群优化(CPSO),核回归(RBF),径向基函数网络(RBFN),梯度下降支持向量机(SVMGD)和最小二乘支持向量机(SVMLS)。对于每个数据集,确定样本数量,特征t检验的总和[-log(p)]的组合的AUC,有无特征标准化以及有(模糊)特征和没有(酥脆)特征模糊。总共进行了2,123,530次分类运行。在最大样本量水平下,ANN拟合的AUC为90%,而PSO拟合的AUC最低为72.1%。从4NN得出的AUC值对样本量的依赖性最大,而PSO最少。人工神经网络最依赖于基于sum [-log(p)]使用的特征的总统计显着性,而PSO依赖性最小。功能标准化使PSO的AUC增加了8.1%,QDA的-0.2%增加了,而模糊化使PSO的AUC增加了9.4%,而QDA的AUC减少了3.8%。如果将CPSO用于较低级别的特征显着性(即sum [-log([p]〜50)和将ANN用于较大显着性的级别,则在没有标准化和模糊化特征的计划的微阵列实验中,AUC的确定将受益最大”(例如,sum [-log(p)]〜500)。如果仅对特征进行标准化,则使用CPSO降低特征统计显着性水平而使用LVQ1升高显着性水平的研究可能会受益最大。由于观察到的AUC大量增加且LVQ1的费用较低,因此应使用LVQ1的特征;最后,在进行特征标准化和模糊化处理时,PSO导致显着提高了AUC的水平(平均89.5%)。研究了影响AUC的因素,如果需要低费用的计算,则建议使用LVQ1;但是,如果较少考虑计算费用,则PSO或CPSO为意味深长。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号