首页> 美国卫生研究院文献>Molecular Cellular Proteomics : MCP >A Critical Assessment of Feature Selection Methods for Biomarker Discovery in Clinical Proteomics
【2h】

A Critical Assessment of Feature Selection Methods for Biomarker Discovery in Clinical Proteomics

机译:关键评估的蛋白质组学中的生物标志物发现的特征选择方法。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this paper, we compare the performance of six different feature selection methods for LC-MS-based proteomics and metabolomics biomarker discovery—t test, the Mann–Whitney–Wilcoxon test (mww test), nearest shrunken centroid (NSC), linear support vector machine–recursive features elimination (SVM-RFE), principal component discriminant analysis (PCDA), and partial least squares discriminant analysis (PLSDA)—using human urine and porcine cerebrospinal fluid samples that were spiked with a range of peptides at different concentration levels. The ideal feature selection method should select the complete list of discriminating features that are related to the spiked peptides without selecting unrelated features. Whereas many studies have to rely on classification error to judge the reliability of the selected biomarker candidates, we assessed the accuracy of selection directly from the list of spiked peptides. The feature selection methods were applied to data sets with different sample sizes and extents of sample class separation determined by the concentration level of spiked compounds. For each feature selection method and data set, the performance for selecting a set of features related to spiked compounds was assessed using the harmonic mean of the recall and the precision (f-score) and the geometric mean of the recall and the true negative rate (g-score). We conclude that the univariate t test and the mww test with multiple testing corrections are not applicable to data sets with small sample sizes (n = 6), but their performance improves markedly with increasing sample size up to a point (n > 12) at which they outperform the other methods. PCDA and PLSDA select small feature sets with high precision but miss many true positive features related to the spiked peptides. NSC strikes a reasonable compromise between recall and precision for all data sets independent of spiking level and number of samples. Linear SVM-RFE performs poorly for selecting features related to the spiked compounds, even though the classification error is relatively low.
机译:在本文中,我们比较了基于LC-MS的蛋白质组学和代谢组学生物标志物发现方法的6种不同特征选择方法的性能-t检验,Mann-Whitney-Wilcoxon检验(mww检验),最近的收缩质心(NSC),线性支持向量机递归特征消除(SVM-RFE),主成分判别分析(PCDA)和偏最小二乘判别分析(PLSDA)-使用人尿和猪脑脊液样品,其中掺入了一系列不同浓度水平的肽。理想的特征选择方法应选择与加标肽相关的区分特征的完整列表,而不选择不相关的特征。尽管许多研究不得不依靠分类错误来判断所选生物标志物候选物的可靠性,但我们直接从加标肽列表中评估了选择的准确性。将特征选择方法应用于具有不同样品大小和样品类别分离程度的数据集,这些样品类别的分离程度取决于加标化合物的浓度水平。对于每种特征选择方法和数据集,使用召回率的谐波平均值,召回率的精度(f分数),几何平均值和召回率和真实负率评估选择与加标化合物相关的一组特征的性能。 (g得分)。我们得出的结论是,单变量t检验和具有多重检验校正的mww检验不适用于样本量较小(n = 6)的数据集,但随着样本量增加到一个点(n> 12),其性能会显着提高。他们胜过其他方法。 PCDA和PLSDA选择高精度的小特征集,但错过了许多与加标肽有关的真实阳性特征。 NSC在所有数据集的召回率和精度之间达成了合理的折衷,而与峰值水平和样本数量无关。即使分类误差相对较低,线性SVM-RFE在选择与加标化合物相关的特征方面也表现不佳。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号