首页> 美国卫生研究院文献>Cancer Informatics >Assessing the Statistical Significance of the Achieved Classification Error of Classifiers Constructed using Serum Peptide Profiles and a Prescription for Random Sampling Repeated Studies for Massive High-Throughput Genomic and Proteomic Studies
【2h】

Assessing the Statistical Significance of the Achieved Classification Error of Classifiers Constructed using Serum Peptide Profiles and a Prescription for Random Sampling Repeated Studies for Massive High-Throughput Genomic and Proteomic Studies

机译:评估使用血清肽谱构建的分类器实现分类错误的统计意义以及用于大规模高通量基因组和蛋白质组学研究的随机抽样重复研究的处方

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Peptide profiles generated using SELDI/MALDI time of flight mass spectrometry provide a promising source of patient-specific information with high potential impact on the early detection and classification of cancer and other diseases. The new profiling technology comes, however, with numerous challenges and concerns. Particularly important are concerns of reproducibility of classification results and their significance. In this work we describe a computational validation framework, called PACE (Permutation-Achieved Classification Error), that lets us assess, for a given classification model, the significance of the Achieved Classification Error (ACE) on the profile data. The framework compares the performance statistic of the classifier on true data samples and checks if these are consistent with the behavior of the classifier on the same data with randomly reassigned class labels. A statistically significant ACE increases our belief that a discriminative signal was found in the data. The advantage of PACE analysis is that it can be easily combined with any classification model and is relatively easy to interpret. PACE analysis does not protect researchers against confounding in the experimental design, or other sources of systematic or random error. We use PACE analysis to assess significance of classification results we have achieved on a number of published data sets. The results show that many of these datasets indeed possess a signal that leads to a statistically significant ACE.
机译:使用SELDI / MALDI飞行时间质谱仪生成的肽谱提供了有前途的患者特异性信息来源,对癌症和其他疾病的早期检测和分类具有很大的潜在影响。但是,新的配置文件技术面临许多挑战和担忧。关于分类结果的可重复性及其重要性尤其重要。在这项工作中,我们描述了一个称为PACE(排列实现的分类误差)的计算验证框架,该框架可使我们针对给定的分类模型评估配置文件数据中实现的分类误差(ACE)的重要性。框架比较真实数据样本上分类器的性能统计信息,并检查这些统计信息是否与具有随机重新分配的类标签的相同数据上分类器的行为一致。具有统计意义的ACE增强了我们的信念,即在数据中发现了区分信号。 PACE分析的优点是它可以轻松地与任何分类模型组合,并且相对易于解释。 PACE分析无法防止研究人员混淆实验设计或其他系统误差或随机误差的来源。我们使用PACE分析来评估我们在许多已公开数据集上获得的分类结果的重要性。结果表明,这些数据集中的许多确实具有导致统计学上显着ACE的信号。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号