首页> 外文期刊>BMC Cancer >Robust assignment of cancer subtypes from expression data using a uni-variate gene expression average as classifier
【24h】

Robust assignment of cancer subtypes from expression data using a uni-variate gene expression average as classifier

机译:使用单变量基因表达平均值作为分类器,从表达数据中可靠地分配癌症亚型

获取原文
获取外文期刊封面目录资料

摘要

Background Genome wide gene expression data is a rich source for the identification of gene signatures suitable for clinical purposes and a number of statistical algorithms have been described for both identification and evaluation of such signatures. Some employed algorithms are fairly complex and hence sensitive to over-fitting whereas others are more simple and straight forward. Here we present a new type of simple algorithm based on ROC analysis and the use of metagenes that we believe will be a good complement to existing algorithms. Results The basis for the proposed approach is the use of metagenes, instead of collections of individual genes, and a feature selection using AUC values obtained by ROC analysis. Each gene in a data set is assigned an AUC value relative to the tumor class under investigation and the genes are ranked according to these values. Metagenes are then formed by calculating the mean expression level for an increasing number of ranked genes, and the metagene expression value that optimally discriminates tumor classes in the training set is used for classification of new samples. The performance of the metagene is then evaluated using LOOCV and balanced accuracies. Conclusions We show that the simple uni-variate gene expression average algorithm performs as well as several alternative algorithms such as discriminant analysis and the more complex approaches such as SVM and neural networks. The R package rocc is freely available at http://?cran.?r-project.?org/?web/?packages/?rocc/?index.?html .
机译:背景技术全基因组的基因表达数据是用于鉴定适合于临床目的的基因标记的丰富来源,并且已经描述了用于鉴定和评估这种标记的许多统计算法。一些采用的算法相当复杂,因此对过度拟合敏感,而另一些则更为简单直接。在这里,我们提出了一种基于ROC分析和使用元基因的新型简单算法,我们认为这将是对现有算法的良好补充。结果提出的方法的基础是使用元基因,而不是单个基因的集合,以及使用通过ROC分析获得的AUC值进行特征选择。相对于所研究的肿瘤类别,为数据集中的每个基因分配一个AUC值,并根据这些值对基因进行排名。然后,通过计算越来越多的排名基因的平均表达水平来形成元基因,并将在训练集中最佳地区分肿瘤类别的元基因表达值用于新样品的分类。然后,使用LOOCV和平衡精度来评估元基因的性能。结论我们证明,简单的单变量基因表达平均算法的性能与其他算法(例如判别分析)和更复杂的方法(例如SVM和神经网络)一样。 R包rocc可从http://?cran。?r-project。?org /?web /?packages /?rocc /?index。?html上免费获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号