首页> 外文期刊>Systems biomedicine. >Relapsing-remitting multiple sclerosis classification using elastic net logistic regression on gene expression data
【24h】

Relapsing-remitting multiple sclerosis classification using elastic net logistic regression on gene expression data

机译:使用弹性网逻辑回归分析对基因表达数据进行复发-缓解型多发性硬化

获取原文
       

摘要

As part of the first Industrial Methodology for Process Verification in Research Challenge, the aim of the MS Diagnostic sub-challenge was to identify a robust diagnostic signature for relapsing-remitting multiple sclerosis from gene expression data. In this regard, we built a classifier that discriminates samples into two phenotype groups, either RRMS or controls, using the transcriptome of peripheral blood mononuclear cells. For our classifier, we used logistic regression with elastic net regression as implemented in the glmnet package in R. We selected the values of the regularization hyper-parameters using cross-validation performance on the provided training data, number of non-zero parameters in our model, and based on the distribution of output values when the input vector for the test data were used with our classifier. We analyzed our classifier performance with two different strategies for feature extraction, using either only genes or including additional constructed features from gene pathways data. The two different strategies produced little differences in performance when comparing the 10-fold cross-validation of the training data and prediction on the test data. Our final submission for the sub-challenge used only genes as features, and identified a diagnostic signature consisting of 58 genes, that was ranked second out of a total of 39 submissions.
机译:作为“研究挑战”中第一个过程验证的工业方法学的一部分,MS Diagnostic子挑战的目的是从基因表达数据中识别出复发-缓解型多发性硬化症的可靠诊断特征。在这方面,我们建立了一个分类器,使用外周血单核细胞的转录组将样品分为两个表型组,RRMS或对照。对于我们的分类器,我们使用R中glmnet软件包中实现的逻辑回归和弹性净回归。我们在提供的训练数据上使用交叉验证性能选择正则化超参数的值,我们的非零参数数量模型,并基于我们的分类器使用测试数据的输入向量时的输出值分布。我们使用两种不同的特征提取策略分析了分类器的性能,这些策略仅使用基因,或者包括来自基因途径数据的其他构建特征。当比较训练数据的10倍交叉验证和对测试数据的预测时,两种不同的策略在性能上几乎没有差异。我们对子挑战的最终提交仅使用基因作为特征,并确定了由58个基因组成的诊断特征,在39个提交的论文中排名第二。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号