首页> 外文会议>IEEE Congress on Evolutionary Computation >Predicting recurrence in clear cell Renal Cell Carcinoma: Analysis of TCGA data using outlier analysis and generalized matrix LVQ
【24h】

Predicting recurrence in clear cell Renal Cell Carcinoma: Analysis of TCGA data using outlier analysis and generalized matrix LVQ

机译:预测透明细胞肾细胞癌的复发:使用异常值分析和广义矩阵LVQ对TCGA数据进行分析

获取原文

摘要

Using mRNA-Seq and clinical data for 469 clear cell Renal Cell Carcinoma (ccRCC) samples from The Cancer Genome Atlas (TCGA), we develop a protocol to identify patients likely to have early recurrence of their disease. We first split the data into two sets, with 380 samples in the training set and 89 samples in the test set. Using the training set, we identify genes whose outlier status (high or low mRNA expression) is predictive of recurrence, based on Kaplan-Meier recurrence free survival log-rank p-value. We find a significant overlap among genes identified as predictive biomarkers in Reads per Kilobase Million (RPKM) normalized data and Raw Reads mRNA-Seq data. Using 80 consensus genes predictive in both RPKM and Raw Reads data, we define an outlier-based risk score R to stratify patients into two groups, a high-risk (early recurrence) group (R 2). The KM recurrence curve using this stratification shows excellent separation in training and test sets. Restricting the analysis to patients who had recurrence within two years (109 cases) and those who had no recurrence in five years (107 cases) we find that the risk predictor achieves ca. 80 percent sensitivity and specificity. The 80 genes identified by the outlier analysis were used to develop a more intuitive classifier based on Generalized Matrix Learning Vector Quantization (GMLVQ). This method stratifies samples into risk classes based on defining prototypes in feature space and an appropriate distance metric. GMLVQ identified a subset of 12 genes that have high accuracy in predicting recurrence, which suggests that an assay with a small number of genes might be able to predict recurrence in ccRCC.
机译:利用来自癌症基因组图谱(TCGA)的469个透明细胞肾细胞癌(ccRCC)标本的mRNA-Seq和临床数据,我们开发了一种协议,以鉴定可能早日复发其疾病的患者。我们首先将数据分为两组,训练集中有380个样本,测试集中有89个样本。使用训练集,我们根据Kaplan-Meier无复发生存对数秩p值,确定异常状态(mRNA表达高或低)可预测复发的基因。我们发现在每千碱基读数(RPKM)归一化数据和Raw Reads mRNA-Seq数据中被识别为预测性生物标志物的基因之间存在重大重叠。使用RPKM和Raw Reads数据中预测的80个共有基因,我们定义了基于异常值的风险评分R,将患者分为两组,即高风险(早期复发)组(R 2)。使用这种分层的KM复发曲线显示出训练和测试集之间的出色分离。将分析限制在两年内复发的患者(109例)和五年内未复发的患者(107例)中,我们发现风险预测因子达到了约。 80%的敏感性和特异性。通过离群分析确定的80个基因被用于开发基于广义矩阵学习矢量量化(GMLVQ)的更直观的分类器。该方法基于在特征空间中定义原型和适当的距离度量,将样本分为风险类别。 GMLVQ鉴定了12个基因的子集,这些子集在预测复发方面具有很高的准确性,这表明使用少量基因的测定可能能够预测ccRCC的复发。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号