首页> 外文会议>International Conference on Artificial Intelligence in Medicine >Analysis of Viability of TCGA and GTEx Gene Expression for Gleason Grade Identification
【24h】

Analysis of Viability of TCGA and GTEx Gene Expression for Gleason Grade Identification

机译:TCGA和GTEX基因表达对GLEASE等级鉴定的活力分析

获取原文

摘要

Gleason grade is a critical indicator for determining patient treatment for prostate cancer. In this paper, we analyze the viability of RNA sequencing gene expression data for Gleason grade identification. We combine datasets from the TCGA (sampled from cancer patients) and GTEx (sampled from healthy patients) databases. Using mutual information techniques, we reduce the dimensionality from 19046 genes to only the 20 most predictive genes. Then, we apply an unsupervised approach to analyze the separability of the grades of cancer. We use the t-SNE algorithm to map features into two dimensions and apply a Gaussian Mixture Model (GMM) for clustering. The result shows a clear visual separability between cancer and healthy samples. However, the grades of cancer themselves are not visually separable. Also, we apply the Mann-Whitney U test to compare the statistical similarity of the different Gleason grades and find that most grades are similar to each other. We further apply a random forest model to estimate the Gleason grade. The results show that the model accurately predicts whether a sample comes from healthy or cancer tissue. However, the model is weak in classifying the Gleason grade. The best performing model has a weighted macro-averaged F1 score of 0.66, improving on a baseline score of 0.22 obtained by random guessing. Our results indicate that the difference in gene expression among Gleason grades is relatively small compared to the difference between healthy and cancer samples. Thus, gene expression alone cannot be used for Gleason grade identification.
机译:Gleason等级是用于确定前列腺癌的患者治疗的关键指标。在本文中,我们分析了RNA测序基因表达数据的可行性,用于GLEASE级鉴定。我们将数据集与TCGA(从癌症患者采样)和GTEX(从健康患者采样)数据库。使用相互信息技术,我们将19046个基因的维度降低到20个最预测的基因。然后,我们采用无监督的方法来分析癌症等级的可分离。我们使用T-SNE算法将特征映射成两种维度,并应用Gaussian混合模型(GMM)进行聚类。结果表明癌症和健康样品之间的明显可视性。然而,癌症等级本身并不可视分类。此外,我们应用Mann-Whitney U测试以比较不同的Gleason成绩的统计相似性,并发现大多数等级彼此相似。我们进一步应用随机森林模型来估计Gleason等级。结果表明,该模型准确地预测了样品是否来自健康或癌症组织。然而,该模型在分类Gleason等级中弱。最佳性能模型的加权宏观平均F1得分为0.66,从随机猜测获得的基线得分为0.22。我们的结果表明,与健康和癌症样品之间的差异相比,Gleason等级之间基因表达的差异相对较小。因此,单独的基因表达不能用于Gleason等级鉴定。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号