首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >Deep Learning-based Identification of Cancer or Normal Tissue using Gene Expression Data
【24h】

Deep Learning-based Identification of Cancer or Normal Tissue using Gene Expression Data

机译:使用基因表达数据基于深度学习的癌症或正常组织识别

获取原文

摘要

Background: Deep learning has proven to show outstanding performance in resolving recognition and classification problems. As increasing amounts of cancer and normal gene expression data become publicly available, deep learning may become an integral component of efficiently finding specific patterns within massive datasets. Thus, we aim to address the extent to which the machine can learn to recognize cancer. We integrated cancer and normal tissue data from the Gene Expression Omnibus (GEO), The Cancer Gene Atlas (TCGA), Therapeutically Applicable Research To Generate Effective Treatments (TARGET), and Genotype-Tissue Expression (GTEx) databases, including 13,406 cancer and 12,842 normal gene expression data from 24 different tissues. We first trained the deep neural network (DNN) to discriminate between cancer and normal samples using various gene selection strategies and therapeutic target genes from commercial cancer panels and genes in NCI-curated cancer pathways. We also suggest systemic analyzation method to interpret trained deep neural network. We applied the method to find genes mostly contribute to classify cancer in an individual sample. Result: The best trained DNN could classify cancer and normal data with accuracy of 0.997 in the training data set of 13,123 (cancer: 6,703, normal: 6,402) samples. In the independent test set comprising 13,125 (cancer: 6,703, normal: 6,422) samples, the DNN model achieved 0.979 accuracy. Using the same training and test data, our DNN showed better performance than other conventional prediction methods, followed by the support vector machine approach. For interpretation, we propose a method that can extract a gene's contribution to an individual sample's cancer probability from the trained DNN. This method distinguished samples dependent on one or a few genes suggesting these samples are possibly “oncogene addicted”. Conclusion: A deep learning approach in conjunction with our interpretation method is not only a useful tool to identify cancer from gene expression data but can also contribute toward understanding the complex nature of cancer based on large public data.
机译:背景:事实证明,深度学习在解决识别和分类问题方面显示出卓越的性能。随着越来越多的癌症和正常基因表达数据可公开获得,深度学习可能成为有效地在海量数据集中查找特定模式的不可或缺的组成部分。因此,我们旨在解决机器可以学习识别癌症的程度。我们整合了来自基因表达综合(GEO),癌症基因图谱(TCGA),产生有效治疗的治疗性应用研究(TARGET)和基因型组织表达(GTEx)数据库的癌症和正常组织数据,包括13,406例癌症和12,842例来自24个不同组织的正常基因表达数据。我们首先训练了深度神经网络(DNN),以使用各种基因选择策略以及来自商业癌症专家组的治疗目标基因和NCI治愈的癌症途径中的基因来区分癌症和正常样品。我们还建议使用系统分析方法来解释训练有素的深度神经网络。我们应用了该方法来发现在单个样本中最有助于癌症分类的基因。结果:受过最佳训练的DNN可以在13123个样本(癌症:6703个,正常:6402个)样本的训练数据集中对癌症和正常数据进行分类,准确度为0.997。在包含13,125个样本(癌症:6703个,正常:6422个)的独立测试集中,DNN模型达到了0.979的准确性。使用相同的训练和测试数据,我们的DNN表现出比其他常规预测方法更好的性能,其次是支持向量机方法。为了进行解释,我们提出了一种方法,该方法可以从受过训练的DNN中提取基因对单个样本的癌症可能性的贡献。该方法区分了依赖于一个或几个基因的样本,表明这些样本可能是“致癌基因”。结论:结合我们的解释方法的深度学习方法不仅是从基因表达数据中识别癌症的有用工具,而且还有助于基于大量公共数据理解癌症的复杂性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号