首页> 外文期刊>Frontiers in Genetics >Identification of Triple-Negative Breast Cancer Genes and a Novel High-Risk Breast Cancer Prediction Model Development Based on PPI Data and Support Vector Machines
【24h】

Identification of Triple-Negative Breast Cancer Genes and a Novel High-Risk Breast Cancer Prediction Model Development Based on PPI Data and Support Vector Machines

机译:基于PPI数据和支持向量机的三阴性乳腺癌基因鉴定和新型高危乳腺癌预测模型开发

获取原文
           

摘要

Triple-negative breast cancer (TNBC) is a special subtype of breast cancer that is difficult to treat. It is crucial to identify breast cancer-related genes that could provide new biomarkers for breast cancer diagnosis and potential treatment goals. In the development of our new high-risk breast cancer prediction model, seven raw gene expression datasets from the NCBI gene expression omnibus (GEO) database (GSE31519, GSE9574, GSE20194, GSE20271, GSE32646, GSE45255, and GSE15852) were used. Using the maximum relevance minimum redundancy (mRMR) method, we selected significant genes. Then, we mapped transcripts of the genes on the protein-protein interaction (PPI) network from the Search Tool for the Retrieval of Interacting Genes (STRING) database, as well as traced the shortest path between each pair of proteins. Genes with higher betweenness values were selected from the shortest path proteins. In order to ensure validity and precision, a permutation test was performed. We randomly selected 248 proteins from the PPI network for shortest path tracing and repeated the procedure 100 times. We also removed genes that appeared more frequently in randomized results. As a result, 54 genes were selected as potential TNBC-related genes. Using 14 out the 54 genes, which are potential TNBC associated genes, as input features into a support vector machine (SVM), a novel model was trained to predict high-risk breast cancer. The prediction accuracy of normal tissues and TNBC tissues reached 95.394%, and the predictions of Stage II and Stage III TNBC reached 86.598%, indicating that such genes play important roles in distinguishing breast cancers, and that the method could be promising in practical use. According to reports, some of the 54 genes we identified from the PPI network are associated with breast cancer in the literature. Several other genes have not yet been reported but have functional resemblance with known cancer genes. These may be novel breast cancer-related genes and need further experimental validation. Gene ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed to appraise the 54 genes. It was indicated that cellular response to organic cyclic compounds has an influence in breast cancer, and most genes may be related with viral carcinogenesis.
机译:三阴性乳腺癌(TNBC)是一种特殊的乳腺癌亚型,难以治疗。鉴定与乳腺癌相关的基因至关重要,可以为乳腺癌的诊断和潜在的治疗目标提供新的生物标记。在我们新的高危乳腺癌预测模型的开发中,使用了来自NCBI基因表达综合(GEO)数据库的七个原始基因表达数据集(GSE31519,GSE9574,GSE20194,GSE20271,GSE32646,GSE45255和GSE15852)。使用最大相关最小冗余(mRMR)方法,我们选择了重要基因。然后,我们通过检索相互作用基因(STRING)数据库的搜索工具在蛋白质-蛋白质相互作用(PPI)网络上绘制了基因的转录本,并追踪了每对蛋白质之间的最短路径。从最短路径蛋白中选择具有较高中间值的基因。为了确保有效性和准确性,进行了置换测试。我们从PPI网络中随机选择了248种蛋白质以进行最短路径跟踪,并重复了100次该过程。我们还删除了随机结果中出现频率更高的基因。结果,选择了54个基因作为潜在的TNBC相关基因。使用54种潜在的TNBC相关基因中的14种作为支持向量机(SVM)的输入特征,训练了一种新模型来预测高危乳腺癌。正常组织和TNBC组织的预测准确性达到95.394%,II期和III期TNBC的预测达到86.598%,表明这些基因在区分乳腺癌中起重要作用,该方法在实际应用中可能很有希望。据报道,在文献中,我们从PPI网络中鉴定出的54个基因中有一些与乳腺癌有关。尚无其他几种基因的报道,但其功能与已知的癌症基因相似。这些可能是与乳腺癌有关的新基因,需要进一步的实验验证。进行了基因本体论(GO)富集和京都基因与基因组百科全书(KEGG)富集分析,以评估54个基因。研究表明,细胞对有机环状化合物的反应对乳腺癌有影响,并且大多数基因可能与病毒致癌作用有关。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号