首页> 外文会议>International conference on computer and knowledge engineering >Identification of gene signatures for classifying of breast cancer subtypes using protein interaction database and support vector machines
【24h】

Identification of gene signatures for classifying of breast cancer subtypes using protein interaction database and support vector machines

机译:使用蛋白质交互数据库和支持向量机进行分类乳腺癌亚型的基因签名的鉴定

获取原文
获取外文期刊封面目录资料

摘要

Many studies have used the microarray gene expression data in order to classifying breast cancer subtypes. However, the classification accuracy was not acceptable in many cases even by applying the algorithms to only a single set of data. In this regard, using appropriate algorithm in every step of whole procedure, applying useful bioinformatics databases, considering the interaction among genes, and properly combining analytical steps are the main challenging problems. In this study a solution was proposed which followed a three step process. In the first step a filter feature selection method was used to produce a small set of informative genes. In the second step, the primary selected genes were mapped on the protein-protein interaction network to extend the gene set according to the linking among corresponding proteins. Thus, a portion of genes that was pruned in the first stage is added again to the primary set of selected genes. In the final stage, by using support vector machine-based recursive feature elimination (SVMRFE) method, the final set of informative genes was identified. After that, we compared our proposed algorithm with decision tree methods in the same datasets. The proposed procedure was evaluated on two publicly available DNA microarray dataset, including 456 samples on breast cancer. The proposed algorithm reached to 100% accuracy for predicting Luminal B by using the JMI method in the first step. In conclusion the proposed method showed an appealing improvement in classification accuracy for a multiclass prediction problem. We can predict subtypes with greater than 91.2% overall accuracy by proposed algorithm. However, the accuracy of prediction subtypes in tree decision method is 78.6%.
机译:许多研究使用了微阵列基因表达数据,以便对乳腺癌亚型进行分类。然而,即使通过将算法应用于单个数据,在许多情况下,许多情况下也不可接受分类精度。在这方面,在整个过程的每个步骤中使用适当的算法,考虑基因之间的相互作用,并适当地组合分析步骤是主要的具有挑战性问题。在这项研究中,提出了一种溶液,其遵循三步过程。在第一步中,使用过滤器特征选择方法来产生一小一小集的信息基因。在第二步中,将主要所选基因映射在蛋白质 - 蛋白质相互作用网络上,以根据相应蛋白质中的连接延伸基因组。因此,在第一阶段修剪的一部分基因再次加入到一组所选基因中。在最终阶段,通过使用支持向量机基机的递归特征消除(SVMRFE)方法,确定了最终的信息基因集。之后,我们将所提出的算法与在同一数据集中的决策树方法进行了比较。所提出的程序在两个公共可用的DNA微阵列数据集上进行评估,包括乳腺癌的456个样品。所提出的算法通过在第一步中使用JMI方法来预测Luminal B达到100%的精度。总之,所提出的方法显示了多种数据预测问题的分类准确性的吸引力。我们可以通过所提出的算法预测具有大于91.2%的总精度的亚型。然而,树决策方法中预测亚型的准确性为78.6%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号