首页> 外文会议>International conference on computer and knowledge engineering >Identification of gene signatures for classifying of breast cancer subtypes using protein interaction database and support vector machines
【24h】

Identification of gene signatures for classifying of breast cancer subtypes using protein interaction database and support vector machines

机译:使用蛋白质相互作用数据库和支持向量机识别用于区分乳腺癌亚型的基因特征

获取原文

摘要

Many studies have used the microarray gene expression data in order to classifying breast cancer subtypes. However, the classification accuracy was not acceptable in many cases even by applying the algorithms to only a single set of data. In this regard, using appropriate algorithm in every step of whole procedure, applying useful bioinformatics databases, considering the interaction among genes, and properly combining analytical steps are the main challenging problems. In this study a solution was proposed which followed a three step process. In the first step a filter feature selection method was used to produce a small set of informative genes. In the second step, the primary selected genes were mapped on the protein-protein interaction network to extend the gene set according to the linking among corresponding proteins. Thus, a portion of genes that was pruned in the first stage is added again to the primary set of selected genes. In the final stage, by using support vector machine-based recursive feature elimination (SVMRFE) method, the final set of informative genes was identified. After that, we compared our proposed algorithm with decision tree methods in the same datasets. The proposed procedure was evaluated on two publicly available DNA microarray dataset, including 456 samples on breast cancer. The proposed algorithm reached to 100% accuracy for predicting Luminal B by using the JMI method in the first step. In conclusion the proposed method showed an appealing improvement in classification accuracy for a multiclass prediction problem. We can predict subtypes with greater than 91.2% overall accuracy by proposed algorithm. However, the accuracy of prediction subtypes in tree decision method is 78.6%.
机译:许多研究已使用微阵列基因表达数据来对乳腺癌亚型进行分类。但是,在许多情况下,即使仅将算法应用于单个数据集,分类精度也是不可接受的。在这方面,在整个过程的每个步骤中使用适当的算法,应用有用的生物信息学数据库,考虑基因之间的相互作用以及适当地组合分析步骤是主要的难题。在这项研究中,提出了一种解决方案,该解决方案遵循三个步骤。第一步,使用过滤器特征选择方法来生成少量信息基因。在第二步中,将最初选择的基因定位在蛋白质-蛋白质相互作用网络上,以根据相应蛋白质之间的连接来扩展基因组。因此,将在第一阶段修剪的一部分基因再次添加到所选基因的主要集合中。在最后阶段,通过使用基于支持向量机的递归特征消除(SVMRFE)方法,确定了最终的信息基因集。之后,我们在相同的数据集中将我们提出的算法与决策树方法进行了比较。在两个可公开获得的DNA微阵列数据集上评估了拟议的程序,其中包括456个乳腺癌样本。第一步,通过使用JMI方法,该算法可达到100%的准确度来预测亮度B。总之,对于多类预测问题,所提出的方法在分类精度上显示出令人瞩目的改进。通过所提出的算法,我们可以预测总体精度高于91.2%的亚型。但是,树决策方法中预测子类型的准确性为78.6%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号