首页> 外文期刊>BMC Bioinformatics >Improving accuracy for cancer classification with a new algorithm for genes selection
【24h】

Improving accuracy for cancer classification with a new algorithm for genes selection

机译:利用新的基因选择算法提高癌症分类的准确性

获取原文
           

摘要

Background Even though the classification of cancer tissue samples based on gene expression data has advanced considerably in recent years, it faces great challenges to improve accuracy. One of the challenges is to establish an effective method that can select a parsimonious set of relevant genes. So far, most methods for gene selection in literature focus on screening individual or pairs of genes without considering the possible interactions among genes. Here we introduce a new computational method named the Binary Matrix Shuffling Filter (BMSF). It not only overcomes the difficulty associated with the search schemes of traditional wrapper methods and overfitting problem in large dimensional search space but also takes potential gene interactions into account during gene selection. This method, coupled with Support Vector Machine (SVM) for implementation, often selects very small number of genes for easy model interpretability. Results We applied our method to 9 two-class gene expression datasets involving human cancers. During the gene selection process, the set of genes to be kept in the model was recursively refined and repeatedly updated according to the effect of a given gene on the contributions of other genes in reference to their usefulness in cancer classification. The small number of informative genes selected from each dataset leads to significantly improved leave-one-out (LOOCV) classification accuracy across all 9 datasets for multiple classifiers. Our method also exhibits broad generalization in the genes selected since multiple commonly used classifiers achieved either equivalent or much higher LOOCV accuracy than those reported in literature. Conclusions Evaluation of a gene’s contribution to binary cancer classification is better to be considered after adjusting for the joint effect of a large number of other genes. A computationally efficient search scheme was provided to perform effective search in the extensive feature space that includes possible interactions of many genes. Performance of the algorithm applied to 9 datasets suggests that it is possible to improve the accuracy of cancer classification by a big margin when joint effects of many genes are considered.
机译:背景技术尽管基于基因表达数据的癌组织样品的分类近年来已经取得了很大进步,但是在提高准确性方面仍面临巨大挑战。挑战之一是建立一种可以选择一组相关基因的有效方法。迄今为止,文献中大多数用于基因选择的方法都集中于筛选单个或成对的基因,而不考虑基因之间可能的相互作用。在这里,我们介绍了一种称为二进制矩阵改组滤波器(BMSF)的新计算方法。它不仅克服了与传统包装方法的搜索方案相关的困难,而且克服了大尺寸搜索空间中的过拟合问题,而且在基因选择过程中考虑了潜在的基因相互作用。此方法与支持向量机(SVM)一起实施时,通常会选择非常少量的基因,以便于模型解释。结果我们将我们的方法应用于涉及人类癌症的9个两类基因表达数据集。在基因选择过程中,根据给定基因对其他基因的贡献(参考它们在癌症分类中的作用)的影响,对要保留在模型中的基因集进行递归优化和重复更新。从每个数据集中选择的信息性基因数量少,可在多个分类器的所有9个数据集中显着提高留一法(LOOCV)分类准确性。我们的方法还显示出所选基因的广泛概括性,因为多个常用分类器实现的LOOCV精度与文献报道的等效或更高。结论在调整了许多其他基因的联合作用后,最好考虑评估基因对二元癌症分类的贡献。提供了一种计算有效的搜索方案,可以在包括许多基因可能相互作用的广泛特征空间中执行有效搜索。该算法应用于9个数据集的性能表明,当考虑到许多基因的联合效应时,可以大幅度提高癌症分类的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号