首页> 外文期刊>BMC Bioinformatics >Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems
【24h】

Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems

机译:使用SNR的新推广来发现表达数据的主导和休眠基因进行多级问题

获取原文
获取外文期刊封面目录资料

摘要

Background The Signal-to-Noise-Ratio (SNR) is often used for identification of biomarkers for two-class problems and no formal and useful generalization of SNR is available for multiclass problems. We propose innovative generalizations of SNR for multiclass cancer discrimination through introduction of two indices, Gene Dominant Index and Gene Dormant Index (GDIs). These two indices lead to the concepts of dominant and dormant genes with biological significance. We use these indices to develop methodologies for discovery of dominant and dormant biomarkers with interesting biological significance. The dominancy and dormancy of the identified biomarkers and their excellent discriminating power are also demonstrated pictorially using the scatterplot of individual gene and 2-D Sammon's projection of the selected set of genes. Using information from the literature we have shown that the GDI based method can identify dominant and dormant genes that play significant roles in cancer biology. These biomarkers are also used to design diagnostic prediction systems. Results and discussion To evaluate the effectiveness of the GDIs, we have used four multiclass cancer data sets (Small Round Blue Cell Tumors, Leukemia, Central Nervous System Tumors, and Lung Cancer). For each data set we demonstrate that the new indices can find biologically meaningful genes that can act as biomarkers. We then use six machine learning tools, Nearest Neighbor Classifier (NNC), Nearest Mean Classifier (NMC), Support Vector Machine (SVM) classifier with linear kernel, and SVM classifier with Gaussian kernel, where both SVMs are used in conjunction with one-vs-all (OVA) and one-vs-one (OVO) strategies. We found GDIs to be very effective in identifying biomarkers with strong class specific signatures. With all six tools and for all data sets we could achieve better or comparable prediction accuracies usually with fewer marker genes than results reported in the literature using the same computational protocols. The dominant genes are usually easy to find while good dormant genes may not always be available as dormant genes require stronger constraints to be satisfied; but when they are available, they can be used for authentication of diagnosis. Conclusion Since GDI based schemes can find a small set of dominant/dormant biomarkers that is adequate to design diagnostic prediction systems, it opens up the possibility of using real-time qPCR assays or antibody based methods such as ELISA for an easy and low cost diagnosis of diseases. The dominant and dormant genes found by GDIs can be used in different ways to design more reliable diagnostic prediction systems.
机译:背景技术信噪比(SNR)通常用于识别两类问题的生物标志物,并且没有SNR的正式和有用的概括可用于多标菌问题。通过引入两个指数,基因显性指数和基因休眠指数(GDIS)提出了对多标准癌症歧视的创新概括。这两种指数导致具有生物学意义的显性和休眠基因的概念。我们使用这些指数来制定具有有趣的生物学意义的主导和休眠生物标志物的方法。使用单个基因的散点图和2-D三文样对所选基因的投影的散点图,还证明了所识别的生物标志物的主导和休眠和它们出色的辨别力。使用文献中的信息我们已经表明,基于GDI的方法可以识别在癌症生物学中起显着作用的主要和休眠基因。这些生物标志物还用于设计诊断预测系统。结果与讨论评估GDI的有效性,我们使用了四种多种多联癌数据集(小圆形蓝细胞肿瘤,白血病,中枢神经系统肿瘤和肺癌)。对于每个数据集,我们证明新索引可以找到可以充当生物标志物的生物学上有意义的基因。然后,我们使用六种机器学习工具,最近的邻接分类器(NNC),最接近的平均分类器(NMC),带有线性内核的传送器(SVM)分类器,以及带高斯内核的SVM分类器,其中两个SVM都与一个 - vs-all(ova)和一对一(ovo)策略。我们发现GDIS在识别具有强大类特定签名的生物标志物方面非常有效。通过所有六种工具和所有数据集,我们可以通过使用相同的计算协议在文献中报告的结果达到更好或类似的预测准确性,而不是较少的标记基因。主要的基因通常很容易找到,而良好的休眠基因可能并不总是可用,因为休眠基因需要更强的限制来满足;但是,当它们可用时,它们可用于诊断的身份验证。结论由于基于GDI的方案,可以找到一小一组占主导地/休眠生物标志物,适用于设计诊断预测系统,它开辟了使用实时QPCR测定或基于抗体的方法,例如ELISA的可能性,以实现简单且低成本的诊断疾病。 GDIS发现的主要和休眠基因可以以不同的方式使用来设计更可靠的诊断预测系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号