首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >Predictive Analytics on Genomic Data with High-Performance Computing
【24h】

Predictive Analytics on Genomic Data with High-Performance Computing

机译:高性能计算的基因组数据预测分析

获取原文

摘要

Recent technological advancements and scientific discoveries have revolutionized the current era of genomics. Next-generation sequencing (NGS) technologies have led to tremendous reduction in the sequencing time and given rise to the production and collection of high volumes of genomic datasets. Predicting protein-coding genes from these copious genomic datasets is significant for the synthesis of protein and the understating of the regulatory function of the non-coding region. Methods have been developed to find protein-coding genes from the genome of organisms. Notwithstanding, the recent data explosion in genomics accentuates the need for more efficient algorithms for gene prediction. In this paper, we explore predictive analytics on genomic data. In particular, we present a scalable naïve Bayes-based algorithm that is deployed over a cluster of Apache Spark framework for efficient prediction of genes in the genome of eukaryotic organisms. Evaluation results on the human genome chromosome GRCh37 and GRCh38 show that effectiveness of our algorithm for predictive analytics on genomic data with high-performance computing. high sensitivity, specificity and accuracy.
机译:最近的技术进步和科学发现彻底改变了基因组学的当前时代。下一代测序(NGS)技术导致测序时间的巨大减少,并升高了高卷基因组数据集的生产和收集。预测来自这些大量基因组数据集的蛋白质编码基因对于合成蛋白质和低估非编码区的调节功能而言是显着的。已经开发了方法以从生物体基因组中寻找蛋白质编码基因。尽管如此,基因组学中最近的数据爆炸强调了对基因预测的更有效的算法需要更有效的算法。在本文中,我们探讨了基因组数据的预测分析。特别是,我们提出了一种可缩放的Naïve贝叶斯的算法,该算法部署在Apache Spark框架上,以便在真核生物基因组中有效预测基因。人类基因组染色体GRCH37和GRCH38对评价结果表明,我们对具有高性能计算的基因组数据的预测分析算法的有效性。高灵敏度,特异性和准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号