首页> 美国卫生研究院文献>Genes >A Fast and Scalable Workflow for SNPs Detection in Genome Sequences Using Hadoop Map-Reduce

【2h】

A Fast and Scalable Workflow for SNPs Detection in Genome Sequences Using Hadoop Map-Reduce

机译：使用Hadoop Map-Reduce的基因组序列中SNP检测的快速可扩展工作流

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Next generation sequencing (NGS) technologies produce a huge amount of biological data, which poses various issues such as requirements of high processing time and large memory. This research focuses on the detection of single nucleotide polymorphism (SNP) in genome sequences. Currently, SNPs detection algorithms face several issues, e.g., computational overhead cost, accuracy, and memory requirements. In this research, we propose a fast and scalable workflow that integrates Bowtie aligner with Hadoop based Heap SNP caller to improve the SNPs detection in genome sequences. The proposed workflow is validated through benchmark datasets obtained from publicly available web-portals, e.g., NCBI and DDBJ DRA. Extensive experiments have been performed and the results obtained are compared with Bowtie and BWA aligner in the alignment phase, while compared with GATK, FaSD, SparkGA, Halvade, and Heap in SNP calling phase. Experimental results analysis shows that the proposed workflow outperforms existing frameworks e.g., GATK, FaSD, Heap integrated with BWA and Bowtie aligners, SparkGA, and Halvade. The proposed framework achieved 22.46% more efficient F-score and 99.80% consistent accuracy on average. More, comparatively 0.21% mean higher accuracy is achieved. Moreover, SNP mining has also been performed to identify specific regions in genome sequences. All the frameworks are implemented with the default configuration of memory management. The observations show that all workflows have approximately same memory requirement. In the future, it is intended to graphically show the mined SNPs for user-friendly interaction, analyze and optimize the memory requirements as well.

机译：下一代测序（NGS）技术产生大量的生物学数据，这带来了诸如高处理时间和大内存需求等各种问题。这项研究的重点是检测基因组序列中的单核苷酸多态性（SNP）。当前，SNP检测算法面临若干问题，例如，计算开销成本，准确性和存储器要求。在这项研究中，我们提出了一种快速且可扩展的工作流，该工作流将Bowtie aligner与基于Hadoop的Heap SNP调用程序集成在一起，以改善基因组序列中SNP的检测。通过从公开的网络门户（例如NCBI和DDBJ DRA）获得的基准数据集验证了建议的工作流程。已经进行了广泛的实验，并在对准阶段与Bowtie和BWA对准器进行了比较，而在SNP调用阶段与GATK，FaSD，SparkGA，Halvade和Heap进行了比较。实验结果分析表明，提出的工作流程优于现有框架，例如GATK，FaSD，与BWA和Bowtie aligners集成的Heap，SparkGA和Halvade。拟议的框架平均提高了22.46％的F得分效率和99.80％的一致精度。而且，相对地0.21％意味着更高的精度。此外，还进行了SNP挖掘以鉴定基因组序列中的特定区域。所有框架都是使用内存管理的默认配置实现的。观察结果表明，所有工作流程都具有大致相同的内存需求。将来，它将以图形方式显示挖掘出的SNP，以实现用户友好的交互，并分析和优化内存需求。

著录项

期刊名称 Genes
作者
Muhammad Tahir; Muhammad Sardaraz;
展开▼
作者单位

展开▼
年(卷),期 2020(11),2
年度 2020
页码 -1
总页数 23
原文格式 PDF
正文语种
中图分类生化遗传学;生化药理学;
关键词
DNA; NGS; SNP; Hadoop; Map-Reduce; accuracy; execution time;

机译：DNA;NGS;SNP;Hadoop;Map-Reduce;准确性;执行时间;

相似文献

外文文献
中文文献
专利

1. Interspecies hybridization on DNA resequencing microarrays: efficiency of sequence recovery and accuracy of SNP detection in human, ape, and codfish mitochondrial DNA genomes sequenced on a human-specific MitoChip [J] . Sarah MC Flynn, Steven M Carr BMC Genomics . 2007,第1期

机译：在DNA重测序微阵列上进行种间杂交：在人类特异性MitoChip上测序的人类，猿和鳕鱼线粒体DNA基因组中的序列恢复效率和SNP检测的准确性
2. GeneEvolve: a fast and memory efficient forward-time simulator of realistic whole-genome sequence and SNP data [J] . RasoolTahmasbi, Matthew C.Keller Bioinformatics . 2017,第2期

机译：Geneevolve：现实全基因组序列和SNP数据的快速和记忆高效的前进时间模拟器
3. Fast and cost-effective single nucleotide polymorphism (SNP) detection in the absence of a reference genome using semideep next-generation Random Amplicon Sequencing (RAMseq) [J] . Bayerl Helmut, Kraus Robert H. S., Nowak Carsten, Molecular ecology resources . 2018,第1期

机译：快速且经济高效的单核苷酸多态性（SNP）在不存在参考基因组的情况下使用Semidep下一代随机扩增子测序检测（Ramseq）
4. ParSECH: Parallel Sequencing Error Correction with Hadoop for Large-Scale Genome Sequences [C] . Arghya Kusum Das, Shayan Shams, Sayan Goswami, International Conference on Bioinformatics and Computational Biology . 2017

机译：Parsech：用Hadoop进行大规模基因组序列的平行测序纠错
5. Accelerating Hadoop Map-Reduce for small/intermediate data sizes using the Comet coordination framework [D] . Chaudhari, Shivangi 2009

机译：使用Comet协调框架为小型/中型数据加速Hadoop Map-Reduce
6. A Deep-Sequencing Workflow for the Fast and Efficient Generation of High-Quality African Swine Fever Virus Whole-Genome Sequences [O] . Jan H. Forth, Leonie F. Forth, Jacqueline King, 2019

机译：快速高效生成高质量非洲猪瘟病毒全基因组序列的深度测序工作流程
7. A Fast and Scalable Workflow for SNPs Detection in Genome Sequences Using Hadoop Map-Reduce [O] . Muhammad Tahir, Muhammad Sardaraz 2020

机译：使用Hadoop地图减少的基因组序列中的SNPS检测快速且可扩展的工作流程

A Fast and Scalable Workflow for SNPs Detection in Genome Sequences Using Hadoop Map-Reduce

摘要

著录项

相似文献

相关主题

期刊订阅