Genome Sequence Analysis in Distributed Computing using Spark

Sagar Ap.; Pooja Mehta; Anuradha J.; B.K. Tripathy

首页> 外文期刊>International journal of knowledge discovery in bioinformatics >Genome Sequence Analysis in Distributed Computing using Spark

【24h】

Genome Sequence Analysis in Distributed Computing using Spark

机译：使用Spark进行分布式计算中的基因组序列分析

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Integration of Computer Science with Bio Science has led to new field Computational Biology which created an opportunity in speeding up the process of analyzing the Bio-data. DNA sequence analysis especially finding the base pairs tlrat helps in identifying the order of nucleotides present in all living beings, it also helps in forensics for DNA profiling and parenting testing. This sequence analysis has been a challenging task in Computational Biology due to large volumes of data and need of more computational resources. Using a distributed file system with distributed computation of tasks can be one of the solutions to above problem. In this paper, the authors use Spark a query engine for large-scale data processing in analyzing the DNA sequence and extracting the base pairs and also they try to improve base pair extraction with improvised algorithms.

机译：计算机科学与生物科学的融合带来了计算生物学的新领域，这为加速分析生物数据的过程创造了机会。 DNA序列分析，尤其是发现tlrat碱基对，有助于确定所有生物中存在的核苷酸的顺序，也有助于进行DNA分析和育儿测试的法医。由于数据量大且需要更多的计算资源，因此此序列分析已成为计算生物学中的一项艰巨任务。将分布式文件系统与任务的分布式计算结合使用可能是上述问题的解决方案之一。在本文中，作者使用Spark查询引擎进行大规模数据处理，以分析DNA序列并提取碱基对，并尝试通过改进算法改进碱基对的提取。

著录项

来源
《International journal of knowledge discovery in bioinformatics》 |2015年第2期|30-42|共13页
作者
Sagar Ap.; Pooja Mehta; Anuradha J.; B.K. Tripathy;
展开▼
作者单位

VIT University, Vellore, India;

VIT University, Vellore, India;

VIT University, Vellore, India;

VIT University, Vellore, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Classification; Clustering; Computational Biology; Distributed Computing; Genome; In Memory Computing; Sequence Analysis; Spark;

机译：分类;集群;计算生物学;分布式计算基因组在内存计算中;序列分析;火花;
入库时间 2022-08-18 00:36:54

相似文献

外文文献
中文文献
专利

1. DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework [J] . Po-Jung Huang, Jui-Huan Chang, Hou-Hsien Lin, Computational and mathematical methods in medicine . 2020,第1期

机译：使用基于云的计算框架的深筋 - on-Spark：小型基因组分析
2. Encouraging open science, replicability of analysis and collaborative cloud computing for whole genome sequence analysis of complex traits [J] . Majarian Timothy, Manning Alisa K. Genetic epidemiology. . 2018,第7期

机译：鼓励开放的科学，分析的可再现性以及复杂性状的全基因组序列分析
3. AN EFFICIENT DISTRIBUTED BIOINFORMATICS COMPUTING SYSTEM FOR DNA SEQUENCE ANALYSIS ON ENCODING SYSTEM [J] . Mohammad Ibrahim Khan, Chotan Sheel American Journal of Bioinformatics . 2013,第2期

机译：用于编码系统DNA序列分析的高效分布式生物信息学计算系统
4. A Cloud-Assisted Application over Apache Spark for Investigating Epigenetic Markers on DNA Genome Sequences [C] . Ning Yu, Bing Li, Yi Pan 2016 IEEE International Conferences on Big Data and Cloud Computing, Social Computing and Networking, Sustainable Computing and Communication . 2016

机译：基于Apache Spark的云辅助应用程序，用于研究DNA基因组序列的表观遗传标记
5. Biological sequence analysis using Hadoop/MapReduce as a distributed computing model. [D] . Paudel, Roshan. 2012

机译：使用Hadoop / MapReduce作为分布式计算模型的生物序列分析。
6. DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework [O] . Po-Jung Huang, Jui-Huan Chang, Hou-Hsien Lin, 2020

机译：使用基于云的计算框架的深筋 - on-Spark：小型基因组分析
7. The Bioinformatics Bookshelf: Teach Yourself Computational Biology? Bioinformatics: The Machine Learning Approach By Pierre Baldi and Soren Brunak Cambridge, MA: MIT Press (1998). 351 pp. $40.00; Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins Edited by Andreas D. Baxevanis and B. F. Francis Ouellette New York: Wiley-lnterscience (1998). 370 pp. $59.95; Guide to Human Genome Computing, Second Edition Edited by Martin J. Bishop San Diego, CA: Academic Press (1998). 306 pp. $69.95; Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids By Richard Durbin, Sean Eddy, Anders Krogh, and Graeme Mitchison Cambridge: Cambridge University Press (1998). 356 pp. $34.95; Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology By Dan Gusfield Cambridge: Cambridge University Press (1997). 534 pp. $59.95; Introduction to Computational Molecular Biology By Joao Setubal and Joao Meidanis Boston: PWS Publishing (1997). 296 pp. $61.95 [O] . Pickeral Oxana K, Boguski Mark S 1999

机译：生物信息学书架：自学计算生物学吗？生物信息学：机器学习方法，作者：Pierre Baldi和Soren Brunak剑桥，麻省：麻省理工学院出版社（1998）。 351页，$ 40.00；生物信息学：由Andreas D. Baxevanis和B. F. Francis Ouellette编辑的基因和蛋白质分析实用指南纽约：Wiley-Interscience（1998）。 370页，$ 59.95；《人类基因组计算指南》，第二版，由马丁·J·毕晓普（Martin J. Bishop）编辑，加利福尼亚州圣地亚哥：学术出版社（1998）。 306页，$ 69.95；生物序列分析：蛋白质和核酸的概率模型Richard Durbin，Sean Eddy，Anders Krogh和Graeme Mitchison剑桥：剑桥大学出版社（1998年）。 356页，$ 34.95；字符串，树和序列上的算法：计算机科学和计算生物学Dan Danssfield剑桥：剑桥大学出版社（1997年）。 534页，$ 59.95； Joao Setubal和Joao Meidanis Boston撰写的《计算分子生物学概论》：PWS出版（1997）。 296羽61.95美元

Genome Sequence Analysis in Distributed Computing using Spark

摘要

著录项

相似文献

相关主题

期刊订阅