首页> 外文学位 >Biological sequence analysis using Hadoop/MapReduce as a distributed computing model.

【24h】

Biological sequence analysis using Hadoop/MapReduce as a distributed computing model.

机译：使用Hadoop / MapReduce作为分布式计算模型的生物序列分析。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most Biological (DNA, RNA or Protein) sequence analyzing algorithms are complex and require extensive execution time and memory. Serial Biological Sequence Processing Algorithms do not use the computing power of present computers very efficiently. Today, researchers and scientists have developed and tested many programming models for parallelizing and optimizing algorithms to decrease execution time and memory used.;MapReduce is a programming model based on functional programming, where users implement interface of two functions - map and reduce. In general, map is a kind of application of functions and reduce is he aggregations of the results of those applications. MapReduce Programming Model is patented by Google. In this research, Hadoop implementation of MapReduce was used. Hadoop and Hadoop Distributed File System are open source models of MapReduce and Google File System. Hadoop framework automatically transforms map and reduce applications into map and reduce tasks.;All known biological sequences and their functional annotations are stored in biological databases. A newly determined biological sequence should be compared with each and every known corresponding biological sequence to detect potential structural or evolutionary relationships. From a computational point of view, a major challenge is to align the query biological sequence to a very large collection of biological sequences and sort them according to the score of their alignment with the input biological sequence. The solution has to be fast and scalable.;The main goals of this thesis research are: • To build a fully-distributed Ubuntu Hadoop cluster of four nodes. • To configure and test Hadoop cluster in the LittleFe cluster computer. • To seek, determine and measure the efficiency of program in terms of used time and memory.;The main achievements/results of this thesis research are: • Transformation of the LittleFe BCCD operating system cluster computer into the Ubuntu operating system cluster computer. • Two Hadoop examples - the RandomTextWriter.java and SecondarySort.java were modified into the Hadoop MRGenerateDNA.java program to generate big file of random DNA sequences and the Hadoop MRSortDNA.java program to sort DNA sequences in an order respectively. • Proved that Hadoop is an efficient programming model to develop new parallel algorithms for biological sequence processing based on Map Reduce Programming model.

机译：大多数生物学（DNA，RNA或蛋白质）序列分析算法都很复杂，并且需要大量的执行时间和内存。串行生物序列处理算法不能非常有效地利用当前计算机的计算能力。如今，研究人员和科学家已经开发和测试了许多编程模型，用于并行化和优化算法以减少执行时间和使用的内存。MapReduce是基于函数式编程的编程模型，用户在其中实现两个函数的接口-映射和化简。通常，map是函数的一种应用程序，而归纳法是那些应用程序的结果的汇总。 MapReduce编程模型已获得Google的专利。在本研究中，使用了MapReduce的Hadoop实现。 Hadoop和Hadoop分布式文件系统是MapReduce和Google File System的开源模型。 Hadoop框架自动将地图和约简应用程序转换为地图和约简任务。;所有已知的生物序列及其功能注释都存储在生物数据库中。应该将新确定的生物学序列与每个已知的相应生物学序列进行比较，以检测潜在的结构或进化关系。从计算的角度来看，一个主要的挑战是将查询生物序列与大量生物序列进行比对，并根据其与输入生物序列的比对得分对它们进行排序。该解决方案必须快速且可扩展。；本论文研究的主要目标是：•构建一个由四个节点组成的完全分布式的Ubuntu Hadoop集群。 •在LittleFe群集计算机中配置和测试Hadoop群集。 •根据所使用的时间和内存来寻找，确定和衡量程序的效率。；本研究的主要成果/结果是：•将LittleFe BCCD操作系统集群计算机转换为Ubuntu操作系统集群计算机。 •将两个Hadoop示例（RandomTextWriter.java和SecondarySort.java）修改为Hadoop MRGenerateDNA.java程序以生成随机DNA序列的大文件，并修改了Hadoop MRSortDNA.java程序以分别对DNA序列进行排序。 •证明Hadoop是一种有效的编程模型，可以基于Map Reduce编程模型开发用于生物序列处理的新并行算法。

著录项

作者
Paudel, Roshan.;
展开▼
作者单位

Morgan State University.;

展开▼
授予单位 Morgan State University.;
学科 Biology Bioinformatics.;Computer Science.
学位 M.S.
年度 2012
页码 87 p.
总页数 87
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. G-Hadoop: MapReduce across distributed data centers for data-intensive computing [J] . Lizhe Wang, Jie Tao, Rajiv Ranjan, Future generation computer systems . 2013,第3期

机译：G-Hadoop：跨分布式数据中心的MapReduce，用于数据密集型计算
2. Dynamic contrast-enhanced computed tomography in metastatic nasopharyngeal carcinoma: reproducibility analysis and observer variability of the distributed parameter model. [J] . Ng QS, Thng CH, Lim WT, Investigative radiology . 2012,第1期

机译：动态对比增强计算机断层扫描在转移性鼻咽癌中的应用：分布参数模型的再现性分析和观察者变异性。
3. Genome Sequence Analysis in Distributed Computing using Spark [J] . Sagar Ap., Pooja Mehta, Anuradha J., International journal of knowledge discovery in bioinformatics . 2015,第2期

机译：使用Spark进行分布式计算中的基因组序列分析
4. Performance evaluation and tuning for MapReduce computing in Hadoop distributed file system [C] . Kim Jongyeop, Kumar T K Ashwin, George K.M., IEEE International Conference on Industrial Informatics . 2015

机译：Hadoop分布式文件系统中MapReduce计算的性能评估和调整
5. Distributed search of biological databases using Hadoop/MapReduce. [D] . Fashola, Babatunde Olaide. 2015

机译：使用Hadoop / MapReduce分布式搜索生物数据库。
6. Experimental Analysis in Hadoop MapReduce: A Closer Look at Fault Detection and Recovery Techniques [O] . Muntadher Saadoon, Siti Hafizah Ab Hamid, Hazrina Sofian, 2021

机译：Hadoop Makreduce的实验分析：仔细看看故障检测和恢复技术
7. Analysis and Research of Distributed network Crawler based on Cloud Computing Hadoop Platform [O] . Hongsheng Xu, Ganglong Fan, Ke Li 2018

机译：基于云计算Hadoop平台的分布式网络履带分析与研究

Biological sequence analysis using Hadoop/MapReduce as a distributed computing model.

摘要

著录项

相似文献

相关主题

期刊订阅