CloudRS: An error correction algorithm of high-throughput sequencing data based on scalable framework

机译：CloudRS：一种基于可扩展框架的高通量测序数据的纠错算法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Next-generation sequencing (NGS) technologies produce huge amounts of data. These sequencing data unavoidably are accompanied by the occurrence of sequencing errors which constitutes one of the major problems of further analyses. Error correction is indeed one of the critical steps to the success of NGS applications such as de novo genome assembly and DNA resequencing as illustrated in literature. However, it requires computing time and memory space heavily. To design an algorithm to improve data quality by efficiently utilizing on-demand computing resources in the cloud is a challenge for biologists and computer scientists. In this study, we present an error-correction algorithm, called the CloudRS algorithm, for correcting errors in NGS data. The CloudRS algorithm aims at emulating the notion of error correction algorithm of ALLPATHS-LG on the Hadoop/ MapReduce framework. It is conservative in correcting sequencing errors to avoid introducing false decisions, e.g., when dealing with reads from repetitive regions. We also illustrate several probabilistic measures we introduce into CloudRS to make the algorithm more efficient without sacrificing its effectiveness. Running time of using up to 80 instances each with 8 computing units shows satisfactory speedup. Experiments of comparing with other error correction programs show that CloudRS algorithm performs lower false positive rate for most evaluation benchmarks and higher sensitivity on genome S. cerevisiae. We demonstrate that CloudRS algorithm provides significant improvements in the quality of the resulting contigs on benchmarks of NGS de novo assembly.

机译：下一代测序（NGS）技术可产生大量数据。这些测序数据不可避免地伴随着测序错误的发生，这构成了进一步分析的主要问题之一。纠错确实是NGS应用成功的关键步骤之一，例如从头开始的基因组组装和DNA重测序，如文献所示。但是，它需要大量的计算时间和内存空间。设计一种通过有效利用云中的按需计算资源来提高数据质量的算法，对生物学家和计算机科学家来说是一个挑战。在这项研究中，我们提出了一种称为CloudRS算法的纠错算法，用于纠正NGS数据中的错误。 CloudRS算法旨在在Hadoop / MapReduce框架上模拟ALLPATHS-LG的纠错算法的概念。为了避免引入错误的决定，例如在处理来自重复区域的读取时，校正序列错误是保守的。我们还说明了几种引入CloudRS的概率测度，以提高算法的效率而又不牺牲其有效性。使用多达8个计算单元的多达80个实例的运行时间显示出令人满意的加速。与其他纠错程序进行比较的实验表明，对于大多数评估基准，CloudRS算法的假阳性率较低，对酿酒酵母的基因组的敏感性更高。我们证明了CloudRS算法在NGS de novo程序集基准测试中所产生的重叠群的质量上有了重大改进。

著录项

来源
《2013 IEEE International Conference on Big Data》|2013年|717-722|共6页
会议地点 Santa Clara CA(US)
作者
Chen Chien-Chih; Chang Yu-Jung; Chung Wei-Chun; Lee Der-Tsai;
展开▼
作者单位

Institute of Information Science Research Center for Information Technology Innovation, Academia Sinica Taipei, Taiwan, ROCc;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
error correction; genome assembly; mapreduce; next-generation sequencing;

机译：纠错;基因组装配;简化;下一代测序;;

相似文献

外文文献
中文文献
专利

1. Detecting Rare AID-Induced Mutations in B-Lineage Oncogenes from High-Throughput Sequencing Data Using the Detection of Minor Variants by Error Correction Method [J] . Martin Ophelie Alyssa, Garot Armand, Le Noir Sandrine, The Journal of Immunology: Official Journal of the American Association of Immunologists . 2018,第3期

机译：通过误差校正方法检测来自高通量测序数据的B型血管生成的罕见辅助诱导突变
2. Error correction and statistical analyses for intra-host comparisons of feline immunodeficiency virus diversity from high-throughput sequencing data [J] . Yang Liu, Francesca Chiaromonte, Howard Ross, BMC Bioinformatics . 2015,第1期

机译：对来自高通量测序数据的猫免疫缺陷病毒多样性进行宿主内比较的错误校正和统计分析
3. Error correction of high-throughput sequencing datasets with non-uniform coverage [J] . Medvedev, Paul, Scott, Eric, Kakaradov, Boyko, Bioinformatics . 2011,第13期

机译：覆盖范围不一致的高通量测序数据集的错误校正
4. CloudRS: An error correction algorithm of high-throughput sequencing data based on scalable framework [C] . Chen Chien-Chih, Chang Yu-Jung, Chung Wei-Chun, IEEE International Conference on Big Data . 2013

机译：CLOUDRS：基于可伸缩框架的高吞吐量排序数据纠错算法
5. Error correction and clustering algorithms for next generation sequencing [D] . Yang, Xiao 2011

机译：下一代测序的纠错和聚类算法
6. Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction [O] . David Laehnemann, *, Arndt Borkhardt, -1

机译：对DNA深度测序数据进行去噪-高通量测序错误及其更正
7. Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction [O] . David Laehnemann, Arndt Borkhardt, Alice Carolyn McHardy 2015

机译：去噪DNA深度测序数据 - 高通量测序误差及其校正

CloudRS: An error correction algorithm of high-throughput sequencing data based on scalable framework

摘要

著录项

相似文献

相关主题

期刊订阅