A crowdsourcing method for correcting sequencing errors for the third-generation sequencing data

机译：用于校正第三代排序数据的序列误差的众包方法

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The third generation sequencing data exposes great advantage on read length, which extremely benefits the genomic analyses. However, the third generation sequencing data implies error models different from the ones that the second generation data brings. It is suggested to correct sequencing errors, which could significantly reduce false positives in downstream analyses. Existing error correction approaches often suffer accuracy loss when the hybrid reads present diversity or the coverage varies. In this paper, we propose a novel method based on crowdsourcing strategy, which is implemented as CLTC. CLTC is also a hybrid correction algorithm, which consists of four steps. The second generation reads are first collected and mapped to the third generation reads. Then, the base difficult level is defined to describe the diversities on a base among a group of 2nd-generation reads covered it. The capability is evaluated for each 2nd-generation read, which considers the base difficult levels across the read, the consistency among overlapped reads and the mapping quality between the 2nd- and 3rd-generation reads. A heuristic algorithm is designed for the calculation of capabilities. An expectation-maximization algorithm is finally used to compute the corrected result for each base-pair. We test CLTC on different datasets and compare to the existing approaches. The results demonstrate that CLTC is able to achieve higher accuracy and performs faster than the existing ones.

机译：第三代排序数据对读取长度暴露了很大的优势，这极大地利益基因组分析。然而，第三代排序数据意味着与第二代数据带来的错误模型不同。建议正确测序误差，这可以显着降低下游分析中的误报。当混合读取当前的分集或覆盖范围变化时，现有的纠错方法通常遭受准确性损失。在本文中，我们提出了一种基于众包策略的新方法，该方法被实施为CLTC。 CLTC也是一个混合校正算法，它由四个步骤组成。首先收集第二代读取并映射到第三代读取。然后，定义基本困难级别以描述覆盖其覆盖的第二代读取的基础上的碱基的多样区。对每个第2代读取的能力进行评估，该读取是读取的基本困难级别，重叠读取的一致性和第三 - 生成读取之间的重叠读取的一致性和映射质量。启发式算法旨在计算能力。最终使用期望最大化算法来计算每个基对的校正结果。我们在不同的数据集上测试CLTC并与现有方法进行比较。结果表明，CLTC能够实现更高的准确性并比现有的更快地执行。

著录项

来源
《IEEE International Conference on Bioinformatics and Biomedicine》|2017年|769p|共8页
会议地点
作者
Yu Geng; Zhongmeng Zhao; Zhaofang Du; Yixuan Wang; Tian Zheng; Siyu He; Xuanping Zhang; Jiayin Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 Q81-53;
关键词
Sequential analysis; Genomics; Crowdsourcing; Algorithm design and analysis; Bioinformatics; Error correction; Mathematical model;

机译：顺序分析;基因组学;众包;算法设计与分析;生物信息学;纠错;数学模型;

相似文献

外文文献
中文文献
专利

1. HECTOR: a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data [J] . Adrianto Wirawan, Robert S Harris, Yongchao Liu, BMC Bioinformatics . 2014,第1期

机译：HECTOR：基于并行多级均聚物谱的纠错器，用于454个测序数据
2. Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction [J] . Laehnemann David, Borkhardt Arndt, McHardy Alice Carolyn Briefings in bioinformatics . 2016,第1期

机译：对DNA深度测序数据进行去噪-高通量测序错误及其纠正
3. Empirical assessment of sequencing errors for high throughput pyrosequencing data [J] . Paulo GS da Fonseca, Jorge AP Paiva, Luiz GP Almeida, BMC research notes . 2013,第S1期

机译：对高通量焦磷酸测序数据测序错误的经验评估
4. A crowdsourcing method for correcting sequencing errors for the third-generation sequencing data [C] . Yu Geng, Zhongmeng Zhao, Zhaofang Du, IEEE International Conference on Bioinformatics and Biomedicine . 2017

机译：一种纠正第三代测序数据测序错误的众包方法
5. Discovering Rare Hematopoietic Clones Harboring Leukemia-Associated Mutations Using Error-Corrected Sequencing. [D] . Young, Andrew Lee. 2018

机译：使用错误校正的测序发现具有白血病相关突变的罕见造血克隆。
6. HECTOR: a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data [O] . Adrianto Wirawan, Robert S Harris, Yongchao Liu, 2014

机译：HECTOR：基于并行多级均聚物谱的纠错器用于454个测序数据
7. HECTOR: a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data [O] . Adrianto Wirawan, Robert S Harris, Yongchao Liu, 2014

机译：HECTOR：基于并行多级均聚物谱的纠错器，用于454个测序数据
8. Iternative algorithm for correcting sequencing errors in DNA coding regions [R] . Xu, Y. , Mural, R. J. , Uberbacher, E. C. 1995

机译：用于校正DNa编码区中的测序错误的替代算法

A crowdsourcing method for correcting sequencing errors for the third-generation sequencing data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅