首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >A crowdsourcing method for correcting sequencing errors for the third-generation sequencing data
【24h】

A crowdsourcing method for correcting sequencing errors for the third-generation sequencing data

机译:用于校正第三代排序数据的序列误差的众包方法

获取原文
获取外文期刊封面目录资料

摘要

The third generation sequencing data exposes great advantage on read length, which extremely benefits the genomic analyses. However, the third generation sequencing data implies error models different from the ones that the second generation data brings. It is suggested to correct sequencing errors, which could significantly reduce false positives in downstream analyses. Existing error correction approaches often suffer accuracy loss when the hybrid reads present diversity or the coverage varies. In this paper, we propose a novel method based on crowdsourcing strategy, which is implemented as CLTC. CLTC is also a hybrid correction algorithm, which consists of four steps. The second generation reads are first collected and mapped to the third generation reads. Then, the base difficult level is defined to describe the diversities on a base among a group of 2nd-generation reads covered it. The capability is evaluated for each 2nd-generation read, which considers the base difficult levels across the read, the consistency among overlapped reads and the mapping quality between the 2nd- and 3rd-generation reads. A heuristic algorithm is designed for the calculation of capabilities. An expectation-maximization algorithm is finally used to compute the corrected result for each base-pair. We test CLTC on different datasets and compare to the existing approaches. The results demonstrate that CLTC is able to achieve higher accuracy and performs faster than the existing ones.
机译:第三代排序数据对读取长度暴露了很大的优势,这极大地利益基因组分析。然而,第三代排序数据意味着与第二代数据带来的错误模型不同。建议正确测序误差,这可以显着降低下游分析中的误报。当混合读取当前的分集或覆盖范围变化时,现有的纠错方法通常遭受准确性损失。在本文中,我们提出了一种基于众包策略的新方法,该方法被实施为CLTC。 CLTC也是一个混合校正算法,它由四个步骤组成。首先收集第二代读取并映射到第三代读取。然后,定义基本困难级别以描述覆盖其覆盖的第二代读取的基础上的碱基的多样区。对每个第2代读取的能力进行评估,该读取是读取的基本困难级别,重叠读取的一致性和第三 - 生成读取之间的重叠读取的一致性和映射质量。启发式算法旨在计算能力。最终使用期望最大化算法来计算每个基对的校正结果。我们在不同的数据集上测试CLTC并与现有方法进行比较。结果表明,CLTC能够实现更高的准确性并比现有的更快地执行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号