首页> 外文OA文献 >Removing duplicate reads using graphics processing units
【2h】

Removing duplicate reads using graphics processing units

机译:使用图形处理单元删除重复的读取

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Background: During library construction polymerase chain reaction is used to enrich the DNA before sequencing. Typically, this process generates duplicate read sequences. Removal of these artifacts is mandatory, as they can affect the correct interpretation of data in several analyses. Ideally, duplicate reads should be characterized by identical nucleotide sequences. However, due to sequencing errors, duplicates may also be nearly-identical. Removing nearly-identical duplicates can result in a notable computational effort. To deal with this challenge, we recently proposed a GPU method aimed at removing identical and nearly-identical duplicates generated with an Illumina platform. The method implements an approach based on prefix-suffix comparison. Read sequences with identical prefix are considered potential duplicates. Then, their suffixes are compared to identify and remove those that are actually duplicated. Although the method can be efficiently used to remove duplicates, there are some limitations that need to be overcome. In particular, it cannot to detect potential duplicates in the event that prefixes are longer than 27 bases, and it does not provide support for paired-end read libraries. Moreover, large clusters of potential duplicates are split into smaller with the aim to guarantees a reasonable computing time. This heuristic may affect the accuracy of the analysis. Results: In this work we propose GPU-DupRemoval, a new implementation of our method able to (i) cluster reads without constraints on the maximum length of the prefixes, (ii) support both single- and paired-end read libraries, and (iii) analyze large clusters of potential duplicates. Conclusions: Due to the massive parallelization obtained by exploiting graphics cards, GPU-DupRemoval removes duplicate reads faster than other cutting-edge solutions, while outperforming most of them in terms of amount of duplicates reads.
机译:背景:在文库构建期间,聚合酶链反应用于在测序前富集DNA。通常,此过程会生成重复的读取序列。必须清除这些伪影,因为它们可能会影响数次分析中数据的正确解释。理想情况下,重复读取应以相同的核苷酸序列为特征。但是,由于排序错误,重复项也可能几乎相同。删除几乎相同的重复项可能会导致大量的计算工作。为了应对这一挑战,我们最近提出了一种GPU方法,旨在消除用Illumina平台生成的相同和几乎相同的重复项。该方法实现了基于前缀-后缀比较的方法。具有相同前缀的阅读序列被认为是潜在的重复。然后,将它们的后缀进行比较,以识别并删除实际上重复的后缀。尽管该方法可以有效地用于删除重复项,但仍需要克服一些限制。特别是,如果前缀长度超过27个碱基,它就无法检测到潜在的重复项,并且它不提供对双端读库的支持。此外,将潜在重复项的大簇分成较小的簇,以保证合理的计算时间。这种试探法可能会影响分析的准确性。结果:在这项工作中,我们提出了GPU-DupRemoval,这是我们方法的一种新实现,能够(i)对读取进行聚类,而对前缀的最大长度没有限制,(ii)支持单端和成对读取库,并且( iii)分析可能重复的大型簇。结论:由于利用图形卡获得了巨大的并行化,GPU-DupRemoval可以比其他尖端解决方案更快地删除重复读取,同时在重复读取数量方面胜过大多数解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号