Finding optimal threshold for correction error reads in DNA issembling

机译：在DNA发布中找到校正错误的最佳阈值

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background: DNA assembling is the problem of determining the nucleotide sequence of a genome from its substrings, called reads. In the experiments, there may be some errors on the reads which affect the performance of the DNA assembly algorithms. Existing algorithms, e.g. ECINDEL and SRCorr, correct the error reads by considering the number of times each length-k substring of the reads appear in the input. They treat those length-k substrings appear at least M times as correct substring and correct the error reads based on these substrings. However, since the threshold M is chosen without any solid theoretical analysis, these algorithms cannot guarantee their performances on error correction.Results: In this paper, we propose a method to calculate the probabilities of false positive and false negative when determining whether a length-k substring is correct using threshold M. Based on this optimal threshold M that minimizes the total errors (false positives and false negatives). Experimental results on both real data and simulated data showed that our calculation is correct and we can reduce the total error substrings by 77.6% and 65.1 % when compared to ECINDEL and SRCorr respectively.Conclusion: We introduced a method to calculate the probability of false positives and false negatives of the length-k substring using different thresholds. Based on this calculation, we found the optimal threshold to minimize the total error of false positive plus false negative.

机译：背景：DNA组装是从其子串确定基因组的核苷酸序列的问题，称为读数。在实验中，读取可能存在一些影响DNA装配算法的性能的读取。现有算法，例如Ecindel和SCRorR，通过考虑读取的每个长度-k子字符串出现在输入中的次数，更正错误读取。它们对待那些长度-K子串起至少为正确的子字符串显示至少m次，并根据这些子个题校正错误读取。然而，由于选择阈值M而不进行任何稳定的理论分析，因此这些算法无法保证它们对纠错的性能。结果：在本文中，我们提出了一种方法来计算误报时误报的概率和在确定是否长度时基于此最佳阈值M，k子串是正确的，最小化总误差（误报和假否定）。实验结果对真实数据和模拟数据显示我们的计算是正确的，与官德尔和SRCORR分别相比，我们可以将总误差子串降低77.6％和65.1％。结论：我们介绍了一种计算误报概率的方法使用不同阈值的长度-K子字符串的假阴性。在此计算的基础上，我们发现最佳阈值，以最小化误报正为假阴性的总误差。

著录项

来源
《Asia-Pacific Bioinformatics Conference》|2009年||共8页
会议地点
作者
Francis YL Chin; Henry CM Leung; Wei-Lin Li; Siu-Ming Yiu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 Q811.4-532;
关键词

相似文献

外文文献
中文文献
专利

1. Finding optimal threshold for correction error reads in DNA assembling [J] . Francis YL Chin, Henry CM Leung, Wei-Lin Li, BMC Bioinformatics . 2009,第SUPPLEMENTa1期

机译：在DNA组装中找到校正错误读数的最佳阈值
2. Asymmetrical barcode adapter-assisted recovery of duplicate reads and error correction strategy to detect rare mutations in circulating tumor DNA [J] . Jinwoo Ahn, Byungjin Hwang, Ha Young Kim, Scientific reports. . 2017,第1期

机译：不对称条形码适配器辅助恢复重复读取和误差校正策略，以检测循环肿瘤DNA中的罕见突变
3. A hybrid and scalable error correction algorithm for indel and substitution errors of long reads [J] . Arghya Kusum Das, Sayan Goswami, Kisung Lee, BMC Genomics . 2019,第S11期

机译：长读取的indel和替换误差的混合和可伸缩误差校正算法
4. Finding optimal threshold for correction error reads in DNA issembling [C] . Francis YL Chin, Henry CM Leung, Wei-Lin Li, Asia-Pacific Bioinformatics Conference . 2009

机译：在DNA发布中找到校正错误的最佳阈值
5. Probabilistic insertion, deletion and substitution error correction using Markov inference in next generation sequencing reads [D] . Noroozi, Vahid 2016

机译：在下一代测序读取中使用马尔可夫推论进行概率插入，删除和取代错误校正
6. Finding optimal threshold for correction error reads in DNA assembling [O] . Francis YL Chin, Henry CM Leung, Wei-Lin Li, 2009

机译：在DNA组装中找到校正错误读数的最佳阈值
7. Finding optimal threshold for correction error reads in DNA assembling [O] . 2009

机译：在DNA组装中找到校正错误读数的最佳阈值

Finding optimal threshold for correction error reads in DNA issembling

摘要

著录项

相似文献

相关主题

期刊订阅