首页> 外文会议>Asia-Pacific Bioinformatics Conference >Finding optimal threshold for correction error reads in DNA issembling
【24h】

Finding optimal threshold for correction error reads in DNA issembling

机译:在DNA发布中找到校正错误的最佳阈值

获取原文

摘要

Background: DNA assembling is the problem of determining the nucleotide sequence of a genome from its substrings, called reads. In the experiments, there may be some errors on the reads which affect the performance of the DNA assembly algorithms. Existing algorithms, e.g. ECINDEL and SRCorr, correct the error reads by considering the number of times each length-k substring of the reads appear in the input. They treat those length-k substrings appear at least M times as correct substring and correct the error reads based on these substrings. However, since the threshold M is chosen without any solid theoretical analysis, these algorithms cannot guarantee their performances on error correction.Results: In this paper, we propose a method to calculate the probabilities of false positive and false negative when determining whether a length-k substring is correct using threshold M. Based on this optimal threshold M that minimizes the total errors (false positives and false negatives). Experimental results on both real data and simulated data showed that our calculation is correct and we can reduce the total error substrings by 77.6% and 65.1 % when compared to ECINDEL and SRCorr respectively.Conclusion: We introduced a method to calculate the probability of false positives and false negatives of the length-k substring using different thresholds. Based on this calculation, we found the optimal threshold to minimize the total error of false positive plus false negative.
机译:背景:DNA组装是从其子串确定基因组的核苷酸序列的问题,称为读数。在实验中,读取可能存在一些影响DNA装配算法的性能的读取。现有算法,例如Ecindel和SCRorR,通过考虑读取的每个长度-k子字符串出现在输入中的次数,更正错误读取。它们对待那些长度-K子串起至少为正确的子字符串显示至少m次,并根据这些子个题校正错误读取。然而,由于选择阈值M而不进行任何稳定的理论分析,因此这些算法无法保证它们对纠错的性能。结果:在本文中,我们提出了一种方法来计算误报时误报的概率和在确定是否长度时基于此最佳阈值M,k子串是正确的,最小化总误差(误报和假否定)。实验结果对真实数据和模拟数据显示我们的计算是正确的,与官德尔和SRCORR分别相比,我们可以将总误差子串降低77.6%和65.1%。结论:我们介绍了一种计算误报概率的方法使用不同阈值的长度-K子字符串的假阴性。在此计算的基础上,我们发现最佳阈值,以最小化误报正为假阴性的总误差。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号