Repeat-aware modeling and correction of short read errors

机译：重复感知建模和简短读取错误的校正

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background: High-throughput short read sequencing is revolutionizing genomics and systems biology research by enabling cost-effective deep coverage sequencing of genomes and transcriptomes. Error detection and correction are crucial to many short readsequencing applications including de novo genome sequencing, genome resequencing, and digital gene expression analysis. Short read error detection is typically carried out by counting the observed frequencies of /oners in reads and validating those withfrequencies exceeding a threshold. In case of genomes with high repeat content, an erroneous /oner may be frequently observed if it has few nucleotide differences with valid /cmers with multiple occurrences in the genome. Error detection and correctionwere mostly applied to genomes with low repeat content and this remains a challenging problem for genomes with high repeat content. Results: We develop a statistical model and a computational method for error detection and correction in the presence of genomic repeats. We propose a method to infer genomic frequencies of /oners from their observed frequencies by analyzing the misread relationships among observed /cmers. We also propose a method to estimate the threshold useful for validating /cmers whoseestimated genomic frequency exceeds the threshold. We demonstrate that superior error detection is achieved using these methods. Furthermore, we break away from the common assumption of uniformly distributed errors within a read, and provide a frameworkto model position-dependent error occurrence frequencies common to many short read platforms. Lastly, we achieve better error correction in genomes with high repeat content. Availability: The software is implemented in C++ and is freely available underGNU GPL3 license and Boost Software V1.0 license at "http://aluru-sun.ece.iastate.edu/doku.php? id=redeem". Conclusions: We introduce a statistical framework to model sequencing errors in next-generation reads, which led to promising results in detectingand correcting errors for genomes with high repeat content.

机译：背景：高吞吐量短读取测序通过使基因组和转录om的成本效益的深度覆盖序列来彻底改变基因组学和系统生物学研究。错误检测和校正对于许多短读取序列应用是至关重要的，包括DE Novo基因组测序，基因组重新序列和数字基因表达分析。短读取错误检测通常通过计算读取中的观察到/ oners的频率并验证超过阈值的那些验证的频率来执行。在具有高重复含量的基因组的情况下，如果在基因组中具有很少的核苷酸差异，可以经常观察到错误/碎片。错误检测和校正主要应用于具有低重复内容的基因组，这仍然是具有高重复内容的基因组的具有挑战性问题。结果：我们在存在基因组重复情况下开发统计模型和计算方法，用于错误检测和校正。我们提出了一种方法，通过分析观察/ CMERS之间的误差关系来推断出从观察到的频率从观察到的频率推断出基因组频率。我们还提出了一种方法来估计可用于验证/ CMERS最期基因组频率超过阈值的阈值。我们证明使用这些方法实现了卓越的错误检测。此外，我们在读取中断开均匀分布错误的共同假设，并提供许多短读平台共用的框架模型依赖性误差发生频率。最后，我们在具有高重复内容的基因组中获得更好的纠错。可用性：该软件是在C ++中实现的，并在“http://aluru-sun.ece.iastate.edu/doku.php？ID = reaveem”中自由地提供Undergnu GPL3许可证并加强软件V1.0许可证。结论：我们介绍了一个统计框架，以在下一代读取中模拟测序误差，这导致了具有高重复内容的基因组的检测和校正误差的有希望的结果。

著录项

来源
《Asia-Pacific Bioinformatics Conference》|2012年||共10页
会议地点
作者
Xiao Yang; Srinivas Aluru; Karin S Dorman;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 Q811.4-532;
关键词

相似文献

外文文献
中文文献
专利

1. Repeat-aware modeling and correction of short read errors [J] . Xiao Yang, Srinivas Aluru, Karin S Dorman BMC Bioinformatics . 2011,第Supplementa1期

机译：重复识别建模和纠正短读错误
2. An Error Correction and DeNovo Assembly Approach for Nanopore Reads Using Short Reads [J] . Kchouk Mehdi, Elloumi Mourad Current Bioinformatics . 2018,第3期

机译：使用短读取纳米孔读取的误差校正和denovo装配方法
3. SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data [J] . Swetansu Pattnaik, Saurabh Gupta, Arjun A Rao, BMC Bioinformatics . 2014,第1期

机译：SInC：针对SNP，Indel和CNV的准确，快速的基于错误模型的模拟器，并结合了用于短读序列数据的读取生成器
4. Repeat-aware modeling and correction of short read errors [C] . Xiao Yang, Srinivas Aluru, Karin S Dorman Asia-Pacific Bioinformatics Conference . 2012

机译：重复感知建模和简短读取错误的校正
5. Nonlinear Multivariate Time-Space Threshold Vector Error Correction Model for Short Term Traffic State Prediction. [D] . Ma, Tao. 2016

机译：短期交通状态预测的非线性多元时空阈值矢量误差校正模型。
6. Repeat-aware modeling and correction of short read errors [O] . Xiao Yang, Srinivas Aluru, Karin S Dorman 2011

机译：重复识别建模和短读错误纠正
7. Repeat-aware modeling and correction of short read errors [O] . Xiao Yang, Srinivas Aluru, Karin S Dorman 2011

机译：重复识别建模和短读错误纠正

Repeat-aware modeling and correction of short read errors

摘要

著录项

相似文献

相关主题

期刊订阅