Separating Significant Matches from Spurious Matches in DNA Sequences

HUGO DEVILLERS; SOPHIE SCHBATH

首页> 外文期刊>Journal of computational biology: A journal of computational molecular cell biology >Separating Significant Matches from Spurious Matches in DNA Sequences

【24h】

Separating Significant Matches from Spurious Matches in DNA Sequences

机译：从DNA序列中的假匹配中分离出重要的匹配

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Word matches are widely used to compare genomic sequences. Complete genome alignment methods often rely on the use of matches as anchors for building their alignments, and various alignment-free approaches that characterize similarities between large sequences are based on word matches. Among matches that are retrieved from the comparison of two genomic sequences, a part of them may correspond to spurious matches (SMs), which are matches obtained by chance rather than by homologous relationships. The number of SMs depends on the minimal match length (?) that has to be set in the algorithm used to retrieve them. Indeed, if ? is too small, a lot of matches are recovered but most of them are SMs. Conversely, if ? is too large, fewer matches are retrieved but many smaller significant matches are certainly ignored. To date, the choice of ? mostly depends on empirical threshold values rather than robust statistical methods. To overcome this problem, we propose a statistical approach based on the use of a mixture model of geometric distributions to characterize the distribution of the length of matches obtained from the comparison of two genomic sequences.

机译：单词匹配被广泛用于比较基因组序列。完整的基因组比对方法通常依靠使用匹配作为锚来建立其比对，而表征大序列之间相似性的各种无比对方法则基于单词匹配。在从两个基因组序列的比较中检索到的匹配中，它们的一部分可能与伪匹配（SM）相对应，它们是偶然获得的匹配，而不是同源关系。 SM的数量取决于用于检索它们的算法中必须设置的最小匹配长度（？）。确实，如果？太小，可以恢复很多匹配，但大多数都是SM。相反，如果？太大，将检索到较少的匹配项，但肯定会忽略许多较小的有效匹配项。迄今为止，选择？主要取决于经验阈值，而不是可靠的统计方法。为了克服这个问题，我们提出了一种基于几何分布混合模型的统计方法，以表征通过比较两个基因组序列获得的匹配长度的分布。

著录项

来源
《Journal of computational biology: A journal of computational molecular cell biology》 |2012年第1期|共12页
作者
HUGO DEVILLERS; SOPHIE SCHBATH;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类生物数学方法;
关键词
comparative genomics; match length; maximal exact matches; mixture model.;

机译：比较基因组学;匹配长度;最大精确匹配;混合物模型。;

相似文献

外文文献
中文文献
专利

1. Separating Significant Matches from Spurious Matches in DNA Sequences [J] . HUGO DEVILLERS, SOPHIE SCHBATH Journal of computational biology: A journal of computational molecular cell biology . 2012,第1期

机译：从DNA序列中的假匹配中分离出重要的匹配
2. DNA polymorphism detector: an automated tool that searches for allelic matches in public databases for discrepancies found in clone or cDNA sequences [J] . Chang CY, LaBaer J Bioinformatics . 2005,第9期

机译：DNA多态性检测器：一种自动工具，可在公共数据库中搜索等位基因匹配以发现克隆或cDNA序列中的差异
3. DNA polymorphism detector: an automated tool that searches for allelic matches in public databases for discrepancies found in clone or cDNA sequences [J] . Chang CY, LaBaer J Bioinformatics . 2005,第9期

机译：DNA多态性检测器：一种自动工具，可在公共数据库中搜索等位基因匹配以发现克隆或cDNA序列中的差异
4. Optimistically building a consensus sequence using F-inexact matches (DNA) [C] . Cull, P., Holloway, . 1991

机译：使用F-inexact匹配（DNA）乐观地建立共有序列
5. Utilizing a Ranking Machine Learning Algorithm on Retention Time Results to Reduce False Positive Matches in Untargeted Metabolomics [D] . Ruby, Allan K. 2021

机译：利用保留时间的排名机器学习算法结果导致在未标准的代谢组中减少假阳性匹配
6. Separating Significant Matches from Spurious Matches in DNA Sequences [O] . Hugo Devillers, Sophie Schbath -1

机译：从DNA序列中的假匹配中分离出重要的匹配
7. Separating Significant Matches from Spurious Matches in DNA Sequences [O] . Hugo Devillers, Sophie Schbath 2012

机译：在DNA序列中分离杂散匹配的显着比赛

Separating Significant Matches from Spurious Matches in DNA Sequences

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅