首页> 外文会议>Pattern recognition in bioinformatics >Counting Patterns in Degenerated Sequences

【24h】

Counting Patterns in Degenerated Sequences

机译：退化序列中的计数模式

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Biological sequences like DNA or proteins, are always obtained through a sequencing process which might produce some uncertainty. As a result, such sequences are usually written in a degenerated alphabet where some symbols may correspond to several possible letters (ex: IUPAC DNA alphabet). When counting patterns in such degenerated sequences, the question that naturally arises is: how to deal with degenerated positions ? Since most (usually 99%) of the positions are not degenerated, it is considered harmless to discard the degenerated positions in order to get an observation, but the exact consequences of such a practice are unclear. In this paper, we introduce a rigorous method to take into account the uncertainty of sequencing for biological sequences (DNA, Proteins). We first introduce a Forward-Backward approach to compute the marginal distribution of the constrained sequence and use it both to perform a Expectation-Maximization estimation of parameters, as well as deriving a heterogeneous Markov distribution for the constrained sequence. This distribution is hence used along with known DFA-based pattern approaches to obtain the exact distribution of the pattern count under the constraints. As an illustration, we consider a EST dataset from the EMBL database. Despite the fact that only 1% of the positions in this dataset are degenerated, we show that not taking into account these positions might lead to erroneous observations, further proving the interest of our approach.

机译：生物序列（例如DNA或蛋白质）总是通过测序过程获得，这可能会产生一些不确定性。结果，这样的序列通常以简并的字母书写，其中某些符号可能对应于几个可能的字母（例如：IUPAC DNA字母）。当计算这种退化序列中的模式时，自然会出现一个问题：如何处理退化位置？由于大多数位置（通常为99％）没有退化，因此丢弃退化位置以进行观察被认为是无害的，但是这种做法的确切结果尚不清楚。在本文中，我们介绍了一种严格的方法来考虑生物序列（DNA，蛋白质）测序的不确定性。我们首先介绍一种向前-向后方法来计算约束序列的边际分布，并使用它来执行参数的期望最大化估计，以及导出约束序列的异构马尔可夫分布。因此，此分布与已知的基于DFA的图案方法一起使用，以获得约束条件下图案计数的精确分布。作为说明，我们考虑了EMBL数据库中的EST数据集。尽管事实上该数据集中只有1％的位置已退化，但我们表明，不考虑这些位置可能会导致错误的观察结果，从而进一步证明了我们方法的重要性。

著录项

来源
《Pattern recognition in bioinformatics 》|2009年|222-232|共11页
会议地点 Sheffield(GB);Sheffield(GB)
作者
Gregory Nuel;
展开▼
作者单位

MAP5, CNRS 8145, University Paris Descartes, 45 rue des Saint-Peres, F-75006 Paris, France;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类生物工程学（生物技术） ;
关键词
forward-backward algorithm; expectation-maximization algorithmn; markov chain embedding; deterministic finite state automaton;

机译：前向后向算法；期望最大化算法马尔可夫链嵌入；确定性有限状态自动机;

相似文献

外文文献
中文文献
专利

1. A New Approach to Pattern Matching in Degenerate DNA/RNA Sequences and Distributed Pattern Matching [J] . Costas S. Iliopoulos, Laurent Mouchard, M. Sohel Rahman Mathematics in Computer Science . 2008 ,第4期

机译：简并DNA / RNA序列模式匹配和分布模式匹配的新方法
2. Counting of a Degenerate Word in Random Sequences [J] . Wei-MouZheng, Ke-Song Liu The open applied informatics journal . 2010 ,第1期

机译：随机序列中简并单词的计数
3. Statistical analysis of counts and spacing of consistent repeating patterns in a set of homologous DNA sequences [J] . D. V. Raje, H. J. Purohit, P. Lijnzaad, Current Science: A Fortnightly Journal of Research . 2006 ,第6期

机译：一组同源DNA序列中一致重复模式的计数和间隔的统计分析
4. Counting Patterns in Degenerated Sequences [C] . Gregory Nuel International Workshop on Pattern Recognition in Bioinformatics . 2009

机译：计算退化序列中的模式
5. A Quantitative Analysis of Star-Forming Galaxies at Intermediate Redshifts: Number Counts, Morphological Sequences, and Evolutionary Timescales. [D] . Voyer, Elysse Nicole. 2011

机译：中等红移时恒星形成星系的定量分析：数量计数，形态序列和演化时标。
6. A Prototypic Lysine Methyltransferase 4 from Archaea with Degenerate Sequence Specificity Methylates Chromatin Proteins Sul7d and Cren7 in Different Patterns [O] . Yanling Niu, Yisui Xia, Sishuo Wang, 2013

机译：来自古细菌的原型赖氨酸甲基转移酶4具有不同模式的简并序列特异性甲基化染色质蛋白Sul7d和Cren7。
7. A New Approach to Pattern Matching in Degenerate DNA/RNA Sequences and Distributed Pattern Matching [O] . Costas S. Iliopoulos, L. Mouchard, M. Sohel Rahman 2008

机译：退化DNa / RNa序列模式匹配与分布式模式匹配的新方法

Counting Patterns in Degenerated Sequences

摘要

著录项

相似文献

相关主题

期刊订阅