首页> 美国卫生研究院文献>PLoS Clinical Trials >On the Representability of Complete Genomes by Multiple Competing Finite-Context (Markov) Models

【2h】

On the Representability of Complete Genomes by Multiple Competing Finite-Context (Markov) Models

机译：基于多重竞争有限上下文（Markov）模型的完整基因组可表示性

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

A finite-context (Markov) model of order yields the probability distribution of the next symbol in a sequence of symbols, given the recent past up to depth . Markov modeling has long been applied to DNA sequences, for example to find gene-coding regions. With the first studies came the discovery that DNA sequences are non-stationary: distinct regions require distinct model orders. Since then, Markov and hidden Markov models have been extensively used to describe the gene structure of prokaryotes and eukaryotes. However, to our knowledge, a comprehensive study about the potential of Markov models to describe complete genomes is still lacking. We address this gap in this paper. Our approach relies on (i) multiple competing Markov models of different orders (ii) careful programming techniques that allow orders as large as sixteen (iii) adequate inverted repeat handling (iv) probability estimates suited to the wide range of context depths used. To measure how well a model fits the data at a particular position in the sequence we use the negative logarithm of the probability estimate at that position. The measure yields information profiles of the sequence, which are of independent interest. The average over the entire sequence, which amounts to the average number of bits per base needed to describe the sequence, is used as a global performance measure. Our main conclusion is that, from the probabilistic or information theoretic point of view and according to this performance measure, multiple competing Markov models explain entire genomes almost as well or even better than state-of-the-art DNA compression methods, such as XM, which rely on very different statistical models. This is surprising, because Markov models are local (short-range), contrasting with the statistical models underlying other methods, where the extensive data repetitions in DNA sequences is explored, and therefore have a non-local character.

机译：阶的有限上下文（Markov）模型会给出给定最近的过去深度，从而产生一系列符号中下一个符号的概率分布。马尔可夫建模长期以来一直应用于DNA序列，例如查找基因编码区。最初的研究带来了DNA序列不稳定的发现：不同的区域需要不同的模型顺序。从那时起，马尔可夫模型和隐马尔可夫模型被广泛用于描述原核生物和真核生物的基因结构。然而，据我们所知，仍然缺乏关于马尔可夫模型描述完整基因组潜力的全面研究。我们在本文中解决了这一空白。我们的方法依赖于（i）不同阶的多个竞争性Markov模型（ii）允许高达16阶的阶的谨慎编程技术（iii）足够的反向重复处理（iv）适用于所使用的广泛上下文深度的概率估计。为了衡量模型在序列中特定位置的数据拟合程度，我们使用该位置概率估计值的负对数。该度量产生序列的信息分布图，这些信息分布图是独立感兴趣的。整个序列的平均值（相当于描述序列所需的每个碱基的平均位数）用作全局性能度量。我们的主要结论是，从概率论或信息论的观点出发，根据这种性能指标，多个竞争性马尔可夫模型可以解释整个基因组，其结果几乎比最先进的DNA压缩方法（例如XM）更好甚至更好。，它们依赖于非常不同的统计模型。这是令人惊讶的，因为马尔可夫模型是局部的（短程），这与其他方法所基于的统计模型形成了鲜明的对比，在其他方法中，DNA序列中的大量数据重复被研究出来，因此具有非局部性。

著录项

期刊名称 PLoS Clinical Trials
作者
Armando J. Pinho; Paulo J. S. G. Ferreira; António J. R. Neves; Carlos A. C. Bastos;
展开▼
作者单位

展开▼
年(卷),期 2008(6),6
年度 2008
页码 e21588
总页数 7
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure [J] . Goug J, Karplus K, Hughey R, Journal of Molecular Biology . 2001,第4期

机译：使用代表已知结构的所有蛋白质的隐马尔可夫模型库将同源性分配给基因组序列
2. A diarrheic chicken simultaneously co-infected with multiple picornaviruses: Complete genome analysis of avian picornaviruses representing up to six genera [J] . Boros Akos, Pankovics Peter, Adonyi Adam, Virology . 2016,第Null期

机译：同时感染多种小核糖核酸病毒的腹泻鸡：代表多达六个属的禽小核糖核酸病毒的完整基因组分析
3. Large-scale multiple testing in genome-wide association studies via region-specific hidden Markov models [J] . Jian Xiao, Wensheng Zhu, Jianhua Guo BMC Bioinformatics . 2013,第1期

机译：通过区域特定的隐马尔可夫模型对全基因组关联研究进行大规模的多重测试
4. DNA synthetic sequences generation using multiple competing Markov models [C] . Pratas Diogo, Bastos Carlos A. C., Pinho Armando J., 2011 IEEE Statistical Signal Processing Workshop . 2011

机译：使用多个竞争马尔可夫模型生成DNA合成序列
5. Multi-genome annotation of genome fragments using hidden Markov model profiles [D] . Menor, Mark. 2007

机译：使用隐藏的Markov模型图谱对基因组片段进行多基因组注释
6. Large-scale multiple testing in genome-wide association studies via region-specific hidden Markov models [O] . Jian Xiao, Wensheng Zhu, Jianhua Guo 2013

机译：通过区域特定的隐马尔可夫模型对全基因组关联研究进行大规模的多重测试
7. On the Representability of Complete Genomes by Multiple Competing Finite-Context (Markov) Models [O] . Pinho, Armando J., Ferreira, Paulo J. S. G., Neves, António J. R., 2011

机译：基于多重竞争有限上下文（Markov）模型的完整基因组可表示性

On the Representability of Complete Genomes by Multiple Competing Finite-Context (Markov) Models

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅