首页> 外文期刊>Computational biology and chemistry >Complexity measures for the evolutionary categorization of organisms
【24h】

Complexity measures for the evolutionary categorization of organisms

机译:用于生物进化分类的复杂性度量

获取原文
获取原文并翻译 | 示例
           

摘要

Complexity measures are used to compare the genomic characteristics of five organisms belonging to distinct classes spanning the evolutionary tree: higher eukaryotes, amoebae, unicellular eukaryotes and bacteria. The comparisons are undertaken using the full four-letter alphabet and the coarse grained two-letter alphabets AG-CT and AT-CG. We show that the conditional probability matrix for the four-letter and AT-CG alphabet is markedly asymmetric in eukaryotes while it is nearly symmetric in bacterial genomes. Spatial asymmetry is revealed in the four-letter alphabet, signifying that the probability fluxes are nonvanishing and thus the reading sense of a sequence is irreversible for all organisms. Calculations of the block entropy and excess entropy demonstrate that the human genome accommodates better all possible block configurations, especially for long blocks. With respect to point-to-point details and to spatial arrangement of blocks the exit distance distributions from a particular letter demonstrate long distance characteristics in the eukaryotic sequences for all three alphabets, while the bacterial (prokar-yotic) genomes deviate indicating short range characteristics. Overall, the conditional probability, the fluxes, the block entropy content and the exit distance distributions can be used as markers, discriminating between eukaryotic and prokaryotic DNA, allowing in many cases to discern details related to finer classes. In all cases the reduction from four letters to two masks some important statistical and spatial properties, with the AT-CG alphabet having higher ability of discrimination than the AG-CT one. In particular, the AT-CG alphabet reduction accentuates the CpG related properties (conditional probabilities W_(32), long ranged exit distance distribution for A and T nucleotides), but masks sequence asymmetry and irreversibility in all examined organisms.
机译:复杂性度量用于比较属于进化树的不同类别的五种生物的基因组特征:高等真核生物,变形虫,单细胞真核生物和细菌。使用完整的四个字母的字母和粗糙的两个字母的字母AG-CT和AT-CG进行比较。我们表明,四字母和AT-CG字母的条件概率矩阵在真核生物中明显不对称,而在细菌基因组中则几乎对称。在四个字母的字母中揭示了空间不对称性,这表明概率通量没有消失,因此,对于所有生物来说,序列的读取都是不可逆的。块熵和过量熵的计算表明,人类基因组可以更好地适应所有可能的块构型,特别是对于长块。关于点对点的细节和块的空间排列,特定字母的出口距离分布在所有三个字母的真核序列中均表现出长距离特征,而细菌(原核生物)基因组则偏离,表明其短距离特征。总体而言,条件概率,通量,嵌段熵含量和出口距离分布都可以用作标记,区分真核和原核DNA,从而在许多情况下可以识别与更精细分类有关的细节。在所有情况下,从四个字母减少到两个字母掩盖了一些重要的统计和空间属性,其中AT-CG字母的辨别能力比AG-CT字母的辨别能力高。尤其是,AT-CG字母的减少强调了CpG相关的属性(条件概率W_(32),A和T核苷酸的远距离出口距离分布),但掩盖了所有检查过的生物中的序列不对称和不可逆性。

著录项

  • 来源
    《Computational biology and chemistry》 |2014年第ptaa期|5-14|共10页
  • 作者单位

    Institute of Nanoscience and Nanotechnology, National Center for Scientific Research 'Demokritos', 15310 Athens, Greece;

    Institut Royal Meteorogique de Belgique, 3 Avenue Circulaire, 1180 Bruxelles, Belgium;

    Interdisciplinary Center for Nonlinear Phenomena and Complex Systems, Universite Libre de Bruxelles, Campus Plaine, C.P. 231, 1050 Bruxelles, Belgium;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Genomic sequences; Irreversibly; Probability fluxes; Block entropy;

    机译:基因组序列;不可逆转概率通量;块熵;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号