首页> 外文期刊>Molecular biology and evolution >A Comprehensive Vertebrate Phylogeny Using Vector Representations of Protein Sequences form Whole Genomes
【24h】

A Comprehensive Vertebrate Phylogeny Using Vector Representations of Protein Sequences form Whole Genomes

机译:使用蛋白质序列的载体表示形成全基因组的综合脊椎动物系统发育

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

We recently developed a method for producing comprehensive gene and species phylogenies from unaligned whole genome data using singular value decomposition (SVD) to analyze character string frequencies. This work provides an integrated gene and species phylogeny for 64 vertebrate mitochondrial genomes composed of 832 total proteins. In addition, to provide a theoretical basis for the method, we present a graphical interpretation of both the original frequency matrix and the SVD-derived matrix. These large matrices describe high-dimensional Euclidean spaces within which biomolecular sequences can be uniquely represented as vectors. In particular, the SVD-derived vector space describes each protein relative to a restricted set of newly defined, independent axes, each of which represents a novel form of conserved motif, termed a correlated peptide motif. A quantitative comparison of the relative orientations of protein vectors in this space provides accurate and straightforward estimates of sequence similarity, from individual species can be summed, allowing species trees to be produced.
机译:我们最近开发了一种方法,使用奇异值分解 (SVD) 从未对齐的全基因组数据中生成全面的基因和物种系统发育来分析字符串频率。这项工作为由832个总蛋白组成的64个脊椎动物线粒体基因组提供了完整的基因和物种系统发育。此外,为了给该方法提供理论基础,我们提出了原始频率矩阵和SVD衍生矩阵的图形解释。这些大型矩阵描述了高维欧几里得空间,其中生物分子序列可以唯一地表示为载体。特别是,SVD衍生的向量空间描述了相对于一组有限的新定义的独立轴的每种蛋白质,每个轴代表一种新形式的保守基序,称为相关肽基序。在该空间中对蛋白质载体的相对取向进行定量比较,可以准确而直接地估计序列相似性,可以从单个物种中求和,从而产生物种树。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号