...
首页> 外文期刊>Molecular phylogenetics and evolution >Clustering DNA sequences by feature vectors
【24h】

Clustering DNA sequences by feature vectors

机译:通过特征向量对DNA序列进行聚类

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We represent all DNA sequences as points in twelve-dimensional space in such a way that homologous DNA sequences are clustered together, from which a new genomic space is created for global DNA sequences comparison of millions of genes simultaneously. More specifically, basing on the contents of four nucleotides, their distances from the origin and their distribution along the sequences, a twelve-dimensional vector is given to any DNA sequence. The applicability of this analysis on global comparison of gene structures was tested on myoglobin, beta-globin, histone-4, lysozyme, and rhodopsin families. Members from each family exhibit smaller vector distances relative to the distances of members from different families. The vector distance also distinguishes random sequences generated based on same bases composition. Sequence comparisons showed consistency with the BLAST method. Once the new gene is discovered, we can compute the location of this new gene in our genomic space. It is natural to predict that the properties of this new gene are similar to the properties of known genes that are locating near by. Biologists can do various experiments to test these properties. (c) 2006 Elsevier Inc. All rights reserved.
机译:我们将所有DNA序列表示为十二维空间中的点,以这样的方式将同源DNA序列聚集在一起,从中创建一个新的基因组空间,以便同时比较数百万个基因的全局DNA序列。更具体地,基于四个核苷酸的含量,它们与起点的距离以及它们沿着序列的分布,将十二维载体赋予任何DNA序列。在肌红蛋白,β-球蛋白,组蛋白-4,溶菌酶和视紫红质家族上测试了该分析在全球基因结构比较中的适用性。相对于来自不同家庭的成员的距离,每个家庭的成员表现出较小的矢量距离。向量距离还区分基于相同碱基组成生成的随机序列。序列比较显示与BLAST方法一致。一旦发现了新基因,我们就可以计算出该新基因在我们基因组空间中的位置。可以自然地预测到该新基因的特性与附近的已知基因的特性相似。生物学家可以做各种实验来测试这些特性。 (c)2006 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号