首页> 美国卫生研究院文献>Proceedings of the National Academy of Sciences of the United States of America >Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions
【2h】

Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions

机译:无序列比对的基因组频率特征(FFP)和最佳分辨率

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

For comparison of whole-genome (genic + nongenic) sequences, multiple sequence alignment of a few selected genes is not appropriate. One approach is to use an alignment-free method in which feature (or l-mer) frequency profiles (FFP) of whole genomes are used for comparison—a variation of a text or book comparison method, using word frequency profiles. In this approach it is critical to identify the optimal resolution range of l-mers for the given set of genomes compared. The optimum FFP method is applicable for comparing whole genomes or large genomic regions even when there are no common genes with high homology. We outline the method in 3 stages: (i) We first show how the optimal resolution range can be determined with English books which have been transformed into long character strings by removing all punctuation and spaces. (ii) Next, we test the robustness of the optimized FFP method at the nucleotide level, using a mutation model with a wide range of base substitutions and rearrangements. (iii) Finally, to illustrate the utility of the method, phylogenies are reconstructed from concatenated mammalian intronic genomes; the FFP derived intronic genome topologies for each l within the optimal range are all very similar. The topology agrees with the established mammalian phylogeny revealing that intron regions contain a similar level of phylogenic signal as do coding regions.
机译:对于比较全基因组(基因+非基因)序列,几个选定基因的多序列比对是不合适的。一种方法是使用无比对方法,其中将整个基因组的特征(或I-mer)频率分布图(FFP)用于比较-使用单词频率分布图的文本或书籍比较方法的一种变体。在这种方法中,对于给定的一组基因组,确定I-mer的最佳分辨率范围至关重要。最佳FFP方法适用于比较整个基因组或较大的基因组区域,即使没有高度同源的常见基因也是如此。我们分3个阶段概述该方法:(i)我们首先说明如何通过英语书本来确定最佳分辨率范围,而英语书本通过删除所有标点符号和空格将其转换为长字符串。 (ii)接下来,我们使用具有广泛碱基取代和重排范围的突变模型,在核苷酸水平上测试优化的FFP方法的稳健性。 (iii)最后,为了说明该方法的实用性,从串联的哺乳动物内含子基因组重建了系统发育树;在最佳范围内,每个l的FFP衍生的内含子基因组拓扑都非常相似。该拓扑结构与已建立的哺乳动物系统发育相吻合,揭示了内含子区域与编码区域所包含的系统发育信号水平相似。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号