...
首页> 外文期刊>Scientific reports. >Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer
【24h】

Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer

机译:使用一个对准方法的病毒系统:一种三步方法来确定k-mer的最佳长度

获取原文
           

摘要

2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral “tree of life”. However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conserved proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. The resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.
机译:200万病毒基因组序列和参考组〜4000病毒基因组序列,涵盖了各种已知的病毒族。全基因组序列可用于改善病毒分类,并向病毒“生命之树”提供深入。然而,由于在不同的病毒中缺乏进化守恒,使用基于保守蛋白质的传统系统发育方法构建病毒树是不可行的。在这项研究中,我们使用了一种对准的方法,该方法使用K-MERS作为基因组特征,以进行REFSeq中可用的完整病毒基因组的大规模比较。为了确定最佳特征长度,K(构建有意义的树形图的基本步骤),我们设计了一种结合三种方法的全面策略:(1)累积相对熵,(2)基因组之间的平均共同特征数量,以及(3 Shannon多样性指数。该策略用于确定Refseq中所有3,905个完整病毒基因组的K.所得到的树枝图显示了ICTV的病毒分类和病毒分类的一致性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号