首页> 外文OA文献 >KITSUNE: A Tool for Identifying Empirically Optimal K-mer Length for Alignment-Free Phylogenomic Analysis
【2h】

KITSUNE: A Tool for Identifying Empirically Optimal K-mer Length for Alignment-Free Phylogenomic Analysis

机译:Kitsune:一种用于识别无验证最佳K-MER长度的工具,用于对准的系统核糖组织分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Genomic DNA is the best “unique identifier” for organisms. Alignment-free phylogenomic analysis, simple, fast, and efficient method to compare genome sequences, relies on looking at the distribution of small DNA sequence of a particular length, referred to as k-mer. The k-mer approach has been explored as a basis for sequence analysis applications, including assembly, phylogenetic tree inference, and classification. Although this approach is not novel, selecting the appropriate k-mer length to obtain the optimal resolution is rather arbitrary. However, it is a very important parameter for achieving the appropriate resolution for genome/sequence distances to infer biologically meaningful phylogenetic relationships. Thus, there is a need for a systematic approach to identify the appropriate k-mer from whole-genome sequences. We present K-mer–length Iterative Selection for UNbiased Ecophylogenomics (KITSUNE), a tool for assessing the empirically optimal k-mer length of any given set of genomes of interest for phylogenomic analysis via a three-step approach based on (1) cumulative relative entropy (CRE), (2) average number of common features (ACF), and (3) observed common features (OCF). Using KITSUNE, we demonstrated the feasibility and reliability of these measurements to obtain empirically optimal k-mer lengths of 11, 17, and ∼34 from large genome datasets of viruses, bacteria, and fungi, respectively. Moreover, we demonstrated a feature of KITSUNE for accurate species identification for the two de novo assembled bacterial genomes derived from error-prone long-reads sequences, and for a published yeast genome. In addition, KITSUNE was used to identify the shortest species-specific k-mer accurately identifying viruses. KITSUNE is freely available at https://github.com/natapol/kitsune.
机译:基因组DNA是最好的“唯一标识符”为生物体。对准自由phylogenomic分析,简便,快速,高效的方法来比较基因组序列,依赖于寻找一个特定长度的小DNA序列的分配,被称为k链节。第k-mer的方法已被开发为用于序列分析的应用,包括组件,系统发生树的推断,并且分类的基础。虽然这种方法并不新颖,选择适当的k链节长度,以获得最佳分辨率是相当武断的。但是,它是实现基因组/序列距离适当的分辨率来推断生物学意义的亲缘关系非常重要的参数。因此,需要一种系统的方法来从全基因组序列中识别适当的k链节。我们本K-聚体长度迭代选择无偏Ecophylogenomics(KITSUNE),用于经由基于(1)累积的三步骤方法评估任何给定的用于phylogenomic分析感兴趣的基因组的经验最佳k链节长度的工具相对熵(CRE),(2)的共同特征平均数(ACF),以及(3)观察到的常见功能(OCF)。使用KITSUNE,我们证明这些测量的可行性和可靠性,得到凭经验最佳分别从病毒,细菌和真菌的基因组大的数据集,11,17,和〜34 k链节的长度。此外,我们证明KITSUNE的特征进行精确的品种鉴定用于两个从头组装从容易出错的长读取序列的细菌基因组,以及用于发布的酵母基因组。此外,KITSUNE被用于识别最短的物种特异性k链节准确地确定病毒。 KITSUNE是免费提供的https://github.com/natapol/kitsune。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号