首页> 外文期刊>Bioinformatics >SPANNER: taxonomic assignment of sequences using pyramid matching of similarity profiles
【24h】

SPANNER: taxonomic assignment of sequences using pyramid matching of similarity profiles

机译:SPANNER:使用相似性特征的金字塔匹配对序列进行分类分配

获取原文
获取原文并翻译 | 示例
       

摘要

Background: Homology-based taxonomic assignment is impeded by differences between the unassigned read and reference database, forcing a rank-specific classification to the closest (and possibly incorrect) reference lineage. This assignment may be correct only to a general rank (e. g. order) and incorrect below that rank (e. g. family and genus). Algorithms like LCA avoid this by varying the predicted taxonomic rank based on matches to a set of taxonomic references. LCA and related approaches can be conservative, especially if best matches are taxonomically widespread because of events such as lateral gene transfer (LGT). Results: Our extension to LCA called SPANNER (similarity profile annotater) uses the set of best homology matches (the LCA Profile) for a given sequence and compares this profile with a set of profiles inferred from taxonomic reference organisms. SPANNER provides an assignment that is less sensitive to LGT and other confounding phenomena. In a series of trials on real and artificial datasets, SPANNER outperformed LCA-style algorithms in terms of taxonomic precision and outperformed best BLAST at certain levels of taxonomic novelty in the dataset. We identify examples where LCA made an overly conservative prediction, but SPANNER produced a more precise and correct prediction. Conclusions: By using profiles of homology matches to represent patterns of genomic similarity that arise because of vertical and lateral inheritance, SPANNER offers an effective compromise between taxonomic assignment based on best BLAST scores, and the conservative approach of LCA and similar approaches.
机译:背景:未分配的阅读数据库和参考数据库之间的差异阻碍了基于同源性的分类学分配,从而迫使针对特定等级的分类最接近(可能不正确)参考谱系。该分配仅对一般等级(例如订单)是正确的,而在该等级以下(例如家庭和属)是不正确的。诸如LCA之类的算法通过基于对一组分类参考的匹配来更改预测的分类等级来避免这种情况。 LCA和相关方法可能是保守的,特别是如果由于诸如横向基因转移(LGT)之类的事件而在分类学上最佳匹配最广泛的情况下。结果:我们对LCA的扩展名为SPANNER(相似性图谱注释者)使用给定序列的最佳同源性匹配集(LCA谱图),并将此谱图与从分类学参考生物中推断出的一组谱图进行比较。 SPANNER提供的作业对LGT和其他混杂现象不太敏感。在对真实数据集和人工数据集的一系列试验中,SPANNER在分类精确度方面优于LCA风格的算法,并且在数据集中的某些分类新颖性方面优于最佳BLAST。我们确定了一些示例,其中LCA做出了过于保守的预测,但是SPANNER给出了更加精确和正确的预测。结论:通过使用同源性匹配的概况来表示由于纵向和横向遗传而产生的基因组相似性模式,SPANNER在基于最佳BLAST评分的分类分配与LCA的保守方法和类似方法之间进行了有效折衷。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号