首页> 外文会议>Annual ACM symposium on applied computing >Virus DNA-fragment Classification using Taxonomic Hidden Markov Model Profiles
【24h】

Virus DNA-fragment Classification using Taxonomic Hidden Markov Model Profiles

机译:病毒DNA片段分类使用分类马丘隐马尔可夫模型概况

获取原文

摘要

In most viral metagenomic studies, genetic material from a diversity of organisms is sampled from the environment and sequenced using Sanger or 454 sequencing. This process typically results in DNA-fragments that need to be assembled into contigs and annotated before any inferences or conclusions can be drawn from the data in hand. However, one problem subsists: both the relatively short length of the sequenced DNA-fragments and the high level of diversity present in a viral community result in a large number of unassembled and unannotated DNA-fragments. This problem limits our capability to better understand the viral community under study. We present the preliminary results of a new annotation method, targeting the virus sequences highly likely to be left unannotated by conventional methods. The resulting system, called Anacle, gives a taxonomic annotation for virus sequences excluded by a pre-screening with BLAST. Anacle uses an automated method relying a) on the Markov clustering (MCL) of all protein sequences belonging to the same taxon and b) on constructing each taxon's genetic variations (skeletons) using Hidden Markov Model (HMM) profiles. The taxonomic annotation consists of comparing each unannotated DNA-fragment to all the skeletons, and labeling them as belonging to the taxon associated with the best similarity score. We have evaluated Anacle's performance on a simulated metagenomes dataset with 100 and 700 base pairs. The results show that Anacle can taxonomically annotate viruses DNA-fragments with high precision and specificity. It indicates that the proposed method can provide valuable taxonomic information about DNA-fragments that could be left unannotated by other methods. We also present Anacle's performance on a small Sargasso Sea dataset.
机译:在大多数病毒性偏见研究中,来自植物多样性的遗传物质被从环境中取样并使用Sanger或454测序测序。该过程通常导致需要组装成COLDIG的DNA片段并在手中从数据中抽出任何推论或结论之前注释。然而,一个问题是:测序的DNA片段的相对短的长度和病毒群中存在的高水平分集导致大量的未组装和未谐振的DNA片段。这个问题限制了我们更好地了解研究中的病毒界的能力。我们提出了一种新的注释方法的初步结果,靶向病毒序列,其高可能通过常规方法未被释放。由此产生的系统称为Anacle,给出了由爆炸预筛选的病毒序列的分类编辑注释。 Anacle在Markov聚类(MCL)上使用自动方法依赖a)所有蛋白质序列的所有蛋白质序列和b)在使用隐马尔可夫模型(HMM)型材构建每个分类群的遗传变异(骷髅)。分类管理注释包括将每个未经发布的DNA片段与所有骷髅进行比较,并将其标记为属于与最佳相似性分数相关的分类群。我们在具有100和700个基对的模拟的MetageNomes数据集中评估了Anracle的性能。结果表明,身体上的小几分类辅助病毒具有高精度和特异性的病毒DNA片段。它表明,所提出的方法可以提供有关DNA片段的有价值的分类信息,这些信息可以通过其他方法未悬垂。我们还在小型Sargasso Sea DataSet上呈现Anracle的表现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号