首页> 外文会议>2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops. >A meta-genome sequencing and assembly preprocessing algorithm inspired by restriction site base composition
【24h】

A meta-genome sequencing and assembly preprocessing algorithm inspired by restriction site base composition

机译:受限制位点碱基组成启发的元基因组测序和装配预处理算法

获取原文
获取原文并翻译 | 示例

摘要

Motivation: In meta-genome sequencing and assembly projects, where there are different types of contigs mixed together in a single pool, the task of assembling its different organisms is a complex and challenging problem. It is therefore desirable to sort the contigs by origins into separate bins from which to work. We propose a framework of using the base compositions of bacterial restriction sites to generate sets of motifs which work to differentiate organismal groups, including the contigs from those groups. We introduce spectrum sets and show how to strategically select them for use in binning contigs from different organisms. We suggest that this framework can save time during a meta-genome sequencing and assembly project. Results: Our method is able to differentiate organisms and to successfully determine the association of the contigs which were derived from an organism. In particular, we show that two genera are fundamentally different by analyzing their motif proportions. Using one of the four total spectrum sets, which encompass all known restriction sites, we show that different sets have different abilities to distinguish sequences. In addition, we show that the selection of a spectrum set which is relevant to one organism, but not the other, greatly improves performance of differentiation, even when the contig size is short (1000bps). Conclusions: Using ten trials of newly selected contigs to confirm our premise, our study provides a proof of concept for a novel and computationally effective method for a preprocessing step in meta-genome sequencing and assembly tasks.
机译:动机:在元基因组测序和组装项目中,一个池中混合有不同类型的重叠群,组装其不同生物的任务是一个复杂而具有挑战性的问题。因此,期望将重叠群按来源分类到单独的箱中,从该箱中开始工作。我们提出了使用细菌限制性位点的基本组成来生成基序集的框架,这些基序可以区分生物群,包括来自那些群的重叠群。我们介绍了光谱集,并展示了如何从策略上选择它们以用于分选来自不同生物的重叠群。我们建议该框架可以节省元基因组测序和装配项目的时间。结果:我们的方法能够区分生物,并成功确定源自生物的重叠群。特别是,我们通过分析两个主题的比例来证明它们在本质上是不同的。使用涵盖所有已知限制位点的四个总谱集之一,我们表明不同的谱集具有区分序列的不同能力。此外,我们表明,即使重叠群大小很短(1000bps),选择与一个生物体相关但与另一生物体无关的谱集也可以极大地提高分化性能。结论:使用十个新近来重叠群的试验来确认我们的前提,我们的研究提供了一种新颖且计算有效的方法用于元基因组测序和装配任务中预处理步骤的概念证明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号