首页> 外文OA文献 >metaMatch: un algorithme pour l'assignation taxonomique en métagénomique
【2h】

metaMatch: un algorithme pour l'assignation taxonomique en métagénomique

机译:metamatch:un algorithme pour l'assignation taxonomiqueenmétagénomique

摘要

Community ecology faces a new challenge as the next-generation sequencing approaches can yield data from hundreds of microbial community samples. This way, combined with accurate and reliable taxonomic assessment, yields hundreds of new data that will contribute to a better understanding of community assemblies formed under various environmental and historical conditions. Algorithms classifying sequences by comparison to a reference library are the most widely used tools for assessing community composition of environmental samples. However, as they are computationally intensive, almost all these algorithms (most standard being BLAST and similar offsprings) use heuristics designed to speed up the database exploration phase, at the cost of being less strict with the quality of the match between a query and a reference. This problem is naturally distributable, as all comparisons (query, reference) are independent. Here, we present a tool enabling comparisons between queries ( say, one million reads) and reference sequences (say, several thousands), and its implementation on two infrastructures: a cluster in MCIA (Mésocentre de Calcul Intensif en Aquitaine) and a production grid EGI. We show how tracking the large number of jobs generated was nearly impossible with gLite, and how this problem could be solved using Dirac. We compare time and quality between a run on Avakas and on the grid EGI. As a perspective, we will develop a user friendly interface enabling this tool to be used routinely on the grid as a diagnostic for a user not acquainted with computing subtleties of the grid.
机译:下一代测序方法可以从数百种微生物群落样本中获得数据,因此群落生态面临新的挑战。这样,结合准确可靠的分类学评估,就可以产生数百个新数据,这些数据将有助于更好地了解在各种环境和历史条件下形成的社区集会。通过与参考库比较对序列进行分类的算法是评估环境样品的群落组成的最广泛使用的工具。但是,由于它们运算量大,因此几乎所有这些算法(大多数标准是BLAST和类似的后代)都使用启发式算法来加快数据库探索阶段的速度,但代价是对查询和查询之间的匹配质量要求不严格。参考。因为所有比较(查询,引用)都是独立的,所以这个问题自然是可分配的。在这里,我们提供了一个工具,可以比较查询(例如,一百万个读取)和参考序列(例如,数千个),并在两种基础结构上进行实施:MCIA中的集群(Mésocentrede Calcul Intensif en Aquitaine)和生产网格EGI。我们展示了如何使用gLite几乎不可能跟踪生成的大量作业,以及如何使用Dirac解决此问题。我们比较了在Avakas和网格EGI上运行的时间和质量。从一个角度来看,我们将开发一个用户友好的界面,使该工具可以在网格上常规使用,作为对不熟悉网格计算细节的用户的诊断。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号