The Amordad database engine for metagenomics

Behnam Ehsan; Smith Andrew D.

首页> 外文期刊>Bioinformatics >The Amordad database engine for metagenomics

【24h】

The Amordad database engine for metagenomics

机译：用于宏基因组学的Amordad数据库引擎

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Motivation: Several technical challenges in metagenomic data analysis, including assembling metagenomic sequence data or identifying operational taxonomic units, are both significant and well known. These forms of analysis are increasingly cited as conceptually flawed, given the extreme variation within traditionally defined species and rampant horizontal gene transfer. Furthermore, computational requirements of such analysis have hindered content-based organization of metagenomic data at large scale.Results: In this article, we introduce the Amordad database engine for alignment-free, content-based indexing of metagenomic datasets. Amordad places the metagenome comparison problem in a geometric context, and uses an indexing strategy that combines random hashing with a regular nearest neighbor graph. This framework allows refinement of the database over time by continual application of random hash functions, with the effect of each hash function encoded in the nearest neighbor graph. This eliminates the need to explicitly maintain the hash functions in order for query efficiency to benefit from the accumulated randomness. Results on real and simulated data show that Amordad can support logarithmic query time for identifying similar metagenomes even as the database size reaches into the millions

机译：动机：宏基因组数据分析中的几个技术挑战，包括组合宏基因组序列数据或确定可操作的生物分类单位，都是重要且众所周知的。鉴于传统上定义的物种内的极端变异和横行的水平基因转移，这些分析形式被越来越多地引用为概念上的缺陷。此外，此类分析的计算要求阻碍了基于内容的宏基因组数据的大规模组织。结果：在本文中，我们介绍了Amordad数据库引擎，用于基于内容的宏序列数据集的无对齐，基于索引的索引。 Amordad将元基因组比较问题放在几何环境中，并使用将随机哈希与规则的最近邻图相结合的索引策略。通过连续应用随机散列函数，此框架允许随着时间的推移对数据库进行细化，每个散列函数的效果都编码在最近的邻居图中。这样就无需显式维护哈希函数，以便使查询效率受益于累积的随机性。真实和模拟数据的结果表明，即使数据库规模达到数百万，Amordad仍可以支持对数查询时间来识别相似的基因组

著录项

来源
《Bioinformatics》 |2014年第20期|共7页
作者
Behnam Ehsan; Smith Andrew D.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类生物工程学（生物技术）;
关键词

相似文献

外文文献
中文文献
专利

1. The Amordad database engine for metagenomics [J] . Behnam Ehsan, Smith Andrew D. Bioinformatics . 2014,第20期

机译：用于宏基因组学的Amordad数据库引擎
2. The use of taxon-specific reference databases compromises metagenomic classification [J] . Vanessa R. Marcelino, Edward C. Holmes, Tania C. Sorrell BMC Genomics . 2020,第1期

机译：使用分类群特定的参考数据库损害了梅泰群分类
3. A review of methods and databases for metagenomic classification and assembly [J] . Florian P. Breitwieser, Jennifer Lu, Steven L. Salzberg Briefings in bioinformatics . 2019,第4期

机译：审查分类和组装的方法和数据库
4. Data Mining and Comparative Analysis of Human Skin Microbiome from EBI Metagenomics Database [C] . Matin Nuhamunada, Gregorius Altius Pratama, Setianing Wikanthi, International Conference on Bioinformatics, Biotechnology, and Biomedical Engineering . 2018

机译：EBI元基因组数据库中人类皮肤微生物组的数据挖掘和比较分析
5. Metagenomic Study of a Wastewater and Advanced Water Treatment System for Potable Reuse: Insights for Engineering [D] . Wong, Arnold Chao Chi. 2020

机译：用于饮用再利用的废水和先进水处理系统的Metagenomic研究：工程洞察
6. The Amordad database engine for metagenomics [O] . Ehsan Behnam, Andrew D. Smith -1

机译：用于宏基因组学的Amordad数据库引擎
7. Supplementary material 1 from: Macher J, Macher T, Leese F (2017) Combining NCBI and BOLD databases for OTU assignment in metabarcoding and metagenomic datasets: The BOLD_NCBI _Merger. Metabarcoding and Metagenomics 1: e22262. https://doi.org/10.3897/mbmg.1.22262 [O] . Jan-Niklas Macher, Till-Hendrik Macher, Florian Leese 2017

机译：补充材料1来自：Macher J，Macher T，Leese F（2017）组合NCBI和粗体数据库在Metabarcoding和Metagenomic Datasets中的OTU分配组合：Bold_ncbi _merger。元成立和偏心神经1：E22262。 https://doi.org/10.3897/mbmg.1.22262

The Amordad database engine for metagenomics

摘要

著录项

相似文献

相关主题

期刊订阅