首页> 美国卫生研究院文献>International Journal of Molecular Sciences >Exploring Neighborhoods in the Metagenome Universe
【2h】

Exploring Neighborhoods in the Metagenome Universe

机译:探索元基因组宇宙中的邻域

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The variety of metagenomes in current databases provides a rapidly growing source of information for comparative studies. However, the quantity and quality of supplementary metadata is still lagging behind. It is therefore important to be able to identify related metagenomes by means of the available sequence data alone. We have studied efficient sequence-based methods for large-scale identification of similar metagenomes within a database retrieval context. In a broad comparison of different profiling methods we found that vector-based distance measures are well-suitable for the detection of metagenomic neighbors. Our evaluation on more than 1700 publicly available metagenomes indicates that for a query metagenome from a particular habitat on average nine out of ten nearest neighbors represent the same habitat category independent of the utilized profiling method or distance measure. While for well-defined labels a neighborhood accuracy of 100% can be achieved, in general the neighbor detection is severely affected by a natural overlap of manually annotated categories. In addition, we present results of a novel visualization method that is able to reflect the similarity of metagenomes in a 2D scatter plot. The visualization method shows a similarly high accuracy in the reduced space as compared with the high-dimensional profile space. Our study suggests that for inspection of metagenome neighborhoods the profiling methods and distance measures can be chosen to provide a convenient interpretation of results in terms of the underlying features. Furthermore, supplementary metadata of metagenome samples in the future needs to comply with readily available ontologies for fine-grained and standardized annotation. To make profile-based k-nearest-neighbor search and the 2D-visualization of the metagenome universe available to the research community, we included the proposed methods in our CoMet-Universe server for comparative metagenome analysis.
机译:当前数据库中的元基因组的多样性为比较研究提供了迅速增长的信息来源。但是,补充元数据的数量和质量仍然落后。因此,重要的是能够仅通过可用的序列数据来鉴定相关的基因组。我们已经研究了基于有效序列的方法,可在数据库检索环境中大规模识别相似的基因组。在对不同分析方法的广泛比较中,我们发现基于矢量的距离度量非常适合检测宏基因组邻居。我们对1700多个可公开获得的元基因组的评估表明,对于来自特定栖息地的查询元基因组,十个最近的邻居中平均有九个代表相同的栖息地类别,而与所采用的分析方法或距离度量无关。虽然对于定义良好的标签,可以实现100%的邻域精度,但通常,邻居检测受到手动注释类别的自然重叠的严重影响。此外,我们提出了一种新颖的可视化方法的结果,该方法能够反映二维散点图中元基因组的相似性。与高维轮廓空间相比,可视化方法在缩小的空间中显示出相似的高精度。我们的研究表明,对于元基因组邻域的检查,可以选择概要分析方法和距离度量,以便根据潜在特征方便地解释结果。此外,未来的元基因组样本补充元数据需要符合易于使用的本体,以进行细粒度和标准注释。为了使研究人员可以使用基于配置文件的k最近邻搜索和元基因组宇宙的2D可视化,我们将建议的方法包括在我们的CoMet-Universe服务器中,用于比较元基因组分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号