首页> 外文会议>Pattern recognition in bioinformatics >pattern recognition; finite inductive sequences; syntactic pattern recognition; genome recognition
【24h】

pattern recognition; finite inductive sequences; syntactic pattern recognition; genome recognition

机译:模式识别;有限的感应序列句法模式识别基因组识别

获取原文
获取原文并翻译 | 示例

摘要

The rapidly emerging field of metagenomics seeks to examine the genomic content of communities of organisms to understand their roles and interactions in an ecosystem. In this paper we focus on clustering methods and their application to taxonomic analysis of metagenomic data. Clustering analysis for metagenomics amounts to group similar partial sequences, such as raw sequence reads, into clusters in order to discover information about the internal structure of the considered dataset, or the relative abundance of protein families. Different methods for clustering analysis of metagenomic datasets have been proposed. Here we focus on evidence-based methods for clustering that employ knowledge extracted from proteins identified by a BLASTx search (proxygenes). We consider two clustering algorithms introduced in previous works and a new one. We discuss advantages and drawbacks of the algorithms, and use them to perform taxonomic analysis of metagenomic data. To this aim, three real-life benchmark datasets used in previous work on metagenomic data analysis are used. Comparison of the results indicates satisfactory coherence of the taxonomies output by the three algorithms, with respect to phylogenetic content at the class level and taxonomic distribution at phylum level. In general, the experimental comparative analysis substantiates the effectiveness of evidence-based clustering methods for taxonomic analysis of metagenomic data.
机译:宏基因组学这个迅速发展的领域试图检查生物群落的基因组含量,以了解其在生态系统中的作用和相互作用。在本文中,我们重点介绍聚类方法及其在宏基因组数据分类分析中的应用。宏基因组学的聚类分析将相似的部分序列(例如原始序列读数)分组为聚类,以便发现有关所考虑数据集的内部结构或相对丰富的蛋白质家族的信息。已经提出了用于宏基因组数据集的聚类分析的不同方法。在这里,我们专注于基于证据的聚类方法,该方法采用从BLASTx搜索(代理基因)识别的蛋白质中提取的知识。我们考虑先前工作中介绍的两种聚类算法和一种新算法。我们讨论了算法的优缺点,并使用它们来进行宏基因组数据的分类分析。为此,使用了在宏基因组数据分析的先前工作中使用的三个现实生活基准数据集。结果的比较表明,就类别级别的系统发育内容和系统级别的分类学分布而言,这三种算法输出的分类法具有令人满意的一致性。通常,实验比较分析证实了基于证据的聚类方法在宏基因组数据分类学分析中的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号