首页> 外文会议>Evolutionary computation, machine learning and data mining in bioinformatics >Clustering Metagenome Short Reads Using Weighted Proteins
【24h】

Clustering Metagenome Short Reads Using Weighted Proteins

机译:使用加权蛋白质聚类元基因组短读

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes a new knowledge-based method for clustering metagenome short reads. The method incorporates biological knowledge in the clustering process, by means of a list of proteins associated to each read. These proteins are chosen from a reference proteome database according to their similarity with the given read, as evaluated by BLAST. We introduce a scoring function for weighting the resulting proteins and use them for clustering reads. The resulting clustering algorithm performs automatic selection of the number of clusters, and generates possibly overlapping clusters of reads. Experiments on real-life benchmark datasets show the effectiveness of the method for reducing the size of a metagenome dataset while maintaining a high accuracy of organism content.
机译:本文提出了一种新的基于知识的聚类基因组短读的方法。该方法通过与每个读数相关的蛋白质列表将生物学知识整合到聚类过程中。根据蛋白质与给定读段的相似性,通过BLAST评估,从参考蛋白质组数据库中选择这些蛋白质。我们引入了一个计分功能来对所得蛋白质进行加权,并将其用于聚类读取。生成的聚类算法将自动选择聚类数量,并生成可能重叠的读取聚类。在现实生活中的基准数据集上进行的实验表明,该方法在保持高度准确的生物体含量的同时,可以有效减少元基因组数据集的大小。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号