随着下一代测序技术的迅猛发展,宏基因组学已经成为新的研究热点,宏基因组学序列聚类问题使用无参考的方法,对包含多个物种的宏基因组序列进行有效分离。为此,提出一种结合相似度信息和结构信息的宏基因组物种聚类算法,并引入仿射聚类来进行序列物种聚类。实验数据表明该方法聚类精度高、执行速度快。我们也开发了基于该方法的宏基因组序列物种聚类软件。%Nowadays, with the rapid development of the next generation sequencing technologies, metagenomics have become a new hotspot,However research in metagenomics faces the issue of binning --- identification and taxonomic characterization of the NGS short reads. To solve this problem, this paper first analyzes the next generation sequencing technology characteristics, statistical characteristics of metagenomic sequence, then proposes a new clustering method for DNA sequence binning. Test results show that this method has a very good clustering accuracy. In the same time, we developed an software for metagenomic binning based on this algorithm MetaBinning.
展开▼