This paper compares several published methods for clustering chemical structures, using both graph- and fingerprint-based similarity measures. The clusterings from each method were compared to determine the degree of cluster overlap. Each method was also evaluated on how well it grouped structures into clusters possessing a non-trivial substructural commonality. The methods which employ adjustable parameters were tested to determine the stability of each parameter for datasets of varying size and composition. Our experiments suggest that both graph- and fingerprint-based similarity measures can be used effectively for generating chemical clusterings; it is also suggested that the CAST and Yin–Chen methods, suggested recently for the clustering of gene expression patterns, may also prove effective for the clustering of 2D chemical structures. udud
展开▼
机译:本文比较了使用基于图和基于指纹的相似性度量对化学结构进行聚类的几种公开方法。比较每种方法的聚类以确定聚类重叠的程度。还评估了每种方法对将结构分组为具有非平凡的子结构共性的群集的程度。测试了使用可调参数的方法,以确定大小和成分不同的数据集的每个参数的稳定性。我们的实验表明,基于图和指纹的相似性度量都可以有效地用于生成化学簇。也有人建议,最近建议对基因表达模式进行聚类的CAST和Yin-Chen方法也可能对二维化学结构的聚类有效。 ud ud
展开▼