...
首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Efficient bottom-up hybrid hierarchical clustering techniques for protein sequence classification
【24h】

Efficient bottom-up hybrid hierarchical clustering techniques for protein sequence classification

机译:用于蛋白质序列分类的有效的自下而上的混合层次聚类技术

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Hybrid hierarchical clustering techniques which combine the characteristics of different partitional clustering techniques or partitional and hierarchical clustering techniques are interesting. In this paper, efficient bottom-up hybrid hierarchical clustering (BHHC) techniques have been proposed for the purpose of prototype selection for protein sequence classification. In the first stage, an incremental partitional clustering technique such as leader algorithm (ordered leader no update (OLNU) method) which requires only one database (db) scan is used to find a set of subcluster representatives. In the second stage, either a hierarchical agglomerative clustering (HAC) scheme or a partitional clustering algorithm-'K-medians' is used on these subcluster representatives to obtain a required number of clusters. Thus, this hybrid scheme is scalable and hence would be suitable for clustering large data sets and we also get a hierarchical structure consisting of clusters and subclusters and the representatives of which are used for pattern classification. Even if more number of prototypes are generated, classification time does not increase much as only a part of the hierarchical structure is searched. The experimental results (classification accuracy (CA) using the prototypes obtained and the computation time) of the proposed algorithms are compared with that of the hierarchical agglomerative schemes, K-medians and nearest neighbour classifier (NNC) methods. The proposed methods are found to be computationally efficient with reasonably good CA. (c) 2006 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
机译:结合不同分区聚类技术或分区聚类和分层聚类技术的特征的混合层次聚类技术是令人感兴趣的。在本文中,已经提出了有效的自下而上的混合层次聚类(BHHC)技术,以用于蛋白质序列分类的原型选择。在第一阶段,使用增量分区聚类技术(例如仅需要一个数据库(db)扫描的领导者算法(有序领导者无更新(OLNU)方法))来查找一组子集群代表。在第二阶段中,在这些子集群代表上使用分层聚集集群(HAC)方案或分区集群算法-“ K-medians”来获得所需数量的集群。因此,此混合方案具有可伸缩性,因此适用于对大型数据集进行聚类,并且我们还获得了由聚类和子类组成的分层结构,其代表用于模式分类。即使生成了更多数量的原型,分类时间也不会增加太多,因为只搜索了一部分层次结构。将该算法的实验结果(使用获得的原型的分类精度(CA)和计算时间)与分层集聚方案,K中值和最近邻分类器(NNC)方法进行了比较。发现所提出的方法在具有相当好的CA的情况下是计算有效的。 (c)2006模式识别学会。由Elsevier Ltd.出版。保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号