首页> 外文会议>Symposium on Biomathematics >The Implementation of Hybrid Clustering Using Fuzzy C-Means and Divisive Algorithm for Analyzing DNA Human Papillomavirus Cause of Cervical Cancer
【24h】

The Implementation of Hybrid Clustering Using Fuzzy C-Means and Divisive Algorithm for Analyzing DNA Human Papillomavirus Cause of Cervical Cancer

机译:用模糊C型术治疗宫颈癌DNA人乳头瘤病毒原因的分裂算法实现混合聚类

获取原文

摘要

Clustering aims to classify the different patterns into groups called clusters. In this clustering method, we use n-mers frequency to calculate the distance matrix which is considered more accurate than using the DNA alignment. The clustering results could be used to discover biologically important sub-sections and groups of genes. Many clustering methods have been developed, while hard clustering methods considered less accurate than fuzzy clustering methods, especially if it is used for outliers data. Among fuzzy clustering methods, fuzzy c-means is one the best known for its accuracy and simplicity. Fuzzy c-means clustering uses membership function variable, which refers to how likely the data could be members into a cluster. Fuzzy c-means clustering works using the principle of minimizing the objective function. Parameters of membership function in fuzzy are used as a weighting factor which is also called the fuzzier. In this study we implement hybrid clustering using fuzzy c-means and divisive algorithm which could improve the accuracy of cluster membership compare to traditional partitional approach only. In this study fuzzy c-means is used in the first step to find partition results. Furthermore divisive algorithms will run on the second step to find sub-clusters and dendogram of phylogenetic tree. To find the best number of clusters is determined using the minimum value of Davies Bouldin Index (DBI) of the cluster results. In this research, the results show that the methods introduced in this paper is better than other partitioning methods. Finally, we found 3 clusters with DBI value of 1.126628 at first step of clustering. Moreover, DBI values after implementing the second step of clustering are always producing smaller IDB values compare to the results of using first step clustering only. This condition indicates that the hybrid approach in this study produce better performance of the cluster results, in term its DBI values.
机译:群集旨在将不同的模式分类为称为群集的组。在这种聚类方法中,我们使用N-MERS频率来计算比使用DNA对准更准确的距离矩阵。聚类结果可用于发现生物学上重要的子部分和基因组。已经开发了许多聚类方法,而硬簇方法则被认为比模糊聚类方法更低,特别是如果它用于异常值数据。在模糊聚类方法中,模糊C-ince是其准确性和简单性的最知名。模糊C-means群集使用隶属函数变量,这是指数据可以在群集中成为成员的可能性。模糊C-Means聚类使用最小化目标函数的原理。模糊中的成员函数参数用作加权因子,该加权因子也称为模糊。在这项研究中,我们使用模糊C-Milith和Disifive算法实施混合聚类,其可以提高与传统的自行方法相比的集群成员资格的准确性。在本研究中,模糊C-inse用于找到分区结果的第一步。此外,分裂算法将在第二步上运行,以找到系统发育树的子簇和Dendogar。为了找到最佳数量的群集,使用群集结果的Davies Bouldin指数(DBI)的最小值确定。在本研究中,结果表明,本文介绍的方法优于其他分区方法。最后,我们在群集的第一步找到了3个具有1.126628的DBI值的3个集群。此外,实现群集第二步骤之后的DBI值始终产生与仅使用第一步聚类的结果进行比较的较小的IDB值。这种情况表明,本研究中的混合方法可以在其DBI值中产生更好的集群结果的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号