首页> 外文会议>IEEE International Conference on Software Engineering and Service Science >Applied biological data mining based on improved K-means clustering algorithm and KNN classifier in protein sub-cellular localization
【24h】

Applied biological data mining based on improved K-means clustering algorithm and KNN classifier in protein sub-cellular localization

机译:基于改进的K-Means聚类算法和KNN分类器在蛋白质荟萃定位中的应用生物学数据挖掘

获取原文

摘要

With the completion of the human genome sequencing, a large number of data especially amino acid sequences floods into biological database. How to analyze these data quickly and even predict the structure and function of protein correctly have become hot topics in recent years. In this paper, we mainly study K-means clustering algorithm and KNN classifier in amino acid sequences of complicated data, which are applied in the prediction of protein sub-cellular localization. In many cases, fuzzy boundary and unbalance are frequently appeared among biological data. The accuracy will be lower, if we make a prediction through traditional KNN and K-means clustering algorithm directly. Firstly, in order to make clear the unbalance, we propose the within-class thought to make sure that training samples in each class around the testing sample are selected and we introduce membership to tell which class the testing sample belongs to. Then, we bring in rough sets and membership to solve the fuzzy boundary. Particularly, we apply correlation coefficient in the rough sets to better reflect the relationship among data objects. The experimental results based on protein sub-cellular localization prediction show that the methods proposed newly better work than the traditional methods.
机译:随着人类基因组测序的完成,大量数据尤其是氨基酸序列泛滥成生物数据库。如何快速分析这些数据,甚至预测蛋白质的结构和功能近年来已经成为热门话题。本文主要研究K-Means聚类算法和KNN分类剂在复杂数据的氨基酸序列中,其应用于蛋白质亚细胞定位的预测。在许多情况下,生物数据中经常出现模糊边界和不平衡。如果我们直接通过传统的KNN和K-means聚类算法预测,则精度将较低。首先,为了清楚不平衡,我们提出了在课堂内认为,选择测试样本周围的每个类中的培训样本,我们介绍了成员资格,以告诉测试样本所属的课程。然后,我们带来粗糙的集合和成员来解决模糊边界。特别地,我们在粗糙集中应用相关系数以更好地反映数据对象之间的关系。基于蛋白质亚蜂窝定位预测的实验结果表明,该方法提出了比传统方法更新的工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号