首页> 外文会议> >Applied biological data mining based on improved K-means clustering algorithm and KNN classifier in protein sub-cellular localization
【24h】

Applied biological data mining based on improved K-means clustering algorithm and KNN classifier in protein sub-cellular localization

机译:基于改进的K均值聚类算法和KNN分类器的生物数据挖掘在蛋白质亚细胞定位中的应用

获取原文

摘要

With the completion of the human genome sequencing, a large number of data especially amino acid sequences floods into biological database. How to analyze these data quickly and even predict the structure and function of protein correctly have become hot topics in recent years. In this paper, we mainly study K-means clustering algorithm and KNN classifier in amino acid sequences of complicated data, which are applied in the prediction of protein sub-cellular localization. In many cases, fuzzy boundary and unbalance are frequently appeared among biological data. The accuracy will be lower, if we make a prediction through traditional KNN and K-means clustering algorithm directly. Firstly, in order to make clear the unbalance, we propose the within-class thought to make sure that training samples in each class around the testing sample are selected and we introduce membership to tell which class the testing sample belongs to. Then, we bring in rough sets and membership to solve the fuzzy boundary. Particularly, we apply correlation coefficient in the rough sets to better reflect the relationship among data objects. The experimental results based on protein sub-cellular localization prediction show that the methods proposed newly better work than the traditional methods.
机译:随着人类基因组测序的完成,大量数据(尤其是氨基酸序列)泛滥成生物学数据库。近年来,如何快速分析这些数据甚至正确预测蛋白质的结构和功能已成为热门话题。本文主要研究复杂数据氨基酸序列中的K均值聚类算法和KNN分类器,将其应用于蛋白质亚细胞定位的预测。在许多情况下,生物学数据之间经常会出现模糊边界和不平衡现象。如果直接通过传统的KNN和K-means聚类算法进行预测,则准确性会较低。首先,为了弄清不平衡,我们提出班级内部思想,以确保选择围绕测试样品的每个班级中的训练样品,并引入成员资格来告诉您测试样品属于哪个班级。然后,我们引入粗糙集和隶属度来解决模糊边界。特别地,我们在粗糙集中应用相关系数以更好地反映数据对象之间的关系。基于蛋白质亚细胞定位预测的实验结果表明,该方法提出的新工作比传统方法更好。

著录项

  • 来源
    《》|2016年|249-252|共4页
  • 会议地点
  • 作者

    Zhenfeng Lei; Shunfang Wang;

  • 作者单位
  • 会议组织
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

    Biology;

    机译:生物学;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号