首页> 外文会议>International Conference on Advanced Computer Theory and Engineering >A hybrid algorithm of minimum spanning tree and nearest neighbor for classifying human cancers
【24h】

A hybrid algorithm of minimum spanning tree and nearest neighbor for classifying human cancers

机译:一种对人类癌症进行分类的最小生成树和最近邻居的混合算法

获取原文

摘要

Classification and prediction of different cancers based on gene expression profiles are important for cancer diagnosis, cancer treatment and medication discovery. The k nearest neighbor algorithm (k-NN) is one easy and efficient machine learning method for cancer classification and the parameter k is crucial. In this paper, we integrate minimum spanning tree (MST) and k nearest neighbor algorithm (k-NN) for cancer classification. The MST is designed for the selection of parameter k and the nearest neighbors for k-NN. Firstly we build a minimum spanning tree (MST) based on Euclidean distance between each two samples for gene expression data only including one unknown class sample. Secondly for unknown class sample in the gene expression data, we find the connected samples and then apply majority vote principle. Thirdly if there are tied votes then we expend the connected samples with the nearest neighbors for unknown class sample until all the samples are expended or the class for unknown sample is obtained. This hybrid algorithm is referred to as MSTNN. The hybrid algorithm MSTNN is compared with k-NN and other 3 existing classification algorithms on CNS dataset, Colon dataset and Lung dataset, 3 binary class gene expression datasets and 3 multi-class gene expression datasets: Leukemia1, Leukemia2, and Leukemia3 involving human cancers. The MSTNN algorithm improves 5.65% better than k-NN on average LOOCV accuracy and 13.80% better than k-NN on testing datasets classification average accuracy, and achieves the best performance in all the 5 algorithms. The results demonstrate that the proposed MSTNN algorithm is feasible to classify human cancers.
机译:基于基因表达谱的不同癌症的分类和预测对于癌症诊断,癌症治疗和药物发现是重要的。 K最近邻算法(K-NN)是癌症分类的简单有效的机器学习方法,参数k是至关重要的。在本文中,我们对癌症分类集成了最小生成树(MST)和K最近邻算法(K-NN)。 MST设计用于选择参数k和K-NN的最近邻居。首先,我们基于每个两个样本之间的欧几里德距离来构建最小的生成树(MST),仅包括一个未知类样本。其次对于基因表达数据中未知类样本,我们发现连接的样本,然后申请大多数投票原则。第三,如果有捆绑的投票,我们将连接的样本用最近的邻居为未知的类样本消耗,直到所有样本都消耗或获得未知样品的类。该混合算法称为MSTNN。将杂交算法MSTNN与K-NN和其他3个现有的CNS数据集,结肠数据集和肺数据集进行比较,3二进制类基因表达数据集和3个多级基因表达数据集:白血病,白血病和白血病3涉及人类癌症。 MSTNN算法在平均LOOCV精度上提高了5.65%,比K-NN更好,比K-NN更好地测试数据集分类平均精度,并在所有5算法中实现了最佳性能。结果表明,所提出的MSTNN算法是可行的,可对人类癌症进行分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号