...
首页> 外文期刊>BMC Bioinformatics >Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction
【24h】

Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction

机译:机器学习中的特征选择和癌症结果预测应用中的最高得分对

获取原文

摘要

Background The widely used k top scoring pair (k-TSP) algorithm is a simple yet powerful parameter-free classifier. It owes its success in many cancer microarray datasets to an effective feature selection algorithm that is based on relative expression ordering of gene pairs. However, its general robustness does not extend to some difficult datasets, such as those involving cancer outcome prediction, which may be due to the relatively simple voting scheme used by the classifier. We believe that the performance can be enhanced by separating its effective feature selection component and combining it with a powerful classifier such as the support vector machine (SVM). More generally the top scoring pairs generated by the k-TSP ranking algorithm can be used as a dimensionally reduced subspace for other machine learning classifiers. Results We developed an approach integrating the k-TSP ranking algorithm (TSP) with other machine learning methods, allowing combination of the computationally efficient, multivariate feature ranking of k-TSP with multivariate classifiers such as SVM. We evaluated this hybrid scheme (k-TSP+SVM) in a range of simulated datasets with known data structures. As compared with other feature selection methods, such as a univariate method similar to Fisher's discriminant criterion (Fisher), or a recursive feature elimination embedded in SVM (RFE), TSP is increasingly more effective than the other two methods as the informative genes become progressively more correlated, which is demonstrated both in terms of the classification performance and the ability to recover true informative genes. We also applied this hybrid scheme to four cancer prognosis datasets, in which k-TSP+SVM outperforms k-TSP classifier in all datasets, and achieves either comparable or superior performance to that using SVM alone. In concurrence with what is observed in simulation, TSP appears to be a better feature selector than Fisher and RFE in some of the cancer datasets Conclusions The k-TSP ranking algorithm can be used as a computationally efficient, multivariate filter method for feature selection in machine learning. SVM in combination with k-TSP ranking algorithm outperforms k-TSP and SVM alone in simulated datasets and in some cancer prognosis datasets. Simulation studies suggest that as a feature selector, it is better tuned to certain data characteristics, i.e. correlations among informative genes, which is potentially interesting as an alternative feature ranking method in pathway analysis.
机译:背景技术广泛使用的k最高评分对(k-TSP)算法是一种简单但功能强大的无参数分类器。它在许多癌症微阵列数据集中的成功归功于一种有效的特征选择算法,该算法基于基因对的相对表达顺序。但是,它的一般鲁棒性并未扩展到某些困难的数据集,例如涉及癌症结果预测的数据集,这可能是由于分类器使用的投票方案相对简单。我们认为,可以通过分离其有效的特征选择组件并将其与功能强大的分类器(例如支持向量机(SVM))相结合来提高性能。通常,由k-TSP排序算法生成的最高得分对可以用作其他机器学习分类器的降维子空间。结果我们开发了一种将k-TSP排序算法(TSP)与其他机器学习方法相集成的方法,从而可以将计算效率高的k-TSP的多特征分级与多元分类器(例如SVM)相结合。我们在具有已知数据结构的一系列模拟数据集中评估了这种混合方案(k-TSP + SVM)。与其他特征选择方法(例如类似于Fisher判别准则的单变量方法(Fisher)或嵌入在SVM中的递归特征消除(RFE))相比,随着信息基因逐渐增多,TSP比其他两种方法更加有效相关性更高,这在分类性能和恢复真实信息基因的能力方面都得到了证明。我们还将这种混合方案应用于四个癌症预后数据集,其中k-TSP + SVM在所有数据集中均胜过k-TSP分类器,并且与单独使用SVM相比具有可比或更高的性能。与仿真中观察到的结果一致,在某些癌症数据集中,TSP似乎比Fisher和RFE是更好的特征选择器结论结论k-TSP排序算法可以用作计算效率高的多元滤波方法,用于机器中的特征选择学习。在模拟数据集和某些癌症预后数据集中,支持向量机与k-TSP排序算法相结合的性能优于单独的k-TSP和SVM。仿真研究表明,作为特征选择器,它可以更好地调整到某些数据特征,即信息基因之间的相关性,这可能是一种有趣的途径分析中的替代特征排序方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号