首页> 外文会议>IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology >A Multivariate Feature Selection Framework for High Dimensional Biomedical Data Classification
【24h】

A Multivariate Feature Selection Framework for High Dimensional Biomedical Data Classification

机译:用于高维生物医学数据分类的多变量特征选择框架

获取原文

摘要

High dimensional biomedical data are becoming common in various predictive models developed for disease diagnosis and prognosis. Extracting knowledge from high dimensional data which contain a large number of features and a small sample size presents intrinsic challenges for classification models. Genetic Algorithms can be successfully adopted to efficiently search through high dimensional spaces, and multivariate classification methods can be utilized to evaluate combinations of features for constructing optimized predictive models. This paper proposes a framework which can be adopted for building prediction models for high dimensional biomedical data. The proposed framework comprises of three main phases. The feature filtering phase which filters out the noisy features; the feature selection phase which is based on multivariate machine learning techniques and the Genetic Algorithm to evaluate the filtered features and select the most informative subsets of features for achieving maximum classification performance; and the predictive modeling phase during which machine learning algorithms are trained on the selected features to construct a reliable prediction model. Experiments were conducted using four high dimensional biomedical datasets including protein and gene-expression data. The results revealed optimistic performances for the multivariate selection approaches which utilize classification measurements based on implicit assumptions.
机译:在为疾病诊断和预后开发的各种预测模型中,高维生物医学数据变得常见。从包含大量特征的高维数据中提取知识和小样本大小呈现出分类模型的内在挑战。可以成功采用遗传算法以有效地通过高维空间搜索,并且可以利用多变量分类方法来评估用于构建优化的预测模型的特征的组合。本文提出了一种框架,可用于构建高维生物医学数据的预测模型。所提出的框架包括三个主要阶段。功能过滤阶段滤除嘈杂功能;特征选择阶段基于多变量机器学习技术和遗传算法来评估过滤的功能,选择最大的功能子集,以实现最大分类性能;并且预测建模阶段在所选特征上培训机器学习算法以构建可靠的预测模型。使用包括蛋白质和基因表达数据的四个高尺寸生物医学数据集进行实验。结果揭示了利用基于隐含假设的分类测量的多变量选择方法的乐观性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号