首页> 外文期刊>Journal of biomedical informatics. >GMDH-based feature ranking and selection for improved classification of medical data.
【24h】

GMDH-based feature ranking and selection for improved classification of medical data.

机译:基于GMDH的特征排名和选择,可改善医学数据的分类。

获取原文
获取原文并翻译 | 示例
       

摘要

Medical applications are often characterized by a large number of disease markers and a relatively small number of data records. We demonstrate that complete feature ranking followed by selection can lead to appreciable reductions in data dimensionality, with significant improvements in the implementation and performance of classifiers for medical diagnosis. We describe a novel approach for ranking all features according to their predictive quality using properties unique to learning algorithms based on the group method of data handling (GMDH). An abductive network training algorithm is repeatedly used to select groups of optimum predictors from the feature set at gradually increasing levels of model complexity specified by the user. Groups selected earlier are better predictors. The process is then repeated to rank features within individual groups. The resulting full feature ranking can be used to determine the optimum feature subset by starting at the top of the list and progressively including morefeatures until the classification error rate on an out-of-sample evaluation set starts to increase due to overfitting. The approach is demonstrated on two medical diagnosis datasets (breast cancer and heart disease) and comparisons are made with other feature ranking and selection methods. Receiver operating characteristics (ROC) analysis is used to compare classifier performance. At default model complexity, dimensionality reduction of 22 and 54% could be achieved for the breast cancer and heart disease data, respectively, leading to improvements in the overall classification performance. For both datasets, considerable dimensionality reduction introduced no significant reduction in the area under the ROC curve. GMDH-based feature selection results have also proved effective with neural network classifiers.
机译:医疗应用通常以大量的疾病标记和相对较少的数据记录为特征。我们证明,完整的特征排名后再进行选择可以导致数据维数显着降低,并且在医学诊断分类器的实现和性能上有显着改善。我们描述了一种新颖的方法,该方法使用基于数据处理组(GMDH)的学习算法独有的属性,根据其预测质量对所有特征进行排名。反复使用一种绑架性网络训练算法,以按用户指定的模型复杂度逐渐增加的级别从功能集中选择最佳预测变量组。较早选择的组是更好的预测指标。然后重复该过程以对各个组内的特征进行排名。通过从列表的顶部开始并逐渐包括更多功能,直到因过拟合而导致样本外评估集上的分类错误率开始增加,可以将所得的完整功能分级用于确定最佳功能子集。该方法在两个医学诊断数据集(乳腺癌和心脏病)上得到了证明,并与其他特征分级和选择方法进行了比较。接收器工作特性(ROC)分析用于比较分类器性能。在默认模型复杂度下,针对乳腺癌和心脏病数据的维数可分别减少22%和54%,从而导致整体分类性能的提高。对于这两个数据集,可观的降维效果不会导致ROC曲线下面积的显着降低。基于GMDH的特征选择结果也已被神经网络分类器证明是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号