...
首页> 外文期刊>Information Sciences: An International Journal >Robust twin boosting for feature selection from high-dimensional omics data with label noise
【24h】

Robust twin boosting for feature selection from high-dimensional omics data with label noise

机译:稳健的双级增强功能,可从带有标签噪声的高维组学数据中选择特征

获取原文
获取原文并翻译 | 示例

摘要

Omics data such as microarray transcriptomic and mass spectrometry proteomic data are typically characterized by high dimensionality and relatively small sample sizes. In order to discover biomarkers for diagnosis and prognosis from omics data, feature selection has become an indispensable step to find a parsimonious set of informative features. However, many previous studies report considerable label noise in omics data, which will lead to unreliable inferences to select uninformative features. Yet, to the best of our knowledge, very few feature selection methods are proposed to address this problem. This paper proposes a novel ensemble feature selection algorithm, robust twin boosting feature selection (RTBFS), which is robust to label noise in omics data. The algorithm has been validated on an omics feature selection test bed and seven real-world heterogeneous omics datasets, of which some are known to have label noise. Compared with several state-of-the-art ensemble feature selection methods, RTBFS can select more informative features despite label noise and obtain better classification results. RTBFS is a general feature selection method and can be applied to other data with label noise. MATLAB implementation of RTBFS and sample datasets are available at: http://www.cs.bham.ac.uk/similar to szh/TReBFSMatlab.zip. (C) 2014 Elsevier Inc. All rights reserved.
机译:诸如微阵列转录组和质谱蛋白质组学数据之类的组学数据通常以高维数和相对较小的样本量为特征。为了从组学数据中发现用于诊断和预后的生物标志物,特征选择已成为找到简约的信息特征集的必不可少的步骤。但是,许多先前的研究报告在组学数据中存在相当大的标签噪声,这将导致无法可靠地推断出选择非信息性特征的可能性。然而,据我们所知,很少提出特征选择方法来解决此问题。本文提出了一种新颖的集成特征选择算法,鲁棒双提升特征选择(RTBFS),它对于标记组学数据中的噪声具有鲁棒性。该算法已在组学特征选择测试台和七个实际的异构组学数据集上得到验证,其中一些已知具有标签噪声。与几种最新的集成特征选择方法相比,RTBFS可以在标签噪声较大的情况下选择更多的信息特征,并获得更好的分类结果。 RTBFS是一种通用的特征选择方法,可以应用于带有标签噪声的其他数据。可通过以下网址获得MATLAB的RTBFS实现和示例数据集:http://www.cs.bham.ac.uk/类似于szh / TReBFSMatlab.zip。 (C)2014 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号