首页> 中文期刊> 《生物物理学报》 >基于稀疏表示算法的蛋白质质谱数据特征选择

基于稀疏表示算法的蛋白质质谱数据特征选择

         

摘要

高维、小样本数据的特征选择方法在蛋白质质谱数据处理分析领域有着广泛应用.本文针对蛋白质质谱特征选择问题,结合稀疏表示这一新理论框架,提出了一种基于稀疏表示的特征选择算法(sparse representation based feature selection,SRFS).该方法将稀疏表示分类的结果作为评定某一个特征子空间特征相对重要性的度量,然后通过对大量随机采样子空间计算结果的统计,得到特征空间中每个特征的排序,并进一步分析提炼出与肿瘤疾病相关的若干谱峰.通过在卵巢癌公共数据集OC-WCX2a和浙江省肿瘤医院乳腺癌数据集BC-WCX2a上的实验结果表明,SRFS算法可以有效应用于本文所使用的SELDI-TOF蛋白质质谱数据的分析.%Feature selection method has been widely used for protein spectrometry data which has high dimension and small samples size. In this paper,a novel feature selection method based on sparse representation (SRFS) is proposed. SRFS considers a feature be important or informative if the subset containing it can perform well in a sparse representation classifier (SRC). In this method,the relative importance of a subset was measured via SRC. And by means of the results of abundant random subsets,we ranked all the features. We also extracted a few peaks which were related with cancer closely. To investigate the performance,the proposed method was tested and evaluated on the ovarian cancer database 0C-WCX2a and breast cancer database BC-WCX2a which supplied by Zhejiang Cancer Hospital. The experimental results show that SRFS can be used to select highly predictive representative feature sets in SELDI-TOF protein spectrometry data analysed in this paper.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号