首页> 外文会议>Industrial and Systems Engineering Annual Conference and Expo >MR~2: A Two-stage Feature Selection Algorithm in High-throughput Methylation Data for Max-relevance and Min-redundancy
【24h】

MR~2: A Two-stage Feature Selection Algorithm in High-throughput Methylation Data for Max-relevance and Min-redundancy

机译:MR〜2:用于最大相关性和最小冗余的高吞吐量甲基化数据中的两级特征选择算法

获取原文

摘要

Recent advances reveal that DNA methylation plays an important role in regulating different genome functions where anomalous methylation levels are associated with various cancer types. Feature selection algorithms are geared towards high-throughput analysis of DNA methylation to help identify idiosyncratic DNA methylation profiles associated with cancer types and subtypes. In high dimensional and highly correlated DNA methylation data, feature selection algorithms aim at selecting an efficient and comprehensive feature set to better capture characteristics of phenotypes. In this work, we introduce a two-stage feature selection algorithm (MR2) based on maximum relevance and minimum redundancy criteria. The features that satisfy the relevance conditions are filtered in the first stage, in the second stage, the final subset of loci is selected to reach minimal redundancy by using a k-medoids clustering algorithm that embeds a succinct uncertainty measure score. The performance of the proposed feature selection algorithm is benchmarked against those of the principal component analysis and four other commonly used filtering methods using lung and breast cancer datasets obtained from Gene Expression Omnibus in terms of their classification errors in support vector machine classifiers. Our MR2 algorithm outperforms these filtering based algorithms while at the same time providing more interpretable results.
机译:最新进展表明,DNA甲基化在调控那些异常甲基化水平与不同癌症类型相关联的不同基因组的功能具有重要作用。特征选择算法对DNA甲基化的高通量分析,以帮助确定癌症类型和亚型相关特质的DNA甲基化谱为目标。在高维和高度相关的DNA甲基化数据,特征选择算法的目的是选择有效的和全面的功能集,以表型更好地捕获特性。在这项工作中,我们将介绍基于最大相关性和最小冗余准则的两阶段特征选择算法(MR2)。满足相关性条件的特征进行滤波在第一阶段,在第二阶段中,位点的子集的最终选择通过使用K-中心点划分聚类算法嵌入一个简洁的不确定性度量得分达到最小冗余。所提出的特征选择算法的性能进行基准对那些主成分分析的,并使用在支持向量机分类器的分类误差的条款从基因表达综合获得肺癌和乳腺癌的数据集4等常用的过滤方法。我们的MR2算法优于这些过滤算法的基础,而在同一时间提供更多可解释的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号