首页> 外文期刊>Talanta: The International Journal of Pure and Applied Analytical Chemistry >Unimodal transform of variables selected by interval segmentation purity for classification tree modeling of high-dimensional microarray data
【24h】

Unimodal transform of variables selected by interval segmentation purity for classification tree modeling of high-dimensional microarray data

机译:通过区间分割纯度选择的变量的单峰变换,用于高维微阵列数据的分类树建模

获取原文
获取原文并翻译 | 示例
           

摘要

As a greedy search algorithm, classification and regression tree (CART) is easily relapsing into overfitting while modeling microarray gene expression data. A straightforward solution is to filter irrelevant genes via identifying significant ones. Considering some significant genes with multi-modal expression patterns exhibiting systematic difference in within-class samples are difficult to be identified by existing methods, a strategy that unimodal transform of variables selected by interval segmentation purity (UTISP) for CART modeling is proposed. First, significant genes exhibiting varied expression patterns can be properly identified by a variable selection method based on interval segmentation purity. Then, unimodal transform is implemented to offer unimodal featured variables for CART modeling via feature extraction. Because significant genes with complex expression patterns can be properly identified and unimodal feature extracted in advance, this developed strategy potentially improves the performance of CART in combating overfitting or underfitting while modeling microarray data. The developed strategy is demonstrated using two microarray data sets. The results reveal that UTISP-based CART provides superior performance to k-nearest neighbors or CARTs coupled with other gene identifying strategies, indicating UTISP-based CART holds great promise for microarray data analysis.
机译:作为贪婪的搜索算法,在对微阵列基因表达数据进行建模时,分类和回归树(CART)容易陷入过度拟合。一种直接的解决方案是通过识别重要基因来过滤无关基因。考虑到现有方法难以识别某些具有多模式表达模式且在类内样品中表现出系统差异的重要基因,提出了一种基于区间分割纯度(UTISP)对CART建模变量进行单峰转化的策略。首先,可以通过基于区间分割纯度的可变选择方法适当地鉴定出表现出不同表达模式的重要基因。然后,实施单峰转换以通过特征提取为CART建模提供单峰特征变量。因为可以正确地识别具有复杂表达模式的重要基因并预先提取单峰特征,所以这种开发的策略潜在地提高了CART在对微阵列数据建模时抗过度拟合或不拟合的性能。使用两个微阵列数据集演示了开发的策略。结果表明,基于UTISP的CART与相邻的k近邻或CART结合其他基因鉴定策略可提供卓越的性能,这表明基于UTISP的CART对于微阵列数据分析具有广阔的前景。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号