首页> 外文会议>International Conference on Machine Learning, Optimization, and Data Science >Variable Selection and Outlier Detection in Regularized Survival Models: Application to Melanoma Gene Expression Data
【24h】

Variable Selection and Outlier Detection in Regularized Survival Models: Application to Melanoma Gene Expression Data

机译:正则化生存模型中的可变选择和异常检测:对黑色素瘤基因表达数据的应用

获取原文

摘要

The importance of gene expression data analysis for oncological diagnosis and treatment has become widely accepted in recent years. One of the main associated challenges is the development of mathematical and statistical methods for data analysis to improve prognosis and guide treatment decisions. One of the difficulties that researchers face when dealing with gene expression datasets concerns their high-dimensionality. In this context, the goal of this work is to reduce the dimensionality of gene expression data using regularization techniques such as Lasso and Elastic net, complemented with DegreeCox, a network-based regularization method for survival analysis recently proposed. Also identification of long or short-term survivors (outliers) may lead to the detection of new prognostic factors, and the Rank Product test is used to identify those observations. An example based on the The Cancer Genome Atlas (TCGA) Melanoma dataset is presented, where the covariates are patients' gene expression. The application of data reduction techniques to the Melanoma dataset enabled the selection of relevant genes over a range of parameters evaluated, with 5 in common between elastic net regularization and DegreeCox for one of the two models further evaluated. Moreover, a long term survivor was detected as outlier by the Rank Product test, being systematically highly ranked for the martingale residuals of the models evaluated.
机译:近年来,肿瘤诊断和治疗基因表达数据分析的重要性已被广泛接受。主要相关挑战之一是数据分析的数学和统计方法的发展,以改善预后和指导治疗决策。研究人员在处理基因表达数据集时涉及其高度的困难之一。在这种情况下,本作作品的目的是使用诸如套索和弹性网的正规化技术来降低基因表达数据的维度,其互相补充了溶液,最近提出了一种基于网络的生存分析的正规化方法。还识别长期或短期幸存者(异常值)可能导致检测新的预后因素,并且秩产品测试用于识别这些观察结果。提出了一种基于癌症基因组Atlas(TCGA)黑色素瘤数据集的实例,其中协变量是患者的基因表达。数据减少技术在黑色素瘤数据集中的应用使得在评估的一系列参数中选择了相关基因的选择,在弹性净正则化与进一步评估的两种模型中的一个共同的共同点。此外,通过秩产品测试检测到长期幸存者作为异常值,系统地高度排名为评估的模型的鞅残留。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号