首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Structured Penalized Logistic Regression for Gene Selection in Gene Expression Data Analysis
【24h】

Structured Penalized Logistic Regression for Gene Selection in Gene Expression Data Analysis

机译:基因表达数据分析中用于基因选择的结构化惩罚逻辑回归

获取原文
获取原文并翻译 | 示例

摘要

In gene expression data analysis, the problems of cancer classification and gene selection are closely related. Successfully selecting informative genes will significantly improve the classification performance. To identify informative genes from a large number of candidate genes, various methods have been proposed. However, the gene expression data may include some important correlation structures, and some of the genes can be divided into different groups based on their biological pathways. Many existing methods do not take into consideration the exact correlation structure within the data. Therefore, from both the knowledge discovery and biological perspectives, an ideal gene selection method should take this structural information into account. Moreover, the better generalization performance can be obtained by discovering correlation structure within data. In order to discover structure information among data and improve learning performance, we propose a structured penalized logistic regression model which simultaneously performs feature selection and model learning for gene expression data analysis. An efficient coordinate descent algorithm has been developed to optimize the model. The numerical simulation studies demonstrate that our method is able to select the highly correlated features. In addition, the results from real gene expression datasets show that the proposed method performs competitively with respect to previous approaches.
机译:在基因表达数据分析中,癌症的分类和基因选择问题密切相关。成功选择信息基因将大大提高分类性能。为了从大量候选基因中鉴定信息基因,已经提出了各种方法。但是,基因表达数据可能包含一些重要的相关结构,并且某些基因可以根据其生物学途径分为不同的组。许多现有方法未考虑数据中的确切相关结构。因此,从知识发现和生物学的角度来看,理想的基因选择方法都应考虑这种结构信息。此外,通过发现数据内的相关结构可以获得更好的泛化性能。为了发现数据之间的结构信息并提高学习性能,我们提出了一种结构化惩罚逻辑回归模型,该模型同时执行特征选择和模型学习以进行基因表达数据分析。已经开发了一种有效的坐标下降算法来优化模型。数值模拟研究表明,我们的方法能够选择高度相关的特征。另外,来自真实基因表达数据集的结果表明,所提出的方法相对于先前的方法具有竞争性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号