首页> 外文会议>International Conference on Intelligent Systems for Molecular Biology >Analysis of Gene Expression Microarrays for Phenotype Classification
【24h】

Analysis of Gene Expression Microarrays for Phenotype Classification

机译:基因表达微阵列进行表型分类

获取原文

摘要

Several microarry technologies that monitor the level of expression of a large number of genes have recently emerged. Given DNA-microarray data for a set of cells characterized by a given phenotype and for a set of control cells, an important problem is to identify "patterns" of gene expression that can be used to predict cell phenotype. The potential number of such patterns is exponential in the number of genes. In this paper, we propose a solution to this problem based on a supervised learning algorithm, which differs substantially from previous schemes. It couples a complex, non-linear similarity metric, which maximizes the probability of discovering discriminative gene expression patterns, and a pattern discovery algorithm called SPLASH. The latter discovers efficiently and deterministically all statistically significant gene expression patterns in the phenotype set. Statistical significance is evaluated based on the probability of a pattern to occur by chance in the control set. Finally, a greedy set covering algorithm is used to select an optimal subset of statistically significant patterns, which form the basis for a standard likelihood ratio classification scheme. We analyze data from 60 human cancer cell lines using this method, and compare our results with those of other supervised learning schemes. Different phenotypes are studied. These include cancer morphologies (such as melanoma), molecular targets (such as mutations in the p53 gene), and therapeutic targets related to the sensitivity to an anticancer compounds. We also analyze a synthetic data set that shows that this technique is especially well suited for the analysis of sub-phenotype mixtures. For complex phenotypes, such as p53, our method produces an encouragingly low rate of false positives and false negatives and seems to outperform the others. Similar low rates are reported when predicting the efficacy of experimental anticancer compounds. This counts among the first reported studies where drug efficacy has been successfully predicted from large-scale expression data analysis.
机译:最近出现了监测大量基因表达水平的几种微型技术。给定一组细胞的DNA微阵列数据,其特征在于给定表型和一组控制细胞,重要的问题是鉴定可用于预测细胞表型的基因表达的“模式”。这种模式的潜在数量是基因数量的指数。在本文中,我们基于监督学习算法提出解决该问题的解决方案,其基本上与先前的方案不同。它耦合了一个复杂的非线性相似性度量,其最大化了发现判别基因表达式模式的概率,以及称为飞溅的模式发现算法。后者在表型集合中有效地发现和确定的所有统计学上的基因表达模式。基于控制集合偶然发生的模式的概率来评估统计显着性。最后,使用贪婪的集合覆盖算法来选择统计上有效模式的最佳子集,这构成了标准似然比分类方案的基础。我们使用这种方法分析来自60例人癌细胞系的数据,并将我们的结果与其他受监管的学习计划的结果进行比较。研究了不同的表型。这些包括癌症形态(如黑素瘤),分子靶标(例如P53基因中的突变),以及与对抗癌化合物的敏感性有关的治疗靶标。我们还分析了一种合成数据集,表明该技术特别适合于分析亚表型混合物。对于诸如P53的复杂表型,我们的方法产生了一种令人鼓舞的误报和假阴性,似乎优于其他人。在预测实验抗癌化合物的功效时,报告了类似的低速率。这在第一个报告的研究中计数,其中从大规模表达数据分析中成功预测了药物效能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号