首页> 外国专利> Binary prediction tree modeling with many predictors and its uses in clinical and genomic applications

Binary prediction tree modeling with many predictors and its uses in clinical and genomic applications

机译:具有许多预测变量的二进制预测树建模及其在临床和基因组应用中的用途

摘要

The statistical analysis described and claimed is a predictive statistical tree model that overcomes several problems observed in prior statistical models and regression analyses, while ensuring greater accuracy and predictive capabilities. Although the claimed use of the predictive statistical tree model described herein is directed to the prediction of a disease in individuals, the claimed model can be used for a variety of applications including the prediction of disease states, susceptibility of disease states or any other biological state of interest, as well as other applicable non-biological states of interest. This model first screens genes to reduce noise, applies k-means correlation-based clustering targeting a large number of clusters, and then uses singular value decompositions (SVD) to extract the single dominant factor (principal component) from each cluster. This generates a statistically significant number of cluster-derived singular factors, that we refer to as metagenes, that characterize multiple patterns of expression of the genes across samples. The strategy aims to extract multiple such patterns while reducing dimension and smoothing out gene-specific noise through the aggregation within clusters. Formal predictive analysis then uses these metagenes in a Bayesian classification tree analysis. This generates multiple recursive partitions of the sample into subgroups (the “leaves” of the classification tree), and associates Bayesian predictive probabilities of outcomes with each subgroup. Overall predictions for an individual sample are then generated by averaging predictions, with appropriate weights, across many such tree models. The model includes the use of iterative out-of-sample, cross-validation predictions leaving each sample out of the data set one at a time, refitting the model from the remaining samples and using it to predict the hold-out case. This rigorously tests the predictive value of a model and mirrors the real-world prognostic context where prediction of new cases as they arise is the major goal.
机译:描述和要求保护的统计分析是一种预测性统计树模型,该模型克服了先前统计模型和回归分析中观察到的几个问题,同时确保了更高的准确性和预测能力。尽管本文所述的预测统计树模型的要求保护的用途是针对个体疾病的预测,但是要求保护的模型可以用于多种应用,包括疾病状态的预测,疾病状态的易感性或任何其他生物学状态以及其他适用的非生物状态。该模型首先筛选基因以降低噪声,然后针对大量簇应用基于k均值相关性的聚类,然后使用奇异值分解(SVD)从每个簇中提取单个显性因子(主要成分)。这会产生统计上显着数量的群集衍生奇异因子(我们称为元基因),这些因子表征了样品中基因表达的多种模式。该策略旨在提取多个此类模式,同时通过集群内的聚集来减小尺寸并消除基因特异性噪声。正式的预测分析然后在贝叶斯分类树分析中使用这些元基因。这将样本的多个递归分区划分为子组(分类树的“叶”),并将结果的贝叶斯预测概率与每个子组相关联。然后,通过对许多此类树模型的预测进行平均并以适当的权重来生成单个样本的总体预测。该模型包括使用迭代的样本外,交叉验证预测,一次将每个样本排除在数据集之外,然后从其余样本中重新拟合模型,并使用它来预测保留情况。这严格测试了模型的预测价值,并反映了现实世界中的预后情况,在这些情况下,对新病例的预测是主要目标。

著录项

  • 公开/公告号US2005170528A1

    专利类型

  • 公开/公告日2005-08-04

    原文格式PDF

  • 申请/专利权人 MIKE WEST;JOSEPH R. NEVINS;

    申请/专利号US20030692002

  • 发明设计人 MIKE WEST;JOSEPH R. NEVINS;

    申请日2003-10-24

  • 分类号G06F19/00;G01N33/48;G01N33/50;G01N33/543;

  • 国家 US

  • 入库时间 2022-08-21 22:22:16

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号