首页> 美国卫生研究院文献>BMC Bioinformatics >An EM algorithm to improve the estimation of the probability of clonal relatedness of pairs of tumors in cancer patients
【2h】

An EM algorithm to improve the estimation of the probability of clonal relatedness of pairs of tumors in cancer patients

机译:EM算法可提高癌症患者肿瘤对的克隆相关性估计的可能性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Many studies have been published over the past 20 years that involved examining pairs of tumors at the molecular level from a set of patients to determine if, for some patients, the tumors are clonal, i.e. one of the tumors is a metastasis of the other tumor. We focus in this article on the setting where the data comprise somatic mutations from a panel of genes. Various statistical methods have been proposed in the literature. One approach has been to characterize the evidence for clonality using an index of clonal relatedness (see [ ] and [ ]). However in constructing the index these authors have focused solely on mutations that are shared between the two tumors, ignoring the information from mutations that occur in one tumor but not the other, evidence that argues against clonal relatedness. Other authors have used the proportion of observed mutations that are shared as the index [ , ], while Bao et al. [ ] formalized this idea by assuming that the matched mutations follow a binomial distribution. All of these approaches analyze each case independently. To our knowledge, the approach we discuss in this article, improving upon Mauguen et al. [ ], is the only available method that models the data from all cases collectively to obtain parametric estimates of the proportion of cases in the population that are clonal. Also our method relies heavily on the recognition of the fact that the probabilities of occurrence of the observed mutations are crucially informative,especially for shared mutations. Motivated by a study of contralateral breast cancer that will be described in more detail in the next section, we developed a random-effects model to simultaneously analyze each case for clonal relatedness and to obtain an estimate of how frequently this occurs [ ]. The corresponding function mutation.rem has been added to the R package , originally described in Ostrovnaya et al. [ ]. Overall, the properties of this model were demonstrated to be quite good, in the sense that the parameter estimation has generally low bias except in small samples, where only a few cases from the population are available [ ]. Recently, in applying the model anecdotally, we noticed that in such small datasets, examples can arise where the maximum likelihood estimator of the proportion of clonal cases is zero, even when mutational matches have been observed in some cases. This tends to occur if the absolute number of cases with matches is small, either because the overall number of cases is small, or the proportion of cases that are clonal is small, or in clonal cases the proportion of mutations that are matches is small. This is problematic because it renders the probabilities of clonal relatedness to be exactly zero for all individual cases, an estimate that seems unreasonable, especially if matches on rare mutations have been observed. We thus became interested in alternate estimation methods. In this article we compare estimates obtained by the EM algorithm versus our first approach using a one-step estimate of the conditional likelihood.
机译:在过去的20年中,已经发表了许多研究,涉及从一组患者的分子水平检查成对的肿瘤,以确定对于某些患者来说,肿瘤是否是克隆的,即其中一个肿瘤是另一肿瘤的转移灶。 。在本文中,我们将重点放在数据包含一组基因的体细胞突变的背景上。文献中已经提出了各种统计方法。一种方法是使用克隆相关性的索引来表征克隆性的证据(参见[]和[])。但是,在构建索引时,这些作者仅关注于两个肿瘤之间共有的突变,而忽略了一种肿瘤而非另一种肿瘤中发生的突变所提供的信息,这些证据证明了克隆相关性。其他作者使用观察到的突变比例作为指标[,],而Bao等人则使用该比例。 []通过假设匹配的突变遵循二项式分布来形式化这个想法。所有这些方法均独立分析每种情况。据我们所知,本文讨论的方法是对Mauguen等人的改进。 []是唯一可用于对所有病例的数据进行集中建模以获取克隆人群中病例比例的参数估计的唯一可用方法。同样,我们的方法在很大程度上依赖于对以下事实的认识:观察到的突变的发生概率至关重要,特别是对于共享突变。通过对侧乳腺癌的研究(将在下一节更详细地介绍),我们开发了一种随机效应模型,可以同时分析每种情况的克隆相关性,并估计这种情况的发生频率[]。相应的功能mutation.rem已添加到R包中,最初在Ostrovnaya等人中描述。 []。总体而言,从参数估计通常具有低偏差的意义上来说,该模型的性能已证明是相当不错的,除了小样本(样本中只有少数情况可用)以外。最近,在轶事地应用模型时,我们注意到在如此小的数据集中,即使在某些情况下已观察到突变匹配的情况下,也可能出现克隆案例所占比例的最大似然估计为零的示例。如果匹配的病例的绝对数量少,这往往会发生,这是因为总病例数很小,或者克隆的病例比例很小,或者在克隆的情况下,匹配突变的比例很小。这是有问题的,因为它使所有个体病例的克隆相关性概率完全为零,这种估计似乎是不合理的,尤其是在观察到罕见突变的情况下。因此,我们对替代估计方法产生了兴趣。在本文中,我们使用条件似然的单步估计将由EM算法获得的估计与我们的第一种方法进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号