...
首页> 外文期刊>PLoS Computational Biology >Maximum-Likelihood Model Averaging To Profile Clustering of Site Types across Discrete Linear Sequences
【24h】

Maximum-Likelihood Model Averaging To Profile Clustering of Site Types across Discrete Linear Sequences

机译:平均最大似然模型以分析离散线性序列中站点类型的聚类

获取原文
   

获取外文期刊封面封底 >>

       

摘要

A major analytical challenge in computational biology is the detection and description of clusters of specified site types, such as polymorphic or substituted sites within DNA or protein sequences. Progress has been stymied by a lack of suitable methods to detect clusters and to estimate the extent of clustering in discrete linear sequences, particularly when there is no a priori specification of cluster size or cluster count. Here we derive and demonstrate a maximum likelihood method of hierarchical clustering. Our method incorporates a tripartite divide-and-conquer strategy that models sequence heterogeneity, delineates clusters, and yields a profile of the level of clustering associated with each site. The clustering model may be evaluated via model selection using the Akaike Information Criterion, the corrected Akaike Information Criterion, and the Bayesian Information Criterion. Furthermore, model averaging using weighted model likelihoods may be applied to incorporate model uncertainty into the profile of heterogeneity across sites. We evaluated our method by examining its performance on a number of simulated datasets as well as on empirical polymorphism data from diverse natural alleles of the Drosophila alcohol dehydrogenase gene. Our method yielded greater power for the detection of clustered sites across a breadth of parameter ranges, and achieved better accuracy and precision of estimation of clusters, than did the existing empirical cumulative distribution function statistics.
机译:计算生物学中的主要分析挑战是检测和描述特定位点类型的簇,例如DNA或蛋白质序列中的多态或取代位点。缺乏合适的方法来检测聚类和评估离散线性序列中聚类的程度,尤其是在没有聚类大小或聚类计数的先验规格时,阻碍了进步的进展。在这里,我们导出并演示了层次聚类的最大似然方法。我们的方法采用了三方分治策略,该策略可以对序列异质性进行建模,描绘聚类,并得出与每个位点相关的聚类水平的概况。可以通过使用Akaike信息准则,校正的Akaike信息准则和贝叶斯信息准则的模型选择来评估聚类模型。此外,可以使用使用加权模型似然性进行模型平均来将模型不确定性纳入站点间异质性分布中。我们通过检查其方法在许多模拟数据集以及果蝇酒精脱氢酶基因的各种天然等位基因的经验多态性数据上的性能来评估我们的方法。与现有的经验累积分布函数统计数据相比,我们的方法为跨参数范围的聚类站点检测提供了更大的功能,并获得了更好的聚类估计精度和精确度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号