首页> 外文会议>Annual international conference on research in computational molecular biology >Fast Phylogenetic Biodiversity Computations Under a Non-uniform Random Distribution
【24h】

Fast Phylogenetic Biodiversity Computations Under a Non-uniform Random Distribution

机译:非均匀随机分布下的快速系统发生生物多样性计算

获取原文

摘要

Computing the phylogenetic diversity of a set of species is an important part of many ecological case studies. More specifically, let T be a phylogenetic tree, and let R be a subset of its leaves representing the species under study. Specialists in ecology want to evaluate a function f(T, R) (a phylogenetic measure) that quantifies the evolutionary distance between the elements in R. But, in most applications, it is also important to examine how f(T,R) behaves when R is selected at random. The standard way to do this is to compute the mean and the variance of / among all subsets of leaves in T that consist of exactly |R| = r elements. For certain measures, there exist algorithms that can compute these statistics, under the condition that all subsets of r leaves are equiprobable. Yet, so far there are no algorithms that can do this exactly when the leaves in T are weighted with unequal probabilities. As a consequence, for this general setting, specialists try to compute the statistics of phylogenetic measures using methods which axe both inexact and very slow. We present for the first time exact and efficient algorithms for computing the mean and the variance of phylogenetic measures when leaf subsets of fixed size are selected from T under a non-uniform random distribution. In particular, let T be a tree that has n nodes and depth d, and let r be a non-negative integer. We show how to compute in O( (d + log n)n log n) time and O(n) space the mean and the variance for any measure that belongs to a well-defined class. We show that two of the most popular phylogenetic measures belong to this class: the Phylogenetic Diversity (PD) and the Mean Pairwise Distance (MPD). The random distribution that we consider is the Poisson binomial distribution restricted to subsets of fixed size r. More than that, we provide a stronger resu specifically for the PD and the MPD we describe algorithms that compute in a batched manner the mean and variance on T for all possible leaf-subset sizes in O((d + log n)n log n) time and O(n) space. For the PD and MPD, we implemented our algorithms that perform batched computations of the mean and variance. We also developed alternative implementations that compute in O((d + log n)n~2) time the same output. For both types of implementations, we conducted experiments and measured their performance in practice. Despite the difference in the theoretical performance, we show that the algorithms that run in O((d+log n)n~2) time are more efficient in practice, and numerically more stable. We also compared the performance of these algorithms with standard inexact methods that can be used in case studies. We show that our algorithms are outstandingly faster, making it possible to process much larger datasets than before. Our implementations will become publicly available through the R package PhyloMeasures.
机译:计算一组物种的系统发育多样性是许多生态案例研究的重要组成部分。更具体地说,令T为系统发育树,令R为代表研究物种的其叶的子集。生态学专家希望评估函数f(T,R)(系统发育度量),以量化R中元素之间的进化距离。但是,在大多数应用中,检查f(T,R)的行为也很重要当随机选择R时。执行此操作的标准方法是计算T中所有叶子的子集的均值和/的方差,这些子集正好由| R |组成= r个元素。对于某些度量,存在可以在r个叶子的所有子集均等的条件下计算这些统计信息的算法。但是,到目前为止,当T中的叶子以不等概率加权时,还没有算法可以准确地做到这一点。结果,对于这种一般情况,专家们尝试使用既不精确又非常缓慢的方法来计算系统发育测量的统计数据。我们首次提出了精确而有效的算法,用于在非均匀随机分布下从T中选择固定大小的叶子集时计算系统发育测度的均值和方差。特别地,令T为具有n个节点且深度为d的树,令r为非负整数。我们展示了如何在O((d + log n)n log n)时间和O(n)空间中计算属于明确定义的类的任何度量的均值和方差。我们显示,最流行的系统发育度量中的两个属于此类:系统发育多样性(PD)和平均成对距离(MPD)。我们考虑的随机分布是泊松二项分布,它局限于固定大小r的子集。不仅如此,我们提供了更强大的结果;特别是对于PD和MPD,我们描述了一种算法,该算法以分批方式计算O((d + log n)n log n)时间和O(n)空间中所有可能的叶子子集大小的T的均值和方差。对于PD和MPD,我们实现了对均值和方差进行批量计算的算法。我们还开发了在O((d + log n)n〜2)时间内计算相同输出的替代实现。对于这两种实现方式,我们进行了实验并测量了它们在实践中的性能。尽管理论上的性能有所不同,但我们证明,在O((d + log n)n〜2)时间内运行的算法在实践中更有效,并且在数值上更稳定。我们还将这些算法的性能与可用于案例研究的标准不精确方法进行了比较。我们证明了我们的算法速度更快,可以处理比以前更大的数据集。我们的实现将通过R软件包PhyloMeasures公开提供。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号