首页> 外文学位 >Maximum entropy density estimation and modeling geographic distributions of species.
【24h】

Maximum entropy density estimation and modeling geographic distributions of species.

机译:最大熵密度估计和物种物种地理分布建模。

获取原文
获取原文并翻译 | 示例

摘要

Maximum entropy (maxent) approach, formally equivalent to maximum likelihood, is a widely used density-estimation method. When input datasets are small, maxent is likely to overfit. Overfitting can be eliminated by various smoothing techniques, such as regularization and constraint relaxation, but theory explaining their properties is often missing or needs to be derived for each case separately. In this dissertation, we propose a unified treatment for a large and general class of smoothing techniques. We provide fully general guarantees on their statistical performance and propose optimization algorithms with complete convergence proofs. As special cases, we can easily derive performance guarantees for many known regularization types including L1 and L2-squared regularization. Furthermore, our general approach enables us to derive entirely new regularization functions with superior statistical guarantees. The new regularization functions use information about the structure of the feature space, incorporate information about sample selection bias, and combine information across several related density-estimation tasks. We propose algorithms solving a large and general subclass of generalized maxent problems, including all discussed in the dissertation, and prove their convergence. Our convergence proofs generalize techniques based on information geometry and Bregman divergences as well as those based more directly on compactness.; As an application of maxent, we discuss an important problem in ecology and conservation: the problem of modeling geographic distributions of species. Here, small sample sizes hinder accurate modeling of rare and endangered species. Generalized maxent offers several advantages over previous techniques. In particular, generalized maxent addresses the problem in a statistically sound manner and allows principled extensions to situations when data collection is biased or when we have access to data on many related species. The utility of our unified approach is demonstrated in comprehensive experiments on large real-world datasets. We find that generalized maxent is among the best-performing species-distribution modeling techniques. Our experiments also show that the contributions of this dissertation, i.e., regularization strategies, bias-removal approaches, and multiple-estimation techniques, all significantly improve the predictive performance of maxent.
机译:形式上等效于最大似然的最大熵(maxent)方法是一种广泛使用的密度估计方法。当输入数据集较小时,maxent可能会过拟合。过度拟合可以通过各种平滑技术来消除,例如正则化和约束松弛,但是解释其属性的理论常常缺失,或者需要针对每种情况分别导出。在本文中,我们提出了对大型和通用类平滑技术的统一处理。我们对其统计性能提供全面的保证,并提出具有完整收敛性证明的优化算法。作为特殊情况,我们可以轻松得出许多已知正则化类型(包括L1和L2平方正则化)的性能保证。此外,我们的通用方法使我们能够导出具有卓越统计保证的全新正则函数。新的正则化函数使用有关特征空间结构的信息,合并有关样本选择偏差的信息,并跨多个相关的密度估计任务组合信息。我们提出了解决广义广义问题的一个大而广义子类的算法,包括本文讨论的所有子类,并证明了它们的收敛性。我们的收敛证明概括了基于信息几何和Bregman散度的技术,以及更直接基于紧凑性的技术。作为maxent的应用,我们讨论了生态和保护方面的一个重要问题:对物种地理分布建模的问题。在这里,小样本量妨碍了对稀有和濒危物种的准确建模。与以前的技术相比,广义maxent具有许多优点。特别是,广义的maxent以统计上合理的方式解决了这个问题,并允许对数据收集有偏见或我们可以访问许多相关物种的数据的情况进行原则性扩展。在大型的真实数据集上进行的综合实验中证明了我们统一方法的实用性。我们发现广义maxent是表现最佳的物种分布建模技术之一。我们的实验还表明,本论文的贡献,即正则化策略,消除偏倚的方法和多重估计技术,均显着提高了maxent的预测性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号