首页> 美国卫生研究院文献>other >Modeling Linguistic Variables With Regression Models: Addressing Non-Gaussian Distributions Non-independent Observations and Non-linear Predictors With Random Effects and Generalized Additive Models for Location Scale and Shape
【2h】

Modeling Linguistic Variables With Regression Models: Addressing Non-Gaussian Distributions Non-independent Observations and Non-linear Predictors With Random Effects and Generalized Additive Models for Location Scale and Shape

机译:使用回归模型为语言变量建模:处理具有随机效应的非高斯分布非独立观测值和非线性预测变量以及位置尺度和形状的广义加性模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for ‘difficult’ variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we assess a range of candidate distributions, including the Sichel, Delaporte, Box-Cox Green and Cole, and Box-Cox t distributions. We find that the Box-Cox t distribution, with appropriate modeling of its parameters, best fits the conditional distribution of phonemic inventory size. We finally discuss the specificities of phoneme counts, weak effects, and how GAMLSS should be considered for other linguistic variables.
机译:随着统计方法越来越多地用于语言学中,必须注意所用方法和算法的选择。这是特别正确的,因为它们需要满足一些假设才能提供有效的结果,并且因为科学文章仍然经常没有报告是否满足这样的假设。但是,在各个方向上都取得了进展,其中之一就是引入了能够对数据建模的技术,这些技术无法使用更简单的线性回归模型正确地进行分析。我们报告语言学统计建模方面的最新进展。我们首先描述线性混合效应回归模型(LMM)和广义线性混合效应模型(GLMM),线性混合效应回归模型解决了观测分组问题,该模型为因变量提供了一系列分布。然后引入广义加性模型(GAM),该模型允许对因变量和预测变量之间的非线性参数关系或非参数关系进行建模。然后,我们重点介绍广义附加模型为位置,比例和形状(GAMLSS)提供的可能性。我们将说明它们如何使它们超越普通分布(例如高斯或泊松)成为可能,并提供适当的推论框架来解释“困难”变量,例如具有高度过度分散的计数数据。我们还演示了当不仅对因变量的均值建模,而且对其方差,偏度和峰度建模时,它们如何为数据提供有趣的观点。作为说明,整个文章中都分析了音素库大小的情况。对于1,500多种语言,我们将说说话者的人数,与非洲的距离,对语言接触强度的估计以及语言关系作为预测指标。我们讨论了使用随机效应来解释家谱关系,为模型计数数据选择合适的分布以及非线性关系。依靠GAMLSS,我们评估了一系列候选分布,包括Sichel,Delaporte,Box-Cox Green和Cole以及Box-Cox t分布。我们发现,Box-Cox t分布及其参数经过适当建模,最适合音素库大小的条件分布。最后,我们讨论音素计数的特殊性,较弱的影响,以及对于其他语言变量应如何考虑GAMLSS。

著录项

相似文献

  • 外文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号