首页> 美国卫生研究院文献>PLoS Clinical Trials >On the Accuracy of Language Trees
【2h】

On the Accuracy of Language Trees

机译:论语言树的准确性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages: a set of parallel corpora whose compilation represents a paramount achievement in linguistics.From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases.In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve it.
机译:历史语言学的目的是从有关语言进化相关性的信息出发,推断出最可能的语言系统树。可用的信息通常是许多不同语言的同源(词汇,语音,句法)特征或字符的列表:一组平行语料库,其汇编表示语言学的最重要成就。从这个角度来看,语言树的重构是反演的一个例子问题:从目前的,不完整的,通常是嘈杂的信息开始,一个目的是推断最可能的过去进化史。反问题的一个基本问题是对推论的评估。处理此问题的一种标准方法是使用人工模型生成数据,以便完全访问要推断的进化过程。此过程存在一个固有的局限性:在处理真实数据集时,通常不知道哪种进化模型最适合它们。一种可能的解决方法是将算法推断与专家分类进行比较。这是我们通过与Ethnologue专家分类相比对重建方法的准确性进行的全面调查而得出的观点。我们特别关注使用全球语言数据库进行系统发育重建的基于距离的最新方法。为了评估推断树的准确性,我们介绍并描述了树间距离的标准定义的两种概括。基于这些分数,我们可以量化所考虑的基于距离的算法的相对性能。进一步,我们量化了可用数据库的完整性和覆盖范围如何影响重建的准确性。最后,我们得出了有关历史语言学重构的准确性的立场以及改进它的主要方向的一些结论。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号