首页> 外文会议>International conference on computational linguistics >Towards identifying the optimal datasize for lexically-based Bayesian inference of linguistic phylogenies
【24h】

Towards identifying the optimal datasize for lexically-based Bayesian inference of linguistic phylogenies

机译:旨在为基于语言的系统发育的基于词法的贝叶斯推理确定最佳数据量

获取原文

摘要

Bayesian linguistic phylogenies are standardly based on cognate matrices for words referring to a fix set of meanings-typically around 100-200. To this day there has not been any empirical investigation into which datasize is optimal. Here we determine, across a set of language families, the optimal number of meanings required for the best performance in Bayesian phylogenetic inference. We rank meanings by stability, infer phylogenetic trees using first the most stable meaning, then the two most stable meanings, and so on, computing the quartet distance of the resulting tree to the tree proposed by language family experts al each step of datasize increase. When a gold standard tree is not available we propose to instead compute the quartet distance between the tree based on the n-most stable meaning and the one based on the n + 1-most stable meanings, increasing n from 1 to N - 1. where Ar is the total number of meanings. The assumption here is that the value of n for which the quartet distance begins to stabilize is also the value at which the quality of the tree ceases to improve. We show that this assumption is borne out. The results of the two methods vary across families, and the optimal number of meanings appears to correlate with the number of languages under consideration.
机译:贝叶斯语言系统学标准地基于同源矩阵,即单词所指的是一组固定的含义,通常约为100-200。迄今为止,还没有任何关于数据大小最佳的经验研究。在这里,我们确定了一组语言族中贝叶斯系统发生推理中最佳性能所需要的最佳意思数量。我们通过稳定性对含义进行排名,首先使用最稳定的含义来推断系统发育树,然后使用两个最稳定的含义来推论系统树,依此类推,计算结果树到语言族专家建议的树的四重奏距离,每增加一步数据量。当没有黄金标准树时,我们建议改为计算基于n最稳定含义的树与基于n +1最稳定含义的树之间的四方距离,将n从1增加到N-1。其中Ar是含义的总数。这里的假设是,四重奏距离开始稳定的n的值也是树的质量停止改善的n的值。我们证明了这一假设。两种方法的结果因族而异,并且最佳含义数量似乎与所考虑的语言数量相关。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号