首页> 美国卫生研究院文献>PLoS Clinical Trials >Statistical Analysis of the Indus Script Using n-Grams
【2h】

Statistical Analysis of the Indus Script Using n-Grams

机译:使用n语法对印度语脚本进行统计分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The Indus script is one of the major undeciphered scripts of the ancient world. The small size of the corpus, the absence of bilingual texts, and the lack of definite knowledge of the underlying language has frustrated efforts at decipherment since the discovery of the remains of the Indus civilization. Building on previous statistical approaches, we apply the tools of statistical language processing, specifically n-gram Markov chains, to analyze the syntax of the Indus script. We find that unigrams follow a Zipf-Mandelbrot distribution. Text beginner and ender distributions are unequal, providing internal evidence for syntax. We see clear evidence of strong bigram correlations and extract significant pairs and triplets using a log-likelihood measure of association. Highly frequent pairs and triplets are not always highly significant. The model performance is evaluated using information-theoretic measures and cross-validation. The model can restore doubtfully read texts with an accuracy of about 75%. We find that a quadrigram Markov chain saturates information theoretic measures against a held-out corpus. Our work forms the basis for the development of a stochastic grammar which may be used to explore the syntax of the Indus script in greater detail.
机译:印度河文字是古代世界上主要的未解释文字之一。自从发现印度河文明的遗迹以来,语料库的规模小,缺少双语文本以及对底层语言的明确知识的缺乏使解密工作受挫。在以前的统计方法的基础上,我们应用统计语言处理工具(特别是n-gram马尔可夫链)来分析Indus脚本的语法。我们发现,字母组合遵循Zipf-Mandelbrot分布。文本初学者和作者分布不平等,为语法提供了内部证据。我们看到了强大的二元组相关性的明确证据,并使用对数似然度量关联来提取重要的对和三元组。高频对和三胞胎并不总是很重要。使用信息理论方法和交叉验证来评估模型性能。该模型可以准确地还原大约75%的可疑阅读文本。我们发现,四边形马尔可夫链使针对保留的语料库的信息理论测度饱和。我们的工作构成了发展随机语法的基础,该语法可用于更详细地探讨印度河脚本的语法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号