首页> 外文OA文献 >BAStat : New Statistical Resources at the Bavarian Archive for Speech Signals
【2h】

BAStat : New Statistical Resources at the Bavarian Archive for Speech Signals

机译:Bastat:巴伐利亚档案馆新的统计资源,用于语音信号

摘要

A new type of language resource ’BAStat’ has been released by the Bavarian Archive for Speech Signals. In contrast to primary resources like speech and text corpora BAStat comprises statistical estimates based on a number of primary resources: first and second order occurrence probability of phones, syllables and words, duration statistics, probabilities of pronunciation variants of words and probabilities of context information. Unlike other statistical speech resources BAStat is based solely on recordings of conversational German and therefore models spoken language. It consists of 7-bit ASCII tables and matrices to maximize inter-operability between different platforms and can be downloaded from the BAS web-site. This paper gives a detailed description about the empirical basis, the contained data types, some interesting interpretations and a brief comparison to the text-based statistical resource CELEX.
机译:巴伐利亚语音信号档案馆已经发布了一种新型的语言资源“ BAStat”。与语音和文本语料库这样的主要资源相比,BAStat包含基于多种主要资源的统计估计值:电话,音节和单词的一阶和二阶出现概率,持续时间统计信息,单词发音变体的概率以及上下文信息的概率。与其他统计语音资源不同,BAStat仅基于会话德语的记录,因此可以模拟口头语言。它由7位ASCII表和矩阵组成,可最大程度地提高不同平台之间的互操作性,并可从BAS网站上下载。本文详细介绍了经验基础,所包含的数据类型,一些有趣的解释以及与基于文本的统计资源CELEX的简要比较。

著录项

  • 作者

    Schiel Florian;

  • 作者单位
  • 年度 2010
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号