【24h】

Chinese TIMIT: A TIMIT-like corpus of standard Chinese

机译:中文TIMIT:类似TIMIT的标准中文语料库

获取原文
获取原文并翻译 | 示例

摘要

This paper describes an effort to build a TIMIT-like corpus in Standard Chinese, which is part of our "Global TIMIT" project. Three steps are involved and detailed in the paper: selection of sentences; speaker recruitment and recording; and phonetic segmentation. The corpus consists of 6000 sentences read by 50 speakers (25 females and 25 males). Phonetic segmentation obtained from forced alignment is provided, which has 93.2% agreement (of phone boundaries) within 20 ms compared to manual segmentation on 50 randomly selected sentences. Statistics on the number of tokens and mean duration of phones and tones in the corpus are also reported. Males have shorter phones/tones but more and longer utterance internal silences than females, demonstrating that males in this dataset speak faster but pause more frequently and longer.
机译:本文介绍了构建标准中文TIMIT语料库的工作,这是我们“全球TIMIT”项目的一部分。本文涉及并详细介绍了三个步骤:句子的选择;句子的选择。发言人招聘和录音;和语音细分。语料库由6000位句子组成,由50位说话者(25位女性和25位男性)朗读。提供了从强制对齐获得的语音分割,相比于对50个随机选择的句子进行手动分割,该语音分割在20毫秒内具有(电话边界的)93.2%的一致性。还报告了令牌的数量以及语料库中电话和音频的平均持续时间的统计信息。与女性相比,男性的电话/音调更短,但内部沉默的时间越来越长,这表明该数据集中的男性说话速度更快,但停顿的频率更高且更长。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号