首页> 美国政府科技报告 >Spontaneous Speech Collection for the CSR Corpus.
【24h】

Spontaneous Speech Collection for the CSR Corpus.

机译:CsR语料库的自发语音收集。

获取原文

摘要

As part of a pilot data collection for DARPA's Continuous Speech Recognition (CSR) speech corpus, SRI International experimented with the collection of spontaneous speech material. The bulk of the CSR pilot data was read versions of news articles from the Wall Street Journal (WSJ), and the spontaneous sentences were to be similar material, but spontaneously dictated. In the first pilot portion of the data collection, twelve subjects including nine journalists were located, and instructed in how to dictate using the data collection hardware and software at SRI. These talkers pro-produced 1280 spontaneous sentences. In general, compared to read material, the spontaneous material took about two to three times more subject time to produce and about four times more experimenter time to produce, package, and ship. The paper provides details on the materials, subjects and procedures used in the study, and it describes the results in terms of speaker reaction and data production. The methods described are sufficient to collect fluent spontaneous recordings at a predictable rate. The spontaneous material differs in several characteristics from WSJ material; paragraphs and sentences tend to be longer, more world type are used, and by most measures, the material is more variable.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号