...
首页> 外文期刊>Computer speech and language >Crowd-sourcing prosodic annotation
【24h】

Crowd-sourcing prosodic annotation

机译:众包韵律注释

获取原文
获取原文并翻译 | 示例
           

摘要

Much of what is known about prosody is based on native speaker intuitions of idealized speech, or on prosodic annotations from trained annotators whose auditory impressions are augmented by visual evidence from speech waveforms, spectrograms and pitch tracks. Expanding the prosodic data currently available to cover more languages, and to cover a broader range of unscripted speech styles, is prohibitive due to the time, money and human expertise needed for prosodic annotation. We describe an alternative approach to prosodic data collection, with coarse-grained annotations from a cohort of untrained annotators performing rapid prosody transcription (RPT) using LMEDS, an open-source software tool we developed to enable large-scale, crowd-sourced data collection with RPT. Results from three RPT experiments are reported. The reliability of RPT is analysed comparing kappa statistics for lab-based and crowd-sourced annotations for American English, comparing annotators from the same (US) versus different (Indian) dialect groups, and comparing each RPT annotator with a ToBI annotation. Results show better reliability for same-dialect annotators (US), and the best overall reliability from crowd-sourced US annotators, though lab-based annotations are the most similar to ToBI annotations. A generalized additive mixed model is used to test differences among annotator groups in the factors that predict prosodic annotation. Results show that a common set of acoustic and contextual factors predict prosodic labels for all annotator groups, with only small differences among the RPT groups, but with larger effects on prosodic marking for ToBI annotators. The findings suggest methods for optimizing the efficiency of RPT annotations. Overall, crowd-sourced prosodic annotation is shown to be efficient, and to rely on established cues to prosody, supporting its use for prosody research across languages, dialects, speaker populations, and speech genres.
机译:关于韵律的很多了解都是基于母语人士对理想化语音的直觉,或者是来自受过训练的注释者的韵律注释,这些注释的听觉印象通过语音波形,频谱图和音高轨迹的视觉证据得以增强。由于韵律注释所需的时间,金钱和人的专业知识,目前禁止将韵律数据扩展到涵盖更多的语言并覆盖更广泛的非脚本化语音样式是令人望而却步的。我们描述了韵律数据收集的另一种方法,其中使用LMEDS(来自我们开发的开源软件工具,可以进行大规模的人群数据收集),使用未经训练的注释者队列进行粗调注释,以进行快速韵律转录(RPT)使用RPT。报告了三个RPT实验的结果。分析了RPT的可靠性,比较了美国英语基于实验室的注释和众包注释的kappa统计数据,比较了来自相同(美国)和不同(印度)方言组的注释者,并将每个RPT注释者与ToBI注释进行了比较。结果显示,尽管基于实验室的注释与ToBI注释最相似,但对于相同方向的注释器(US),其可靠性更高,并且来自众筹的美国注释器的总体可靠性最高。广义加性混合模型用于测试注释者组之间预测韵律注释的因素之间的差异。结果表明,一组通用的声学和上下文因素可以预测所有注释者组的韵律标签,而RPT组之间只有很小的差异,但是对ToBI注释者的韵律标记影响更大。这些发现提出了优化RPT注释效率的方法。总体而言,众包韵律注解被证明是有效的,并且依赖于已建立的线索进行韵律,从而支持其在跨语言,方言,说话者群体和语音体裁的韵律研究中的使用。

著录项

  • 来源
    《Computer speech and language》 |2017年第9期|300-325|共26页
  • 作者单位

    University of Illinois at Urbana-Champaign, Department of Linguistics 4080 Foreign Language Building, 707 S Mathews Avenue, MC-168, Urbana, Illinois 61801, USA,Northwestern University, Department of Linguistics, 2016 Sheridan Road Evanston, Illinois 60208, USA;

    Laboratoire Parole et Langage, UMR 7309 CNRS, Aix-Marseille Universite, 5 avenue Pasteur BP 80975, Aix-en-Provence 13604, France;

    University of Illinois at Urbana-Champaign, Department of Linguistics 4080 Foreign Language Building, 707 S Mathews Avenue, MC-168, Urbana, Illinois 61801, USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Prosody; Annotation; Crowd-sourcing; Generalized mixed effects model; Inter-rater reliability; Speech transcription;

    机译:韵律注解;众包;广义混合效应模型;评价者间的可靠性;语音转录;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号