Crowd-sourcing prosodic annotation

Jennifer Cole; Timothy Mahrt; Joseph Roy

首页> 外文期刊>Computer speech and language >Crowd-sourcing prosodic annotation

【24h】

Crowd-sourcing prosodic annotation

机译：众包韵律注释

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Much of what is known about prosody is based on native speaker intuitions of idealized speech, or on prosodic annotations from trained annotators whose auditory impressions are augmented by visual evidence from speech waveforms, spectrograms and pitch tracks. Expanding the prosodic data currently available to cover more languages, and to cover a broader range of unscripted speech styles, is prohibitive due to the time, money and human expertise needed for prosodic annotation. We describe an alternative approach to prosodic data collection, with coarse-grained annotations from a cohort of untrained annotators performing rapid prosody transcription (RPT) using LMEDS, an open-source software tool we developed to enable large-scale, crowd-sourced data collection with RPT. Results from three RPT experiments are reported. The reliability of RPT is analysed comparing kappa statistics for lab-based and crowd-sourced annotations for American English, comparing annotators from the same (US) versus different (Indian) dialect groups, and comparing each RPT annotator with a ToBI annotation. Results show better reliability for same-dialect annotators (US), and the best overall reliability from crowd-sourced US annotators, though lab-based annotations are the most similar to ToBI annotations. A generalized additive mixed model is used to test differences among annotator groups in the factors that predict prosodic annotation. Results show that a common set of acoustic and contextual factors predict prosodic labels for all annotator groups, with only small differences among the RPT groups, but with larger effects on prosodic marking for ToBI annotators. The findings suggest methods for optimizing the efficiency of RPT annotations. Overall, crowd-sourced prosodic annotation is shown to be efficient, and to rely on established cues to prosody, supporting its use for prosody research across languages, dialects, speaker populations, and speech genres.

机译：关于韵律的很多了解都是基于母语人士对理想化语音的直觉，或者是来自受过训练的注释者的韵律注释，这些注释的听觉印象通过语音波形，频谱图和音高轨迹的视觉证据得以增强。由于韵律注释所需的时间，金钱和人的专业知识，目前禁止将韵律数据扩展到涵盖更多的语言并覆盖更广泛的非脚本化语音样式是令人望而却步的。我们描述了韵律数据收集的另一种方法，其中使用LMEDS（来自我们开发的开源软件工具，可以进行大规模的人群数据收集），使用未经训练的注释者队列进行粗调注释，以进行快速韵律转录（RPT）使用RPT。报告了三个RPT实验的结果。分析了RPT的可靠性，比较了美国英语基于实验室的注释和众包注释的kappa统计数据，比较了来自相同（美国）和不同（印度）方言组的注释者，并将每个RPT注释者与ToBI注释进行了比较。结果显示，尽管基于实验室的注释与ToBI注释最相似，但对于相同方向的注释器（US），其可靠性更高，并且来自众筹的美国注释器的总体可靠性最高。广义加性混合模型用于测试注释者组之间预测韵律注释的因素之间的差异。结果表明，一组通用的声学和上下文因素可以预测所有注释者组的韵律标签，而RPT组之间只有很小的差异，但是对ToBI注释者的韵律标记影响更大。这些发现提出了优化RPT注释效率的方法。总体而言，众包韵律注解被证明是有效的，并且依赖于已建立的线索进行韵律，从而支持其在跨语言，方言，说话者群体和语音体裁的韵律研究中的使用。

著录项

来源
《Computer speech and language》 |2017年第9期|300-325|共26页
作者
Jennifer Cole; Timothy Mahrt; Joseph Roy;
展开▼
作者单位

University of Illinois at Urbana-Champaign, Department of Linguistics 4080 Foreign Language Building, 707 S Mathews Avenue, MC-168, Urbana, Illinois 61801, USA,Northwestern University, Department of Linguistics, 2016 Sheridan Road Evanston, Illinois 60208, USA;

Laboratoire Parole et Langage, UMR 7309 CNRS, Aix-Marseille Universite, 5 avenue Pasteur BP 80975, Aix-en-Provence 13604, France;

University of Illinois at Urbana-Champaign, Department of Linguistics 4080 Foreign Language Building, 707 S Mathews Avenue, MC-168, Urbana, Illinois 61801, USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Prosody; Annotation; Crowd-sourcing; Generalized mixed effects model; Inter-rater reliability; Speech transcription;

机译：韵律注解;众包;广义混合效应模型;评价者间的可靠性;语音转录;

相似文献

外文文献
中文文献
专利

1. Crowd-sourcing meets deep learning: A hybrid approach for retinal image annotation [J] . Roesch Karin, Leifman George, Swedish Tristan, Investigative ophthalmology & visual science . 2016,第12期

机译：人群采购符合深度学习：视网膜图像注释的混合方法
2. Do prosodic manual annotations matter for Japanese speech synthesis systems with WaveNet vocoder? [J] . Hieu-Thi LUONG, Xin WANG, Junichi YAMAGISHI, 電子情報通信学会技術研究報告. 信号処理. Signal Processing . 2017,第516期

机译：使用Wavenet Vocoder的日语语音合成系统进行韵律手册注释吗？
3. Do prosodic manual annotations matter for Japanese speech synthesis systems with WaveNet vocoder? [J] . Hieu-Thi LUONG, Xin WANG, Junichi YAMAGISHI, 電子情報通信学会技術研究報告. 音声. Speech . 2017,第517期

机译：使用Wavenet Vocoder的日语语音合成系统进行韵律手册注释吗？
4. Crowd-sourcing annotation of complex NLU tasks: A case study of argumentative content annotation [C] . Tamar Lavee, Lili Kotlerman, Matan Orbach, Workshop on aggregating and analyaing crowdsourced annotations for NLP . 2019

机译：复杂NLU任务的众包注释：以辩论性内容注释为例
5. The Prosody and Morphology of Elastic Words in Chinese: Annotations and Analyses [D] . Dong, Yan. 2015

机译：中文弹性词的韵律与形貌：注释与分析
6. The Prosodic Marionette: a method to visualize speech prosody and assess perceptual and expressive prosodic abilities [O] . Jonathan S. Brumberg, Jill C. Thorson, Rupal Patel -1

机译：韵律木偶：一种可视化语音韵律并评估感知和表达韵律能力的方法
7. Comparison of Inter-rater Reliability of Human and Computer Prosodic Annotation Using Brazil’s Prosody Model [O] . Okim Kang, David O. Johnson 2015

机译：巴西韵律模型的人力和计算机博物馆注释帧间可靠性比较

Crowd-sourcing prosodic annotation

摘要

著录项

相似文献

相关主题

期刊订阅