首页> 外文期刊>Computer speech and language >Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors
【24h】

Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors

机译:陶醉的语音检测:具有说话人归一化层次功能和GMM超向量的融合框架

获取原文
获取原文并翻译 | 示例

摘要

Segmental and suprasegmental speech signal modulations offer information about paralinguistic content such as affect, age and gender, pathology, and speaker state. Speaker state encompasses medium-term, temporary physiological phenomena influenced by internal or external bio-chemical actions (e.g., sleepiness, alcohol intoxication). Perceptual and computational research indicates that detecting speaker state from speech is a challenging task. In this paper, we present a system constructed with multiple representations of prosodic and spectral features that provided the best result at the Intoxication Subchallenge of Interspeech 2011 on the Alcohol Language Corpus. We discuss the details of each classifier and show that fusion improves performance. We additionally address the question of how best to construct a speaker state detection system in terms of robust and practical marginalization of associated variability such as through modeling speakers, utterance type, gender, and utterance length. As is the case in human perception, speaker normalization provides significant improvements to our system. We show that a held-out set of baseline (sober) data can be used to achieve comparable gains to other speaker normalization techniques. Our fused frame-level statistic-functional systems, fused GMM systems, and final combined system achieve unweighted average recalls (UARs) of 69.7%, 65.1%, and 68.8%, respectively, on the test set. More consistent numbers compared to development set results occur with matched-prompt training, where the UARs are 70.4%, 66.2%, and 71.4%, respectively. The combined system improves over the Challenge baseline by 5.5% absolute (8.4% relative), also improving upon our previously best result.
机译:分段和超分段语音信号调制提供有关副语言内容的信息,例如情感,年龄和性别,病理以及说话者状态。说话者状态包括受内部或外部生物化学作用(例如,嗜睡,酒精中毒)影响的中期,暂时性生理现象。感知和计算研究表明,从语音中检测说话者状态是一项艰巨的任务。在本文中,我们提出了一个由韵律和频谱特征的多种表示构成的系统,该系统在《 2011年国际酒精饮料语料库》中的“醉酒”挑战赛上提供了最佳结果。我们讨论了每个分类器的细节,并表明融合可以提高性能。我们还针对如何有效地构建发言人状态检测系统这一问题,例如通过对发言人,话语类型,性别和话语长度进行建模,从而有效地将相关变异性边缘化和边缘化。就像人类的感知一样,说话人归一化为我们的系统提供了显着的改进。我们表明,可以使用一组保留的基线(清醒)数据来获得与其他扬声器归一化技术相当的收益。我们的融合帧级统计功能系统,融合GMM系统和最终组合系统在测试集上的未加权平均召回率(UAR)分别为69.7%,65.1%和68.8%。匹配提示训练产生的数据比开发集结果更一致的数字,其中UAR分别为70.4%,66.2%和71.4%。组合系统相对于Challenge基准提高了5.5%的绝对值(相对于8.4%),也改进了我们之前的最佳结果。

著录项

  • 来源
    《Computer speech and language》 |2014年第2期|375-391|共17页
  • 作者单位

    Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, 3710 McClintock Ave., Los Angeles, CA 90089, USA;

    Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, 3710 McClintock Ave., Los Angeles, CA 90089, USA;

    Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, 3710 McClintock Ave., Los Angeles, CA 90089, USA;

    Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, 3710 McClintock Ave., Los Angeles, CA 90089, USA,Department of Linguistics, University of Southern California (USC), 3620 McClintock Ave., Los Angeles, CA 90089, USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    GMM supervectors; Speaker normalization; Hierarchical features; Intoxication detection; Speaker state; Cognitive and motor load;

    机译:GMM超向量;说话人标准化;层次特征;中毒检测;说话者状态;认知和运动负荷;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号