首页> 外文会议>IEEE Workshop on Spoken Language Technology >Exploiting loudness dynamics in stochastic models of turn-taking
【24h】

Exploiting loudness dynamics in stochastic models of turn-taking

机译:开采响度动态的转型型号

获取原文

摘要

Stochastic turn-taking models have traditionally been implemented as N-grams, which condition predictions on recent binary-valued speech/non-speech contours. The current work re-implements this function using feed-forward neural networks, capable of accepting binary- as well as continuous-valued features; performance is shown to asymptotically approach that of the N-gram baseline as model complexity increases. The conditioning context is then extended to leverage loudness contours. Experiments indicate that the additional sensitivity to loudness considerably decreases average cross entropy rates on unseen data, by 0.03 bits per framing interval of 100 ms. This reduction is shown to make loudness-sensitive conversants capable of better predictions, with attention memory requirements at least 5 times smaller and responsiveness latency at least 10 times shorter than the loudness-insensitive baseline.
机译:传统上,随机轮转模型被实施为n-gram,对最近的二进制值语音/非语言轮廓的条件预测。 目前的工作使用前锋神经网络重新实现此功能,能够接受二进制和连续值; 表现表现为渐近地接近N-GRAM基线作为模型复杂性的增加。 然后扩展调节上下文以利用响度轮廓。 实验表明,对响度的额外敏感性显着降低了未经调整数据上的平均交叉熵速率,每帧/ 100ms的帧间隔0.03比特。 该减少被证明可以使能够更好地预测的响度敏感的倾向者,注意记忆要求至少比响度不敏感基线短至少10倍,响应延迟至少5倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号