首页> 外文期刊>Multimedia systems >Discovering salient prosodic cues and their interactions for automatic story segmentation in Mandarin broadcast news
【24h】

Discovering salient prosodic cues and their interactions for automatic story segmentation in Mandarin broadcast news

机译:在中文广播新闻中发现显着的韵律线索及其相互作用以实现自动故事分割

获取原文
获取原文并翻译 | 示例
       

摘要

This paper investigates speech prosody for automatic story segmentation in Mandarin broadcast news. Prosodic cues effectively used in English story segmentation deserve a re-investigation since the lexical tones of Mandarin may complicate the expressions of pitch declination and reset. Our data-oriented study shows that story boundaries cannot be clearly discriminated from utterance boundaries by speaker normalized pitch features due to their large variations across different Mandarin syllable tones. We thus propose to use speaker- and tone-normalized pitch features that can provide clear separations between utterance and story boundaries. Our study also shows that speaker-normalized pause duration is quite effective to separate between story and utterance boundaries, while speaker-normalized speech energy and syllable duration are not effective. Experiments using decision trees for story boundary detection reinforce the difference between English and Chinese, i.e., speaker- and tone-normalized pitch features should be favorably adopted in Mandarin story segmentation. We show that the combination of different prosodic cues can achieve a very high F-measure of 93.04% due to the complementarity between pause, pitch and energy. Analysis of the decision tree uncovered five major heuristics that show how speakers jointly utilize pause duration and pitch to separate speech into stories.
机译:本文研究了普通话广播新闻中用于自动故事分割的语音韵律。由于普通话的词汇语调可能会使音高偏斜和复位的表达变得复杂,因此在英语故事分割中有效使用的韵律提示值得重新研究。我们的面向数据的研究表明,由于说话者归一化的音高特征在不同的普通话音节音调中存在较大差异,因此无法将其与说话者的音色界限区分开。因此,我们建议使用扬声器和音调归一化的音高特征,这些特征可以在发声和故事边界之间提供清晰的分隔。我们的研究还表明,说话者归一化的停顿持续时间对于区分故事和话语边界非常有效,而说话者归一化的语音能量和音节持续时间则无效。使用决策树进行故事边界检测的实验增强了英语和汉语之间的差异,即在汉语故事分割中应优先采用说​​话人和音调标准化的音高特征。我们证明,由于暂停,音调和能量之间的互补性,不同韵律提示的组合可以实现93.04%的很高的F值。通过对决策树的分析,发现了五种主要的启发式方法,它们显示了说话者如何共同利用停顿持续时间和音调将语音分为故事。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号