首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Preserving Word-Level Emphasis in Speech-to-Speech Translation
【24h】

Preserving Word-Level Emphasis in Speech-to-Speech Translation

机译:在语音到语音翻译中保留单词级重点

获取原文
获取原文并翻译 | 示例
       

摘要

Speech-to-speech translation (S2ST) is a technology that translates speech across languages, which can remove barriers in cross-lingual communication. In the conventional S2ST systems, the linguistic meaning of speech was translated, but paralinguistic information conveying other features of the speech such as emotion or emphasis were ignored. In this paper, we propose a method to translate paralinguistic information, specifically focusing on emphasis. The method consists of a series of components that can accurately translate emphasis using all acoustic features of speech. First, linear-regression hidden semi-Markov models (LR-HSMMs) are used to estimate a real-numbered emphasis value for every word in an utterance, resulting in a sequence of values for the utterance. After that the emphasis translation module translates the estimated emphasis sequence into a target language emphasis sequence using a conditional random field model considering the features of emphasis levels, words, and part-of-speech tags. Finally, the speech synthesis module synthesizes emphasized speech with LR-HSMMs, taking into account the translated emphasis sequence and transcription. The results indicate that our translation model can translate emphasis information, correctly emphasizing words in the target language with 91.6% F -measure by objective evaluation. A listening test with human subjects further showed that they could identify the emphasized words with 87.8% F -measure, and that the naturalness of the audio was preserved.
机译:语音到语音翻译(S2ST)是一种跨语言翻译语音的技术,可以消除跨语言交流的障碍。在常规的S2ST系统中,语音的语言含义被翻译,但是传达语音其他特征(如情感或强调)的副语言信息被忽略。在本文中,我们提出了一种翻译旁语言信息的方法,特别侧重于重点。该方法由一系列组件组成,这些组件可以使用语音的所有声学特征准确地转换重点。首先,线性回归隐藏半马尔可夫模型(LR-HSMM)用于估计话语中每个单词的实数强调值,从而产生话语值序列。之后,重点翻译模块使用条件随机字段模型将估计的重点序列转换为目标语言重点序列,同时考虑重点级别,单词和词性标签的特征。最后,语音合成模块考虑到翻译后的强调序列和转录,利用LR-HSMM来合成强调的语音。结果表明,通过客观评价,我们的翻译模型能够正确地翻译重点信息,正确地强调了目标语言中的单词,占91.6%。对人类受试者的听力测试进一步表明,他们可以用87.8%的F值来识别强调的单词,并且可以保留音频的自然性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号