首页> 外文会议>International Conference on Intelligent Computer Communication and Processing >Improving sentence-level alignment of speech with imperfect transcripts using utterance concatenation and VAD
【24h】

Improving sentence-level alignment of speech with imperfect transcripts using utterance concatenation and VAD

机译:使用话语级联和VAD改进具有不完善笔录的语音的句子级别对齐

获取原文

摘要

Preparing data for speech processing applications is in general a task which requires expert knowledge and takes up a large amount of time. Therefore, being able to automate as much as possible this process can have a significant impact on the expansion of the number of languages for which spoken interaction with the machines is available. In this paper we build upon a previously developed tool, ALISA, which was developed to align speech with imperfect transcripts using only 10 minutes of manually labelled data, in any alphabetic language. Although its error rate is around 0.6% at word-level, we noticed that the sentence-level accuracy is drastically affected by a large number of sentence-initial word deletions. To overcome this problem, we propose two methods: one based on utterance concatenation, and one based on voice activity detection (VAD). The results show that these simple methods can achieve around 10% relative improvement over the baseline results.
机译:通常,为语音处理应用程序准备数据是一项需要专家知识并占用大量时间的任务。因此,能够使该过程尽可能自动化,会对与机器进行口头交互的语言数量的扩展产生重大影响。在本文中,我们基于先前开发的工具ALISA进行开发,该工具使用任何字母语言仅使用10分钟的手动标记数据即可将语音与不完善的笔录对齐。尽管其错误率在单词级别约为0.6%,但我们注意到,句子级别的准确性受大量句子初始单词删除的影响很大。为克服此问题,我们提出了两种方法:一种基于话语串联,另一种基于语音活动检测(VAD)。结果表明,这些简单的方法可以相对于基线结果实现约10%的相对改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号