Improving sentence-level alignment of speech with imperfect transcripts using utterance concatenation and VAD

机译：使用话语级联和VAD改进具有不完善笔录的语音的句子级别对齐

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Preparing data for speech processing applications is in general a task which requires expert knowledge and takes up a large amount of time. Therefore, being able to automate as much as possible this process can have a significant impact on the expansion of the number of languages for which spoken interaction with the machines is available. In this paper we build upon a previously developed tool, ALISA, which was developed to align speech with imperfect transcripts using only 10 minutes of manually labelled data, in any alphabetic language. Although its error rate is around 0.6% at word-level, we noticed that the sentence-level accuracy is drastically affected by a large number of sentence-initial word deletions. To overcome this problem, we propose two methods: one based on utterance concatenation, and one based on voice activity detection (VAD). The results show that these simple methods can achieve around 10% relative improvement over the baseline results.

机译：通常，为语音处理应用程序准备数据是一项需要专家知识并占用大量时间的任务。因此，能够使该过程尽可能自动化，会对与机器进行口头交互的语言数量的扩展产生重大影响。在本文中，我们基于先前开发的工具ALISA进行开发，该工具使用任何字母语言仅使用10分钟的手动标记数据即可将语音与不完善的笔录对齐。尽管其错误率在单词级别约为0.6％，但我们注意到，句子级别的准确性受大量句子初始单词删除的影响很大。为克服此问题，我们提出了两种方法：一种基于话语串联，另一种基于语音活动检测（VAD）。结果表明，这些简单的方法可以相对于基线结果实现约10％的相对改进。

著录项

来源
《International Conference on Intelligent Computer Communication and Processing》|2016年|171-174|共4页
会议地点
作者
Alexandru Moldovan; Adriana Stan; Mircea Giurgiu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Speech; Hidden Markov models; Acoustics; Data models; Speech processing; Error analysis; Decoding;

机译：语音;隐马尔可夫模型;声学;数据模型;语音处理;错误分析;解码;

相似文献

外文文献
中文文献
专利

1. A Dynamic Alignment Algorithm for Imperfect Speech and Transcript [J] . Ye Tao, Xueqing Li, Bian Wu Computer Science and Information Systems . 2010,第1期

机译：语音和字幕不完整的动态对齐算法
2. Integrating imperfect transcripts into speech recognition systems for building high-quality corpora [J] . Benjamin Lecouteux, Georges Linares, Stanislas Oger Computer speech and language . 2012,第2期

机译：将不完美的笔录整合到语音识别系统中，以构建高质量的语料库
3. Improved tone concatenation rules in a formant-based Chinese text-to-speech system [J] . Lee L., Tseng C. IEEE Transactions on Speech and Audio Proceeding . 1993,第3期

机译：基于共振峰的中文文本语音转换系统中改进的音调连接规则
4. Improving sentence-level alignment of speech with imperfect transcripts using utterance concatenation and VAD [C] . Alexandru Moldovan, Adriana Stan, Mircea Giurgiu International Conference on Intelligent Computer Communication and Processing . 2016

机译：使用话语级联和VAD改善具有不完美转录物的句子级别对齐
5. Novel Frameworks for Attribute-Based Speech Emotion Recognition using Time-continuous Traces and Sentence-Level Annotations [D] . Parthasarathy, Srinivas. 2019

机译：基于属性的语音情感识别的新颖框架使用时间连续迹线和句子级注释
6. Forming Big Datasets through Latent Class Concatenation of Imperfectly Matched Databases Features [O] . Christopher W. Bartlett, Brett G. Klamer, Steven Buyske, 2019

机译：通过不完全匹配的数据库功能的潜在类串联形成大数据集
7. A PRAGMATIC STUDY ON THE ILLOCUTIONARY FORCE OF EXPRESSIVE UTTERANCES IN “THE KING’S SPEECH” MOVIE TRANSCRIPT [O] . Bayu Retnowati Putri Nurhandayani, Suparno Suparno, A. Handoko Pudjobroto 2015

机译：“国王演讲”动画成绩单中表达话语令人兴奋力的务实研究

Improving sentence-level alignment of speech with imperfect transcripts using utterance concatenation and VAD

摘要

著录项

相似文献

相关主题

期刊订阅