VoCo: Text-based Insertion and Replacement in Audio Narration

ZEYU JIN; GAUTHAM J. MYSORE; STEPHEN DIVERDI; JINGWAN LU; ADAM FINKELSTEIN

首页> 外文期刊>ACM Transactions on Graphics >VoCo: Text-based Insertion and Replacement in Audio Narration

【24h】

VoCo: Text-based Insertion and Replacement in Audio Narration

机译：VoCo：音频旁白中基于文本的插入和替换

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Editing audio narration using conventional software typically involves many painstaking low-level manipulations. Some state of the art systems allow the editor to work in a text transcript of the narration, and perform select, cut, copy and paste operations directly in the transcript; these operations are then automatically applied to the waveform in a straightforward manner. However, an obvious gap in the text-based interface is the ability to type new words not appearing in the transcript, for example inserting a new word for emphasis or replacing a misspoken word. While high-quality voice synthesizers exist today, the challenge is to synthesize the new word in a voice that matches the rest of the narration. This paper presents a system that can synthesize a new word or short phrase such that it blends seamlessly in the context of the existing narration. Our approach is to use a text to speech synthesizer to say the word in a generic voice, and then use voice conversion to convert it into a voice that matches the narration. Offering a range of degrees of control to the editor, our interface supports fully automatic synthesis, selection among a candidate set of alternative pronunciations, fine control over edit placements and pitch profiles, and even guidance by the editors own voice. The paper presents studies showing that the output of our method is preferred over baseline methods and often indistinguishable from the original voice.

机译：使用常规软件编辑音频旁白通常涉及许多艰苦的低级操作。某些先进的系统允许编辑者处理旁白的文本记录，并直接在记录中执行选择，剪切，复制和粘贴操作。然后，这些操作将以直接的方式自动应用于波形。但是，基于文本的界面中明显的缺陷是能够键入未出现在抄本中的新单词，例如，插入新单词以强调单词或替换错误的单词。尽管当今存在高质量的语音合成器，但挑战在于如何以与叙述的其余部分匹配的语音合成新词。本文提出了一种系统，该系统可以合成一个新单词或短短语，使其在现有叙述的上下文中无缝融合。我们的方法是使用文本语音合成器以通用语音说出单词，然后使用语音转换将其转换为与旁白匹配的语音。我们的界面为编辑人员提供了一定程度的控制权，它支持全自动合成，在备选发音的候选集合中进行选择，对编辑位置和音高曲线进行精细控制，甚至由编辑者自己的声音进行指导。该论文提出的研究表明，我们的方法的输出优于基线方法，并且通常与原始声音没有区别。

著录项

来源
《ACM Transactions on Graphics》 |2017年第4cd期|96.1-96.13|共13页
作者
ZEYU JIN; GAUTHAM J. MYSORE; STEPHEN DIVERDI; JINGWAN LU; ADAM FINKELSTEIN;
展开▼
作者单位

Princeton University;

Adobe Research;

Adobe Research;

Adobe Research;

Princeton University;

展开▼
收录信息美国《科学引文索引》(SCI);
原文格式 PDF
正文语种 eng
中图分类
关键词
audio; human computer interaction;

机译：音频人机交互;

相似文献

外文文献
中文文献
专利

1. Testing audio narration: the emotional impact of language in audio description [J] . Marina Ramos Caro Perspectives: studies in translatology . 2016,第4期

机译：测试音频旁白：音频描述中语言的情感影响
2. Patent Issued for Audio Synchronization for Document Narration with User-Selected Playback [J] . Journal of Engineering . 2013,第12期

机译：音频同步已发布，具有用户选择的播放功能，可用于文档旁白的音频同步
3. Learning from Animated Concept Maps with Concurrent Audio Narration [J] . John C. Nesbit, Olusola O. Adesope The Journal of Experimental Education . 2011,第2期

机译：从具有并行音频旁白的动画概念图中学习
4. QUERYD: A Video Dataset with High-Quality Text and Audio Narrations [C] . Andreea-Maria Oncescu, João F. Henriques, Yang Liu, IEEE International Conference on Acoustics, Speech and Signal Processing . 2021

机译：Queryd：具有高质量文本和音频叙述的视频数据集
5. Speech Synthesis for Text-Based Editing of Audio Narration [D] . Jin, Zeyu. 2018

机译：基于文本的音频旁白编辑的语音合成
6. Classifying Alzheimers Disease Using Audio and Text-Based Representations of Speech [O] . Rmani Haulcy, James Glass 2020

机译：使用基于文本的语音表示分类Alzheimer的疾病
7. Audio description, audio narration - a new era in AVT [O] . Kruger Jan-Louis, Orero Pilar 2010

机译：音频描述，音频旁白-AVT的新时代

VoCo: Text-based Insertion and Replacement in Audio Narration

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅