Cross-modal retrieval of scripted speech audio

机译：脚本语音音频的跨模式检索

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Abstract: This paper describes an approach to the problem of searching speech-based digital audio using cross-modal information retrieval. Audio containing speech (speech-based audio) is difficult to search. Open vocabulary speech recognition is advancing rapidly, but cannot yield high accuracy in either search or transcription modalities. However, text can be searched quickly and efficiently with high accuracy. Script- light digital audio is audio that has an available transcription. This is a surprisingly large class of content including legal testimony, broadcasting, dramatic productions and political meetings and speeches. An automatic mechanism for deriving the synchronization between the transcription and the audio allows for very accurate retrieval of segments of that audio. The mechanism described in this paper is based on building a transcription graph from the text and computing biphone probabilities for the audio. A modified beam search algorithm is presented to compute the alignment. !22

机译：摘要：本文描述了一种使用交叉模式信息检索来搜索基于语音的数字音频的方法。包含语音的音频（基于语音的音频）很难搜索。开放式词汇语音识别正在迅速发展，但是在搜索或转录方式上均无法产生很高的准确性。但是，可以快速，高效，高精度地搜索文本。脚本光数字音频是具有可用转录的音频。这是令人惊讶的一大类内容，包括法律证词，广播，戏剧作品以及政治会议和演讲。用于获得转录和音频之间的同步的自动机制允许非常准确地检索该音频的片段。本文描述的机制基于从文本构建转录图并计算音频的双音素概率。提出了一种改进的波束搜索算法来计算对准。！22

著录项

来源
《Conference on multimedia computing and networking》|1998年|p.226-235|共10页
会议地点
作者
Charles B. Owen; Dartmouth College; Hanover; NH; USA; Fillia Makedon; Dartmouth College; Hanover; NH; USA.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Cross-modal Interactions during Perception of Audiovisual Speech and Nonspeech Signals: An fMRI Study [J] . Ingo Hertrich Susanne Dietrich and Hermann Ackermann Journal of Cognitive Neuroscience . 2011,第1期

机译：视听语音和非语音信号感知过程中的跨模式交互：fMRI研究
2. Cross-modal integration during vowel identification in audiovisual speech: a functional magnetic resonance imaging study. [J] . Murase M, Saito DN, Kochiyama T, Neuroscience Letters: An International Multidisciplinary Journal Devoted to the Rapid Publication of Basic Research in the Brain Sciences . 2008,第1期

机译：视听语音元音识别期间的跨模式整合：功能性磁共振成像研究。
3. Cross-modal binding and activated attentional networks during audio-visual speech integration: a functional MRI study. [J] . Saito DN, Yoshimura K, Kochiyama T, Cerebral cortex . 2005,第11期

机译：视听语音整合过程中的跨模式绑定和激活的注意力网络：功能性MRI研究。
4. Cross-modal Retrieval of Scripted Speech Audio [C] . Charles B. Owen, Fillia Makedon Conference on multimedia computing and networking . 1998

机译：跨模型检索脚本语音音频
5. Audio parsing and rapid speaker adaptation in speech recognition for spoken document retrieval. [D] . Zhou, Bowen. 2003

机译：语音识别中的音频解析和快速的说话人自适应，可用于语音文档检索。
6. Cross-Modal Matching of Audio-Visual German and French Fluent Speech in Infancy [O] . Claudia Kubicek, Anne Hillairet de Boisferon, Eve Dupierrix, 2010

机译：婴儿期视听德语和法语流利语音的跨模态匹配
7. Cross-modal Suppression of Auditory Association Cortex by Visual Speech as a Mechanism for Audiovisual Speech Perception [O] . Patrick J. Karas, John F. Magnotti, Brian A. Metzger, 2019

机译：视觉言论视听协会皮质的跨模型抑制作为视听语音感知的机制

Cross-modal retrieval of scripted speech audio

摘要

著录项

相似文献

相关主题

期刊订阅