Post-processing of the recognized speech for web presentation of large audio archive

机译：对大型语音档案的Web演示进行识别语音的后处理

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper deals with a post-processing phase of automatic transcription of spoken documents stored in the large Czech Radio audio archive (containing hundreds of thousands of recordings). The ultimate goal of the project is to transcribe them and to allow public access to their content. In this paper we focus on methods and algorithms for unsupervised post-processing of automatically recognized recordings. The post-processing is adapted for the needs of the web presentation of the archive. Up to now it has been used to process about 60,000 audio documents. We present the overall structure of the system as well as its core modules - speech recognition engine, speaker diarization module and final text processing. Special attention is paid to the punctuation issue. The punctuation accuracy is evaluated and compared to human use. In the final part of the paper we propose further improvements and ideas for the future research.

机译：本文涉及存储在大型捷克广播音频档案库（包含数十万个录音）中的语音文档自动转录的后期处理阶段。该项目的最终目标是转录它们并允许公众访问其内容。在本文中，我们着重于对自动识别的记录进行无监督后处理的方法和算法。后处理适合存档的Web演示的需要。到目前为止，它已用于处理约60,000个音频文档。我们介绍了系统的整体结构及其核心模块-语音识别引擎，说话者差异化模块和最终文本处理。特别注意标点符号问题。评估标点的准确性并将其与人类使用进行比较。在本文的最后部分，我们为将来的研究提出了进一步的改进和思路。

著录项

来源
《Telecommunications and Signal Processing (TSP), 2012 35th International Conference on》|2012年|p.441- 445|共5页
会议地点 Prague(CZ)
作者
Bohac Marek; Blavka Karel; Kucharova Michaela; Skodova Svatava;
展开▼
作者单位

Institute of Information Technology and Electronics, Faculty of Mechatronics, Technical University of Liberec, 461 17, Czech Republic;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类信号处理;
关键词

相似文献

外文文献
中文文献
专利

1. The cortical representation of the speech envelope is earlier for audiovisual speech than audio speech [J] . Michael J. Crosse Edmund C. Lalor Journal of Neurophysiology . 2014,第4期

机译：对于视听语音，语音包络的皮质表示早于音频语音
2. The cortical representation of the speech envelope is earlier for audiovisual speech than audio speech [J] . Michael J. Crosse Edmund C. Lalor Journal of Neurophysiology . 2014,第4期

机译：语音信封的皮质代表性比音频语音更早用于视听演讲
3. Exploration of Properly Combined Audiovisual Representation with the Entropy Measure in Audiovisual Speech Recognition [J] . Vakhshiteh Fatemeh, Almasganj Farshad Circuits, systems, and signal processing . 2019,第6期

机译：视听语音识别中正确结合视听表示与熵测度的探索
4. Post-processing of the recognized speech for web presentation of large audio archive [C] . Bohac Marek, Blavka Karel, Kucharova Michaela, International Conference on Telecommunications and Signal Processing . 2012

机译：大型音频存档的Web呈现识别语音的后处理
5. The Archival Web: Contextual Authority Files and the Representation of Institutional Textual Documents in Online Description. [D] . McLuhan-Myers, Madeleine. 2012

机译：档案网：上下文授权文件和在线描述中机构文本文档的表示。
6. Classifying Alzheimers Disease Using Audio and Text-Based Representations of Speech [O] . Rmani Haulcy, James Glass 2020

机译：使用基于文本的语音表示分类Alzheimer的疾病
7. Reconfiguration of speech recognizers through layered-grammar structure to provide ease of navigation and recognition accuracy in speech-web. [O] . Qureshi Irfan H. 2001

机译：通过分层语法结构重新配置语音识别器，以简化语音网络中的导航和识别准确性。

Post-processing of the recognized speech for web presentation of large audio archive

摘要

著录项

相似文献

相关主题

期刊订阅