Toward Human-Friendly ASR Systems: Recovering Capitalization and Punctuation for Vietnamese Text

Thi Thu HIEN NGUYEN; Thai BINH NGUYEN; Ngoc PHUONG PHAM; Quoc TRUONG DO; Tu LUC LE; Chi MAI LUONG

首页> 外文期刊>IEICE transactions on information and systems >Toward Human-Friendly ASR Systems: Recovering Capitalization and Punctuation for Vietnamese Text

【24h】

Toward Human-Friendly ASR Systems: Recovering Capitalization and Punctuation for Vietnamese Text

机译：迈向人友好的ASR系统：恢复越南文本的资本化和标点符号

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Speech recognition is a technique that recognizes words and sentences in audio form and converts them into text sentences. Currently, with the advancement of deep learning technologies, speech recognition has achieved very satisfactory results close to human abilities. However, there are still limitations in identification results such as lack of punctuation, capitalization, and standardized numerical data. Vietnamese also contains local words, homonyms, etc, which make it difficult to read and understand the identification results for users as well as to perform the next tasks in Natural Language Processing (NLP). In this paper, we propose to combine the transformer decoder with conditional random field (CRF) to restore punctuation and capitalization for the Vietnamese automatic speech recognition (ASR) output. By chunking input sentences and merging output sequences, it is possible to handle longer strings with greater accuracy. Experiments show that the method proposed in the Vietnamese post-speech recognition dataset delivers the best results.

机译：语音识别是一种识别音频形式的单词和句子的技术，并将它们转换为文本句子。目前，随着深度学习技术的进步，语音识别取得了非常令人满意的结果，接近人类能力。但是，识别结果仍有局限性，例如缺乏标点符号，大写和标准化的数值数据。越南人还包含本地单词，同音异义词等，这使得难以阅读和理解用户的识别结果，以及在自然语言处理（NLP）中执行下一个任务。在本文中，我们建议将变压器解码器与条件随机字段（CRF）相结合，以恢复越南自动语音识别（ASR）输出的标点符号和大写。通过划分输入句子和合并输出序列，可以以更高的准确度处理更长的字符串。实验表明，越南语音识别数据集中提出的方法提供了最佳结果。

著录项

来源
《IEICE transactions on information and systems》 |2021年第8期|共9页
作者
Thi Thu HIEN NGUYEN; Thai BINH NGUYEN; Ngoc PHUONG PHAM; Quoc TRUONG DO; Tu LUC LE; Chi MAI LUONG;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
capitalizationpunctuationautomatic speech recognition;

机译：CappualizationPuncationAutom语言识别;

相似文献

外文文献
中文文献
专利

1. Punctuation and Capitalization in Text Messages Sent from Traditional Mobile Phones Versus Smartphones:Implications for Higher Education [J] . GENEVIEVE JOHNSON International journal on E-learning . 2016,第3期

机译：传统手机与智能手机发送的短信中的标点符号和大写字母：对高等教育的启示
2. Bilingual Experiments on Automatic Recovery of Capitalization and Punctuation of Automatic Speech Transcripts [J] . Batista F., Moniz H., Trancoso I., Audio, Speech, and Language Processing, IEEE Transactions on . 2012,第2期

机译：自动恢复大写和标点自动语音的双语实验
3. A low latency sequential model and its user-focused evaluation for automatic punctuation of ASR closed captions [J] . Mate Akos Tuendik, Balazs Tarjan, Gyoergy Szaszak Computer speech and language . 2020,第Sepa期

机译：低延迟顺序模型及其以用户为中心的ASR隐藏标题的自动标点评估
4. Restoring Punctuation and Capitalization Using Transformer Models [C] . Andris Varavs, Askars Salimbajevs International conference on statistical language and speech processing . 2018

机译：使用变压器模型恢复标点和大写
5. Unregulated Space: Text-Messaging Habits as a Predictor of Punctuation Errors in the Academic Writing of College Students [D] . Achuff, Robert Ryan. 2017

机译：不受管制的空间：短信习惯是大学生学术写作中标点错误的预测指标
6. Implementation and Validation of the Roche Light Cycler 480 96-Well Plate Platform as a Real-Time PCR Assay for the Quantitative Detection of Cytomegalovirus (CMV) in Clinical Specimens Using the Luminex MultiCode ASRs System [O] . Shengwen Calvin Li, Kara J. Sparks, Leonard S. Sender 2020

机译：使用Luminex MultiCode ASRs系统对Roche Light Cycler 480 96孔板平台进行实时PCR测定和定量检测临床标本中的巨细胞病毒（CMV）的实施和验证
7. Minimum essentials in spelling, punctuation, capitalization and English grammar : being a study to determine what spelling, punctuation, capitalization and grammar should be studied in the junior high school and to determine what items of grammar should be stressed. [O] . John Beard -1

机译：拼写，标点符号，大写和英语语法的最低要点：作为一项研究，以确定初中应研究拼写，标点，资本化和语法的研究，并确定应强调语法的项目。
8. Grammar, punctuation, and capitalization: A handbook for technical writers and editors [R] . Mccaskill, Mary K. 1990

机译：语法，标点符号和大写：技术作家和编辑手册

Toward Human-Friendly ASR Systems: Recovering Capitalization and Punctuation for Vietnamese Text

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅