Dialect-Aware Modeling for End-to-End Japanese Dialect Speech Recognition

机译：端到端的终止日语方言语音识别的方言感知建模

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we present a novel model for building end-to-end Japanese-dialect automatic speech recognition (ASR) system. It is known that ASR systems modeling for the standard Japanese language is not suitable for recognizing Japanese dialects, which include accents and vocabulary different from standard Japanese. Therefore, we aim to produce dialect-specific end-to-end ASR systems for Japanese. Since it is difficult to collect a massive amount of speech-to-text paired data for each Japanese dialect, we utilize both dialect data and standard Japanese language data for constructing the dialect-specific end-to-end ASR systems. One primitive approach is a multi-condition modeling that simply merges the dialect data with the standardlanguage data. However, this simple multi-condition modeling causes inadequate dialect-specific characteristics to be captured because of a mismatch between the dialects and standard language. Thus, to produce reliable dialect-specific end-to-end ASR systems, we propose the dialect-aware modeling that utilizes dialect labels as auxiliary features. The main strength of the proposed method is that it effectively utilizes both dialect and standard-language data while capturing adequate dialect-specific characteristics. In our experiments using a home-made database of Japanese dialects, the proposed dialect-aware modeling out-performedthe simple multi-condition modeling and achieved an error reduction of 19.2%.

机译：在本文中，我们提出了一种建立端到端日语方言自动语音识别（ASR）系统的新型模型。众所周知，标准日语的ASR系统建模不适合识别日本方言，其中包括与标准日语不同的口音和词汇。因此，我们的目标是为日语制作特定于方言的端到端ASR系统。由于难以为每个日文方言收集大量的语音对文本配对数据，因此我们利用方言数据和标准日语数据来构建方言的端到端ASR系统。一种原始方法是一种多条件建模，只需将方言数据与标准语言数据合并。然而，这种简单的多条件建模导致由于方言和标准语言之间的不匹配而捕获的语句特定特征不足。因此，要生产可靠的方言特定的端到端ASR系统，我们提出了使用DALECT标签作为辅助功能的方言感知建模。该方法的主要优点是它有效地利用了方言和标准语言数据，同时捕获了特定的特定特征。在我们的实验中，使用日语方言的自制数据库，所提出的方言感知建模外出简单的多条件建模，并实现了19.2％的误差。

著录项

来源
《Asia-Pacific Signal and Information Processing Association Annual Summit and Conference》|2020年|297-301|共5页
会议地点
作者
Ryo Imaizumi; Ryo Masumura; Sayaka Shiota; Hitoshi Kiya;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Standards; Databases; Decoding; Mathematical model; Data models; Acoustics; Vocabulary;

机译：标准;数据库;解码;数学模型;数据模型;声学;词汇;

相似文献

外文文献
中文文献
专利

1. Tunisian Dialectal End-to-end Speech Recognition based on DeepSpeech [J] . Abir Messaoudi, Hatem Haddad, Chayma Fourati, Procedia Computer Science . 2021,第a期

机译：突尼斯方面的辩证端到端语音识别基于DeepSpeech
2. Tibetan Multi-Dialect Speech Recognition Using Latent Regression Bayesian Network and End-To-End Mode [J] . Yue Zhao, Jianjian Yue, Wei Song, 物联网杂志(英文) . 2019,第001期

机译：藏族多方面语音识别使用潜在回归贝叶斯网络和端到端模式
3. Bridging automatic speech recognition and psycholinguistics: Extending Shortlist to an end-to-end model of human speech recognition (L) [J] . Odette Scharenborg, Louis ten Bosch, Lou Boves, The Journal of the Acoustical Society of America . 2003,第6期

机译：桥接自动语音识别和心理语言学：将候选清单扩展到人类语音识别的端到端模型（L）
4. Data collection of Japanese dialects and its influence into speech recognition [C] . Kudo, I., Nakama, . 1996

机译：日语方言的数据收集及其对语音识别的影响
5. End-to-End Speech Recognition Models. [D] . Chan, William. 2016

机译：端到端语音识别模型。
6. Dynamic Acoustic Unit Augmentation with BPE-Dropout for Low-Resource End-to-End Speech Recognition [O] . Aleksandr Laptev, Andrei Andrusenko, Ivan Podluzhny, 2021

机译：用BPE-ropout进行动态声学单元增强用于低资源端到端语音识别
7. Bridging Automatic Speech Recognition and Psycholinguistics: Extending Shortlist to an End-to-End Model of Human Speech Recognition [O] . Scharenborg O.E., Bosch L.F.M. ten, Boves L.W.J., 2003

机译：桥接自动语音识别和心理语言学：将候选清单扩展到人类语音识别的端到端模型

Dialect-Aware Modeling for End-to-End Japanese Dialect Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅