Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification

Basu Joyanta; Khan Soma; Roy Rajib; Basu Tapan Kumar; Majumder Swanirbhar

首页> 外文期刊>Circuits, systems and signal processing >Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification

【24h】

Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification

机译：用于扬声器和语言识别的低资源东部和东北印度语言语言的多语种演讲语料库

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Research and development of speech technology applications in low-resource languages (LRL) are challenging due to the non-availability of proper speech corpus. Especially, for most of the Indian languages, the amount and type of data found in different digital sources are sparse and prior works are too few to serve the purpose of large-scale development needs. This paper illustrates the creation process of such an LRL corpus comprising of sixteen rarely studied Eastern and Northeastern (E&NE) Indian languages and presents the data variability with different statistics. Furthermore, several experiments are carried out using the collected LRL corpus to build baseline speaker identification (SID) and language identification (LID) system for acceptance evaluation. For investigating the presence of speaker and language-specific information, spectral features like Mel frequency cepstral coefficients (MFCCs), shifted delta cepstral (SDC), and relative spectral transform-perceptual linear prediction (RASTA-PLP) features are used here. Vector quantization (VQ), Gaussian mixture models (GMMs), support vector machine (SVM), and multilayer perceptron (MLP)-based models are developed to represent the speaker and language-specific information captured through the spectral features. Apart from this, i-vectors, time delay neural networks (TDNN), and recurrent neural network with long short-term memory (LSTM-RNN) method-based SID and LID models are being experimented with to comply with the recent approaches. Performances of the developed systems are analyzed with LRL corpus in terms of SID and LID accuracy. The best SID and LID performances are observed to be 94.49% and 95.69%, respectively, for the baseline systems using LSTM-RNN with MFCC + SDC feature.

机译：由于适当的语音语料库的非可用性，低资源语言（LRL）的语音技术应用的研究与开发是挑战。特别是，对于大多数印度语言，不同数字来源中发现的数据的数量和类型稀疏，并且在速度太少的情况下，以满足大规模发展需求的目的。本文说明了这类LRL语料库的创造过程，其中包括十六岁，很少学习东部和东北（E＆NE）印度语言，并提出了不同统计数据的数据变化。此外，使用收集的LRL语料库进行若干实验，以构建基线扬声器识别（SID）和语言识别（LID）系统进行接受评估。为了调查扬声器和语言特异性信息，这里使用麦克频谱系数（MFCC）等光谱特征，如图所用。矢量量化（VQ），高斯混合模型（GMM），支持向量机（SVM）和多层Perceptron（MLP）的模型，以表示通过光谱功能捕获的扬声器和语言特定信息。除此之外，I-Vectors，时间延迟神经网络（TDNN）和具有长短期存储器（LSTM-RNN）的SID和盖型模型的经常性神经网络正在尝试以遵守最近的方法。在SID和盖子精度方面用LRL语料库分析发达系统的表演。使用LSTM-RNN具有MFCC + SDC功能的基线系统，最佳SID和盖子性能分别观察到为94.49％和95.69％。

著录项

来源
《Circuits, systems and signal processing》 |2021年第10期|4986-5013|共28页
作者
Basu Joyanta; Khan Soma; Roy Rajib; Basu Tapan Kumar; Majumder Swanirbhar;
展开▼
作者单位

CDAC Sect 5 Kolkata India;

CDAC Sect 5 Kolkata India;

CDAC Sect 5 Kolkata India;

Indian Inst Technol Dept Elect Engn Kharagpur W Bengal India;

Tripura Univ Dept Informat Technol Suryamaninagar Tripura India;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Low-resource language (LRL); Speaker identification (SID); Language identification (LID); Mel frequency cepstral coefficients (MFCCs); i-Vectors; Deep neural networks (DNN);

机译：低资源语言（LRL）;扬声器识别（SID）;语言识别（盖子）;MEL频率谱系统系数（MFCC）;i-vectors;深神经网络（DNN）;

相似文献

外文文献
中文文献
专利

1. A Multilingual to Polyglot Speech Synthesizer for Indian Languages Using a Voice-Converted Polyglot Speech Corpus [J] . Vijayalakshmi P., Ramani B., Jeeva M. P. Actlin, Circuits, systems, and signal processing . 2018,第5期

机译：使用语音转换的多语种语音语料库的印度语多语言到多语种语音合成器
2. An Experimental Comparison of Modeling Techniques and Combination of Speaker - Specific Information from Different Languages for Multilingual Speaker Identification [J] . H. S. Jayanna, B. G. Nagaraja Journal of Intelligent Systems . 2016,第4期

机译：多种语言的说话人识别的建模技术和来自不同语言的说话人特定信息组合的实验比较
3. A Review on Speech Corpus Development for Automatic Speech Recognition in Indian Languages [J] . Cini kurian International Journal of Advanced Networking and Applications . 2015,第7018期

机译：语音语料库在印度语言中自动语音识别的发展述评
4. Performance Evaluation of Language Identification on Emotional Speech Corpus of Three Indian Languages [C] . Joyanta Basu, Swanirbhar Majumder Doctoral Symposium on Intelligence Enabled Research . 2021

机译：三种印度语言情绪语音语言识别性能评估
5. Automatic Speech Recognition for Low-Resource and Morphologically Complex Languages [D] . Morris, Ethan. 2021

机译：用于低资源和形态复杂语言的自动语音识别
6. Tutorial: Speech Assessment for Multilingual Children Who Do Not Speak the Same Language(s) as the Speech-Language Pathologist [O] . Sharynne McLeod, Sarah Verdon, Elise Baker, -1

机译：教程：针对与母语病理学家讲不同语言的多语言儿童的语音评估
7. Hierarchical Transfer Learning for Multilingual, Multi-Speaker, and Style Transfer DNN-Based TTS on Low-Resource Languages [O] . Kurniawati Azizah, Mirna Adriani, Wisnu Jatmiko 2020

机译：低资源语言的多语言，多扬声器和样式转移DNN的分层转移学习

Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅