FuzzyGCP: A deep learning architecture for automatic spoken language identification from speech signals

Garain Avishek; Singh Pawan Kumar; Sarkar Ram

首页> 外文期刊>Expert systems with applications >FuzzyGCP: A deep learning architecture for automatic spoken language identification from speech signals

【24h】

FuzzyGCP: A deep learning architecture for automatic spoken language identification from speech signals

机译：fuzzygcp：一种深度学习架构，用于语音信号的自动语言识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

In this modern era, language has no geographic boundary. Therefore, for developing an automated system for search engines using audio, tele-medicine, emergency service via phone etc., the first and foremost requirement is to identify the language. The fundamental difficulty of automatic speech recognition is that the speech signals vary significantly due to different speakers, speech variation, language variation, age and sex wise voice modulation variation, contents and acoustic conditions and so on. In this paper, we have proposed a deep learning based ensemble architecture, called FuzzyGCP, for spoken language identification from speech signals. This architecture combines the classification principles of a Deep Dumb Multi Layer Perceptron (DDMLP), Deep Convolutional Neural Network (DCNN) and Semi-supervised Generative Adversarial Network (SSGAN) to increase the precision to maximum and finally applies Ensemble learning using Choquet integral to predict the final output, i.e., the language class. We have evaluated our model on four standard benchmark datasets comprising of two Indic language datasets and two foreign language datasets. Irrespective of the languages, the F1-score of the proposed language identification model is as high as 98% in MaSS dataset and worst performance is that of 67% on the VoxForge dataset which is much better compared to maximum of 44% by state-of-the-art models on multi-class classification. The link to the source code of our model is available here.

机译：在这个现代化的时代，语言没有地理边界。因此，为了开发用于使用音频，电信，紧急服务通过电话等的搜索引擎的自动化系统，首先和最重要的要求是识别语言。自动语音识别的根本难度是由于不同的扬声器，语音变化，语言变化，年龄和性别语音调制变化，内容和声学条件等，语音信号由于不同而导致的语音信号很大。在本文中，我们提出了一种基于深度学习的集合体系结构，称为FuzzyGCP，用于语音信号的口语识别。该架构结合了深度哑岩多层Perceptron（DDMLP），深卷积神经网络（DCNN）和半监督生成的对冲网络（SSGAN）的分类原则，以提高最大精度，最后使用Choquet积分来预测集合学习最终输出，即语言类。我们在四个标准基准数据集中评估了我们的模型，包括两个指示语言数据集和两个外语数据集。无论语言如何，拟议语言识别模型的F1分数高达98％，大量数据集和最差的性能是Voxforge数据集中的67％，而最大为44％ - 多级分类的艺术模型。此处提供了我们模型源代码的链接。

著录项

来源
《Expert systems with applications》 |2021年第4期|114416.1-114416.14|共14页
作者
Garain Avishek; Singh Pawan Kumar; Sarkar Ram;
展开▼
作者单位

Jadavpur Univ Dept Comp Sci & Engn Kolkata 700032 W Bengal India;

Jadavpur Univ Dept Informat Technol Kolkata 700106 W Bengal India;

Jadavpur Univ Dept Comp Sci & Engn Kolkata 700032 W Bengal India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Spoken language identification; Speech signal; Deep learning; GAN; DNN; MLP; Ensemble learning; Choquet integral; Spectrogram;

机译：口语识别;语音信号;深入学习;GaN;DNN;MLP;集合学习;Choquet积分;谱图;

FuzzyGCP: A deep learning architecture for automatic spoken language identification from speech signals

摘要

著录项

相关主题

期刊订阅