首页> 外文期刊>Expert systems with applications >FuzzyGCP: A deep learning architecture for automatic spoken language identification from speech signals
【24h】

FuzzyGCP: A deep learning architecture for automatic spoken language identification from speech signals

机译:fuzzygcp:一种深度学习架构,用于语音信号的自动语言识别

获取原文
获取原文并翻译 | 示例
           

摘要

In this modern era, language has no geographic boundary. Therefore, for developing an automated system for search engines using audio, tele-medicine, emergency service via phone etc., the first and foremost requirement is to identify the language. The fundamental difficulty of automatic speech recognition is that the speech signals vary significantly due to different speakers, speech variation, language variation, age and sex wise voice modulation variation, contents and acoustic conditions and so on. In this paper, we have proposed a deep learning based ensemble architecture, called FuzzyGCP, for spoken language identification from speech signals. This architecture combines the classification principles of a Deep Dumb Multi Layer Perceptron (DDMLP), Deep Convolutional Neural Network (DCNN) and Semi-supervised Generative Adversarial Network (SSGAN) to increase the precision to maximum and finally applies Ensemble learning using Choquet integral to predict the final output, i.e., the language class. We have evaluated our model on four standard benchmark datasets comprising of two Indic language datasets and two foreign language datasets. Irrespective of the languages, the F1-score of the proposed language identification model is as high as 98% in MaSS dataset and worst performance is that of 67% on the VoxForge dataset which is much better compared to maximum of 44% by state-of-the-art models on multi-class classification. The link to the source code of our model is available here.
机译:在这个现代化的时代,语言没有地理边界。因此,为了开发用于使用音频,电信,紧急服务通过电话等的搜索引擎的自动化系统,首先和最重要的要求是识别语言。自动语音识别的根本难度是由于不同的扬声器,语音变化,语言变化,年龄和性别语音调制变化,内容和声学条件等,语音信号由于不同而导致的语音信号很大。在本文中,我们提出了一种基于深度学习的集合体系结构,称为FuzzyGCP,用于语音信号的口语识别。该架构结合了深度哑岩多层Perceptron(DDMLP),深卷积神经网络(DCNN)和半监督生成的对冲网络(SSGAN)的分类原则,以提高最大精度,最后使用Choquet积分来预测集合学习最终输出,即语言类。我们在四个标准基准数据集中评估了我们的模型,包括两个指示语言数据集和两个外语数据集。无论语言如何,拟议语言识别模型的F1分数高达98%,大量数据集和最差的性能是Voxforge数据集中的67%,而最大为44% - 多级分类的艺术模型。此处提供了我们模型源代码的链接。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号