首页> 外文会议>European conference on ambient intelligence >Spoken Language Identification Using ConvNets
【24h】

Spoken Language Identification Using ConvNets

机译:使用ConvNets进行口语识别

获取原文

摘要

Language Identification (LI) is an important first step in several speech processing systems. With a growing number of voice-based assistants, speech LI has emerged as a widely researched field. To approach the problem of identifying languages, we can either adopt an implicit approach where only the speech for a language is present or an explicit one where text is available with its corresponding transcript. This paper focuses on an implicit approach due to the absence of transcriptive data. This paper benchmarks existing models and proposes a new attention based model for language identification which uses log-Mel spectrogram images as input. We also present the effectiveness of raw waveforms as features to neural network models for LI tasks. For training and evaluation of models, we classified six languages (English, French, German, Spanish, Russian and Italian) with an accuracy of 95.4% and four languages (English, French, German, Spanish) with an accuracy of 96.3% obtained from the VoxForge dataset. This approach can further be scaled to incorporate more languages.
机译:语言识别(LI)是几种语音处理系统中重要的第一步。随着基于语音的助手数量的增长,语音LI已成为一个广泛研究的领域。为了解决识别语言的问题,我们可以采用一种隐式方法,即只显示一种语言的语音,也可以采用一种显式的方法,其中可以使用带有相应转录本的文本。由于缺少转录数据,本文着重于隐式方法。本文对现有模型进行了基准测试,并提出了一种新的基于注意力的语言识别模型,该模型使用log-Mel光谱图图像作为输入。我们还介绍了原始波形作为LI任务的神经网络模型的功能的有效性。为了对模型进行训练和评估,我们对六种语言(英语,法语,德语,西班牙语,俄语和意大利语)进行了分类,其准确度为95.4%,对四种语言(英语,法语,德语,西班牙语)进行了分类,其准确度为96.3%。 VoxForge数据集。该方法可以进一步扩展以合并更多的语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号