首页> 外文会议>International Multidisciplinary Information Technology and Engineering Conference >A Multilingual ASR of Sepedi-English Code-Switched Speech for Automatic Language Identification
【24h】

A Multilingual ASR of Sepedi-English Code-Switched Speech for Automatic Language Identification

机译:用于自动语言识别的英语-英语码转换语音的多语言ASR

获取原文

摘要

This paper presents an integration of multilingual speech recognition into language identification (LID) for code-switched speech using phonotatic features as language information. A multilingual speech recognition system converts the spoken utterances into occurrences of phone sequences. The hidden Markov models (HMMs) are employed to build a multilingual acoustic models that can handle multiple languages within an utterance. We propose two phoneme clustering methods to determine the phoneme similarities among the target languages. A supervised machine learning technique is employed to learn the language transition of the phonotactic information given the phoneme sequences. The classification decision is made by support vector machines (SVM) technique which classifies language identity given the likelihood scores based on the phoneme occurrence segments. We experiments were performed using a mixed language speech corpus for Sepedi and English. We evaluate the ASR-LID system measuring the performance of the phone error rate (PER) and the LID classification accuracy portions separately. We obtained a lower PER on a system that employed data-driven phoneme clustering method which was modelled with 32-Gaussian mixtures per state. The proposed multilingual ASR-LID framework has achieved an acceptable recognition and classification accuracy on code-switched and monolingual speech respectively.
机译:本文介绍了将多语种语音识别集成到以语音信息作为语言信息的代码转换语音的语言识别(LID)中。多语言语音识别系统将语音转换为电话序列的出现。隐藏的马尔可夫模型(HMM)用于构建可以在语音中处理多种语言的多语言声学模型。我们提出了两种音素聚类方法,以确定目标语言之间的音素相似性。在给定音素序列的情况下,采用监督式机器学习技术来学习音位信息的语言转换。通过支持向量机(SVM)技术做出分类决策,该技术基于音素出现片段,根据给定的可能性得分,对语言身份进行分类。我们使用针对Sepedi和英语的混合语言语音语料库进行了实验。我们评估ASR-LID系统,分别测量电话错误率(PER)和LID分类准确性部分的性能。我们在采用数据驱动音素聚类方法的系统上获得了较低的PER,该聚类方法以每个状态32个高斯混合模型进行建模。所提出的多语言ASR-LID框架分别在代码转换语音和单语言语音上实现了可接受的识别和分类精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号