首页> 外文期刊>Computer Science & Information Technology >Phone Clustering Methods for Multilingual Language Identification
【24h】

Phone Clustering Methods for Multilingual Language Identification

机译:用于多语言语言识别的手机聚类方法

获取原文
获取外文期刊封面目录资料

摘要

This paper proposes phoneme clustering methods for multilingual language identification (LID) on a mixed-language corpus. A one-pass multilingual automated speech recognition (ASR) system converts spoken utterances into occurrences of phone sequences. Hidden Markov models were employed to train multilingual acoustic models that handle multiple languages within an utterance. Two phoneme clustering methods were explored to derive the most appropriate phoneme similarities between the target languages. Ultimately a supervised machine learning technique was employed to learn the language transition of the phonotactic information and engage the support vector machine (SVM) models to classify phoneme occurrences. The system performance was evaluated on mixed-language speech corpus for two South African languages (Sepedi and English) using the phone error rate (PER) and LID classification accuracy separately. We show that multilingual ASR which fed directly to the LID system has a direct impact on LID accuracy. Our proposed system has achieved an acceptable phone recognition and classification accuracy in mixed-language speech and monolingual speech (i.e. either Sepedi or English). Data-driven, and knowledge-driven phoneme clustering methods improve ASR and LID for code-switched speech. The data-driven method obtained the PER of 5.1% and LID classification accuracy of 94.5% when the acoustic models are trained with 64 Gaussian mixtures per state.
机译:本文提出了对混合语言语料库的多语言语言识别(盖子)的音素聚类方法。单通式多语言自动化语音识别(ASR)系统将说话的话语转换为电话序列的出现。隐藏的马尔可夫模型被用来培训在话语中处理多种语言的多语言声学模型。探索了两个音素聚类方法,以导出目标语言之间最合适的音素相似之处。最终,采用了监督机器学习技术来学习音素信息的语言转换,并与支持向量机(SVM)模型进行分类,以分类音素出现。使用电话错误率(每个)和盖子分类精度分别对两个南非语言(Sepedi和英语)的混合语言语音语料库进行了评估了系统性能。我们展示了直接馈送到盖子系统的多语言ASR直接影响盖子精度。我们所提出的系统在混合语言言论和单语演讲中取得了可接受的电话识别和分类准确性(即左撇子或英语)。数据驱动,知识驱动的音素群集方法改进ASR和LID用于代码切换语音。当声学模型培训时,数据驱动方法获得5.1%,盖子分类精度为94.5%,培训64个高斯混合的每个状态。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号