首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >An Improved Framework for Recognizing Highly Imbalanced Bilingual Code-Switched Lectures with Cross-Language Acoustic Modeling and Frame-Level Language Identification
【24h】

An Improved Framework for Recognizing Highly Imbalanced Bilingual Code-Switched Lectures with Cross-Language Acoustic Modeling and Frame-Level Language Identification

机译:跨语言声学建模和框架级语言识别的高度识别双语代码转换演讲的改进框架

获取原文
获取原文并翻译 | 示例
       

摘要

This paper considers the recognition of a widely observed type of bilingual code-switched speech: the speaker speaks primarily the host language (usually his native language), but with a few words or phrases in the guest language (usually his second language) inserted in many utterances of the host language. In this case, not only the languages are switched back and forth within an utterance so the language identification is difficult, but much less data are available for the guest language, which results in poor recognition accuracy for the guest language part. Unit merging approaches on three levels of acoustic modeling (triphone models, HMM states and Gaussians) have been proposed for cross-lingual data sharing for such highly imbalanced bilingual code-switched speech. In this paper, we present an improved overall framework on top of the previously proposed unit merging approaches for recognizing such code-switched speech. This includes unit recovery for reconstructing the identity for units of the two languages after being merged, unit occupancy ranking to offer much more flexible data sharing between units both across languages and within the language based on the accumulated occupancy of the HMM states, and estimation of frame-level language posteriors using blurred posteriorgram features (BPFs) to be used in decoding. We also present a complete set of experimental results comparing all approaches involved for a real-world application scenario under unified conditions, and show very good improvement achieved with the proposed approaches.
机译:本文考虑了对广泛使用的双语代码转换语音的识别:说话者主要说出宿主语言(通常是他的母语),但在客体语言(通常是他的第二种语言)中插入了一些单词或短语主持人语言的许多说法。在这种情况下,不仅在语音中来回切换语言,所以语言识别很困难,而且可用于来宾语言的数据少得多,这导致来宾语言部分的识别精度差。对于这种高度不平衡的双语代码转换语音,已经提出了在三个层次的声学建模(三音模型,HMM状态和高斯模型)上的单元合并方法,用于跨语言数据共享。在本文中,我们在先前提出的用于识别这种代码转换语音的单元合并方法的基础上,提出了一种改进的总体框架。这包括单位恢复,用于在合并后重建两种语言的单位的身份;单位占用率排名,根据HMM状态的累积占用率,提供跨语言的单位和语言内部单位之间更灵活的数据共享,以及对使用模糊后验特征(BPF)的帧级语言后代用于解码。我们还提供了一套完整的实验结果,比较了在统一条件下用于实际应用场景的所有方法,并显示了所提出方法的良好改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号