首页> 外文会议>Annual conference of the International Speech Communication Association;INTERSPEECH 2010 >SEAME: a Mandarin-English Code-switching Speech Corpus in South-East Asia
【24h】

SEAME: a Mandarin-English Code-switching Speech Corpus in South-East Asia

机译:SEAME:东南亚的普通话-英语代码转换语音语料库

获取原文

摘要

In Singapore and Malaysia, people often speak a mixture of Mandarin and English within a single sentence. We call such sentences intra-sentential code-switch sentences. In this paper, we report on the development of a Mandarin-English code-switching spontaneous speech corpus: SEAME. The corpus is developed as part of a multilingual speech recognition project and will be used to examine how Mandarin-English code-switch speech occurs in the spoken language in South-East Asia. Additionally, it can provide insights into the development of large vocabulary continuous speech recognition (LVCSR) for code-switching speech. The corpus collected consists of intra-sentential code-switching utterances that are recorded under both interview and conversational settings. This paper describes the corpus design and the analysis of collected corpus.
机译:在新加坡和马来西亚,人们经常在一个句子中说普通话和英语。我们称这类句子为句内代码转换句子。在本文中,我们报告了普通话-英语代码转换自发语音语料库:SEAME的发展。该语料库是多语言语音识别项目的一部分,将用于检查东南亚在口语中普通话-英语代码转换语音是如何发生的。此外,它可以为代码转换语音的大词汇量连续语音识别(LVCSR)的开发提供见解。收集的语料库由在面试和对话设置下记录的句内代码转换语音组成。本文介绍了语料库的设计和收集语料库的分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号