...
首页> 外文期刊>Computer speech and language >Human and computer recognition of regional accents and ethnic groups from British English speech
【24h】

Human and computer recognition of regional accents and ethnic groups from British English speech

机译:人和计算机从英式英语语音中识别区域口音和种族

获取原文
获取原文并翻译 | 示例
           

摘要

The paralinguistic information in a speech signal includes clues to the geographical and social background of the speaker. This paper is concerned with automatic extraction of this information from a short segment of speech. A state-of-the-art language identification (LID) system is applied to the problems of regional accent recognition for British English, and ethnic group recognition within a particular accent. We compare the results with human performance and, for accent recognition, the 'text dependent' ACCDIST accent recognition measure. For the 14 regional accents of British English in the ABI-1 corpus (good quality read speech), our LID system achieves a recognition accuracy of 89.6%, compared with 95.18% for our best ACCDIST-based system and 58.24% for human listeners. The "Voices across Birmingham" corpus contains significant amounts of telephone conversational speech for the two largest ethnic groups in the city of Birmingham (UK), namely the 'Asian' and 'White' communities. Our LID system distinguishes between these two groups with an accuracy of 96.51 % compared with 90.24% for human listeners. Although direct comparison is difficult, it seems that our LID system performs much better on the standard 12 class NIST 2003 Language Recognition Evaluation task or the two class ethnic group recognition task than on the 14 class regional accent recognition task. We conclude that automatic accent recognition is a challenging task for speech technology, and speculate that the use of natural conversational speech may be advantageous for these types of paralinguistic task.
机译:语音信号中的副语言信息包括有关讲话者的地理和社会背景的线索。本文涉及从一小段语音中自动提取此信息。最新的语言识别(LID)系统适用于英式英语的区域口音识别和特定口音内的种族识别的问题。我们将结果与人类绩效进行比较,并针对口音识别,比较“基于文本的” ACCDIST口音识别措施。对于ABI-1语料库中的14种英式英语口音(高质量的语音朗读),我们的LID系统的识别准确度为89.6%,而我们基于ACCDIST的最佳系统的识别准确度为95.18%,听众的识别准确度为58.24%。 “横跨伯明翰的声音”语料库包含大量的电话交谈语音,用于伯明翰(英国)市中两个最大的族裔群体,即“亚洲”和“白人”社区。我们的LID系统可以区分这两组,准确度为96.51%,而听众的准确度为90.24%。尽管直接比较比较困难,但似乎我们的LID系统在标准的12类NIST 2003语言识别评估任务或两类种族识别任务上比在14类区域重音识别任务上的性能要好得多。我们得出结论,对于语音技术而言,自动重音识别是一项具有挑战性的任务,并推测自然对话语音的使用对于这些类型的副语言任务可能是有利的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号