首页> 外文会议>International Speech Communication Association >Investigating Morphological Decomposition for Transcription of Arabic Broadcast News and Broadcast Conversation Data
【24h】

Investigating Morphological Decomposition for Transcription of Arabic Broadcast News and Broadcast Conversation Data

机译:研究阿拉伯广播新闻和广播对话数据转录形态分解

获取原文

摘要

One of the challenges of Arabic speech recognition is to deal with the huge lexical variety. Morphological de-composition has been proposed to address this problem by increasing lexical coverage, thereby reducing errors that are due to words that are unknown to the system. In our previous attempts to develop an Arabic speech-to-text (STT) transcription system with morphological decom-position, an increase in word error rate of about 2% ab-solute was observed relative to a comparable word based system. Based on an error analysis and a comparison of our approach with that of other sites, two modifications were made. The first modification was to not decompose the most frequent words; and the second to not decom-pose the prefix 'Al' for words starting with a solar con-sonant since due to assimilation with the following con-sonant, deletion of the prefix was one of the most fre-quent errors. Comparable recognition performance was achieved using word-based and morphologically decom-posed language models, and since the errors made by the systems are different, combining the two gave a perfor-mance gain.
机译:阿拉伯语演讲识别的挑战之一是应对巨大的词汇品种。已经提出了形态学去组合来通过增加词汇覆盖来解决这个问题,从而减少了由于系统未知的单词而导致的错误。在我们之前尝试开发具有形态学校准位置的阿拉伯语语音到文本(STT)转录系统,相对于基于类似的单词的系统,观察到约2%AB溶质的字误差率的增加。根据错误分析和我们对其他网站的方法的比较,进行了两种修改。第一个修改是不分解最常见的单词;而第二个以解答前缀'al'是以太阳能的声音开始的单词,因为由于与以下配置的同化,前缀删除是最令人震惊的错误之一。使用基于Word的和形态学的语言模型实现了可比的识别性能,并且由于系统所做的错误是不同的,因此组合两个给出了一个穿孔的增益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号