【24h】

Morpheme-based Lexical Modeling for Korean Broadcast News Transcription

机译:基于语素的词汇模型在韩国广播新闻报道中的应用

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we describe our LVCSR system for Korean broadcast news transcription. The main focus here is to find the most proper morpheme-based lexical model for Korean broadcast news recognition to deal with the inflectional flexibilities in Korean. Since there are trade-offs between lexicon size and lexical coverage, and between the length of lexical unit and WER, in our system we analyzed the training corpus to obtain a compact 24k-morpheme-based lexicon with 98.8% coverage. Then, the lexicon is optimized by combining morphemes using statistics of training corpus under monosyllable constraint or maximum length constraint. In experiments, our system reduced the number of monosyllable morphemes which are the most error-prone, from 52% to 29% of the lexicon and obtained 13.24% WER for anchor and 24.97% for reporter.
机译:在本文中,我们描述了用于韩国广播新闻转录的LVCSR系统。这里的主要重点是为朝鲜语广播新闻识别找到最合适的基于词素的词法模型,以应对朝鲜语的屈折灵活性。由于在词典大小和词汇覆盖率之间,以及在词汇单元的长度和WER之间存在取舍,因此在我们的系统中,我们分析了训练语料库,从而获得了具有98.8%覆盖率的紧凑型基于24k词素的词典。然后,通过使用单音节约束或最大长度约束下的训练语料统计来组合词素来优化词典。在实验中,我们的系统将最容易出错的单音节语素数量从词典的52%减少到29%,获得了锚点WER的13.24%和报告子WER的24.97%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号