首页> 外文会议>International Congress on Image and Signal Processing >Error feedback based lexical entity extraction for Chinese language modeling
【24h】

Error feedback based lexical entity extraction for Chinese language modeling

机译:基于错误反馈的汉语建模词汇实体提取

获取原文

摘要

Chinese, which is quite different from western languages, has no standard definition of word. Therefore, choosing suitable lexicon plays an important role in Chinese language modeling. This paper proposes a novel method of constructing the lexicon automatically. Other than depending on statistical measures of text features, this method is directly based on the feedback of errors from the corresponding task, such as phoneme-to-grapheme conversion in this paper. The whole process consists of two iterative phases: selection of individual words from a large manual lexicon and further extraction of compound words based on Phase One. Experiments implemented on phoneme-to-grapheme conversion show that this method can achieve 1.09% and 0.38% absolute reduction in character error rate respectively for Phase One and Phase Two compared with baseline lexicons in the same size generated by the conventional method based on word frequency.
机译:中文与西方语言完全不同,没有标准的单词定义。因此,选择合适的词典在中文建模中起着重要的作用。本文提出了一种自动构建词典的新方法。除了依赖于文本特征的统计度量外,该方法还直接基于相应任务的错误反馈,例如本文中的音素到音素转换。整个过程包括两个迭代阶段:从大型手动词典中选择单个单词,然后根据第一阶段进一步提取复合单词。在音素到音素转换上进行的实验表明,与基于词频的常规方法生成的相同大小的基线词典相比,该方法可以分别实现第一阶段和第二阶段的字符错误率绝对降低1.09%和0.38% 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号