首页> 外文会议>Proceedings of the International Colloquium on Information Fusion 2007 >Research on Chinese Character Confusion Network Algorithm for LVCSR
【24h】

Research on Chinese Character Confusion Network Algorithm for LVCSR

机译:LVCSR的汉字混淆网络算法研究

获取原文

摘要

In large vocabulary continuous speech recognition, the recognizer outputs using the standard MAP decoding strategy have the minimum sentence error rate, so there is a mismatch between the MAP recognition results and the commonly used performance metric- word error rate. The minimum bayes risk(MBR)decoding strategy can be used to obtain recognition results with minimum WER. One method of MBR decoding is that the word lattice can be transformed into confusion network in order to obtain the hypotheses with minimum WER. According to the characteristic of mandarin, we proposed an Chinese character confusion network generation algorithm based on prevenient works. Firstly, a Chinese word lattice can be produced by using standard mandarin large vocabulary continuous speech recognizer; then the Chinese word lattice is analyzed and handled based on the Chinese language features, and an Chinese character lattice is made; lastly an Chinese character confusion network is produce by implementing alignment in the Chinese character lattice. The experimental results of mandarin large vocabulary continuous speech recognition show that the proposed algorithm yields a lower WER than the MAP recognition and previous two confusion network generation algorithms.
机译:在大词汇量连续语音识别中,使用标准MAP解码策略的识别器输出具有最小的句子错误率,因此MAP识别结果与常用的性能度量词错误率不匹配。最小贝叶斯风险(MBR)解码策略可用于以最小WER获得识别结果。 MBR解码的一种方法是可以将单词晶格转换为混淆网络,以获得具有最小WER的假设。根据普通话的特点,提出了一种基于先验作品的汉字混淆网络生成算法。首先,可以使用标准的普通话大词汇量连续语音识别器来生成中文单词格;然后根据汉语言特征对汉字词格进行分析处理,制作汉字词格。最后通过在汉字格中实现对齐产生汉字混淆网络。普通话大词汇量连续语音识别的实验结果表明,该算法产生的WER低于MAP识别和前两种混淆网络生成算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号