首页> 外文会议>IEEE International Conference on Computer-Aided Industrial Design Conceptual Design >Developing a method to build Japanese speech recognition system based on 3-gram language model expansion with Google database
【24h】

Developing a method to build Japanese speech recognition system based on 3-gram language model expansion with Google database

机译:通过Google数据库开发基于3克语言模型扩展构建日语语音识别系统的方法

获取原文

摘要

We have developed a method to build a Japanese automatic speech recognition (ASR) system based on 3-gram language model expansion with the Google database. Our aim is to enhance the recognition accuracy of ASR systems based on the 3-gram language model, even in cases where the language model is trained using short text segments. We investigate a practical approach to expanding language models by using 3-gram information from external web documents. In addition, we filter 3-gram entries on the basis of term frequency-inverse document frequency (TF-IDF) scores and the output of the Yahoo! web API to prevent the unnecessary addition of redundant or irrelevant 3-gram entries. In the experiments, we achieved an improvement of 0.71% in the word error rate and proved that the recognition accuracy can be improved by combining the proposed method and the traditional back-off smoothing technique without any costs being incurred in collecting additional text for training the model.
机译:我们开发了一种基于3克语言模型扩展的日本自动语音识别(ASR)系统的方法,使用Google数据库构建了3克语言模型。我们的目的是提高基于3克语言模型的ASR系统的识别准确性,即使在使用短文本段训练语言模型的情况下也是如此。我们调查通过使用外部Web文档的3克信息来扩展语言模型的实用方法。此外,我们基于术语频率 - 逆文档频率(TF-IDF)分数和雅虎的输出来过滤3克条目。 Web API可防止不必要地添加冗余或无关的3克条目。在实验中,我们以字错误率实现了0.71%的提高,并证明了通过组合所提出的方法和传统的退避平滑技术,可以提高识别准确度,而不会在收集其他文本以进行培训模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号