Large-Scale Language Modeling with Random Forests for Mandarin Chinese Speech-to-Text

机译：普通话语音到文本的随机森林大规模语言建模

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this work the random forest language modeling approach is applied with the aim of improving the performance of the LIMSI, highly competitive, Mandarin Chinese speech-to-text system. The experimental setup is that of the GALE Phase 4 evaluation. This setup is characterized by a large amount of available language model training data (over 3.2 billion segmented words). A conventional unpruned 4-gram language model with a vocabulary of 56K words serves as a baseline that is challenging to improve upon. However moderate perplexity and CER improvements over this model were obtained with a random forest language model. Different random forest training strategies were explored so as to attain the maximal gain in performance and Forest of Random Forest language modeling scheme is introduced.

机译：在这项工作中，采用了随机森林语言建模方法，旨在提高LIMSI，高度竞争的普通话语音到文本系统的性能。实验设置是GALE阶段4评估的设置。这种设置的特点是拥有大量可用的语言模型训练数据（超过32亿个分段词）。具有56K个单词的词汇量的常规未删节4克语言模型用作难以改进的基线。但是，使用随机森林语言模型可以获得对该模型适度的困惑和CER改进。探索了不同的随机森林训练策略，以获得最大的性能提升，并介绍了随机森林语言建模方案。

著录项

来源
《Advances in natural language processing》|2010年|p.269-280|共12页
会议地点 Reykjavik(IS);Reykjavik(IS)
作者
Ilya Oparin; Lori Lamel; Jean-Luc Gauvain;
展开▼
作者单位

LIMSI CNRS, Spoken Language Processing Group, B.P. 133, 91403 Orsay cedex, France;

LIMSI CNRS, Spoken Language Processing Group, B.P. 133, 91403 Orsay cedex, France;

LIMSI CNRS, Spoken Language Processing Group, B.P. 133, 91403 Orsay cedex, France;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序设计、软件工程;
关键词
language modeling; random forest; speech-to-text; ASR; STT; mandarin chinese;

机译：语言建模；随机森林语音转文字ASR； STT；普通话;
入库时间 2022-08-26 14:01:56

相似文献

外文文献
中文文献
专利

1. Continuous Mandarin speech recognition for Chinese language with large vocabulary based on segmental probability model [J] . Shen J.-L. IEE Proceedings. Part K . 1998,第5期

机译：基于分段概率模型的大词汇量汉语连续汉语语音识别
2. Continuous Mandarin speech recognition for Chinese language with large vocabulary based on segmental probability model [J] . J.-L. Shen IEE Proceedings. Part K . 1998,第5期

机译：基于分段概率模型的大词汇量汉语连续汉语语音识别
3. Random forests and the data sparseness problem in language modeling [J] . Peng Xu, Frederick Jelinek Computer speech and language . 2007,第1期

机译：语言建模中的随机森林和数据稀疏问题
4. Large-Scale Language Modeling with Random Forests for Mandarin Chinese Speech-to-Text [C] . Ilya Oparin, Lori Lamel, Jean-Luc Gauvain International Conference on Advances in Natural Language Processing . 2010

机译：随机森林普通话汉语演讲到文本的大规模语言建模
5. Knowledge integration into language models: A random forest approach. [D] . Su, Yi. 2009

机译：知识集成到语言模型中：随机森林方法。
6. Can natural language processing help differentiate inflammatory intestinal diseases in China? Models applying random forest and convolutional neural network approaches [O] . Yuanren Tong, Keming Lu, Yingyun Yang, 2020

机译：自然语言加工可以有助于区分中国的炎症性肠疾病吗？应用随机森林和卷积神经网络方法的模型
7. Large-scale random forest language models for speech recognition [O] . Yi Su, Frederick Jelinek, Sanjeev Khudanpur 2007

机译：用于语音识别的大规模随机森林语言模型

Large-Scale Language Modeling with Random Forests for Mandarin Chinese Speech-to-Text

摘要

著录项

相似文献

相关主题

期刊订阅