Text conditioning and statistical language modeling for Romanian language

机译：罗马尼亚语的文本条件和统计语言建模

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we present a synthesis of the theoretical fundamentals and some practical aspects of statistical (n-gram) language modeling which is a main part of a large vocabulary statistical speech recognition system. There are presented the unigram, bigram and trigram language models as well as the Good-turing estimator based Katz back-off smoothing algorithm. There is also described the perplexity measure of a language model used for evaluation. The practical experiments were made on Romanian constitution corpus. There are also presented the text normalization steps before the language model generation. The results are ARPA-MIT format language models for Romanian language. The models were tested and compared using perplexity measure. Finally some comparisons were made between Romanian and English language modeling and conclusions are drawn.

机译：在本文中，我们提出了统计（n-gram）语言建模的理论基础和一些实际方面的综合信息，这是大型词汇统计语音识别系统的主要部分。介绍了unigram，bigram和trigram语言模型以及基于Good-turing估计器的Katz补偿平滑算法。还描述了用于评估的语言模型的困惑度度量。在罗马尼亚宪法语料库上进行了实际实验。还介绍了语言模型生成之前的文本规范化步骤。结果是针对罗马尼亚语言的ARPA-MIT格式语言模型。测试了模型并使用困惑度度量进行了比较。最后对罗马尼亚和英语语言模型进行了一些比较，并得出了结论。

著录项

来源
《Speech Technology and Human-Computer Dialogue, 2009. SpeD '09》|2009年|1-5|共5页
会议地点 Constanta(RO);Constanta(RO)
作者
Domokos J.; Toderean G.; Buza O.;
展开▼
作者单位

Commun. Dept., Tech. Univ. of Cluj-Napoca, Cluj-Napoca, Romania;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
hidden Markov models; natural language processing; smoothing methods; speech recognition; speech synthesis; statistical analysis; text analysis; vocabulary; ARPA-MIT format language model; Good-turing estimator; Katz back-off smoothing algorithm; Romanian constitution corpus; Romanian language; hidden Markov model; perplexity measure; statistical language modeling; text conditioning tool; vocabulary statistical speech recognition system; ARPA-MIT language model format; Romanian statistical language modeling; n-gram;

机译：隐马尔可夫模型；自然语言处理；平滑方法；语音识别；语音合成；统计分析；文本分析；词汇； ARPA-MIT格式语言模型； Good-turing估计器； Katz后退平滑算法；罗马尼亚宪法语料库；罗马尼亚语言隐马尔可夫模型困惑度测度统计语言建模文本条件工具词汇统计语音识别系统ARPA-MIT语言模型格式罗马尼亚统计语言建模n-gram;

相似文献

外文文献
中文文献
专利

1. Lexicalized and Statistical Parsing of Natural Language Text in Tamil using Hybrid Language Models [J] . M. SELVAM, A. M. NATARAJAN, R. THANGARAJAN WSEAS Transactions on Computers . 2008,第8期

机译：使用混合语言模型对泰米尔语中的自然语言文本进行词汇化和统计分析
2. An empirical study of statistical language models: n-gram language models vs. neural network language models [J] . Freha Mezzoudj, Abdelkader Benyettou International Journal of Innovative Computing and Applications . 2018,第4期

机译：统计语言模型的实证研究：n-gram语言模型与神经网络语言模型
3. Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages [J] . ArnarThor Jensson, Koji Iwano, Sadaoki Furui EURASIP journal on audio, speech, and music processing . 2009,第1期

机译：使用机器翻译的文本对资源不足的语言进行语言模型自适应
4. Text Conditioning and Statistical Language Modeling for Romanian Language [C] . Jozsef DOMOKOS, Gavril TODEREAN, Ovidiu BUZA, Conference on Speech Technology and Human - Computer Dialogue . 2009

机译：罗马尼亚语的文本调节与统计语言建模
5. Language-independent text learning with statistical n-gram language models. [D] . Peng, Fuchun. 2003

机译：统计n-gram语言模型的独立于语言的文本学习。
6. Simulating Language-specific and Language-general Effects in a Statistical Learning Model of Chinese Reading [O] . Jianfeng Yang, Bruce D. McCandliss, Hua Shu, -1

机译：汉语统计学习模型中模拟特定语言和语言的效果
7. Arabic text recognition of printed manuscripts. Efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing. [O] . Al-Muhtaseb Husni Abdulghani 2010

机译：印刷品的阿拉伯文字识别。使用隐马尔可夫模型，Bigram统计语言模型和后处理可有效识别离线印刷的阿拉伯文本。

Text conditioning and statistical language modeling for Romanian language

摘要

著录项

相似文献

相关主题

期刊订阅