Evaluation of the Stochastic Morphosyntactic Language Model on a One Million Word Hungarian Dictation Task

机译：一百万字匈牙利听写任务的随机形态语法语言模型的评估

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this article we evaluate our stochastic morphosyntactic language model (SMLM) on a Hungarian newspaper dictation task that requires modeling over 1 million different word forms. The proposed method is based on the use of morphemes as the basic recognition units and the combination of a morpheme N-gram model and a morphosyntactic language model. The architecture of the recognition system is based on the weighted finite-state transducer (WFST) paradigm. Thanks to the flexible transducer-based architecture, the morphosyntactic component is integrated seamlessly with the basic modules with no need to modify the decoder itself. We compare the phoneme, morpheme, and word error-rates as well as the sizes of the recognition networks in two configurations. In one configuration we use only the N-gram model while in the other we use the combined model. The proposed stochastic morphosyntactic language model decreases the morpheme error rate by between 1.7 and 7.2% relatively when compared to the baseline trigram system. The morpheme error-rate of the best configuration is 18% and the best word error-rate is 22.3%.

机译：在本文中，我们根据匈牙利报纸的听写任务评估随机的形态句法语言模型（SMLM），该任务需要对超过一百万种不同的单词形式进行建模。所提出的方法是基于使用词素作为基本识别单元，并结合了词素N元语法模型和句法语言模型。识别系统的体系结构基于加权有限状态传感器（WFST）范例。得益于基于传感器的灵活架构，语态句法组件可以与基本模块无缝集成，而无需修改解码器本身。我们比较了两种配置中的音素，语素和单词错误率以及识别网络的大小。在一种配置中，我们仅使用N-gram模型，而在另一种配置中，我们使用组合模型。与基线三字母组合系统相比，所提出的随机语态句法语言模型将语素错误率降低了1.7％至7.2％。最佳配置的词素错误率是18％，最佳字词错误率是22.3％。

著录项

来源
《European Conference on Speech Communication and Technology - EUROSPEECH 2003(INTERSPEECH 2003) vol.3; 20030901-04; Geneva(CH)》|2003年|P.2297-2300|共4页
会议地点 Geneva(CH)
作者
Mate Szarvas; Sadaoki Furui;
展开▼
作者单位

Department of Computer Science Tokyo Institute of Technology 2-12-1, Ookayama, Meguro-ku, Tokyo, 152-8552 Japan;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动信息理论;
关键词

相似文献

外文文献
中文文献
专利

1. 语言水平在语法听写任务中对focus on form的影响 [J] . 鲍甜美, 周炳兰, 顾炜俊中国应用语言学：英文版 . 2013,第004期
2. Robust Language Modeling for a Small Corpus of Target Tasks Using Class-Combined Word Statistics and Selective Use of a General Corpus [J] . Yosuke Wada, Norihiko Kobayashi, Tetsunori Kobayashi Systems and Computers in Japan . 2003,第12期

机译：使用类组合词统计和通用语料库的选择性使用，对目标任务的小型语料库进行稳健的语言建模
3. Evaluation of language and communication skills in adult key word signing users with intellectual disability: Advantages of a narrative task [J] . MeurisK., MaesB., ZinkI. Research in developmental disabilities . 2014,第10期

机译：评估智障成人关键字签名用户的语言和沟通能力：叙事任务的优势
4. Effective Word Prediction in Urdu Language Using Stochastic Model [J] . M. Farhan Siddiqui, M. Hassan Sukkur IBA Journal of Computing and Mathematical Sciences . 2018,第2期

机译：基于随机模型的乌尔都语有效单词预测
5. Evaluation of the Stochastic Morphosyntactic Language Model on a One Million Word Hungarian Dictation Task [C] . Mate Szarvas, Sadaoki Furui, International Speech Communication Association(ISCA) European Conference on Speech Communication and Technology - EUROSPEECH . 2003

机译：一百万个词匈牙利语听写任务的随机形态学语言模型评估
6. The strategies that English as a Second Language students use when spelling words in dictation format and when composing stories. [D] . Jones, Anne Marie. 1991

机译：英语作为第二语言的学生在以听写格式拼写单词和编写故事时使用的策略。
7. Neural Correlates of Task-Irrelevant First and Second Language Emotion Words – Evidence from the Emotional Face–Word Stroop Task [O] . Lin Fan, Qiang Xu, Xiaoxi Wang, -1

机译：任务无关的第一语言和第二语言情感词的神经相关性-来自情感面孔-单词Stroop任务的证据
8. Hierarchical Latent Words Language Models for Robust Modeling to Out-Of Domain Tasks [O] . Ryo Masumura, Taichi Asami, Takanobu Oba, 2015

机译：具有稳健建模的分层潜在语言模型到域名任务

Evaluation of the Stochastic Morphosyntactic Language Model on a One Million Word Hungarian Dictation Task

摘要

著录项

相似文献

相关主题

期刊订阅