Language Model Based on Word Clustering

机译：基于词聚类的语言模型

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Category-based statistic language model is an important method to solve the problem of sparse data. But there are two bottlenecks about this model: (1) the problem of word clustering, it is hard to find a suitable clustering method that has good performance and not large amount of computation. (2) class-based method always loses some prediction ability to adapt the text of different domain. The authors try to solve above problems in this paper. This paper presents a definition of word similarity by utilizing mutual information. Based on word similarity, this paper gives the definition of word set similarity. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance. At the same time, this paper presents a new method to create the vari-gram model.

机译：基于类别的统计语言模型是解决数据稀疏问题的重要方法。但是该模型存在两个瓶颈：（1）词聚类的问题，很难找到一种性能好，运算量不大的合适聚类方法。（2）基于类的方法总是失去一定的预测能力以适应不同领域的文本。作者试图解决本文中的上述问题。本文利用互信息提出了词语相似度的定义。基于词相似度，本文给出了词集相似度的定义。实验表明，基于相似度的词聚类算法在速度和性能上均优于传统的贪婪聚类方法。同时，本文提出了一种新的创建变异图模型的方法。

著录项

来源
《Pacific Asia Conference on Language, Information and Computation; 20061101-03; Wuhan(CN)》|2006年|P.394-397|共4页
会议地点 Wuhan(CN)
作者
Lichi Yuan;
展开▼
作者单位

School of Information Technology, Jiangxi University of Finance Economics, Nanchang 330013, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;
关键词
word clustering; statistical language model; vari-gram language model;

机译：词聚类；统计语言模型；变异语法模型;

相似文献

外文文献
中文文献
专利

1. A LANGUAGE MODEL BASED ON SEMANTICALLY CLUSTERED WORDS IN A CHINESE CHARACTER RECOGNITION SYSTEM [J] . Lee HJ., Tung CH. Pattern Recognition: The Journal of the Pattern Recognition Society . 1997,第8期

机译：基于汉字字符识别系统中词类聚类语言的语言模型
2. Comparison of Performance of Enhanced Morpheme-based Language Model with Different Word-based Language Models for Improving the Performance of Tamil Speech Recognition System [J] . S. SARASWATHI, T.V. GEETHA ACM transactions on Asian language information processing . 2007,第3期

机译：增强的基于词素的语言模型与不同的基于单词的语言模型的性能比较，以提高泰米尔语语音识别系统的性能
3. RNN language model with word clustering and class-based output layer [J] . Yongzhe Shi, Wei-Qiang Zhang, Jia Liu, EURASIP Journal on Audio, Speech, and Music Processing . 2013,第1期

机译：具有词聚类和基于类的输出层的RNN语言模型
4. Combining word- and class-based language models: A comparative study in several languages using automatic and manual word-clustering techniques [C] . G. Maltese, P. Bravetti, H. Crepy, European conference on speech communication and technology . 2001

机译：基于单词和基于类的语言模型：使用自动和手动字聚类技术的多种语言的比较研究
5. Connecting Documents, Words, and Languages Using Topic Models [D] . Yang, Weiwei. 2019

机译：使用主题模型连接文档，单词和语言
6. Word-level language modeling for P300 spellers based on discriminative graphical models [O] . Jaime F Delgado Saa, Adriana de Pesters, Dennis McFarland, -1

机译：基于区别性图形模型的P300拼写单词级语言建模
7. Morph-Based Speech Recognition and Modeling of Out-of-Vocabulary Words Across Languages [O] . Mathias Creutz, Teemu Hirsimäki, Mikko Kurimo, 2010

机译：基于变形的语音识别和跨语言词汇外模型的建模

Language Model Based on Word Clustering

摘要

著录项

相似文献

相关主题

期刊订阅