【24h】

Language Model Based on Word Clustering

机译:基于词聚类的语言模型

获取原文
获取原文并翻译 | 示例

摘要

Category-based statistic language model is an important method to solve the problem of sparse data. But there are two bottlenecks about this model: (1) the problem of word clustering, it is hard to find a suitable clustering method that has good performance and not large amount of computation. (2) class-based method always loses some prediction ability to adapt the text of different domain. The authors try to solve above problems in this paper. This paper presents a definition of word similarity by utilizing mutual information. Based on word similarity, this paper gives the definition of word set similarity. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance. At the same time, this paper presents a new method to create the vari-gram model.
机译:基于类别的统计语言模型是解决数据稀疏问题的重要方法。但是该模型存在两个瓶颈:(1)词聚类的问题,很难找到一种性能好,运算量不大的合适聚类方法。 (2)基于类的方法总是失去一定的预测能力以适应不同领域的文本。作者试图解决本文中的上述问题。本文利用互信息提出了词语相似度的定义。基于词相似度,本文给出了词集相似度的定义。实验表明,基于相似度的词聚类算法在速度和性能上均优于传统的贪婪聚类方法。同时,本文提出了一种新的创建变异图模型的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号