首页> 外文会议>Information Retrieval Technology; Lecture Notes in Computer Science; 4182 >Statistical Behavior Analysis of Smoothing Methods for Language Models of Mandarin Data Sets
【24h】

Statistical Behavior Analysis of Smoothing Methods for Language Models of Mandarin Data Sets

机译:普通话数据集语言模型平滑方法的统计行为分析

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we discuss the properties of statistical behavior and entropies of three smoothing methods; two well-known and one proposed smoothing method will be used on three language models in Mandarin data sets. Because of the problem of data sparseness, smoothing methods are employed to estimate the probability for each event (including all the seen and unseen events) in a language model. A set of properties used to analyze the statistical behaviors of three smoothing methods are proposed. Our proposed smoothing methods comply with all the properties. We implement three language models in Mandarin data sets and then discuss the entropy. In general, the entropies of proposed smoothing method for three models are lower than that of other two methods.
机译:在本文中,我们讨论了三种平滑方法的统计行为和熵的性质。在普通话数据集中的三种语言模型上将使用两种众所周知的方法和一种提出的平滑方法。由于数据稀疏性的问题,使用平滑方法来估计语言模型中每个事件(包括所有可见和未见事件)的概率。提出了一组用于分析三种平滑方法的统计行为的属性。我们建议的平滑方法符合所有属性。我们在普通话数据集中实现三种语言模型,然后讨论熵。通常,三种模型所提出的平滑方法的熵均低于其他两种方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号