首页> 外文会议>21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics(COLING.ACL 2006) vol.2 >Chatting is a popular communication media on the Internet via ICQ, chat rooms, etc. Chat language is different from natural language due to its anomalous and dynamic natures, which renders conventional NLP tools inapplicable. The dynamic problem is enormo
【24h】

Chatting is a popular communication media on the Internet via ICQ, chat rooms, etc. Chat language is different from natural language due to its anomalous and dynamic natures, which renders conventional NLP tools inapplicable. The dynamic problem is enormo

机译:聊天是通过ICQ,聊天室等在Internet上流行的通信媒体。聊天语言由于其异常和动态性质而不同于自然语言,这使得常规NLP工具不适用。动态问题是不确定的

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a discriminative pruning method of n-gram language model for Chinese word segmentation. To reduce the size of the language model that is used in a Chinese word segmentation system, importance of each bigram is computed in terms of discriminative pruning criterion that is related to the performance loss caused by pruning the bigram. Then we propose a step-by-step growing algorithm to build the language model of desired size. Experimental results show that the discriminative pruning method leads to a much smaller model compared with the model pruned using the state-of-the-art method. At the same Chinese word segmentation F-measure, the number of bigrams in the model can be reduced by up to 90%. Correlation between language model perplexity and word segmentation performance is also discussed.
机译:本文提出了一种用于汉语分词的n元语法模型的判别删减方法。为了减小在中文分词系统中使用的语言模型的大小,根据判别式修剪标准来计算每个双字母组的重要性,该判别式修剪标准与修剪双字母组导致的性能损失有关。然后,我们提出了逐步增长算法,以构建所需大小的语言模型。实验结果表明,与使用最新方法修剪的模型相比,区分修剪方法导致的模型要小得多。在相同的中文分词F度量下,模型中的双字母组数量最多可以减少90%。还讨论了语言模型困惑度和分词性能之间的相关性。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号