首页> 外文会议>International Conference on Audio, Language and Image Processing >Adaptive compression-based models of Chinese text
【24h】

Adaptive compression-based models of Chinese text

机译:基于自适应压缩的中文文本模型

获取原文

摘要

Large alphabet languages such as Chinese present different problems for language modelling compared to small alphabet languages such as English. In this paper, we describe adaptive models of Chinese text based on the Partial Predictive Match (PPM) text compression scheme that learns the language as the text is processed sequentially. We describe several character-based, word-based and part-of-speech (POS) based variants of PPM that achieve significant improvements in compression rate over existing models. Interestingly, results for Chinese text contrast that achieved for English text, with character-based models outperforming the word and POS based models rather than the other way round. We then explore how well these models perform at the task of Chinese word segmentation.
机译:与诸如英语的小字母语言相比,诸如中文的大字母语言在语言建模方面存在不同的问题。在本文中,我们描述了基于部分预测匹配(PPM)文本压缩方案的中文文本自适应模型,该模型在按顺序处理文本时学习该语言。我们描述了几种基于字符,基于单词和词性(POS)的PPM变体,它们在压缩率上比现有模型有显着提高。有趣的是,中文文本的结果与英语文本相比有所不同,基于字符的模型优于基于单词的模型和基于POS的模型,而不是相反。然后,我们探索这些模型在中文分词任务中的表现如何。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号