首页> 外文会议>Insternational Joint Conference on Natural Language Processing >Comparing Entropies within the Chinese Language
【24h】

Comparing Entropies within the Chinese Language

机译:比较汉语内的熵

获取原文
获取外文期刊封面目录资料

摘要

Using a large synchronous Chinese corpus, we show how word and character entropy variations exhibit interesting differences in terms of time and space for different Chinese speech communities. We find that word entropy values are affected by the quality of the segmentation process. We also note that word entropies can be affected by proper nouns, which is the most volatile segment of the stable lexicon of the language. Our word and character entropy results provide interesting comparison with the earlier results and the first-ever average joint character entropies (a.k.a. entropy rates) of Chinese up to order 20 provided by us indicate that the limits of the conditional character entropies of Chinese for the different speech communities should be about 1 (or less). This invites questions on whether early convergence of character entropies would also entail word entropy convergence.
机译:使用大型同步语料库,我们展示了如何在不同中国语音社区的时间和空间方面表现出有趣的差异。 我们发现Word熵值受分割过程质量的影响。 我们还注意到,Word Entopies可以受到适当名词的影响,这是语言稳定词典的最挥发性部分。 我们的单词和字符熵结果与早期的结果和汉语的第一个平均联合角色熵(AKA熵率)提供了有趣的比较,由我们提供的第20条表明汉语有条件特征熵的限制 语音社区应该是大约1(或更少)。 这邀请了关于字符熵早期融合的问题也将需要Word熵趋同。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号