首页> 外文期刊>Automatic Documentation and Mathematical Linguistics >Automatic Text Classification Method Based on Zipf's Law
【24h】

Automatic Text Classification Method Based on Zipf's Law

机译:基于Zipf定律的文本自动分类方法

获取原文
获取原文并翻译 | 示例

摘要

This paper describes a method for automatic text classification based on analysing the deviation of the word distribution from Zipf's law, combined with the zonal data-processing approach. Deviation is understood as the difference between the actual numerical score of a word and its score according to Zipf's law. The proposed method involves the division of input and reference texts into J_0, J_1 and J_2 zones, and the creation of a numerical series using the words that are contained in the J_0 zone. The constructed numerical series shows the difference between the real scores of words and the scores calculated according to Zipf's law. The proposed method can significantly reduce text dimensionality and thus improve the running speed of automatic text classification.
机译:本文在分析单词分布与Zipf定律的偏差的基础上,结合区域数据处理方法,提出了一种自动文本分类的方法。偏差被理解为一个单词的实际数字分数与其根据Zipf定律的分数之间的差。所提出的方法涉及将输入文本和参考文本划分为J_0,J_1和J_2区域,并使用J_0区域中包含的单词创建数字序列。构造的数值序列显示了单词的实际分数与根据Zipf定律计算出的分数之间的差异。所提出的方法可以大大降低文本的维数,从而提高文本自动分类的运行速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号