首页> 外文会议>IEEE Interantional Conference on Systems, Man and Cybernetics >Chinese Web page classification based on statistics word segmentation
【24h】

Chinese Web page classification based on statistics word segmentation

机译:基于统计词分割的中文网页分类

获取原文

摘要

Word segmentation is an important step in Chinese natural language processing. This paper explores the problem of classifying Chinese web pages based on statistical word segmentation. We first construct a Chinese word list of binary words automatically from training Chinese web pages. Then the texts in testing Chinese web pages are segmented with the word list. Web pages are classified based on the segmentation results. Experiments show that statistical word segmentation can efficiently improve classification precision. Based on the experiment results, we analyze the influence of statistical word segmentation on Chinese web page classification. Single Chinese characters and words play different roles in web page classification and the reason for the difference is also analyzed.
机译:字分割是中国自然语言处理的重要一步。本文探讨了基于统计词分割对中文网页分类的问题。我们首先从培训中文网页自动构建中文单词的二进制单词列表。然后测试中文网页中的文本被单词列表进行分段。网页基于分段结果分类。实验表明,统计词分割可以有效地提高分类精度。基于实验结果,我们分析了统计词分割对中文网页分类的影响。单个汉字和单词在网页分类中发挥不同的角色,并且还分析了差异的原因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号