首页> 中文期刊> 《计算机工程与应用》 >结构特征和内容分析融合的博客文章分类

结构特征和内容分析融合的博客文章分类

     

摘要

Aiming at the problems of blog posts contents including multiple themes, unobvious categories ownership and more author's subjective views, structures including tags which are different from texts, common text classification methods not performing well, a new blog posts classification method is presented based on structural characteristics and content analysis. By taking into account blog posts content features, it iterates two different feature extraction methods to enhance the representative ability of feature collection effectively, makes use of main body and title classification. By taking into account the structural features of blog posts, it makes use of tags classification and finally fuses three aspects. The experimental results show that the performance of the improved method is obviously better than common text classification methods.%针对博客文章内容上,包含多个主题,类别归属不明显,多为作者自己主观意见且结构上,包括不同于文本的标签,普通文本分类方法直接应用于博客文章效果不理想的问题,提出一种结构特征和内容分析融合的博客文章分类方法.内容上,通过迭代两种不同特征选择方法,提高特征集代表性的前提下,利用正文,标题两个方面分类.结构上,利用博客文章特有的标签分类,并将三个方面融合.实验结果表明,改进的分类方法有效地提高了博客文章分类的性能.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号