首页> 外文会议>Twenty-First International Workshop on Database and Expert Systems Applications >A Comparison of Stylometric and Lexical Features for Web Genre Classification and Emotion Classification in Blogs
【24h】

A Comparison of Stylometric and Lexical Features for Web Genre Classification and Emotion Classification in Blogs

机译:博客中网络类型分类和情感分类的风格和词法特征比较

获取原文

摘要

In the blogosphere, the amount of digital content is expanding and for search engines, new challenges have been imposed. Due to the changing information need, automatic methods are needed to support blog search users to filter information by different facets. In our work, we aim to support blog search with genre and facet information. Since we focus on the news genre, our approach is to classify blogs into news versus rest. Also, we assess the emotionality facet in news related blogs to enable users to identify peopleȁ9;s feelings towards specific events. Our approach is to evaluate the performance of text classifiers with lexical and stylometric features to determine the best performing combination for our tasks. Our experiments on a subset of the TREC Blogs08 dataset reveal that classifiers trained on lexical features perform consistently better than classifiers trained on the best stylometric features.
机译:在博客领域,数字内容的数量正在增加,并且对于搜索引擎而言,已经提出了新的挑战。由于信息需求的变化,需要自动方法来支持博客搜索用户以按不同方面筛选信息。在我们的工作中,我们旨在通过类型和方面信息来支持博客搜索。由于我们专注于新闻类型,因此我们的方法是将博客分类为新闻还是休息。此外,我们评估了新闻相关博客中的情感方面,以使用户能够识别人们对特定事件的9感。我们的方法是评估具有词法和笔法特征的文本分类器的性能,以确定执行任务的最佳组合。我们在TREC Blogs08数据集的子集上进行的实验表明,在词法特征上训练的分类器的性能始终优于在最佳笔势特征上训练的分类器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号