【24h】

Blog, Forum or Newspaper? Web Genre Detection Using SVMs

机译:博客,论坛还是报纸?使用SVM进行Web风格检测

获取原文
获取原文并翻译 | 示例

摘要

In recent years, blogs have become a very popular way to publish information, express opinions and hold discussions. Hence researchers and industry have interest in analyzing the blogosphere. Due to the increasing diversity of blog usage, the initial categorization into web genres is the first necessary step before any analyses. In this research, we focus on the distinction between traditional blogs, news portals, forums and miscellaneous websites. Especially the new distinction between news portals and blogs allows analyses to adapt to the network-specific characteristics of traditional media with high journalistic effort and more personal weblogs and their authors. We present a set of 80 features and extensively experiment with possible combinations and SVM parameters to identify the best constellation for the categorization into the four different web genres. Our experiments show a maximal accuracy of 83.5% overall. This high precision was reached using a combination of trained n-grams, structural properties (e.g. Twitter links) and quantitative properties like the text's length and number of dates.
机译:近年来,博客已成为一种非常流行的发布信息,发表意见和进行讨论的方式。因此,研究人员和行业都对分析Blogo感兴趣。由于博客用法的多样性日益增加,在进行任何分析之前,将初始分类为网络类型是第一步。在这项研究中,我们着眼于传统博客,新闻门户,论坛和其他网站之间的区别。特别是新闻门户网站和博客之间的新区别,使得分析工作可以通过大量的新闻工作和更多的个人博客及其作者来适应传统媒体的网络特定特征。我们提出了80种功能,并通过可能的组合和SVM参数进行了广泛的实验,以识别出最佳类别,以将其分类为四种不同的网络类型。我们的实验显示,整体的最大准确度为83.5%。结合训练有素的n-gram,结构属性(例如Twitter链接)和定量属性(例如文本的长度和日期数),可以达到这种高精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号