首页> 外文会议> >Sentiment crawling: Extremist content collection through a sentiment analysis guided web-crawler
【24h】

Sentiment crawling: Extremist content collection through a sentiment analysis guided web-crawler

机译:情绪搜寻:通过情绪分析指导的网络搜寻器收集极端主义内容

获取原文

摘要

As the data generated on the internet exponentially increases, developing guided data collection methods become more and more essential to the research process. This paper proposes an approach to building a self-guiding web-crawler to collect data specifically from extremist websites. The guidance component of the web-crawler is achieved through the use of sentiment-based classification rules which allow the crawler to make decisions on the content of the webpage it downloads. First, content from 2,500 webpages was collected for each of the four different sentiment-based classes: pro-extremist websites, anti-extremist websites, neutral news sites discussing extremism and finally sites with no discussion of extremism. Then parts of speech tagging was used to find the most frequent keywords in these pages. Utilizing sentiment software in conjunction with classification software a decision tree that could effectively discern which class a particular page would fall into was generated. The resulting tree showed an 80% success rate on differentiating between the four classes and a 92% success rate at classifying specifically extremist pages. This decision tree was then applied to a randomly selected sample of pages for each class. The results from the secondary test showed similar results to the primary test and hold promise for future studies using this framework.
机译:随着互联网上生成的数据成倍增加,开发有指导的数据收集方法对于研究过程变得越来越重要。本文提出了一种构建自导网络爬虫的方法,以专门从极端主义网站收集数据。网络爬虫的指导组件是通过使用基于情感的分类规则来实现的,该规则允许爬虫对其下载的网页内容做出决定。首先,针对四个不同的基于情感的类别,分别从2500个网页中收集了内容:支持极端主义的网站,反对极端主义的网站,讨论极端主义的中立新闻网站,最后没有讨论极端主义的网站。然后,使用语音标注的一部分来查找这些页面中最常见的关键字。将情感软件与分类软件结合使用,可以生成决策树,该决策树可以有效地识别特定页面将属于哪个类。生成的树在区分这四个类别时显示出80%的成功率,在对特定极端页面进行分类时显示出92%的成功率。然后将此决策树应用于每个类别的随机选择的页面样本。二级测试的结果显示出与一级测试相似的结果,并为使用该框架的未来研究提供了希望。

著录项

  • 来源
    《》|2015年|1024-1027|共4页
  • 会议地点 Paris(FR)
  • 作者

    Joseph Mei; Richard Frank;

  • 作者单位

    International CyberCrime Research Center Simon Fraser University School of Criminology Burnaby Canada;

  • 会议组织
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

    Classification; Extremism; Sentiment Analysis; Web-crawling;

    机译:分类;极端主义;情绪分析;网络爬行;
  • 入库时间 2022-08-26 14:39:45

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号