首页> 外文会议>2016 IEEE International Conference on Cybercrime and Computer Forensic >Positing the problem: enhancing classification of extremist web content through textual analysis
【24h】

Positing the problem: enhancing classification of extremist web content through textual analysis

机译:提出问题:通过文本分析加强极端主义网站内容的分类

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Webpages with terrorist and extremist content are key factors in the recruitment and radicalization of disaffected young adults who may then engage in terrorist activities at home or fight alongside terrorist groups abroad. This paper reports on advances in techniques for classifying data collected by the Terrorism and Extremism Network Extractor (TENE) webcrawler, a custom-written program that browses the World Wide Web, collecting vast amounts of data, retrieving the pages it visits, analyzing them, and recursively following the links out of those pages. The textual content is subjected to enhanced classification through software analysis, using the Posit textual analysis toolset, generating a detailed frequency analysis of the syntax, including multi-word units and associated part-of-speech components. Results are then deployed in a knowledge extraction process using knowledge extraction algorithms, e.g., from the WEKA system. Indications are that the use of the data enrichment through application of Posit analysis affords a greater degree of match between automatic and manual classification than previously attained. Furthermore, the incorporation and deployment of these technologies promises to provide public safety officials with techniques that can help to detect terrorist webpages, gauge the intensity of their content, discriminate between webpages that do or do not require a concerted response, and take appropriate action where warranted.
机译:包含恐怖和极端主义内容的网页是招募和激进化不满的年轻人的关键因素,这些年轻人然后可能在家中从事恐怖活动或与国外恐怖组织并肩作战。本白皮书报告了对恐怖主义和极端主义网络提取器(TENE)网络爬虫收集的数据进行分类的技术进展,该网络爬虫是一种自定义编写的程序,可浏览万维网,收集大量数据,检索其访问的页面,对其进行分析,并递归地浏览这些页面之外的链接。使用Posit文本分析工具集,可以通过软件分析对文本内容进行增强的分类,生成详细的语法频率分析,包括多词单元和相关的词性组件。然后,例如使用来自WEKA系统的知识提取算法,将结果部署在知识提取过程中。迹象表明,通过应用Posit分析进行数据充实可以使自动分类和手动分类的匹配程度比以前更高。此外,这些技术的结合和部署有望为公共安全官员提供可帮助检测恐怖分子网页,衡量其内容强度,区分需要或不需要协同响应的网页以及在以下情况下采取适当行动的技术:保证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号