首页> 外文会议>IEEE International Conference on Big Data >The impact of preprocessing in natural language for open source intelligence and criminal investigation
【24h】

The impact of preprocessing in natural language for open source intelligence and criminal investigation

机译:用自然语言进行预处理对开源情报和刑事调查的影响

获取原文

摘要

Underground forums serves as gathering place for like-minded cyber criminals and are an continued threat to law and order. Law enforcement agencies can use Open-Source Intelligence (OSINT) to gather valuable information to proactively counter existing and new threats. For example, by shifting criminal investigation’s focus onto certain cyber criminals with large impact in underground forums and related criminal business models. This paper presents our study on text preprocessing requirements and document construction for the topic model algorithm Latent Dirichlet Allocation (LDA). We identify a set of preprocessing requirements based on literature review and demonstrate them on a real-world forum, similar to those used by cyber criminals. Our result show that topic modelling processes needs to follow a very strict procedure to provide significant result that can be useful in OSINT. Additionally, more reliable results are produced by tuning the hyper-parameters and the number of topics for LDA. We demonstrate improved results by iterative preprocessing to continuously improve the model, which provide more coherent and focused topics.
机译:地下论坛是志趣相投的网络罪犯的聚会场所,并且是对法律和秩序的持续威胁。执法机构可以使用开源情报(OSINT)收集有价值的信息,以主动应对现有威胁和新威胁。例如,通过将犯罪调查的重点转移到对地下论坛和相关犯罪商业模式产生重大影响的某些网络犯罪分子。本文介绍了我们对主题模型算法潜在狄利克雷分配(LDA)的文本预处理要求和文档构建的研究。我们根据文献综述确定了一组预处理要求,并在现实世界的论坛上进行了演示,类似于网络罪犯使用的要求。我们的结果表明,主题建模过程需要遵循非常严格的过程才能提供对OSINT有用的重要结果。此外,通过调整LDA的超参数和主题数,可以获得更可靠的结果。我们通过迭代预处理来证明改进的结果,以不断改进模型,从而提供了更加连贯且重点突出的主题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号