首页> 外文会议>Conference of the European Chapter of the Association for Computational Linguistics >Classifying Illegal Activities on Tor Network Based on Web Textual Contents
【24h】

Classifying Illegal Activities on Tor Network Based on Web Textual Contents

机译:基于Web文本内容对Tor网络进行分类的非法活动

获取原文
获取外文期刊封面目录资料

摘要

The freedom of the Deep Web offers a safe place where people can express themselves anonymously but they also can conduct illegal activities. In this paper, we present and make publicly available1 a new dataset for Darknet active domains, which we call it "Darknet Usage Text Addresses" (DUTA). We built DUTA by sampling the Tor network during two months and manually labeled each address into 26 classes. Using DUTA, we conducted a comparison between two well-known text representation techniques crossed by three different supervised classifiers to categorize the Tor hidden services. We also fixed the pipeline elements and identified the aspects that have a critical influence on the classification results. We found that the combination of TF-IDF words representation with Logistic Regression classifier achieves 96.6% of 10 folds cross-validation accuracy and a macro F1 score of 93.7% when classifying a subset of illegal activities from DUTA. The good performance of the classifier might support potential tools to help the authorities in the detection of these activities.
机译:深网络的自由提供了一个安全的地方,人们可以匿名表达自己,但他们也可以开展非法活动。在本文中,我们展示并公开可用的1用于Darknet Active域的新数据集,我们称之为“Darknet使用文本地址”(Duta)。我们通过在两个月内对TOR网络进行采样并手动将每个地址标记为26级来构建DUTA。使用DUTA,我们在两个不同监督分类器跨越三个不同的监督分类器的两个众所周知的文本表示技术之间进行了比较,以对TOR隐藏服务进行分类。我们还修复了管道元素,并确定了对分类结果具有关键影响的方面。我们发现,当在追究DUTA的非法活动的子集时,TF-IDF字表示与逻辑回归分类器的组合达到了96.6%的交叉验证精度和93.7%的宏F1分数。分类器的良好性能可能支持潜在的工具,以帮助当局检测这些活动。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号