Classifying Illegal Activities on Tor Network Based on Web Textual Contents

机译：基于Web文本内容对Tor网络进行分类的非法活动

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The freedom of the Deep Web offers a safe place where people can express themselves anonymously but they also can conduct illegal activities. In this paper, we present and make publicly available1 a new dataset for Darknet active domains, which we call it "Darknet Usage Text Addresses" (DUTA). We built DUTA by sampling the Tor network during two months and manually labeled each address into 26 classes. Using DUTA, we conducted a comparison between two well-known text representation techniques crossed by three different supervised classifiers to categorize the Tor hidden services. We also fixed the pipeline elements and identified the aspects that have a critical influence on the classification results. We found that the combination of TF-IDF words representation with Logistic Regression classifier achieves 96.6% of 10 folds cross-validation accuracy and a macro F1 score of 93.7% when classifying a subset of illegal activities from DUTA. The good performance of the classifier might support potential tools to help the authorities in the detection of these activities.

机译：深网络的自由提供了一个安全的地方，人们可以匿名表达自己，但他们也可以开展非法活动。在本文中，我们展示并公开可用的1用于Darknet Active域的新数据集，我们称之为“Darknet使用文本地址”（Duta）。我们通过在两个月内对TOR网络进行采样并手动将每个地址标记为26级来构建DUTA。使用DUTA，我们在两个不同监督分类器跨越三个不同的监督分类器的两个众所周知的文本表示技术之间进行了比较，以对TOR隐藏服务进行分类。我们还修复了管道元素，并确定了对分类结果具有关键影响的方面。我们发现，当在追究DUTA的非法活动的子集时，TF-IDF字表示与逻辑回归分类器的组合达到了96.6％的交叉验证精度和93.7％的宏F1分数。分类器的良好性能可能支持潜在的工具，以帮助当局检测这些活动。

著录项

来源
《Conference of the European Chapter of the Association for Computational Linguistics》|2017年|xxxviii 642 p.|共9页
会议地点
作者
Mhd Wesam Al Nabki; Eduardo Fidalgo; Enrique Alegre; Ivan de Paz;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. WebGuard: a Web filtering engine combining textual, structural, and visual content-based analysis [J] . Hammami M., Chahir Y., Chen L. IEEE Transactions on Knowledge and Data Engineering . 2006,第2期

机译：WebGuard：Web过滤引擎，结合了基于文本，结构和视觉内容的分析
2. Words are important: A textual content based identity resolution scheme across multiple online social networks [J] . Srivastava Deepesh Kumar, Roychoudhury Basav Knowledge-Based Systems . 2020,第May11期

机译：单词很重要：跨多个在线社交网络的基于文本内容的身份解析方案
3. Textual and Content-Based Search in Repositories of Web Application Models [J] . BOJANA BISLIMOVSKA, ALESSANDRO BOZZON, MARCO BRAMBILLA, ACM transactions on the web . 2014,第2期

机译：Web应用程序模型存储库中的基于文本和基于内容的搜索
4. Classifying Illegal Activities on Tor Network Based on Web Textual Contents [C] . Mhd Wesam Al Nabki, Eduardo Fidalgo, Enrique Alegre, Conference of the European Chapter of the Association for Computational Linguistics . 2017

机译：基于Web文本内容的Tor网络上的非法活动分类
5. A Neural Network Model for Classifying Bubble-Based Instructor Evaluations, and an Accompanying Web Portal [D] . Held, Jason. 2018

机译：用于分类基于泡沫的教练评估的神经网络模型，以及随附的网络门户网站
6. Automatic Detection of Pornographic and Gambling Websites Based on Visual and Textual Content Using a Decision Mechanism [O] . Yang Chen, Rongfeng Zheng, Anmin Zhou, 2020

机译：基于使用决策机制的视觉和文本内容自动检测色情和赌博网站
7. Classifying Illegal Activities on Tor Network Based on Web Textual Contents [O] . Mhd Wesam Al Nabki, Eduardo Fidalgo, Enrique Alegre, 2017

机译：基于Web文本内容对Tor网络进行分类的非法活动

Classifying Illegal Activities on Tor Network Based on Web Textual Contents

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅