Malicious and Benign Webpages Dataset

机译：恶意和良性网页数据集

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Web Security is a challenging task amidst ever rising threats on the Internet. With billions of websites active on Internet, and hackers evolving newer techniques to trap web users, machine learning offers promising techniques to detect malicious websites. The dataset described in this manuscript is meant for such machine learning based analysis of malicious and benign webpages. The data has been collected from Internet using a specialized focused web crawler named MalCrawler [1]. The dataset comprises of various extracted attributes, and also raw webpage content including JavaScript code. It supports both supervised and unsupervised learning. For supervised learning, class labels for malicious and benign webpages have been added to the dataset using the Google Safe Browsing API.1 The most relevant attributes within the scope have already been extracted and included in this dataset. However, the raw web content, including JavaScript code included in this dataset supports further attribute extraction, if so desired. Also, this raw content and code can be used as unstructured data input for text-based analytics. This dataset consists of data from approximately 1.5 million webpages, which makes it suitable for deep learning algorithms. This article also provides code snippets used for data extraction and its analysis.

机译：在互联网上的威胁上升，Web安全是一个具有挑战性的任务。在互联网上有数百班的网站，以及陷阱陷阱网络用户的黑客，机器学习提供了有希望的检测恶意网站的技术。本手稿中描述的数据集旨在获得基于机器学习的恶意和良性网页的分析。数据已经从Internet收集了使用名为MalcRawler [1]的专用聚焦的Web爬虫收集。数据集包括各种提取的属性，以及包括JavaScript代码的原始网页内容。它支持监督和无监督的学习。对于监督学习，使用Google安全浏览API1将恶意和良性网页的类标签添加到DataSet中.1范围内的最相关属性已被提取并包含在此数据集中。但是，如果需要，则在该数据集中包含的JavaScript代码包括在此数据集中的JavaScript代码支持进一步的属性提取。此外，该原始内容和代码可以用作基于文本的分析的非结构化数据输入。此数据集由来自大约150万个网页的数据组成，这使其适用于深度学习算法。本文还提供用于数据提取的代码片段及其分析。

著录项

期刊名称 Data in Brief
作者
A.K. Singh;
展开▼
作者单位

展开▼
年(卷),期 2020(-1),-1
年度 2020
页码 -1
总页数 11
原文格式 PDF
正文语种
中图分类
关键词

机译：Web安全;恶意网页;机器学习;深入学习;恶意JavaScript;

相似文献

外文文献
中文文献
专利

1. Malicious and Benign Webpages Dataset [J] . A.K. Singh Data in Brief . 2020,第2期

机译：恶意和良性网页数据集
2. SABC-SBC: a Hybrid Ontology Based Image and Webpage Retrieval for Datasets [J] . C. Deepa Automatic Control and Computer Sciences . 2017,第2期

机译：SABC-SBC：基于混合本体的图像和网页检索数据集
3. A Comparative Evaluation of Ensemble Classifiers for Malicious Webpage Detection [J] . Abdulhamit Subasi, Mohammed Balfaqih, Zain Balfagih, Procedia Computer Science . 2021,第a期

机译：恶意网页检测集合分类的比较评价
4. Classification of Benign and Malignant Breast Cancer using Supervised Machine Learning Algorithms Based on Image and Numeric Datasets [C] . Ratula Ray, Azian Azamimi Abdullah, Debasish Kumar Mallick, International Conference on Biomedical Engineering . 2020

机译：基于图像和数字数据集的监督机器学习算法对良性和恶性乳腺癌的分类
5. Detecting malicious Webpages using content based classification . [D] . Bannur, Sushma Nagesh. 2011

机译：使用基于内容的分类检测恶意网页。
6. Dataset of anomalies and malicious acts in a cyber-physical subsystem [O] . Pedro Merino Laso, David Brosset, John Puentes 2017

机译：网络物理子系统中异常和恶意行为的数据集
7. Comparisons of machine learning techniques for detecting malicious webpages [O] . Kazemian, Hassan, Ahmed, S. 2015

机译：用于检测恶意网页的机器学习技术的比较
8. Separation of Benign and Malicious Network Events for Accurate Malware Family Classification. [R] . Mohaisen, A., Zhang, Z., Mekky, H. 2015

机译：用于准确恶意软件家庭分类的良性和恶意网络事件的分离。

Malicious and Benign Webpages Dataset

摘要

著录项

相似文献

相关主题

期刊订阅