Prioritized active learning for malicious URL detection using weighted text-based features

机译：使用基于文本的加权功能对恶意URL检测进行优先级主动学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data analytics is being increasingly used in cyber-security problems, and found to be useful in cases where data volumes and heterogeneity make it cumbersome for manual assessment by security experts. In practical cyber-security scenarios involving data-driven analytics, obtaining data with annotations (i.e. ground-truth labels) is a challenging and known limiting factor for many supervised security analytics task. Significant portions of the large datasets typically remain unlabelled, as the task of annotation is extensively manual and requires a huge amount of expert intervention. In this paper, we propose an effective active learning approach that can efficiently address this limitation in a practical cyber-security problem of Phishing categorization, whereby we use a human-machine collaborative approach to design a semi-supervised solution. An initial classifier is learnt on a small amount of the annotated data which in an iterative manner, is then gradually updated by shortlisting only relevant samples from the large pool of unlabelled data that are most likely to influence the classifier performance fast. Prioritized Active Learning shows a significant promise to achieve faster convergence in terms of the classification performance in a batch learning framework, and thus requiring even lesser effort for human annotation. An useful feature weight update technique combined with active learning shows promising classification performance for categorizing Phishing/malicious URLs without requiring a large amount of annotated training samples to be available during training. In experiments with several collections of PhishMonger's Targeted Brand dataset, the proposed method shows significant improvement over the baseline by as much as 12%.

机译：数据分析正越来越多地用于网络安全问题中，并发现在数据量和异构性使其难以由安全专家进行手动评估的情况下很有用。在涉及数据驱动分析的实际网络安全方案中，对于许多受监管的安全分析任务而言，获取带有批注（即真实标签）的数据是一项具有挑战性且已知的限制因素。大型数据集的重要部分通常不加标签，因为注释的任务是广泛的手动操作，需要大量的专家干预。在本文中，我们提出了一种有效的主动学习方法，该方法可以有效解决网络钓鱼分类的实际网络安全问题中的这一限制，从而使用人机协作方法来设计半监督解决方案。在少量带注释的数据上学习初始分类器，然后以迭代的方式逐步更新初始分类器，方法是从大量未标记数据中仅筛选出最有可能会快速影响分类器性能的相关样本，然后逐步对其进行更新。优先主动学习显示了在批处理学习框架中实现分类性能方面更快收敛的显着希望，因此需要更少的人工注释工作。一种有用的特征权重更新技术与主动学习相结合，显示了很有前景的分类性能，可用于对网络钓鱼/恶意URL进行分类，而无需在培训期间使用大量带注释的培训样本。在使用PhishMonger的“目标品牌”数据集的多个集合进行的实验中，所提出的方法显示出比基线高出多达12％的显着改进。

著录项

来源
《IEEE International Conference on Intelligence and Security Informatics》|2017年|107-112|共6页
会议地点
作者
Sreyasee Das Bhattacharjee; Ashit Talukder; Ehab Al-Shaer; Pratik Doshi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Uniform resource locators; Computer security; Training; Mutual information; Man-machine systems; Collaboration;

机译：统一资源定位器;计算机安全;培训;相互信息;人机系统;协作;

相似文献

外文文献
中文文献
专利

1. Detection of Malicious Social Bots Using Learning Automata With URL Features in Twitter Network [J] . Rout Rashmi Ranjan, Lingam Greeshma, Somayajulu D. V. L. N. Computational Social Systems, IEEE Transactions on . 2020,第4期

机译：在Twitter网络中使用Learnic Automata检测具有URL功能的乐意社交机器人
2. Malicious URL detection with feature extraction based on machine learning [J] . Baojiang Cui, Shanshan He, Xi Yao, International Journal of High Performance Computing and Networking . 2018,第2期

机译：基于机器学习的特征提取的恶意URL检测
3. Cost-Sensitive Online Active Learning with Application to Malicious URL Detection [J] . Peilin Zhao, Steven C. H. Hoi SIGKDD explorations . 2013,第CDaROM期

机译：成本敏感的在线主动学习及其在恶意URL检测中的应用
4. Prioritized active learning for malicious URL detection using weighted text-based features [C] . Sreyasee Das Bhattacharjee, Ashit Talukder, Ehab Al-Shaer, IEEE International Conference on Intelligence and Security Informatics . 2017

机译：使用加权基于文本的功能进行恶意URL检测的优先考虑的主动学习
5. Learning to detect malicious URLs. [D] . Ma, Justin Tung. 2010

机译：学习检测恶意URL。
6. Malicious URL Detection Based on Associative Classification [O] . Sandra Kumi, ChaeHo Lim, Sang-Gon Lee 2021

机译：基于关联分类的恶意URL检测
7. Cost-Sensitive Online Active Learning with Application to Malicious URL Detection [O] . Peilin Zhao, Steven C.H. Hoi 2013

机译：具有成本敏感性的在线主动学习，适用于恶意URL检测
8. Neural Detection of Malicious Network Activities Using a New Direct Parsing and Feature Extraction Technique. [R] . Low, C. H. 2015

机译：利用新的直接解析和特征提取技术对恶意网络活动进行神经检测。

Prioritized active learning for malicious URL detection using weighted text-based features

摘要

著录项

相似文献

相关主题

期刊订阅