首页> 外文期刊>Expert Systems with Application >Novel set of general descriptive features for enhanced detection of malicious emails using machine learning methods
【24h】

Novel set of general descriptive features for enhanced detection of malicious emails using machine learning methods

机译:一套新颖的常规描述功能,可使用机器学习方法增强对恶意电子邮件的检测

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

In recent years, cyber-attacks against businesses and organizations have increased. Such attacks usually result in significant damage to the organization, such as the loss and/or leakage of sensitive and confidential information. Because email communication is an integral part of daily business operations, attackers frequently leverage email as an attack vector in order to initially penetrate the targeted organization. Email message allows the attacker to deliver dangerous content to the victim, such as malicious attachments or links to malicious websites. Existing email analysis solutions analyze only specific parts of the email using rule-based methods, while other important parts remain unanalyzed. Existing anti-virus engines primarily use signature-based detection methods, and therefore are insufficient for detecting new unknown malicious emails. Machine learning methods have been shown to be effective at detecting maliciousness in various domains and particularly in email. Previous works which used machine learning methods suggested sets of features which offer a limited perspective over the whole email message. In this paper, we propose a novel set of general descriptive features extracted from all email components (header, body, and attachments) for enhanced detection of malicious emails using machine learning methods. The proposed features are extracted just from the email itself; therefore, our features are independent, since the extraction process does not require an Internet connection or the use of external services or other tools, thereby meeting the needs of real-time detection systems. We conducted an extensive evaluation of our new novel features against sets of features suggested by previous academic work using a collection of 33,142 emails which contains 38.73% malicious and 61.27% benign emails. The results show that malicious emails can be detected effectively when using our novel features with machine learning algorithms. Moreover, our novel features enhance the detection of malicious emails when used in conjunction with features suggested by related work. The Random Forest classifier achieved the highest detection rates, with an AUC of 0.929, true positive rate (TPR) of 0.947, and false positive rate (FPR) of 0.03. We also present the IDR (integrated detection rate), a new measure which helps calibrate the threshold of a machine learning classifier in order to achieve the optimal TP and FP rates, which are the most important measures for a real-time and practical cyber-security application. (C) 2018 Elsevier Ltd. All rights reserved.
机译:近年来,针对企业和组织的网络攻击有所增加。此类攻击通常会对组织造成重大损害,例如敏感和机密信息的丢失和/或泄漏。由于电子邮件通信是日常业务运营必不可少的一部分,因此攻击者经常利用电子邮件作为攻击媒介,从而初步侵入目标组织。电子邮件使攻击者可以将危险内容传递给受害者,例如恶意附件或指向恶意网站的链接。现有的电子邮件分析解决方案仅使用基于规则的方法来分析电子邮件的特定部分,而其他重要部分仍未进行分析。现有的防病毒引擎主要使用基于签名的检测方法,因此不足以检测新的未知恶意电子邮件。机器学习方法已被证明可以有效地检测各种领域的恶意软件,尤其是电子邮件中的恶意软件。以前使用机器学习方法的著作提出了一些功能集,这些功能集对整个电子邮件消息的视野有限。在本文中,我们提出了从所有电子邮件组件(标题,正文和附件)中提取的一组新颖的常规描述功能,以使用机器学习方法来增强对恶意电子邮件的检测。建议的功能仅从电子邮件本身中提取;因此,我们的功能是独立的,因为提取过程不需要Internet连接或使用外部服务或其他工具,从而可以满足实时检测系统的需求。我们使用33,142封电子邮件,其中包含38.73%的恶意邮件和61.27%的良性电子邮件,对以前的学术工作提出的新功能进行了广泛的评估。结果表明,将我们的新颖功能与机器学习算法结合使用时,可以有效地检测到恶意电子邮件。此外,当与相关工作建议的功能结合使用时,我们的新颖功能可增强对恶意电子邮件的检测。随机森林分类器的检测率最高,AUC为0.929,真实阳性率(TPR)为0.947,错误阳性率(FPR)为0.03。我们还介绍了IDR(综合检测率),这是一种新方法,可帮助校准机器学习分类器的阈值,以获得最佳的TP和FP率,这是实时,实用的安全应用程序。 (C)2018 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号