...
首页> 外文期刊>Expert Systems with Application >SFEM: Structural feature extraction methodology for the detection of malicious office documents using machine learning methods
【24h】

SFEM: Structural feature extraction methodology for the detection of malicious office documents using machine learning methods

机译:SFEM:使用机器学习方法检测恶意Office文档的结构特征提取方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Office documents are used extensively by individuals and organizations. Most users consider these documents safe for use. Unfortunately, Office documents can contain malicious components and perform harmful operations. Attackers increasingly take advantage of naive users and leverage Office documents in order to launch sophisticated advanced persistent threat (APT) and ransomware attacks. Recently, targeted cyber-attacks against organizations have been initiated with emails containing malicious attachments. Since most email servers do not allow the attachment of executable files to emails, attackers prefer to use of non-executable files (e.g., documents) for malicious purposes. Existing anti-virus engines primarily use signature-based detection methods, and therefore fail to detect new unknown malicious code which has been embedded in an Office document. Machine learning methods have been shown to be effective at detecting known and unknown malware in various domains, however, to the best of our knowledge, machine learning methods have not been used for the detection of malicious XML-based Office documents (*.docx, *.xlsx, *.pptx, *.odt, *.ods, etc.). In this paper we present a novel structural feature extraction methodology (SFEM) for XML-based Office documents. SFEM extracts discriminative features from documents, based on their structure. We leveraged SFEM's features with machine learning algorithms for effective detection of malicious *.docx documents. We extensively evaluated SFEM with machine learning classifiers using a representative collection (16,938 *.docx documents collected "from the wild") which contains 4.9% malicious and similar to 95.1% benign documents. We examined 1,600 unique configurations based on different combinations of feature extraction, feature selection, feature representation, top-feature selection methods, and machine learning classifiers. The results show that machine learning algorithms trained on features provided by SFEM successfully detect new unknown malicious *.docx documents. The Random Forest classifier achieves the highest detection rates, with an AUC of 99.12% and true positive rate (TPR) of 97% that is accompanied by a false positive rate (FPR) of 4.9%. In comparison, the best anti-virus engine achieves a TPR which is 25% lower. (C) 2016 Elsevier Ltd. All rights reserved.
机译:Office文档被个人和组织广泛使用。大多数用户认为这些文档可以安全使用。不幸的是,Office文档可能包含恶意组件并执行有害操作。攻击者越来越多地利用天真的用户并利用Office文档来发起复杂的高级持久威胁(APT)和勒索软件攻击。最近,针对组织的针对性网络攻击已通过包含恶意附件的电子邮件发起。由于大多数电子邮件服务器不允许将可执行文件附加到电子邮件,因此攻击者更喜欢出于恶意目的使用不可执行的文件(例如文档)。现有的防病毒引擎主要使用基于签名的检测方法,因此无法检测到Office文档中嵌入的新的未知恶意代码。机器学习方法已被证明可以有效地检测各个域中的已知和未知恶意软件,但是,据我们所知,机器学习方法尚未用于检测基于XML的恶意Office文档(* .docx, * .xlsx,*。pptx,*。odt,*。ods等)。在本文中,我们为基于XML的Office文档提供了一种新颖的结构特征提取方法(SFEM)。 SFEM基于文档的结构从文档中提取歧视性功能。我们利用SFEM的功能和机器学习算法来有效检测恶意* .docx文档。我们使用具有代表性的集合(“从野外”收集的16938个* .docx文档)对包含机器学习分类器的SFEM进行了广泛评估,该集合包含4.9%的恶意文件,与95.1%的良性文件相似。我们基于特征提取,特征选择,特征表示,顶级特征选择方法和机器学习分类器的不同组合检查了1,600种独特配置。结果表明,基于SFEM提供的功能训练的机器学习算法成功检测到新的未知恶意* .docx文档。随机森林分类器实现了最高的检测率,其AUC为99.12%,真实阳性率(TPR)为97%,而错误阳性率(FPR)为4.9%。相比之下,最好的防病毒引擎实现的TPR低25%。 (C)2016 Elsevier Ltd.保留所有权利。

著录项

  • 来源
    《Expert Systems with Application》 |2016年第11期|324-343|共20页
  • 作者单位

    Ben Gurion Univ Negev, Dept Informat Syst Engn, IL-84105 Beer Sheva, Israel|Ben Gurion Univ Negev, Cyber Secur Res Ctr, Malware Lab, IL-84105 Beer Sheva, Israel;

    Ben Gurion Univ Negev, Dept Informat Syst Engn, IL-84105 Beer Sheva, Israel|Ben Gurion Univ Negev, Cyber Secur Res Ctr, Malware Lab, IL-84105 Beer Sheva, Israel;

    Ben Gurion Univ Negev, Dept Informat Syst Engn, IL-84105 Beer Sheva, Israel|Ben Gurion Univ Negev, Cyber Secur Res Ctr, Malware Lab, IL-84105 Beer Sheva, Israel;

    Ben Gurion Univ Negev, Dept Informat Syst Engn, IL-84105 Beer Sheva, Israel|Ben Gurion Univ Negev, Cyber Secur Res Ctr, Malware Lab, IL-84105 Beer Sheva, Israel;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Machine learning; Malware detection; Static analysis; Structural features; Microsoft office open xml; Document;

    机译:机器学习;恶意软件检测;静态分析;结构特征;Microsoft Office open xml;文档;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号