首页> 外文期刊>Journal of Intelligent Information Systems >Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list
【24h】

Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list

机译:分层关联分类器(HAC),用于从不平衡的大型灰色列表中检测恶意软件

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Nowadays, numerous attacks made by the malware (e.g., viruses, backdoors, spy ware, trojans and worms) have presented a major security threat to computer users. Currently, the most significant line of defense against malware is anti-virus products which focus on authenticating valid software from a whitelist, blocking invalid software from a blacklist, and running any unknown software (i.e., the gray list) in a controlled manner. The gray list, containing unknown software programs which could be either normal or malicious, is usually authenticated or rejected manually by virus analysts. Unfortunately, along with the development of the malware writing techniques, the number of file samples in the gray list that need to be analyzed by virus analysts on a daily basis is constantly increasing. The gray list is not only large in size, but also has an imbalanced class distribution where malware is the minority class. In this paper, we describe our research effort on building automatic, effective, and interpretable classifiers resting on the analysis of Application Programming Interfaces (APIs) called by Windows Portable Executable (PE) files for detecting malware from the large and unbalanced gray list. Our effort is based on associative classifiers due to their high interpretability as well as their capability of discovering interesting relationships among API calls. We first adapt several different post-processing techniques of associative classification, including rule pruning and rule re-ordering, for building effective associative classifiers from large collections of training data. In order to help the virus analysts detect malware from the unbalanced gray list, we then develop the Hierarchical Associative Classifier (HAC). HAC constructs a two-level associative classifier to maximize precision and recall of the minority (malware) class: in the first level, it uses high precision rules of majority (benign file samples) class and low precision rules of minority class to achieve high recall; and in the second level, it ranks the minority class files and optimizes the precision. Finally, since our case studies are based on a large and real data collection obtained from the Anti-virus Lab of Kingsoft corporation, including 8,000,000 malware, 8,000,000 benign files, and 100,000 file samples from the gray list, we empirically examine the sampling strategy to build the classifiers for such a large data collection to avoid over-fitting and achieve great effectiveness as well as high efficiency. Promising experimental results demonstrate the effectiveness and efficiency of the HAC classifier. HAC has already been incorporated into the scanning tool of Kingsoft's Anti-Virus software.
机译:如今,恶意软件所进行的众多攻击(例如病毒,后门,间谍软件,木马和蠕虫)已对计算机用户构成了主要的安全威胁。当前,针对恶意软件的最重要的防御措施是防病毒产品,其重点是从白名单中验证有效软件,从黑名单中阻止无效软件以及以受控方式运行任何未知软件(即,灰色列表)。灰名单包含未知的软件程序,这些程序可能是正常的也可能是恶意的,通常会被病毒分析人员手动验证或拒绝。不幸的是,随着恶意软件编写技术的发展,每天需要由病毒分析人员分析的灰色列表中的文件样本数量正在不断增加。灰名单不仅规模庞大,而且类别分布不平衡,其中恶意软件属于少数类别。在本文中,我们描述了我们在构建自动,有效和可解释的分类器方面的研究工作,这些分类器基于对Windows便携式可执行文件(PE)文件调用的应用程序编程接口(API)进行分析,以从较大且不平衡的灰色列表中检测恶意软件。我们的工作基于关联分类器,因为它们具有较高的可解释性以及发现API调用之间有趣关系的能力。我们首先采用几种不同的关联分类后处理技术,包括规则修剪和规则重新排序,以从大量的训练数据集中构建有效的关联分类器。为了帮助病毒分析人员从不平衡的灰色列表中检测恶意软件,我们然后开发了层次关联分类器(HAC)。 HAC构造了一个两级关联分类器,以最大化少数(恶意软件)类的准确性和召回性:在第一级中,它使用多数(良性文件样本)类的高精度规则和少数类的低精度规则来实现较高的召回率;在第二级中,它对少数类文件进行排名并优化精度。最后,由于我们的案例研究基于从金山毒霸公司防病毒实验室获得的大量真实数据,其中包括8,000,000个恶意软件,8,000,000个良性文件以及100,000个灰色列表中的文件样本,因此我们以实证的方式研究了为大型数据收集构建分类器,以避免过度拟合并获得巨大的效果和高效率。有希望的实验结果证明了HAC分类器的有效性和效率。 HAC已被纳入金山毒霸反病毒软件的扫描工具中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号