Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list

Yanfang Ye; Tao Li; Kai Huang; Qingshan Jiang; Yong Chen

首页> 外文期刊>Journal of Intelligent Information Systems >Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list

【24h】

Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list

机译：分层关联分类器（HAC），用于从不平衡的大型灰色列表中检测恶意软件

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Nowadays, numerous attacks made by the malware (e.g., viruses, backdoors, spy ware, trojans and worms) have presented a major security threat to computer users. Currently, the most significant line of defense against malware is anti-virus products which focus on authenticating valid software from a whitelist, blocking invalid software from a blacklist, and running any unknown software (i.e., the gray list) in a controlled manner. The gray list, containing unknown software programs which could be either normal or malicious, is usually authenticated or rejected manually by virus analysts. Unfortunately, along with the development of the malware writing techniques, the number of file samples in the gray list that need to be analyzed by virus analysts on a daily basis is constantly increasing. The gray list is not only large in size, but also has an imbalanced class distribution where malware is the minority class. In this paper, we describe our research effort on building automatic, effective, and interpretable classifiers resting on the analysis of Application Programming Interfaces (APIs) called by Windows Portable Executable (PE) files for detecting malware from the large and unbalanced gray list. Our effort is based on associative classifiers due to their high interpretability as well as their capability of discovering interesting relationships among API calls. We first adapt several different post-processing techniques of associative classification, including rule pruning and rule re-ordering, for building effective associative classifiers from large collections of training data. In order to help the virus analysts detect malware from the unbalanced gray list, we then develop the Hierarchical Associative Classifier (HAC). HAC constructs a two-level associative classifier to maximize precision and recall of the minority (malware) class: in the first level, it uses high precision rules of majority (benign file samples) class and low precision rules of minority class to achieve high recall; and in the second level, it ranks the minority class files and optimizes the precision. Finally, since our case studies are based on a large and real data collection obtained from the Anti-virus Lab of Kingsoft corporation, including 8,000,000 malware, 8,000,000 benign files, and 100,000 file samples from the gray list, we empirically examine the sampling strategy to build the classifiers for such a large data collection to avoid over-fitting and achieve great effectiveness as well as high efficiency. Promising experimental results demonstrate the effectiveness and efficiency of the HAC classifier. HAC has already been incorporated into the scanning tool of Kingsoft's Anti-Virus software.

机译：如今，恶意软件所进行的众多攻击（例如病毒，后门，间谍软件，木马和蠕虫）已对计算机用户构成了主要的安全威胁。当前，针对恶意软件的最重要的防御措施是防病毒产品，其重点是从白名单中验证有效软件，从黑名单中阻止无效软件以及以受控方式运行任何未知软件（即，灰色列表）。灰名单包含未知的软件程序，这些程序可能是正常的也可能是恶意的，通常会被病毒分析人员手动验证或拒绝。不幸的是，随着恶意软件编写技术的发展，每天需要由病毒分析人员分析的灰色列表中的文件样本数量正在不断增加。灰名单不仅规模庞大，而且类别分布不平衡，其中恶意软件属于少数类别。在本文中，我们描述了我们在构建自动，有效和可解释的分类器方面的研究工作，这些分类器基于对Windows便携式可执行文件（PE）文件调用的应用程序编程接口（API）进行分析，以从较大且不平衡的灰色列表中检测恶意软件。我们的工作基于关联分类器，因为它们具有较高的可解释性以及发现API调用之间有趣关系的能力。我们首先采用几种不同的关联分类后处理技术，包括规则修剪和规则重新排序，以从大量的训练数据集中构建有效的关联分类器。为了帮助病毒分析人员从不平衡的灰色列表中检测恶意软件，我们然后开发了层次关联分类器（HAC）。 HAC构造了一个两级关联分类器，以最大化少数（恶意软件）类的准确性和召回性：在第一级中，它使用多数（良性文件样本）类的高精度规则和少数类的低精度规则来实现较高的召回率;在第二级中，它对少数类文件进行排名并优化精度。最后，由于我们的案例研究基于从金山毒霸公司防病毒实验室获得的大量真实数据，其中包括8,000,000个恶意软件，8,000,000个良性文件以及100,000个灰色列表中的文件样本，因此我们以实证的方式研究了为大型数据收集构建分类器，以避免过度拟合并获得巨大的效果和高效率。有希望的实验结果证明了HAC分类器的有效性和效率。 HAC已被纳入金山毒霸反病毒软件的扫描工具中。

著录项

来源
《Journal of Intelligent Information Systems》 |2010年第1期|P.1-20|共20页
作者
Yanfang Ye; Tao Li; Kai Huang; Qingshan Jiang; Yong Chen;
展开▼
作者单位

Department of Computer Science, Xiamen University, Xiamen, 361005, People's Republic of China;

School of Computer Science, Florida International University, Miami, FL 33199, USA;

rnSoftware School, Xiamen University, Xiamen, 361005, People's Republic of China;

rnSoftware School, Xiamen University, Xiamen, 361005, People's Republic of China;

rnAnti-virus Laboratory, Kingsoft Corporation, Zhuhai, 519000, People's Republic of China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
malware detection; gray list; class imbalance; hierarchical associative classifier (HAC);

机译：恶意软件检测;灰名单阶级失衡;层次关联分类器（HAC）;

相似文献

外文文献
中文文献
专利

1. A new MapReduce associative classifier based on a new storage format for large-scale imbalanced data [J] . Mehrdad Almasi, Mohammad Saniee Abadeh Cluster computing . 2018,第4期

机译：一种新的MapReduce关联分类基于用于大规模不平衡数据的新存储格式
2. A GENERATOR BASED ASSOCIATIVE CLASSIFIER FOR IMBALANCED DATASETS [J] . SIREESHA RODDA, PROF. SHASHI MOGALLA International Journal of Engineering Science and Technology . 2011,第4期

机译：不平衡数据集的基于生成器的关联分类器
3. A new machine learning-based method for android malware detection on imbalanced dataset [J] . Dehkordy Diyana Tehrany, Rasoolzadegan Abbas Multimedia Tools and Applications . 2021,第16期

机译：基于机器学习的基于机器学习的Android Malware检测方法，用于基于Inbalanced DataSet
4. Empirical Study of Associative Classifiers on Imbalanced Datasets in KEEL [C] . Zulfiqar Ali, Rehan Ahmad, Muhammad Nadeem Akhtar, International Conference on Information, Intelligence, Systems and Applications . 2019

机译：龙骨上不平衡数据集对关联分类的实证研究
5. Android Malware Detection Using Category-Based Machine Learning Classifiers. [D] . Ali Alatwi, Huda. 2016

机译：使用基于类别的机器学习分类器进行Android恶意软件检测。
6. Detecting and classifying method based on similarity matching of Android malware behavior with profile [O] . Jae-wook Jang, Jaesung Yun, Aziz Mohaisen, -1

机译：基于Android恶意软件行为与配置文件相似度匹配的检测分类方法
7. Intelligent File Scoring System for Malware Detection from the Gray List [O] . Qingshan Jiang, Yanfang Ye, Zhixue Han, 2013

机译：智能文件评分系统，可从灰色列表中检测恶意软件

Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅