Identifying meaningful clusters in malware data

de Amorim Renato Cordeiro; Ruiz Carlos David Lopez

首页> 外文期刊>Expert systems with applications >Identifying meaningful clusters in malware data

【24h】

Identifying meaningful clusters in malware data

机译：在恶意软件数据中识别有意义的群集

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Finding meaningful clusters in drive-by-download malware data is a particularly difficult task. Malware data tends to contain overlapping clusters with wide variations of cardinality. This happens because there can be considerable similarity between malware samples (some are even said to belong to the same family), and these tend to appear in bursts. Clustering algorithms are usually applied to normalised data sets. However, the process of normalisation aims at setting features with different range values to have a similar contribution to the clustering. It does not favour more meaningful features over those that are less meaningful, an effect one should perhaps expect of the data pre-processing stage. In this paper we introduce a method to deal precisely with the problem above. This is an iterative data pre-processing method capable of aiding to increase the separation between clusters. It does so by calculating the within-cluster degree of relevance of each feature, and then it uses these as a data rescaling factor. By repeating this until convergence our malware data was separated in clear clusters, leading to a higher average Silhouette width.

机译：在逐行下载恶意软件数据中找到有意义的群集是一个特别困难的任务。恶意软件数据往往包含具有广泛变异的基数的重叠簇。这发生了因为恶意软件样本之间可能存在相当大的相似性（有些人甚至常常认为属于同一家族），并且这些往往会出现在爆发中。群集算法通常应用于归一化数据集。但是，归一化过程旨在设置具有不同范围值的特征，以对聚类具有类似的贡献。它对那些不太有意义的人来说，这不赞成更有意义的特征，这是一个应该期望数据预处理阶段的效果。在本文中，我们介绍一种恰恰在上面的问题进行处理的方法。这是一种能够触及增加簇之间的分离的迭代数据预处理方法。它通过计算每个特征的簇内相关性的群集程度来实现，然后它将其作为数据重构因子。通过重复这一点直到融合我们的恶意软件数据在清除集群中分离，导致平均剪影宽度更高。

著录项

来源
《Expert systems with applications》 |2021年第9期|114971.1-114971.11|共11页
作者
de Amorim Renato Cordeiro; Ruiz Carlos David Lopez;
展开▼
作者单位

Univ Essex Sch Comp Sci & Elect Engn Wivenhoe Pk Colchester CO4 3SQ Essex England;

Univ Essex Sch Comp Sci & Elect Engn Wivenhoe Pk Colchester CO4 3SQ Essex England;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Feature rescaling; Drive-by-download malware; Clustering;

机译：功能重新缩放;逐行下载恶意软件;聚类;

相似文献

外文文献
中文文献
专利

1. Cluster Analysis of an International Pressure Pain Threshold Database Identifies 4 Meaningful Subgroups of Adults With Mechanical Neck Pain [J] . Walton David M., Kwok Timothy S. H., Mehta Swati, The clinical journal of pain . 2017,第5期

机译：国际压力疼痛阈值数据库的聚类分析识别有机械颈部疼痛的成人的4个有意义的亚组
2. Identifying parasitic malware as outliers by code clustering [J] . Li Hongcheng, Huang Jianjun, Liang Bin, Journal of computer security . 2020,第2期

机译：通过代码群集识别寄生恶意软件作为异常值
3. Detecting Clinically Meaningful Shape Clusters in Medical Image Data: Metrics Analysis for Hierarchical Clustering Applied to Healthy and Pathological Aortic Arches [J] . Jan L. Bruse, Maria A. Zuluaga, Abbas Khushnood, IEEE Transactions on Biomedical Engineering . 2017,第10期

机译：检测医学图像数据中具有临床意义的形状聚类：适用于健康和病理性主动脉弓的分层聚类的度量分析
4. AN ISA ALGORITHM WITH UNKNOWN GROUP SIZES IDENTIFIES MEANINGFUL CLUSTERS IN METABOLOMICS DATA [C] . Harold W. Gutch, Jan Krumsiek, Fabian J. Theis European signal processing conference;EUSIPCO 2011 . 2011

机译：具有未知组大小的ISA算法可识别代谢组学数据中有意义的类
5. Identifying malware using n-gram clustering metrics. [D] . Dowd, Christopher Ryan. 2014

机译：使用n-gram群集指标识别恶意软件。
6. Cluster analysis successfully identifies clinically meaningful knee valgus moment patterns: frequency of early peaks reflects sex-specific ACL injury incidence [O] . Haraldur B. Sigurðsson, Kristín Briem 2019

机译：聚类分析成功地确定了临床上有意义的膝外翻力矩模式：早期峰的频率反映了性别特异性ACL损伤的发生率
7. An ISA Algorithm with Unknown Group Sizes Identifies Meaningful Clusters in Metabolomics Data [O] . Gutch Harold, Krumsiek Jan, Theis Fabian 2011

机译：具有未知组大小的ISA算法可识别代谢组学数据中的有意义簇

Identifying meaningful clusters in malware data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅