Finding common ground among experts' opinions on data clustering: With applications in malware analysis

机译：在专家对数据群集的意见中找到共同点：借助恶意软件分析中的应用程序

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Data clustering is a basic technique for knowledge discovery and data mining. As the volume of data grows significantly, data clustering becomes computationally prohibitive and resource demanding, and sometimes it is necessary to outsource these tasks to third party experts who specialize in data clustering. The goal of this work is to develop techniques that find common ground among experts' opinions on data clustering, which may be biased due to the features or algorithms used in clustering. Our work differs from the large body of existing approaches to consensus clustering, as we do not require all data objects be grouped into clusters. Rather, our work is motivated by real-world applications that demand high confidence in how data objects - if they are selected - are grouped together.We formulate the problem rigorously and show that it is NP-complete. We further develop a lightweight technique based on finding a maximum independent set in a 3-uniform hypergraph to select data objects that do not form conflicts among experts' opinions. We apply our proposed method to a real-world malware dataset with hundreds of thousands of instances to find malware clusters based on how multiple major AV (Anti-Virus) software classify these samples. Our work offers a new direction for consensus clustering by striking a balance between the clustering quality and the amount of data objects chosen to be clustered.

机译：数据聚类是用于知识发现和数据挖掘的基本技术。随着数据量的显着增长，数据聚类在计算上变得无用且需要大量资源，有时有必要将这些任务外包给专门从事数据聚类的第三方专家。这项工作的目的是开发一种技术，以在专家关于数据聚类的观点中找到共同点，这些观点可能会因聚类中使用的功能或算法而有所偏差。我们的工作不同于现有的用于共识聚类的方法，因为我们不需要将所有数据对象都分组到聚类中。相反，我们的工作是受现实世界中的应用程序启发的，这些应用程序要求对如何将数据对象（如果已选择）分组在一起具有高度的信心。我们严格地阐述了问题并表明它是NP完全的。我们进一步发展了一种轻量级技术，该技术基于在3个均匀超图中找到最大独立集来选择不会在专家意见之间形成冲突的数据对象。我们将提出的方法应用于具有数十万个实例的真实世界的恶意软件数据集，以根据多种主要AV（反病毒）软件对这些样本进行分类的方式来查找恶意软件集群。通过在聚类质量和选择要聚类的数据对象数量之间取得平衡，我们的工作为共识聚类提供了新的方向。

著录项

来源
《IEEE international conference on data engineering》|2014年|15-27|共13页
会议地点
作者
Yan Guanhua;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Construct a composite indicator based on integrating Common Weight Data Envelopment Analysis and principal component analysis models: An application for finding development degree of provinces in Iran [J] . Omrani Hashem, Valipour Mahsa, Mamakani Saeid Jafari Socio-economic planning sciences . 2019,第Deca期

机译：基于通用权重数据包络分析和主成分分析模型的综合构建综合指标：用于伊朗各省发展程度的应用
2. Finding common ground PERSPECTIVE Finding common ground [J] . Futerman Anthony H., Hardy John Nature . 2016,第7621期

机译：寻找共同点观点寻找共同点
3. Impact of Data Privacy and Confidentiality on Developing Telemedicine Applications: A Review Participates Opinion and Expert Concerns [J] . B.B. Zaidan, A.A. Zaidan, M.L. Mat Kiah International Journal of Pharmacology . 2011,第3期

机译：数据隐私和机密性对开发远程医疗应用程序的影响：审查参与意见和专家关注
4. Finding common ground among experts' opinions on data clustering: With applications in malware analysis [C] . Yan Guanhua IEEE international conference on data engineering . 2014

机译：在数据群集的专家意见中找到共同点：在恶意软件分析中的应用
5. Finding common ground: Exploring the connections between the English language arts Common Core State Standards and interdisciplinary teaching and learning at the secondary level. [D] . Tufaro, Thomas. 2013

机译：寻找共同点：探索英语艺术通用核心州标准与中学跨学科教学之间的联系。
6. Practice guidelines for clinical prevention: Do patients physicians and experts share common ground? [O] . M D Beaulieu, E Hudon, D Roberge, 1999

机译：临床预防实践指南：患者医生和专家是否有共同点？
7. Finding common ground when experts disagree: robust portfolio decision analysis [O] . Baker Erin, Bosetti Valentina, Salo Ahti 2017

机译：当专家不同意时找到共同点：强大的投资组合决策分析
8. Common problems in the elicitation and analysis of expert opinion affecting probabilistic safety assessments. [R] . Meyer, M. A., Booker, J. M. 1990

机译：影响概率安全评估的专家意见的启发和分析中的常见问题。

Finding common ground among experts' opinions on data clustering: With applications in malware analysis

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅