首页> 外文会议>IEEE international conference on data engineering >Finding common ground among experts' opinions on data clustering: With applications in malware analysis

【24h】

Finding common ground among experts' opinions on data clustering: With applications in malware analysis

机译：在数据群集的专家意见中找到共同点：在恶意软件分析中的应用

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data clustering is a basic technique for knowledge discovery and data mining. As the volume of data grows significantly, data clustering becomes computationally prohibitive and resource demanding, and sometimes it is necessary to outsource these tasks to third party experts who specialize in data clustering. The goal of this work is to develop techniques that find common ground among experts' opinions on data clustering, which may be biased due to the features or algorithms used in clustering. Our work differs from the large body of existing approaches to consensus clustering, as we do not require all data objects be grouped into clusters. Rather, our work is motivated by real-world applications that demand high confidence in how data objects - if they are selected - are grouped together.We formulate the problem rigorously and show that it is NP-complete. We further develop a lightweight technique based on finding a maximum independent set in a 3-uniform hypergraph to select data objects that do not form conflicts among experts' opinions. We apply our proposed method to a real-world malware dataset with hundreds of thousands of instances to find malware clusters based on how multiple major AV (Anti-Virus) software classify these samples. Our work offers a new direction for consensus clustering by striking a balance between the clustering quality and the amount of data objects chosen to be clustered.

机译：数据聚类是知识发现和数据挖掘的基本技术。随着数据量大的显着增长，数据群集变得计算上的禁止和资源要求，有时必须将这些任务外包给专门从事数据聚类的第三方专家。这项工作的目标是开发在专家对数据群集的看法中找到共同点的技术，这可能由于聚类中使用的特征或算法而被偏见。我们的工作与现有的达成群集的大量方法不同，因为我们不需要将所有数据对象分组为集群。相反，我们的工作是由真实世界的应用程序的推动，这些应用程序需要高信任数据对象的信心 - 如果被选中，则在一起进行分组。我们将其分组。我们严格制定问题并显示它是NP-CLEATION。我们进一步开发了一种轻量级技术，基于在3均匀的超图中找到最大独立集，以选择在专家意见中不形成冲突的数据对象。我们将建议的方法应用于具有数十万个实例的现实世界恶意软件数据集，以基于多个主要AV（防病毒）软件如何对这些样本进行分类的恶意软件集群。我们的作品通过在集群质量与选择群集的数据对象之间的平衡来提供共识群集的新方向。

著录项

来源
《IEEE international conference on data engineering 》|2014年||共13页
会议地点
作者
Yan Guanhua;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机软件 ;
关键词

相似文献

外文文献
中文文献
专利

1. Construct a composite indicator based on integrating Common Weight Data Envelopment Analysis and principal component analysis models: An application for finding development degree of provinces in Iran [J] . Omrani Hashem, Valipour Mahsa, Mamakani Saeid Jafari Socio-economic planning sciences . 2019 ,第Deca期

机译：基于通用权重数据包络分析和主成分分析模型的综合构建综合指标：用于伊朗各省发展程度的应用
2. Finding common ground PERSPECTIVE Finding common ground [J] . Futerman Anthony H., Hardy John Nature . 2016 ,第7621期

机译：寻找共同点观点寻找共同点
3. Impact of Data Privacy and Confidentiality on Developing Telemedicine Applications: A Review Participates Opinion and Expert Concerns [J] . B.B. Zaidan, A.A. Zaidan, M.L. Mat Kiah International Journal of Pharmacology . 2011 ,第3期

机译：数据隐私和机密性对开发远程医疗应用程序的影响：审查参与意见和专家关注
4. Finding common ground among experts' opinions on data clustering: With applications in malware analysis [C] . Yan Guanhua IEEE international conference on data engineering . 2014

机译：在专家对数据群集的意见中找到共同点：借助恶意软件分析中的应用程序
5. Finding common ground: Exploring the connections between the English language arts Common Core State Standards and interdisciplinary teaching and learning at the secondary level. [D] . Tufaro, Thomas. 2013

机译：寻找共同点：探索英语艺术通用核心州标准与中学跨学科教学之间的联系。
6. Practice guidelines for clinical prevention: Do patients physicians and experts share common ground? [O] . M D Beaulieu, E Hudon, D Roberge, 1999

机译：临床预防实践指南：患者医生和专家是否有共同点？
7. Finding common ground when experts disagree: robust portfolio decision analysis [O] . Baker Erin, Bosetti Valentina, Salo Ahti 2017

机译：当专家不同意时找到共同点：强大的投资组合决策分析
8. Common problems in the elicitation and analysis of expert opinion affecting probabilistic safety assessments. [R] . Meyer, M. A., Booker, J. M. 1990

机译：影响概率安全评估的专家意见的启发和分析中的常见问题。

Finding common ground among experts' opinions on data clustering: With applications in malware analysis

摘要

著录项

相似文献

相关主题

期刊订阅