【24h】

Dealing with Class Noise in Large Training Datasets for Malware Detection

机译:在大型培训数据集中处理类别噪声以进行恶意软件检测

获取原文
获取原文并翻译 | 示例

摘要

This paper presents the ways we explored until now for detecting and dealing with the class noise found in large annotated datasets used for training the classifiers that we have previously designed for industrial-scale malware identification. First we established a number of distance-based filtering rules that allow us to identify different "levels'' of potential noise in the training data, and secondly we analysed the effects produced by either removal or "cleaning'' of the potentially-noised records on the performances of our simplest classifiers. We show that a careful distance-based filtering can lead to sensibly better results in malware detection.
机译:本文介绍了迄今为止我们探索的方法,用于检测和处理在大型带注释的数据集中发现的类噪声,这些数据用于训练我们以前设计用于工业规模恶意软件识别的分类器。首先,我们建立了许多基于距离的过滤规则,使我们能够识别训练数据中不同的“潜在”噪声水平;其次,我们分析了去除或“清除”潜在噪声记录所产生的影响最简单的分类器的性能我们表明,基于距离的仔细过滤可以在恶意软件检测中带来明显更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号