首页> 外文期刊>Digital investigation >A comparative study of support vector machine and neural networks for file type identification using n-gram analysis
【24h】

A comparative study of support vector machine and neural networks for file type identification using n-gram analysis

机译:使用N-GRAM分析对锉刀型识别的支持向量机和神经网络的比较研究

获取原文
获取原文并翻译 | 示例
           

摘要

File type identification (FTI) has become a major discipline for anti-virus developers, firewall designers and for forensic cybercrime investigators. Over the past few years, research has seen the introduction of several classifiers and features. One of these advances is the so-called n-grams analysis, which is an interpretation of statistical counting in classified fragments. Recently, n-grams based approaches were already successfully combined with computational intelligence classifiers. However, the academic body of literature is scant when it comes to a comprehensive explanation of machine learning based approaches such as neural networks (NN) or support vector machines (SVM). For example, how the input parameters, including learning rate, different values of n for n-grams, etc. influence the results. In addition, very few studies have compared the scalability of NN vs. SVM approaches. Therefore, a systematic research in comparing different approaches is needed to address these questions. Hence, this paper investigates this type of comparison, by focusing on the n-gram analysis as a feature for the two different classifiers: SVMs and NNs. This paper details our experiments with two NNs and four SVMs, using linear kernels and RBF kernels on RealDC datasets. In general, we found that SVM-based approaches performed better than the NN, but their scalability is still a challenge. (c) 2021 The Authors. Published by Elsevier Ltd.
机译:文件类型识别(FTI)已成为防病毒开发人员,防火墙设计师和法医网络犯罪调查人员的主要学科。在过去的几年里,研究已经看到了几种分类器和特征的引入。其中一个进步是所谓的n-grams分析,这是对分类片段中统计计数的解释。最近,基于N-GRAMS的方法已经成功地与计算智能分类器结合。然而,在基于机器学习的方法(如神经网络(NN)或支持向量机(SVM)的方法中,文学的学术态度是令人勉强的。例如,输入参数,包括学习率,n-grams的n个不同值的方式如何影响结果。此外,很少有研究比较了NN与SVM方法的可扩展性。因此,需要进行对比较不同方法的系统研究来解决这些问题。因此,本文通过将N-GRAM分析专注于两个不同分类器的特征来研究这种比较:SVM和NNS。本文使用REALDC数据集中的线性内核和RBF内核详细说明了我们的两位NNS和四个SVM的实验。通常,我们发现基于SVM的方法比NN更好地表现,但它们的可扩展性仍然是一个挑战。 (c)2021作者。 elsevier有限公司出版

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号