首页> 外文期刊>Computers & Security >Automated feature engineering for HTTP tunnel detection
【24h】

Automated feature engineering for HTTP tunnel detection

机译:用于HTTP隧道检测的自动化功能工程

获取原文
获取原文并翻译 | 示例

摘要

Generating discriminative input features is a key requirement for achieving highly accurate classifiers. The process of generating features from raw data is known as feature engineering and it can take significant manual effort. In this paper we propose automated feature engineering to derive a suite of additional features from a given set of basic features with the aim of both improving classifier accuracy through discriminative features, and to assist data scientists through automation. Our implementation is specific to HTTP computer network traffic. To measure the effectiveness of our proposal, we compare the performance of a supervised machine learning classifier built with automated feature engineering versus one using human-guided features. The classifier addresses a problem in computer network security, namely the detection of HTTP tunnels. We use Bro to process network traffic into base features and then apply automated feature engineering to calculate a larger set of derived features. The derived features are calculated without favour to any base feature and include entropy, length and N-grams for all string features, and counts and averages over time for all numeric features. Feature selection is then used to find the most relevant subset of these features. Testing showed that both classifiers achieved a detection rate above 99.93% at a false positive rate below 0.01%. For our datasets, we conclude that automated feature engineering can provide the advantages of increasing classifier development speed and reducing development technical difficulties through the removal of manual feature engineering. These are achieved while also maintaining classification accuracy.
机译:生成区分性输入特征是实现高度准确的分类器的关键要求。从原始数据生成要素的过程称为要素工程,这可能需要大量的人工。在本文中,我们提出了自动特征工程,以从一组给定的基本特征中派生出一组附加特征,以期通过区分特征来提高分类器准确性,并通过自动化来协助数据科学家。我们的实现特定于HTTP计算机网络流量。为了衡量我们建议的有效性,我们比较了使用自动特征工程构建的监督式机器学习分类器与使用人工指导特征的分类器的性能。分类器解决了计算机网络安全性的问题,即HTTP隧道的检测。我们使用Bro将网络流量处理为基本要素,然后应用自动化要素工程来计算更大的一组衍生要素。计算得出的特征时不偏爱任何基本特征,包括所有字符串特征的熵,长度和N-gram,以及所有数字特征随时间的计数和平均值。然后使用特征选择来找到这些特征中最相关的子集。测试表明,两个分类器均以低于0.01%的假阳性率实现了99.93%以上的检测率。对于我们的数据集,我们得出的结论是,自动特征工程可以消除手工特征工程,从而具有提高分类器开发速度和减少开发技术难度的优势。在保持分类精度的同时实现了这些目标。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号