首页> 外文期刊>Computer networks >Clustering of unknown protocol messages based on format comparison
【24h】

Clustering of unknown protocol messages based on format comparison

机译:基于格式比较的未知协议消息的聚类

获取原文
获取原文并翻译 | 示例

摘要

As a solution to detect and analyse unknown or proprietary protocols, Protocol Reverse Engineering(PRE) has been developed swiftly in recent years. In this field, message clustering aimed at protocol format serves as a fundamental solution for differentiating of unknown protocol messages. This paper works on the problem of format oriented message clustering of unknown protocols, including messages from proprietary or non-cooperative network environments with their specifications unknown. By introducing basic rules of ABNF, we define Token Format Distance (TFD) and Message Format Distance (MFD) to represent format similarity of tokens and messages, and introduce Jaccard Distance and an optimized sequence alignment algorithm (MFD measurement) to compute them. Then, a distance matrix is built by MFD and we feed it to DBSCAN algorithm to cluster unknown protocol messages into classes with different formats. In this process, we design an unsupervised clustering strategy with Silhouette Coefficient and Dunn Index applied to parameter selecting of DBSCAN. In experiment on two datasets, the harmonic average v-measures of homogeneity and completeness on result clusters are both above 0.91, with fmis and coverages no less than 0.97. Together with iqr of v-measure and fmi bellow 0.1 and 0.03 separately in boxplot analyses, this method is proved to have remarkable validity and stability. Comprehensive analyses and comparisons on these indexes also show considerable advantages of our method over previous work.
机译:作为检测和分析未知或专有协议的解决方案,近年来,协议逆向工程(前)已迅速发展。在此字段中,针对协议格式的消息群集是用于区分未知协议消息的基本解决方案。本文适用于面向格式的未知协议消息群集的问题,包括来自专有或非合作网络环境的消息,其规格未知。通过介绍ABNF的基本规则,我们定义了令牌格式距离(TFD)和消息格式距离(MFD)来表示令牌和消息的格式相似性,并引入Jaccard距离和优化的序列对准算法(MFD测量)来计算它们。然后,MFD构建距离矩阵,我们将其馈送到DBSCAN算法以将未知协议消息群集为具有不同格式的类。在此过程中,我们设计了一种具有剪影系数和邓恩指数的无监督的聚类策略,应用于DBSCAN的参数选择。在两种数据集上的实验中,结果簇的谐波平均V-测量均高于0.91,FMI和覆盖率不低于0.97。在Boxpot分析中分别与V-Measure和FMI的IQR同时,该方法被证明具有显着的有效性和稳定性。对这些指数的综合分析和比较也表明我们对以前的工作方法的相当大的优势。

著录项

  • 来源
    《Computer networks》 |2020年第9期|107296.1-107296.11|共11页
  • 作者单位

    Harbin Inst Technol Dept Comp Sci & Technol Harbin 150001 Peoples R China;

    Harbin Inst Technol Dept Comp Sci & Technol Harbin 150001 Peoples R China;

    Harbin Inst Technol Dept Comp Sci & Technol Harbin 150001 Peoples R China|China Acad Engn Phys Inst Comp Applicat Mianyang 621900 Sichuan Peoples R China;

    Harbin Inst Technol Dept Comp Sci & Technol Harbin 150001 Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Protocol reverse engineering; Message clustering; Machine learning;

    机译:协议逆向工程;消息聚类;机器学习;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号