首页> 外文会议>IET Conference on Wireless, Mobile and Sensor Networks >An improved classification method for the common OLE file by N-gram analysis and vector space model
【24h】

An improved classification method for the common OLE file by N-gram analysis and vector space model

机译:N-GRAM分析和矢量空间模型的常见OLE文件的改进分类方法

获取原文

摘要

Identifying file type by file extension is fallible. Another magic bytes method for these files, which have similar header information, such as the common-used MS Office OLE file, may not distinguish one type from another. In this paper, an efficiently classification method for the common OLE files was proposed. In order to overcome the shortcoming of the original N-gram analysis technique which can not easily tell ambiguous file types apart, the N-gram analysis and the vector space model were combined together to identify the common OLE files. The characteristic items were extracted from the most frequency byte values of each file class, and then the cosine value of two vectors was used to catalogue ambiguous file types. The experiment results demonstrate that our mechanism is effective in identifying the office OLE files, and obtain better performance than the common n-gram method.
机译:通过文件扩展名识别文件类型是有缺乏的。用于这些文件的另一个魔法字节方法,这些方法具有类似的标题信息,例如共同使用的MS Office OLE文件,可能不会区别于另一个类型。在本文中,提出了一种有效的普通OLE文件的分类方法。为了克服原始N-GRAM分析技术的缺点,这不能轻易讲述模糊文件类型,n-gram分析和向量空间模型组合在一起以识别普通的OLE文件。从每个文件类的最频率字节值中提取特征项,然后两个向量的余弦值用于目录模糊文件类型。实验结果表明,我们的机制是有效地识别Outh OLE文件,并获得比普通的N-GRAM方法更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号