首页> 外国专利> TF-IDF-based Vector Conversion and Data Analysis Apparatus and Method

TF-IDF-based Vector Conversion and Data Analysis Apparatus and Method

机译:基于TF-IDF的矢量转换和数据分析装置和方法

摘要

The present invention relates to an apparatus and method for vector conversion and data analysis based on TF-IDF, and in particular, by assigning weights to extracted strings through TF-IDF scheme considering the relationship with other data, characterizing and processing malicious behavior characteristics It relates to a TF-IDF-based vector transformation and data analysis apparatus and method for analyzing data through the generated vector. Further, according to the present invention, a reference vector generation module for extracting a character string from a training data set, constructing a wordbook with a selected character string set in consideration of the frequency of the character string, and calculating a reference word frequency-inverse document frequency vector based on the wordbook; A test vector generation module for extracting a character string from a test file to be analyzed and calculating a test word frequency-inverse document frequency vector (TF-IDF Vector) based on a wordbook previously generated by the reference vector generation module; And a malicious code detection module that compares the test word frequency-inverse document frequency vector with the reference word frequency-inverse document frequency vectors of a training data set configured in advance, and provides malicious code detection and variant analysis. An apparatus and method for transforming and analyzing data are provided.
机译:本发明涉及一种用于基于TF-IDF的矢量转换和数据分析的装置和方法,特别是通过将权重通过TF-IDF方案分配来分配权重,考虑与其他数据的关系,表征和处理恶意行为特征涉及基于TF-IDF的矢量变换和数据分析装置和数据分析装置和方法,用于通过所生成的向量分析数据。此外,根据本发明,参考矢量生成模块,用于从训练数据集中提取字符串,考虑到字符串的频率,并计算参考字频率的选定字符串与所选字符串的字符串构成字符串。基于写字书的逆文档频率矢量;一种测试向量生成模块,用于从要分析的测试文件中提取字符串,并基于先前由参考向量生成模块生成的字母来计算测试字频率 - 逆文档频率向量(TF-IDF向量);和一个恶意代码检测模块,其将测试词频率反转文档频率向量与预先配置的训练数据集的参考字频率 - 逆文档频率矢量进行比较,并提供恶意代码检测和变体分析。提供了一种用于转换和分析数据的装置和方法。

著录项

  • 公开/公告号KR102246405B1

    专利类型

  • 公开/公告日2021-04-30

    原文格式PDF

  • 申请/专利权人

    申请/专利号KR1020190090032

  • 发明设计人 이태진;하지희;

    申请日2019-07-25

  • 分类号G06F21/56;G06F16/31;G06F16/35;G06F40/20;

  • 国家 KR

  • 入库时间 2022-08-24 18:31:11

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号