首页> 外国专利> Dictionary based deduplication of training set samples for machine learning based computer threat analysis

Dictionary based deduplication of training set samples for machine learning based computer threat analysis

机译：字典的基础训练集的重复数据删除样本的基于机器学习电脑威胁分析

页面导航

摘要
著录项
相似文献

摘要

Presence of malicious code can be identified in one or more data samples. A feature set extracted from a sample is vectorized to generate a sparse vector. A reduced dimension vector representing the sparse vector can be generated. A binary representation vector of reduced dimension vector can be created by converting each value of a plurality of values in the reduced dimension vector to a binary representation. The binary representation vector can be added as a new element in a dictionary structure if the binary representation is not equal to an existing element in the dictionary structure. A training set for use in training a machine learning model can be created to include one vector whose binary representation corresponds to each of a plurality of elements in the dictionary structure.

机译：

著录项

公开/公告号US11373065B2

专利类型
公开/公告日2022-06-28

原文格式PDF
申请/专利权人 CYLANCE INC.;
展开▼

申请/专利号US201815873673
发明设计人 ANDREW DAVIS;
展开▼

申请日2018-01-17
分类号G06F21/56;G06N20;G06N3/08;G06K9/62;G06N20/10;G06N20/20;G06V10/40;
国家
入库时间 2024-06-14 23:19:41

相似文献

专利
外文文献
中文文献