首页>
外国专利>
Dictionary based deduplication of training set samples for machine learning based computer threat analysis
Dictionary based deduplication of training set samples for machine learning based computer threat analysis
展开▼
机译:字典的基础训练集的重复数据删除样本的基于机器学习电脑威胁分析
展开▼
页面导航
摘要
著录项
相似文献
摘要
Presence of malicious code can be identified in one or more data samples. A feature set extracted from a sample is vectorized to generate a sparse vector. A reduced dimension vector representing the sparse vector can be generated. A binary representation vector of reduced dimension vector can be created by converting each value of a plurality of values in the reduced dimension vector to a binary representation. The binary representation vector can be added as a new element in a dictionary structure if the binary representation is not equal to an existing element in the dictionary structure. A training set for use in training a machine learning model can be created to include one vector whose binary representation corresponds to each of a plurality of elements in the dictionary structure.
展开▼