The present invention relates to an apparatus and method for vector conversion and data analysis based on TF-IDF, and in particular, by assigning weights to extracted strings through TF-IDF scheme considering the relationship with other data, characterizing and processing malicious behavior characteristics It relates to a TF-IDF-based vector transformation and data analysis apparatus and method for analyzing data through the generated vector. Further, according to the present invention, a reference vector generation module for extracting a character string from a training data set, constructing a wordbook with a selected character string set in consideration of the frequency of the character string, and calculating a reference word frequency-inverse document frequency vector based on the wordbook; A test vector generation module for extracting a character string from a test file to be analyzed and calculating a test word frequency-inverse document frequency vector (TF-IDF Vector) based on a wordbook previously generated by the reference vector generation module; And a malicious code detection module that compares the test word frequency-inverse document frequency vector with the reference word frequency-inverse document frequency vectors of a training data set configured in advance, and provides malicious code detection and variant analysis. An apparatus and method for transforming and analyzing data are provided.
展开▼