The present invention relates to a method for reducing a false positive rate for diagnosis of personal information exposure in document files and atypical image files. The method comprises: an extraction step (S10) of extracting a text from a document file or an atypical image file; a first diagnosis step (S12) of diagnosing whether personal information of the text is exposed; a checksum application confirmation step (S14) of checking whether a checksum is applicable to the exposed personal information if the personal information exposure is diagnosed in the first diagnosis step (S12); a second diagnosis step (S16) of diagnosing whether the personal information is exposed with the checksum; a personal information exposure determination step (S18) according to the results of the diagnosis steps and the checksum application confirmation step; a morpheme analysis step (S20) of generating a structural unit of a sequence by morpheme analysis on the text if the personal information exposure is determined; an indexing step (S22) of generating and indexing a sequence pattern from the structural unit of the sequence; a support loading step (S24) of loading the support for the sequence pattern; a false positive classification step (S26) of classifying positive detection or false detection; a weighting step (S28) of adding a positive or negative weight according to the classification resu a false positive probability calculation step (S30) of calculating a positive or false probability of the sequence pattern; and a false positive probability calculation step (S32) of calculating a positive or false probability of the file.;COPYRIGHT KIPO 2020
展开▼