首页> 外国专利> METHOD FOR REDUCING FALSE POSITIVE RATE FOR DIAGNOSIS OF PERSONAL INFORMATION EXPOSURE IN DOCUMENT FILES AND ATYPICAL IMAGE FILES

METHOD FOR REDUCING FALSE POSITIVE RATE FOR DIAGNOSIS OF PERSONAL INFORMATION EXPOSURE IN DOCUMENT FILES AND ATYPICAL IMAGE FILES

机译:降低误诊率的文件文件和非典型图像文件中的个人信息暴露的诊断方法

摘要

The present invention relates to a method for reducing a false positive rate for diagnosis of personal information exposure in document files and atypical image files. The method comprises: an extraction step (S10) of extracting a text from a document file or an atypical image file; a first diagnosis step (S12) of diagnosing whether personal information of the text is exposed; a checksum application confirmation step (S14) of checking whether a checksum is applicable to the exposed personal information if the personal information exposure is diagnosed in the first diagnosis step (S12); a second diagnosis step (S16) of diagnosing whether the personal information is exposed with the checksum; a personal information exposure determination step (S18) according to the results of the diagnosis steps and the checksum application confirmation step; a morpheme analysis step (S20) of generating a structural unit of a sequence by morpheme analysis on the text if the personal information exposure is determined; an indexing step (S22) of generating and indexing a sequence pattern from the structural unit of the sequence; a support loading step (S24) of loading the support for the sequence pattern; a false positive classification step (S26) of classifying positive detection or false detection; a weighting step (S28) of adding a positive or negative weight according to the classification resu a false positive probability calculation step (S30) of calculating a positive or false probability of the sequence pattern; and a false positive probability calculation step (S32) of calculating a positive or false probability of the file.;COPYRIGHT KIPO 2020
机译:本发明涉及一种用于减少用于诊断文档文件和非典型图像文件中的个人信息暴露的误报率的方法。该方法包括:提取步骤(S10),其从文档文件或非典型图像文件中提取文本;以及第一诊断步骤(S12),用于诊断文本的个人信息是否被暴露;校验和应用确认步骤(S14),如果在第一诊断步骤(S12)中诊断出个人信息暴露,则检查校验和是否适用于暴露的个人信息;第二诊断步骤(S16),用于诊断个人信息是否与校验和一起暴露;根据诊断步骤和校验和应用确认步骤的结果,进行个人信息暴露确定步骤(S18);语素分析步骤(S20),如果确定了个人信息曝光,则通过对文字进行语素分析来生成序列的结构单元;索引步骤(S22),其从序列的结构单元生成并索引序列模式;支撑物加载步骤(S24),其为序列模式加载支撑物;对阳性检测或错误检测进行分类的错误肯定分类步骤(S26);加权步骤(S28),根据所述分类结果增加正负权重;假阳性概率计算步骤(S30),计算序列模式的阳性或假概率; COPYRIGHT KIPO 2020;以及计算文件的正或假概率的假正概率计算步骤(S32)。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号