首页> 外国专利> METHOD AND APPARATUS FOR DIGITALLY SHREDDING SIMILAR DOCUMENTS WITHIN LARGE DOCUMENT SETS IN A DATA PROCESSING ENVIRONMENT

METHOD AND APPARATUS FOR DIGITALLY SHREDDING SIMILAR DOCUMENTS WITHIN LARGE DOCUMENT SETS IN A DATA PROCESSING ENVIRONMENT

机译:在数据处理环境中将大型文档集中的数字文档数字化切分的方法和装置

摘要

A method and apparatus are disclosed for comparing an input or query file to a set of files to detect similarities between the query file and the set of files, and digitally shredding files that match, to some degree, the query file and doing so from within the comparison feature. Using a comparison program, the query file is compared with each non-query file in a data processing system, ranging from a stand-alone computer to an enterprise computing network. A list of non-query files having some degree of similarity with the query file is compiled and presented to the user via a user interface within the comparison program. Certain or all non-query files can then be deleted by marking the names of those non-query files in the list. The comparison program can be of the type using either clustering or coalescing, or both, known hashing techniques, or other comparison algorithms.
机译:公开了一种用于将输入文件或查询文件与文件集进行比较以检测查询文件与文件集之间的相似性并且数字地切碎在某种程度上与查询文件匹配的文件并从内部进行匹配的方法和装置。比较功能。使用比较程序,可将查询文件与数据处理系统中的每个非查询文件进行比较,数据处理系统的范围从独立计算机到企业计算网络。与查询文件具有某种程度的相似性的非查询文件列表被编译并通过比较程序内的用户界面呈现给用户。然后,可以通过在列表中标记那些非查询文件的名称来删除某些或所有非查询文件。比较程序可以是使用聚类或合并或两者,已知的哈希技术或其他比较算法的类型。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号