首页>
外国专利>
Method and apparatus for detecting and summarizing document similarity within large document sets
Method and apparatus for detecting and summarizing document similarity within large document sets
展开▼
机译:用于检测和总结大型文档集中的文档相似性的方法和设备
展开▼
页面导航
摘要
著录项
相似文献
摘要
A method and apparatus are disclosed for comparing an input or query file to a set of files to detect similarities and formatting the output comparison data are described. An input query file that can be segmented into multiple query file substrings is received. A query file substring is selected and used to search a storage area containing multiple ordered file substrings that were taken from previously analyzed files. If the selected query file substring matches any of the multiple ordered file substrings, match data relating to the match between the selected query file substring and the matching ordered file substring is stored in a temporary file. The matching ordered file substring and another ordered file substring are joined if the matching ordered file substring and the second ordered file substring are in a particular sequence and if the selected query file substring and a second query file substring are in the same particular sequence. If the matching ordered file substring and the second query file substring match, a coalesced matching ordered substring and a coalesced query file substring are formed that can be used to format output comparison data.
展开▼