首页> 外国专利> MATCHING ENGINE WITH SIGNATURE GENERATION AND RELEVANCE DETECTION

MATCHING ENGINE WITH SIGNATURE GENERATION AND RELEVANCE DETECTION

机译:具有签名生成和相关性检测的匹配引擎

摘要

A system and a method generates at least one signature associated with document. In one embodiment, a document comprised of text is received and parsed to generate a token set. The token set includes a plurality of tokens. Each token corresponds to the text in the document that is separated by a predefined character characteristic. A score is calculated for each token in the token set based on a frequency and distribution of the text in the document. Each token is then ranked based on the calculated score. A subset of the ranked tokes is selected and a signature is generated for each occurrence of the selected tokens. The selected list of signatures is then output.
机译:一种系统和方法生成与文档相关联的至少一个签名。在一个实施例中,接收并解析由文本组成的文档,以生成令牌集。令牌集包括多个令牌。每个标记对应于文档中由预定义字符特征分隔的文本。根据文档中文本的频率和分布,为令牌集中的每个令牌计算分数。然后根据计算出的分数对每个令牌进行排名。选择已排序令牌的子集,并为每次出现的选定令牌生成签名。然后输出选定的签名列表。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号