首页> 外国专利> Set similarity selection queries at interactive speeds

Set similarity selection queries at interactive speeds

机译:以交互速度设置相似性选择查询

摘要

The similarity between a query set comprising query set tokens and a database set comprising database set tokens is determined by a similarity score. The database sets belong to a data collection set, which contains all database sets from which information may be retrieved. If the similarity score is greater than or equal to a user-defined threshold, the database set has information relevant to the query set. The similarity score is calculated with an inverse document frequency method (IDF) similarity measure independent of term frequency. The document frequency is based at least in part on the number of database sets in the data collection set and the number of database sets which contain at least one query set token. The length of the query set and the length of the database set are normalized.
机译:包含查询集令牌的查询集与包含数据库集令牌的数据库集之间的相似度由相似度分数确定。数据库集属于数据收集集,该数据集包含可从中检索信息的所有数据库集。如果相似性分数大于或等于用户定义的阈值,则数据库集具有与查询集相关的信息。相似度分数是通过与文档词频无关的反向文档频率法(IDF)相似度来计算的。文档频率至少部分地基于数据收集集中的数据库集的数量和包含至少一个查询集令牌的数据库集的数量。查询集的长度和数据库集的长度被标准化。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号