首页> 外国专利> System and method for identifying query-relevant keywords in documents with latent semantic analysis

System and method for identifying query-relevant keywords in documents with latent semantic analysis

机译:利用潜在语义分析识别文档中与查询相关的关键字的系统和方法

摘要

A system and method for identifying query-related keywords in documents found in a search using latent semantic analysis. The documents are represented as a document term matrix M containing one or more document term-weight vectors d, which may be term-frequency (tf) vectors or term-frequency inverse-document-frequency (tf-idf) vectors. This matrix is subjected to a truncated singular value decomposition. The resulting transform matrix U can be used to project a query term-weight vector q into the reduced N-dimensional space, followed by its expansion back into the full vector space using the inverse of U. ;To perform a search, the similarity of qexpanded is measured relative to each candidate document vector in this space. Exemplary similarity functions are dot product and cosine similarity. Keywords are selected with the highest values in qexpanded that are also comprised in at least one document. Matching keywords from the query may be highlighted in the search results.
机译:一种用于使用潜在语义分析来识别在搜索中找到的文档中与查询相关的关键字的系统和方法。这些文档被表示为包含一个或多个文档术语权重向量d的文档术语矩阵 M ,它可以是术语频率(tf)向量或术语频率逆文档-频率(tf-idf)向量。该矩阵经过截断的奇异值分解。生成的变换矩阵 U 可用于将查询项权重向量q投影到缩小的N维空间中,然后使用反函数将其扩展回完整的向量空间中的 U 。 ;为了执行搜索,相对于该空间中的每个候选文档向量,测量q expanded 的相似度。示例性相似度函数是点积和余弦相似度。选择的关键字在q expanded 中具有最高的值,这些关键字也包含在至少一个文档中。来自查询的匹配关键字可能会在搜索结果中突出显示。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号