首页> 外国专利> METHOD, SYSTEM AND COMPUTER PROGRAM FOR GENERATING A QUERY REPRESENTATION OF A DOCUMENT, AND QUERYING A DOCUMENT RETRIEVAL SYSTEM USING SAID QUERY REPRESENTATION

METHOD, SYSTEM AND COMPUTER PROGRAM FOR GENERATING A QUERY REPRESENTATION OF A DOCUMENT, AND QUERYING A DOCUMENT RETRIEVAL SYSTEM USING SAID QUERY REPRESENTATION

机译:用于生成文档的查询表示,以及使用所述查询查询表示来查询文档检索系统的方法,系统和计算机程序

摘要

In a method and system of generating a query representation of an electronic query document, the query document is processed by a computer processor. The computer processor is configured to identify words and sentences in the query document, generate for each word a corresponding part-of-speech, POS, category of the word, identify each sequence of words having a predetermined sequence of POS categories, and store the identified sequences of words as the query representation of the query document. In a method and system for querying a document retrieval system, the document retrieval system is queried with a plurality of the stored identified sequences of words; and target documents are retrieved from the document retrieval system. The target documents have at least one sequence of words in common with the query document. In a method and system for clustering similar documents in a set of electronic documents, one document of the set of documents is designated as a query document. The query document is processed to store identified sequences of words as a query representation of the query document. Each remaining one of the set of documents is queried with a plurality of the stored identified sequences of words. A similarity value for each query of a remaining one of the set of documents is determined, and documents in the set of documents are clustered based on the similarity values.
机译:在生成电子查询文档的查询表示的方法和系统中,查询文档由计算机处理器处理。该计算机处理器被配置为识别查询文档中的单词和句子,为每个单词生成相应的词性,词性,单词的类别,识别具有预定的POS类别序列的单词的每个序列,并存储该单词。确定的单词序列作为查询文档的查询表示形式。在查询文件检索系统的方法和系统中,用多个存储的识别的单词序列查询文件检索系统。从文档检索系统检索目标文档。目标文档具有与查询文档相同的至少一个单词序列。在用于将一组电子文档中的相似文档聚类的方法和系统中,该文档集中的一个文档被指定为查询文档。处理查询文档以存储所标识的单词序列作为查询文档的查询表示。用多个存储的识别的单词序列来查询文档集合中的每个剩余文档。确定针对所述文档集中的剩余一个的每个查询的相似度值,并且基于所述相似度值对所述文档集中的文档进行聚类。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号