首页> 外国专利> Method and apparatus for establishing topic word classes based on an entropy cost function to retrieve documents represented by the topic words

Method and apparatus for establishing topic word classes based on an entropy cost function to retrieve documents represented by the topic words

机译:用于基于熵代价函数建立主题词类别以检索由主题词表示的文档的方法和装置

摘要

A computer-based method and system for establishing topic words to represent a document, the topic words being suitable for use in document retrieval. The method includes determining document keywords from the document; classifying each of the document keywords into one of a plurality of preestablished keyword classes; and selecting words as the topic words, each selected word from a different one of the preestablished keyword classes, to minimize a cost function on proposed topic words. The cost function may be a metric of dissimilarity, such as cross-entropy, between a first distribution of likelihood of appearance by the plurality of document keywords in a typical document and a second distribution of likelihood of appearance by the plurality of document keywords in a typical document, the second distribution being approximated using proposed topic words. The cost function can be a basis for sorting the priority of the documents.
机译:用于建立代表文档的主题词的基于计算机的方法和系统,这些主题词适用于文档检索。该方法包括从文档确定文档关键词;将每个文档关键词分类为多个预先建立的关键词类别之一;选择单词作为主题单词,从预先建立的关键词类别中的不同关键词中选择每个单词,以最小化建议主题单词的成本函数。成本函数可以是典型文档中多个文档关键字的出现可能性的第一分布和文档中多个文档关键字的出现可能性的第二分布之间的相异度量,例如交叉熵。典型文档,第二分布使用建议的主题词来近似。成本函数可以作为对文档优先级进行排序的基础。

著录项

  • 公开/公告号US6128613A

    专利类型

  • 公开/公告日2000-10-03

    原文格式PDF

  • 申请/专利权人 THE CHINESE UNIVERSITY OF HONG KONG;

    申请/专利号US19980069618

  • 发明设计人 AN QIN;WING S. WONG;

    申请日1998-04-29

  • 分类号G06F17/30;

  • 国家 US

  • 入库时间 2022-08-22 01:35:59

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号