首页> 外国专利> Method and apparatus for characterizing documents based on clusters of related words

Method and apparatus for characterizing documents based on clusters of related words

机译:基于相关词簇的文档表征方法和装置

摘要

One embodiment of the present invention provides a system characterizes a document with respect to clusters of conceptually related words. Upon receiving a document containing a set of words, the system selects “candidate clusters” of conceptually related words that are related to the set of words. These candidate clusters are selected using a model that explains how sets of words are generated from clusters of conceptually related words. Next, the system constructs a set of components to characterize the document, wherein the set of components includes components for candidate clusters. Each component in the set of components indicates a degree to which a corresponding candidate cluster is related to the set of words.
机译:本发明的一个实施例提供一种系统,该系统相对于概念上相关的单词的簇来表征文档。在接收到包含一组单词的文档时,系统会选择与该组单词相关的概念上相关的单词的“候选类”。使用一个模型解释这些候选聚类,该模型解释了如何从概念上相关的词的聚类中生成词的集合。接下来,系统构造一组组件以表征文档,其中该组组件包括用于候选聚类的组件。组件集合中的每个组件指示相应候选聚类与单词集合相关的程度。

著录项

  • 公开/公告号US7383258B2

    专利类型

  • 公开/公告日2008-06-03

    原文格式PDF

  • 申请/专利权人 GEORGES HARIK;NOAM M. SHAZEER;

    申请/专利号US20030676571

  • 发明设计人 NOAM M. SHAZEER;GEORGES HARIK;

    申请日2003-09-30

  • 分类号G06F13/30;

  • 国家 US

  • 入库时间 2022-08-21 20:10:04

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号