首页> 外国专利> Automated taxonomy generation

Automated taxonomy generation

机译:自动分类法生成

摘要

In a hierarchical taxonomy of document, the categories of information may be structured as a binary tree with the nodes of the binary tree containing information relevant to the search. The binary tree may be 'trained' or formed by examining a training set of documents and separating those documents into two child nodes. Each of those sets of documents may then be further split into two nodes to create the binary tree data structure. The nodes may be generated to maximize the likelihood that all of the training documents are in either or both of the two child nodes. In one example, each node of the binary tree may be associated with a list of terms and each term in each list of terms is associated with a probability of that term appearing in a document given that node. New documents may be categorized by the nodes of the tree. For example, the new documents may be assigned to a particular node based upon the statistical similarity between that document and the associated node.
机译:在文档的分级分类法中,信息的类别可以被构造为二进制树,其中二进制树的节点包含与搜索有关的信息。通过检查文档的训练集并将这些文档分为两个子节点,可以对“二叉树”进行“训练”或形成。然后可以将那些文档集中的每个文档进一步拆分为两个节点,以创建二叉树数据结构。可以生成节点以最大化所有训练文档在两个子节点中的一个或两个中的可能性。在一个示例中,二叉树的每个节点可以与术语列表相关联,并且在术语列表中的每个术语与该术语出现在给定该节点的文档中的概率相关联。新文档可以通过树的节点进行分类。例如,可以基于该文档与关联节点之间的统计相似性将新文档分配给特定节点。

著录项

  • 公开/公告号EP1612701A3

    专利类型

  • 公开/公告日2008-05-21

    原文格式PDF

  • 申请/专利权人 MICROSOFT CORPORATION;

    申请/专利号EP20050105453

  • 发明设计人 WEARE CHRISTOPHER B.;

    申请日2005-06-21

  • 分类号G06F17/30;

  • 国家 EP

  • 入库时间 2022-08-21 19:58:32

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号