首页> 外国专利> EFFICIENTLY BUILDING COMPACT MODELS FOR LARGE TAXONOMY TEXT CLASSIFICATION

EFFICIENTLY BUILDING COMPACT MODELS FOR LARGE TAXONOMY TEXT CLASSIFICATION

机译：高效地建立紧凑模型，用于大分类法文本分类

页面导航

摘要
著录项
相似文献

摘要

A taxonomy model is determined with a reduced number of weights. For example, the taxonomy model is a tangible representation of a hierarchy of nodes that represents a hierarchy of classes that, when labeled with a representation of a combination of weights, is usable to classify documents having known features but unknown class. For each node of the taxonomy, the training example documents are processed to determine the features for which there are a sufficient number of training example documents having a class label corresponding to at least one of the leaf nodes of a subtree having that node as a root node. For each node of the taxonomy, a sparse weight vector is determined for that node, including setting zero weights, for that node, those features determined to not appear at least a minimum number of times in a given set of leaf nodes in the sub-tree with that node as a root node. The sparse weight vectors can be learned by solving an optimization problem using a maximum entropy classifier, or a large margin classifier with a sequential dual method (SDM) with margin or slack resealing. The determined sparse weight vectors are tangibly embodied in a computer-readable medium in association with the tangible representation of the nodes of the taxonomy.

机译：用减少的权数确定分类模型。例如，分类法模型是节点层次结构的有形表示形式，它表示类的层次结构，当用权重组合的表示形式进行标记时，可用于对具有已知特征但未知类的文档进行分类。对于分类法的每个节点，对训练示例文档进行处理，以确定具有足够数量的训练示例文档的特征，这些训练示例文档具有与以该节点为根的子树的至少一个叶子节点相对应的类别标签节点。对于该分类法的每个节点，为该节点确定一个稀疏权向量，包括为该节点设置零权重，这些特征被确定为在子节点的给定叶节点集合中至少出现最少次数。该节点为根节点的树。可以通过使用最大熵分类器或具有边际或松弛重新密封的顺序对偶方法（SDM）的大边际分类器来解决优化问题，从而学习稀疏权向量。所确定的稀疏权重向量与分类法的节点的有形表示相关联地有形地体现在计算机可读介质中。

著录项

公开/公告号US2010161527A1

专利类型
公开/公告日2010-06-24

原文格式PDF
申请/专利权人 SUNDARARAJAN SELLAMANICKAM;SATHIYA KEERTHI SELVARAJ;
展开▼

申请/专利号US20080342750
发明设计人 SUNDARARAJAN SELLAMANICKAM;SATHIYA KEERTHI SELVARAJ;
展开▼

申请日2008-12-23
分类号G06F15/18;
国家 US
入库时间 2022-08-21 18:54:16

相似文献

专利
外文文献
中文文献