【24h】

Large-Scale Many-Class Learning

机译:大规模的多级学习

获取原文

摘要

A number of tasks, such as large-scale text categorization and word prediction, can benefit from efficient learning and classification when the number of classes (categories), in addition to instances and features, is large, that is, in the thousands and beyond. We investigate learning of sparse category indices to address this challenge. An index is a weighted bipartite graph mapping features to categories. On presentation of an instance, the index retrieves and scores a small set of candidate categories. The candidates can then be ranked and the ranking or the scores can be used for category assignment. We present novel online index learning algorithms. When compared to other approaches, including one-versusrest and top-down learning and classification using support vector machines, we find that indexing is highly advantageous in terms of space and time efficiency, at both training and classification times, while yielding similar and often better accuracies. On problems with hundreds of thousands of instances and thousands of categories, the index is learned in minutes, while other methods can take orders of magnitude longer. As we explain, the design of the algorithm makes it convenient to maintain a constraint on the number of prediction connections a feature is allowed to make. This constraint is crucial in yielding efficient learning and classification.
机译:许多任务,例如大规模的文本分类和字预测,可以从高效的学习和分类中受益,当类(类别)的数量之外,除了实例和特征之外,很大,即,在数千和超越中。我们调查稀疏类别指数的学习来解决这一挑战。索引是类别的加权二分钟图映射功能。在演示文稿上,索引检索并分数一小组候选类别。然后可以将候选者排列,排名或分数可用于类别分配。我们提出了小说在线指数学习算法。与其他方法相比,包括使用支持向量机的一个Versyustrest和自上而下的学习和分类,我们发现索引在训练和分类时间方面,在空间和时间效率方面是非常有利的,同时产生相似且经常更好精度。关于数十万个实例和数千类的问题,索引在几分钟内学到,而其他方法可以比较长时间的数量级。如我们解释,算法的设计使得维持对允许的预测连接数量的约束方便。这种约束对于产生有效的学习和分类至关重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号