首页> 外文OA文献 >Statistical topic models for multi-label document classification
【2h】

Statistical topic models for multi-label document classification

机译:多标签文档分类的统计主题模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Machine learning approaches to multi-label document classification have todate largely relied on discriminative modeling techniques such as supportvector machines. A drawback of these approaches is that performance rapidlydrops off as the total number of labels and the number of labels per documentincrease. This problem is amplified when the label frequencies exhibit the typeof highly skewed distributions that are often observed in real-world datasets.In this paper we investigate a class of generative statistical topic models formulti-label documents that associate individual word tokens with differentlabels. We investigate the advantages of this approach relative todiscriminative models, particularly with respect to classification problemsinvolving large numbers of relatively rare labels. We compare the performanceof generative and discriminative approaches on document labeling tasks rangingfrom datasets with several thousand labels to datasets with tens of labels. Theexperimental results indicate that probabilistic generative models can achievecompetitive multi-label classification performance compared to discriminativemethods, and have advantages for datasets with many labels and skewed labelfrequencies.
机译:多标签文档分类的机器学习方法很大程度上依赖于识别的鉴别性建模技术,例如支持仪机器。这些方法的缺点是,性能快速关闭作为标签总数和每个Domedure释放的标签数。当标签频率表现出在真实数据集中经常观察到的高度倾斜分布的类型时,该问题被放大。本文研究了一类生成统计主题模型,该主题模型将各个单词令牌与不同标签相关联的Formulti-Label文档。我们研究了这种方法的优点,相对刺激模型,特别是关于分类问题voluming的大量相对罕见的标签。我们比较文件标签任务的生成和辨别方法的表现范围范围从带有数千个标签的数据集到数以万个标签。实验结果表明,与鉴别的歧视率为相比,概率的生成模型可以实现多标签的多标签分类性能,并具有许多标签和歪曲的标签的数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号