首页> 外文会议>Annual International Conference on Computational Linguistics and Intelligent Text Processing >Topics and Label Propagation: Best of Both Worlds for Weakly Supervised Text Classification
【24h】

Topics and Label Propagation: Best of Both Worlds for Weakly Supervised Text Classification

机译:主题和标签传播:弱监督文本分类的两个世界

获取原文
获取外文期刊封面目录资料

摘要

We propose a Label Propagation based algorithm for weakly supervised text classification. We construct a graph where each document is represented by a node and edge weights represent similarities among the documents. Additionally, we discover underlying topics using Latent Dirichlet Allocation (LDA) and enrich the document graph by including the topics in the form of additional nodes. The edge weights between a topic and a text document represent level of "affinity" between them. Our approach does not require document level labelling, instead it expects manual labels only for topic nodes. This significantly minimizes the level of supervision needed as only a few topics are observed to be enough for achieving sufficiently high accuracy. The Label Propagation Algorithm is employed on this enriched graph to propagate labels among the nodes. Our approach combines the advantages of Label Propagation (through document-document similarities) and Topic Modelling (for minimal but smart supervision). We demonstrate the effectiveness of our approach on various datasets and compare with state-of-the-art weakly supervised text classification approaches.
机译:我们提出了一种基于标签传播的弱势监督文本分类算法。我们构造一个图形,其中每个文档由节点表示,边缘权重代表文档之间的相似性。此外,我们使用潜在的Dirichlet分配(LDA)发现底层主题,并通过包括其他节点形式的主题来丰富文档图表。主题和文本文档之间的边缘权重代表它们之间的“关联”级别。我们的方法不需要文档级标签,而是预计仅针对主题节点的手动标签。这显着最大限度地减少了仅观察到几个主题所需的监督水平足以实现足够高的准确性。在该富集的图表上采用标签传播算法在节点之间传播标签。我们的方法结合了标签传播(通过文档文件相似性)和主题建模的优点(用于最小但智能监督)。我们展示了我们对各种数据集中的方法的有效性,并与最先进的弱监督文本分类方法相比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号