首页> 外文期刊>Procedia Computer Science >Unsupervised Concept Hierarchy Learning: A Topic Modeling Guided Approach
【24h】

Unsupervised Concept Hierarchy Learning: A Topic Modeling Guided Approach

机译:无监督概念层次学习:主题建模指导方法

获取原文
           

摘要

This paper proposes an efficient and scalable method for concept extraction and concept hierarchy learning from large unstructured text corpus which is guided by a topic modeling process. The method leverages “concepts” from statistically discovered “topics” and then learns a hierarchy of those concepts by exploiting a subsumption relation between them. Advantage of the proposed method is that the entire process falls under the unsupervised learning paradigm thus the use of a domain specific training corpus can be eliminated. Given a massive collection of text documents, the method maps topics to concepts by some lightweight statistical and linguistic processes and then probabilistically learns the subsumption hierarchy. Extensive experiments with large text corpora such as BBC News dataset and Reuters News corpus shows that our proposed method outperforms some of the existing methods for concept extraction and efficient concept hierarchy learning is possible if the overall task is guided by a topic modeling process.
机译:本文提出了一种有效且可扩展的方法,用于从大型非结构化文本语料库中进行概念抽取和概念层次学习,并以主题建模过程为指导。该方法利用统计发现的“主题”中的“概念”,然后通过利用它们之间的包含关系来学习这些概念的层次结构。所提出的方法的优点是整个过程都属于无监督学习范式,因此可以消除使用特定领域的训练语料库。给定大量文本文档,该方法通过一些轻量级的统计和语言过程将主题映射到概念,然后概率地学习归类层次。对大型文本语料库(例如BBC新闻数据集和路透社新闻语料库)进行的大量实验表明,如果总体任务由主题建模过程指导,则我们提出的方法优于某些现有的概念提取方法,并且可以进行有效的概念层次学习。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号