首页> 外文OA文献 >Topic Models and Fusion Methods: a Union to Improve Text Clustering and Cluster Labeling
【2h】

Topic Models and Fusion Methods: a Union to Improve Text Clustering and Cluster Labeling

机译:主题模型和融合方法:一个改进文本聚类和群集标签的联盟

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Topic modeling algorithms are statistical methods that aim to discover the topics running through the text documents. Using topic models in machine learning and text mining is popular due to its applicability in inferring the latent topic structure of a corpus. In this paper, we represent an enriching document approach, using state-of-the-art topic models and data fusion methods, to enrich documents of a collection with the aim of improving the quality of text clustering and cluster labeling. We propose a bi-vector space model in which every document of the corpus is represented by two vectors: one is generated based on the fusion-based topic modeling approach, and one simply is the traditional vector model. Our experiments on various datasets show that using a combination of topic modeling and fusion methods to create documents’ vectors can significantly improve the quality of the results in clustering the documents.
机译:主题建模算法是统计方法,旨在发现通过文本文档运行的主题。在机器学习中使用主题模型和文本挖掘由于其适用性地推断了语料库的潜在结构而受欢迎。在本文中,我们代表了一种丰富的文档方法,使用最先进的主题模型和数据融合方法,以提高文本聚类和群集标签的质量,丰富集合的文档。我们提出了一种双矢量空间模型,其中语料库的每个文档由两个向量表示:一个是基于基于融合的主题建模方法生成的,并且一个简单地是传统的矢量模型。我们对各种数据集的实验表明,使用主题建模和融合方法的组合来创建文档的矢量可以显着提高群集文档的结果的质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号