...
首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Concept decompositions for short text clustering by identifying word communities
【24h】

Concept decompositions for short text clustering by identifying word communities

机译:通过识别Word Communities的简短文本聚类的概念分解

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Short text clustering is an increasingly important methodology but faces the challenges of sparsity and high-dimensionality of text data. Previous concept decomposition methods have obtained concept vectors via the centroids of clusters using k-means-type clustering algorithms on normal, full texts. In this study, we propose a new concept decomposition method that creates concept vectors by identifying semantic word communities from a weighted word co-occurrence network extracted from a short text corpus or a subset thereof. The cluster memberships of short texts are then estimated by mapping the original short texts to the learned semantic concept vectors. The proposed method is not only robust to the sparsity of short text corpora but also overcomes the curse of dimensionality, scaling to a large number of short text inputs due to the concept vectors being obtained from term-term instead of document-term space. Experimental tests have shown that the proposed method outperforms state-of-the-art algorithms. (C) 2017 Elsevier Ltd. All rights reserved.
机译:短文本聚类是一种越来越重要的方法,但面临稀疏性和文本数据的高度的挑战。以前的概念分解方法通过在正常的全文上使用K-Meancy型聚类算法通过集群的质心获得了概念向量。在这项研究中,我们提出了一种新的概念分解方法,其通过从短文本语料库或其子集中提取的加权词共生网络识别语义词群创建概念传统方法。然后通过将原始简短文本映射到学习的语义概念向量来估计短文本的集群成员资格。该方法不仅坚固了对短文本语料库的稀疏性,而且还克服了维度的诅咒,由于从术语期间而不是文档术语空间获得的概念向量,缩放到大量短文本输入。实验测试表明,所提出的方法优于最先进的算法。 (c)2017 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号