...
首页> 外文期刊>Intelligent data analysis >A practical algorithm for solving the sparseness problem of short text clustering
【24h】

A practical algorithm for solving the sparseness problem of short text clustering

机译:解决短文本聚类稀疏问题的实用算法

获取原文
获取原文并翻译 | 示例
           

摘要

Dirichlet Multinomial Mixture (DMM) models have been successful in clustering short texts. However, the word co-occurrence information that can be captured by these models is limited to the short text corpus itself. If two words have strong relatedness but rarely co-occurring in short texts, these models can not fully capture the semantic relatedness between the two words. In this paper, we propose a novel model by incorporating word-word correlation into DMM, called WDMM. By constructing a sparse graph using word-word relationship, our model expands each short text using their neighboring words in each text that can help to solve the problem of sparseness in short texts. Therefore, the cluster label of each text is not only influenced by its words, but decided by their similar words in this corpus. Experimental results on real-world datasets demonstrated the substantial superiority of our WDMM model over the state-of-the-art methods.
机译:Dirichlet多项式混合(DMM)模型已成功地将短文本聚类。但是,这些模型可以捕获的单词共现信息仅限于短文本语料库本身。如果两个单词具有很强的关联性,但很少在短文本中同时出现,则这些模型将无法完全捕获两个单词之间的语义关联性。在本文中,我们提出了一种通过将单词-单词相关性合并到DMM中的新颖模型,称为WDMM。通过使用单词-单词关系构建稀疏图,我们的模型使用每个文本中的相邻单词来扩展每个短文本,这有助于解决短文本中的稀疏问题。因此,每个文本的簇标签不仅受其词的影响,还受其在该语料库中的相似词的影响。实际数据集上的实验结果表明,我们的WDMM模型相对于最新方法具有明显的优越性。

著录项

  • 来源
    《Intelligent data analysis》 |2019年第3期|701-716|共16页
  • 作者单位

    Yangzhou Univ, Dept Comp Sci, Yangzhou, Jiangsu, Peoples R China;

    Yangzhou Univ, Dept Comp Sci, Yangzhou, Jiangsu, Peoples R China;

    Yangzhou Univ, Dept Comp Sci, Yangzhou, Jiangsu, Peoples R China;

    Yangzhou Univ, Dept Comp Sci, Yangzhou, Jiangsu, Peoples R China;

    Univ Louisiana Lafayette, Sch Comp & Informat, Lafayette, LA 70504 USA;

  • 收录信息 美国《科学引文索引》(SCI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Short text; clustering; dirichlet multinomial mixture;

    机译:短文本;聚类;Dirichlet多项式混合物;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号