...
首页> 外文期刊>Intelligent data analysis >A practical algorithm for solving the sparseness problem of short text clustering
【24h】

A practical algorithm for solving the sparseness problem of short text clustering

机译:一种解决简短文本聚类稀疏问题的实用算法

获取原文
获取原文并翻译 | 示例
           

摘要

Dirichlet Multinomial Mixture (DMM) models have been successful in clustering short texts. However, the word co-occurrence information that can be captured by these models is limited to the short text corpus itself. If two words have strong relatedness but rarely co-occurring in short texts, these models can not fully capture the semantic relatedness between the two words. In this paper, we propose a novel model by incorporating word-word correlation into DMM, called WDMM. By constructing a sparse graph using word-word relationship, our model expands each short text using their neighboring words in each text that can help to solve the problem of sparseness in short texts. Therefore, the cluster label of each text is not only influenced by its words, but decided by their similar words in this corpus. Experimental results on real-world datasets demonstrated the substantial superiority of our WDMM model over the state-of-the-art methods.
机译:Dirichlet多项式混合物(DMM)模型在聚类短文中取得了成功。然而,这些模型可以捕获的单词共同发生信息仅限于短文本语料库本身。如果两个单词有强烈的相关性,但很少在短文本中共同发生,这些模型无法完全捕捉两个单词之间的语义相关性。在本文中,我们通过将字词相关性结合到DMM,称为WDMM,提出了一种新颖的模型。通过使用Word-Word关系构建稀疏图,我们的模型将使用每个文本中的邻近单词扩展每个短文本,这些单词可以有助于解决短文本中的稀疏问题。因此,每个文本的群集标签不仅受到字词的影响,而且由此语料库中的类似单词决定。实验结果对现实世界数据集展示了我们的WDMM模型在最先进的方法上的实质优势。

著录项

  • 来源
    《Intelligent data analysis》 |2019年第3期|701-716|共16页
  • 作者单位

    Yangzhou Univ Dept Comp Sci Yangzhou Jiangsu Peoples R China;

    Yangzhou Univ Dept Comp Sci Yangzhou Jiangsu Peoples R China;

    Yangzhou Univ Dept Comp Sci Yangzhou Jiangsu Peoples R China;

    Yangzhou Univ Dept Comp Sci Yangzhou Jiangsu Peoples R China;

    Univ Louisiana Lafayette Sch Comp & Informat Lafayette LA 70504 USA;

  • 收录信息 美国《科学引文索引》(SCI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Short text; clustering; dirichlet multinomial mixture;

    机译:短文本;聚类;Dirichlet多项式混合物;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号