首页> 外文会议>Workshop on vector space Modeling for Natural Language Processing >Short Text Clustering via Convolutional Neural Networks
【24h】

Short Text Clustering via Convolutional Neural Networks

机译:通过卷积神经网络短信群集

获取原文

摘要

Short text clustering has become an increasing important task with the popularity of social media, and it is a challenging problem due to its sparseness of text representation. In this paper, we propose a Short Text Clustering via Convolutional neural networks (abbr. to STCC), which is more beneficial for clustering by considering one constraint on learned features through a self-taught learning framework without using any external tags/labels. First, we embed the original keyword features into compact binary codes with a locality-preserving constraint. Then, word embed-dings are explored and fed into convolutional neural networks to learn deep feature representations, with the output units fitting the pre-trained binary code in the training process. After obtaining the learned representations, we use K-means to cluster them. Our extensive experimental study on two public short text datasets shows that the deep feature representation learned by our approach can achieve a significantly better performance than some other existing features, such as term frequency-inverse document frequency, Laplacian eigenvectors and average embedding, for clustering.
机译:短文本聚类已成为社交媒体普及的越来越重要的任务,由于文本表示的疲劳,这是一个具有挑战性的问题。在本文中,我们通过卷积神经网络(ABBR。到STC)提出了简短的文本聚类,这对于通过在不使用任何外部标记/标签的情况下考虑通过自学学习框架的一个约束来群集更有益。首先,我们将原始关键字功能嵌入到具有局部保留约束的紧凑二进制代码中。然后,探索单词嵌入点并进入卷积神经网络以学习深度特征表示,输出单元拟合训练过程中的预先训练的二进制代码。获取学习的表示后,我们使用k-means来培养它们。我们对两个公共短文数据集的广泛实验研究表明,我们的方法学到的深度特征表示可以实现比其他一些现有功能的性能显着更好,例如术语频率 - 逆文档频率,拉普拉斯特征向量和平均嵌入,用于聚类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号