首页> 外文期刊>International journal of machine learning and cybernetics >A term correlation based semi-supervised microblog clustering with dual constraints
【24h】

A term correlation based semi-supervised microblog clustering with dual constraints

机译:基于术语相关性与双约束的半监控微博聚类

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Microblog clustering is very important in many web applications. However, microblogs do not provide sufficient word occurrences. Meanwhile the limited length of these messages prevents traditional text clustering approaches from being employed to their full potential. To address this problem, in this paper, we propose a novel semi-supervised learning scheme fully exploring the semantic information to compensate for the limited message length. The key idea is to explore term correlation data, which well captures the semantic information for term weighting and provides greater context for microblogs. We then formulate microblog clustering problem as a semi-supervised non-negative matrix factorization co-clustering framework, which takes advantage of both prior domain knowledge of data points (microblogs) in the form of pair-wise constraints and category knowledge of features (terms). Our approach not only greatly reduces the labor-intensive labeling process, but also deeply exploits hidden information from microblog itself. Extensive experiments are conducted on two real-world microblog datasets. The results demonstrate the effectiveness of the proposed approach which produces promising performance as compared to state-of-the-art methods.
机译:微博群集在许多Web应用程序中非常重要。但是,微博不提供足够的单词出现。同时这些消息的有限长度可防止传统的文本聚类方法从其充分潜力。为了解决这个问题,在本文中,我们提出了一种新颖的半监督学习计划,完全探索语义信息以补偿有限的信息长度。关键的想法是探索术语相关数据,该数据很好地捕获术语加权的语义信息,并为微博提供更大的上下文。然后,我们将微博聚类问题作为半监控的非负矩阵分解协同聚类框架,它以成对约束和类别知识的形式利用数据点(微博)的先前域知识(术语)。我们的方法不仅大大减少了劳动密集型标签过程,而且还深入利用微博本身的隐藏信息。广泛的实验是在两个现实世界微博数据集中进行的。结果表明,与最先进的方法相比,所提出的方法的有效性产生了有希望的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号