首页> 外文期刊>ACM transactions on knowledge discovery from data >Mining Event-Oriented Topics in Microblog Stream with Unsupervised Multi-View Hierarchical Embedding
【24h】

Mining Event-Oriented Topics in Microblog Stream with Unsupervised Multi-View Hierarchical Embedding

机译:使用无监督的多视图分层嵌入在微博流中挖掘面向事件的主题

获取原文
获取原文并翻译 | 示例
           

摘要

This article presents an unsupervised multi-view hierarchical embedding (UMHE) framework to sufficiently reveal the intrinsic topical knowledge in social events. Event-oriented topics are highly related to such events as it can provide explicit descriptions of what have happened in social community. In many real-world cases, however, it is difficult to include all attributes of microblogs, more often, textual aspects only are available. Traditional topic modelling methods have failed to generate event-oriented topics with the textual aspects, since the inherent relations between topics are often overlooked in these methods. Meanwhile, the metrics in original word vocabulary space might not effectively capture semantic distances. Our UMHE framework overcomes the severe information deficiency and poor feature representation. The UMHE first develops a multi-view Bayesian rose tree to preliminarily generate prior knowledge for latent topics and their relations. With such prior knowledge, we design an unsupervised translation-based hierarchical embedding method to make a better representation of these latent topics. By applying self-adaptive spectral clustering on the embedding space and the original space concomitantly, we eventually extract event-oriented topics in word distributions to express social events. Our framework is purely data-driven and unsupervised, without any external knowledge. Experimental results on TREC Tweets2011 dataset and Sina Weibo dataset demonstrate that the UMHE framework can construct hierarchical structure with high fitness, but also yield topic embeddings with salient semantics; therefore, it can derive event-oriented topics with meaningful descriptions.
机译:本文提出了一种无监督的多视图层次嵌入(UMHE)框架,以充分揭示社交事件中的固有主题知识。面向事件的主题与此类事件高度相关,因为它可以提供对社交社区中所发生事件的明确描述。但是,在许多实际情况下,很难涵盖微博的所有属性,而更多情况下,仅文本方面可用。传统的主题建模方法无法生成具有文本方面的面向事件的主题,因为在这些方法中,主题之间的固有关系经常被忽略。同时,原始单词词汇空间中的度量可能无法有效地捕获语义距离。我们的UMHE框架克服了严重的信息不足和特征表示不佳的问题。 UMHE首先开发了多视图贝叶斯玫瑰树,以初步生成有关潜在主题及其关系的先验知识。有了这样的先验知识,我们设计了一种无监督的基于翻译的分层嵌入方法,以更好地表示这些潜在主题。通过在嵌入空间和原始空间上同时应用自适应谱聚类,我们最终在单词分布中提取面向事件的主题来表达社交事件。我们的框架是纯粹的数据驱动和无监督的,无需任何外部知识。在TREC Tweets2011数据集和新浪微博数据集上的实验结果表明,UMHE框架可以构建高度适合的层次结构,但也可以产生具有突出语义的主题嵌入。因此,它可以派生具有有意义描述的面向事件的主题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号