首页> 外文期刊>International journal of grid and high performance computing >MR-LDA: An Efficient Topic Model for Classification of Short Text in Big Social Data
【24h】

MR-LDA: An Efficient Topic Model for Classification of Short Text in Big Social Data

机译:MR-LDA:大社会数据中短文本分类的有效主题模型

获取原文
获取原文并翻译 | 示例
       

摘要

Latent Dirichlet Allocation(LDA) is an efficient method of text mining,but applying LDA directly to Chinese micro-blog texts will not work well because micro-blogs are more social, brief, and closely related with each other. Based on LDA, this paper proposes a Micro-blog Relation LDA model (MR-LDA), which takes the relations between Chinese micro-blog documents and other Chinese micro-blog documents into consideration to help topic mining in micro-blog. The authors extend LDA in the following two points. First, they aggregate several Chinese micro-blogs as a single micro-blog document to solve the problem of short texts. Second, they model the generation process of Chinese micro-blogs more accurately by taking relationship between micro-blog documents into consideration. MR-LDA is more suitable to model Chinese micro-blog data. Gibbs sampling method is borrowed to inference the model. Experimental results on actual datasets show that MR-LDA model can offer an effective solution to text mining for Chinese micro-blog.
机译:潜在狄利克雷分配法(LDA)是一种有效的文本挖掘方法,但是将LDA直接应用于中文微博客文本并不能很好地发挥作用,因为微博客之间的联系更加紧密,简短且紧密相关。本文基于LDA,提出了一种微博关系LDA模型(MR-LDA),该模型考虑了中文微博文档与其他中文微博文档之间的关系,以帮助微博中的主题挖掘。作者在以下两点扩展了LDA。首先,他们将多个中文微博汇总为一个微博文档,以解决短文本问题。其次,他们通过考虑微博文档之间的关系来更准确地模拟中文微博的生成过程。 MR-LDA更适合于对中国微博数据进行建模。借用Gibbs抽样方法推断模型。在实际数据集上的实验结果表明,MR-LDA模型可以为中文微博文本挖掘提供有效的解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号