首页> 外文期刊>Knowledge-Based Systems >Experimental explorations on short text topic mining between LDA and NMF based Schemes
【24h】

Experimental explorations on short text topic mining between LDA and NMF based Schemes

机译:基于LDA和NMF的方案之间的短文本主题挖掘的实验探索

获取原文
获取原文并翻译 | 示例

摘要

Learning topics from short texts has become a critical and fundamental task for understanding the widely-spread streaming social messages,e.g., tweets, snippets and questions/answers. Up to date, there are two distinctive topic learning schemes: generative probabilistic graphical models and geometrically linear algebra approaches, with LDA and NMF being the representative works, respectively. Since these two methods both could uncover the latent topics hidden in the unstructured short texts, some interesting doubts are coming to our minds that which one is better and why? Are there any other more effective extensions? In order to explore valuable insights between LDA and NMF based learning schemes, we comprehensively conduct a series of experiments into two parts. Specifically, the basic LDA and NMF are compared with different experimental settings on several public short text datasets in the first part which would exhibit that NMF tends to perform better than LDA; in the second part, we propose a novel model called “Knowledge-guided Non-negative Matrix Factorization for Better Short Text Topic Mining” (abbreviated as KGNMF), which leverages external knowledge as a semantic regulator with low-rank formalizations, yielding up a time-efficient algorithm. Extensive experiments are conducted on three representative corpora with currently typical short text topic models to demonstrate the effectiveness of our proposed KGNMF. Overall, learning with NMF-based schemes is another effective manner in short text topic mining in addition to the popular LDA-based paradigms.
机译:从短文本中学习主题已成为理解广泛传播的流式社交消息(例如,推文,摘要和问题/答案)的关键和基本任务。迄今为止,有两种独特的主题学习方案:生成概率图形模型和几何线性代数方法,其中LDA和NMF分别是代表作品。由于这两种方法都可以发现隐藏在非结构化短文本中的潜在主题,因此我们想到了一些有趣的疑问,即哪个更好,为什么?还有其他更有效的扩展吗?为了探索基于LDA和NMF的学习计划之间的宝贵见解,我们对两部分进行了一系列全面的实验。具体而言,在第一部分中,在几个公共短文本数据集上将基本的LDA和NMF与不同的实验设置进行了比较,这表明NMF的性能往往优于LDA。在第二部分中,我们提出了一种新颖的模型,称为“知识导向的非负矩阵分解以实现更好的短文本主题挖掘”(缩写为KGNMF),该模型利用外部知识作为具有低秩形式的语义调节器,从而产生了省时的算法。使用当前典型的短文本主题模型对三个代表性语料库进行了广泛的实验,以证明我们提出的KGNMF的有效性。总体而言,除了基于流行的基于LDA的范例之外,基于NMF的方案的学习是短文本主题挖掘中的另一种有效方式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号