The Dual-Sparse Topic Model: Mining Focused Topics and Focused Terms in Short Text

机译：双重稀疏主题模型：在短文本中挖掘重点主题和重点术语

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Topic modeling has been proved to be an effective method for exploratory text mining. It is a common assumption of most topic models that a document is generated from a mixture of topics. In real-world scenarios, individual documents usually concentrate on several salient topics instead of covering a wide variety of topics. A real topic also adopts a narrow range of terms instead of a wide coverage of the vocabulary. Understanding this sparsity of information is especially important for analyzing user-generated Web content and social media, which are featured as extremely short posts and condensed discussions. In this paper, we propose a dual-sparse topic model that addresses the sparsity in both the topic mixtures and the word usage. By applying a "Spike and Slab" prior to decouple the sparsity and smoothness of the document-topic and topic-word distributions, we allow individual documents to select a few focused topics and a topic to select focused terms, respectively. Experiments on different genres of large corpora demonstrate that the dual-sparse topic model outperforms both classical topic models and existing sparsity-enhanced topic models. This improvement is especially notable on collections of short documents.

机译：主题建模已被证明是探索性文本挖掘的有效方法。大多数主题模型的一个普遍假设是，文档是由多个主题混合生成的。在现实世界中，单个文档通常集中于几个突出的主题，而不是涵盖各种各样的主题。一个真实的主题还采用了狭窄的术语范围，而不是广泛的词汇范围。理解这种稀疏信息对于分析用户生成的Web内容和社交媒体尤其重要，这些内容以极短的帖子和简短的讨论为特征。在本文中，我们提出了一种双稀疏主题模型，该模型解决了主题混合和单词用法中的稀疏性。通过在解耦文档主题和主题词分布的稀疏性和平滑度之前应用“峰值和平板”，我们允许单个文档分别选择一些重点主题和一个主题以选择重点术语。对不同类型的大型语料库进行的实验表明，双稀疏主题模型的性能优于经典主题模型和现有的稀疏增强主题模型。这种改进在短文档收集方面尤为明显。

著录项

来源
《International conference on world wide web》|2014年|539-549|共11页
会议地点
作者
Tianyi Lin; Wentao Tian; Qiaozhu Mei; Hong Cheng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Topic modeling; spike and slab; sparse representation; user-generated content;

机译：主题建模;尖刺和板坯;稀疏表示用户生成内容;

相似文献

外文文献
中文文献
专利

1. Fuzzy topic modeling approach for text mining over short text [J] . Rashid Junaid, Shah Syed Muhammad Adnan, Irtaza Aun Information Processing & Management . 2019,第6期

机译：短文本文本挖掘的模糊主题建模方法
2. Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings [J] . Li Ximing, Zhang Ang, Li Changchun, The Computer journal . 2019,第3期

机译：关系双项主题模型：使用词嵌入的短文本主题建模
3. In text mining: detection of topic and sub-topic using multiple spider hunting model [J] . Elakiya E., Rajkumar N. Journal of ambient intelligence and humanized computing . 2021,第3期

机译：在文本挖掘中：使用多个蜘蛛狩猎模型检测主题和子主题
4. The Dual-Sparse Topic Model: Mining Focused Topics and Focused Terms in Short Text [C] . Tianyi Lin, Wentao Tian, Qiaozhu Mei, International conference on world wide web . 2014

机译：双稀疏主题模型：在短文本中挖掘专注主题和聚焦术语
5. Topic Modeling and Spam Detection for Short Text Segments in Web Forums [D] . Sun, Yingcheng. 2020

机译：网上论坛中短文本段的主题建模和垃圾邮件检测
6. Text mining in a literature review of urothelial cancer using topic model [O] . Hsuan-Jen Lin, Phillip C.-Y. Sheu, Jeffrey J. P. Tsai, 2020

机译：使用主题模型在尿路上皮癌文献综述中的文本挖掘
7. Online LDA: Adaptive Topic Model for Mining Text Streams with Application on Topic Detection and [O] . Loulwah Alsumait, Daniel Barbará, Carlotta Domeniconi 2008

机译：在线LDa：用于挖掘文本流的自适应主题模型及其在主题检测和应用中的应用

The Dual-Sparse Topic Model: Mining Focused Topics and Focused Terms in Short Text

摘要

著录项

相似文献

相关主题

期刊订阅