Labeled Phrase Latent Dirichlet Allocation and its online learning algorithm

Tang Yi-Kun; Mao Xian-Ling; Huang Heyan

首页> 外文期刊>Data mining and knowledge discovery >Labeled Phrase Latent Dirichlet Allocation and its online learning algorithm

【24h】

Labeled Phrase Latent Dirichlet Allocation and its online learning algorithm

机译：标记短语潜在Dirichlet分配及其在线学习算法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

There is a mass of user-marked text data on the Internet, such as web pages with categories, papers with corresponding keywords, and tweets with hashtags. In recent years, supervised topic models, such as Labeled Latent Dirichlet Allocation, have been widely used to discover the abstract topics in labeled text corpora. However, none of these topic models have taken into consideration word order under the bag-of-words assumption, which will obviously lose a lot of semantic information. In this paper, in order to synchronously model semantical label information and word order, we propose a novel topic model, called Labeled Phrase Latent Dirichlet Allocation (LPLDA), which regards each document as a mixture of phrases and partly considers the word order. In order to obtain the parameter estimation for the proposed LPLDA model, we develop a batch inference algorithm based on Gibbs sampling technique. Moreover, to accelerate the LPLDA's processing speed for large-scale stream data, we further propose an online inference algorithm for LPLDA. Extensive experiments were conducted among LPLDA and four state-of-the-art baselines. The results show (1) batch LPLDA significantly outperforms baselines in terms of case study, perplexity and scalability, and the third party task in most cases; (2) the online algorithm for LPLDA is obviously more efficient than batch method under the premise of good results.

机译：Internet上有大量的用户标记文本数据，例如带有类别的网页，具有相应关键字的文件，以及带有Hashtags的推文。近年来，被监督主题模型（例如标记的潜在Dirichlet分配）已被广泛用于发现标记文本语料库中的抽象主题。但是，这些主题模型中都没有考虑在文字袋的假设下的Word顺序，这显然会丢失很多语义信息。在本文中，为了同步地模拟语义标签信息和单词顺序，我们提出了一种新颖的主题模型，称为标记短语潜在的Dirichlet分配（LPLDA），这将每个文档视为短语的混合，部分地考虑单词顺序。为了获得所提出的LPLDA模型的参数估计，我们开发了一种基于GIBBS采样技术的批量推理算法。此外，为了加速LPLDA的大规模流数据的处理速度，我们还提出了一种用于LPLDA的在线推理算法。在LPLDA和四个最先进的基线之间进行了广泛的实验。结果表明（1）批量LPLDA在大多数情况下，在案例研究，困惑和可扩展性方面显着优于基线，以及第三方任务; （2）在良好效果的前提下，LPLDA的在线算法明显比批量方法更有效。

著录项

来源
《Data mining and knowledge discovery》 |2018年第4期|共28页
作者
Tang Yi-Kun; Mao Xian-Ling; Huang Heyan;
展开▼
作者单位

Beijing Inst Technol Sch Comp Sci &

Technol Beijing Engn Res Ctr High Volume Language Informa Beijing 100081 Peoples R China;

Beijing Inst Technol Sch Comp Sci &

Technol Beijing Engn Res Ctr High Volume Language Informa Beijing 100081 Peoples R China;

Beijing Inst Technol Sch Comp Sci &

Technol Beijing Engn Res Ctr High Volume Language Informa Beijing 100081 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Topic model; Labeled Phrase LDA; Batch Labeled Phrase LDA; Online Labeled Phrase LDA;

机译：主题模型;标记短语LDA;批量标记短语LDA;在线标记短语LDA;

相似文献

外文文献
中文文献
专利

1. Labeled Phrase Latent Dirichlet Allocation and its online learning algorithm [J] . Tang Yi-Kun, Mao Xian-Ling, Huang Heyan Data mining and knowledge discovery . 2018,第4期

机译：标记短语潜在Dirichlet分配及其在线学习算法
2. Toward a better fitness club: Evidence from exerciser online rating and review using latent Dirichlet allocation and support vector machine [J] . Jia Susan (Sixue) International Journal of Market Research . 2019,第1期

机译：迈向更好的健身俱乐部：锻炼者在线评分的证据，并使用潜在的狄利克雷分配和支持向量机进行评估
3. Mining meaning from online ratings and reviews: Tourist satisfaction analysis using latent dirichlet allocation [J] . Guo Yue, Barnes Stuart J., Jia Qiong Tourism management . 2017,第APRa期

机译：从在线评分和评论中挖掘含义：使用潜在狄利克雷分配的游客满意度分析
4. An Online Inference Algorithm for Labeled Latent Dirichlet Allocation [C] . Qiang Zhou, Heyan Huang, Xian-Ling Mao Web technologies and applications . 2015

机译：标记潜在狄利克雷分配的在线推理算法
5. Application of Latent Dirichlet Allocation in Online Content Generation. [D] . Yang, Yajia. 2016

机译：潜在Dirichlet分配在在线内容生成中的应用。
6. Crowd labeling latent Dirichlet allocation [O] . Luca Pion-Tonachini, Scott Makeig, Ken Kreutz-Delgado -1

机译：人群标签潜在Dirichlet分配
7. AUTOMATIC LABELING OF RSS ARTICLES USING ONLINE LATENT DIRICHLET ALLOCATION [O] . Zhe Lu -1

机译：使用在线潜在Dirichlet分配自动标记RSS文章

Labeled Phrase Latent Dirichlet Allocation and its online learning algorithm

摘要

著录项

相似文献

相关主题

期刊订阅