Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases

机译：主题立方体：多维文本数据库OLAP主题建模

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

As the amount of textual information grows explosively in various kinds of business systems, it becomes more and more desirable to analyze both structured data records and unstructured text data simultaneously. While online analytical processing (OLAP) techniques have been proven very useful for analyzing and mining structured data, they face challenges in handling text data. On the other hand, probabilistic topic models are among the most effective approaches to latent topic analysis and mining on text data. In this paper, we propose a new data model called topic cube to combine OLAP with probabilistic topic modeling and enable OLAP on the dimension of text data in a multidimensional text database. Topic cube extends the traditional data cube to cope with a topic hierarchy and store probabilistic content measures of text documents learned through a probabilistic topic model. To materialize topic cubes efficiently, we propose a heuristic method to speed up the iterative EM algorithm for estimating topic models by leveraging the models learned on component data cells to choose a good starting point for iteration. Experiment results show that this heuristic method is much faster than the baseline method of computing each topic cube from scratch. We also discuss potential uses of topic cube and show sample experimental results.

机译：随着文本信息的数量在各种业务系统中爆炸地增长，它变得越来越希望同时分析结构化数据记录和非结构化文本数据。虽然已经证明在线分析处理（OLAP）技术可用于分析和采矿结构化数据非常有用，但它们在处理文本数据方面面临挑战。另一方面，概率主题模型是在文本数据上潜在潜在分析和挖掘的最有效方法之一。在本文中，我们提出了一个名为主题立方体的新数据模型，将OLAP与概率主题建模组合，使能OLAP在多维文本数据库中的文本数据的维度上。主题多维数据集扩展了传统数据多维数据集以应对通过概率主题模型学习的文本文档的概率内容度量。为了有效地实现主题立方体，我们提出了一种启发式方法，可以通过利用组件数据单元上的模型来加速迭代EM算法来估算主题模型，以选择迭代的良好起点。实验结果表明，这种启发式方法比从头开始计算每个主题立方体的基线方法更快。我们还讨论了主题立方体的潜在用途，并显示了样品实验结果。

著录项

来源
《SIAM International Conference on Data Mining》|2009年|1244 p.|共12页
会议地点
作者
Duo Zhang; Chengxiang Zhai; Jiawei Han;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP274.2-53;
关键词

相似文献

外文文献
中文文献
专利

1. Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings [J] . Li Ximing, Zhang Ang, Li Changchun, The Computer journal . 2019,第3期

机译：关系双项主题模型：使用词嵌入的短文本主题建模
2. Modeling and OLAP Cubes for Database of Ground and Municipal Water Supply [J] . Taskeen Zaidi, Annapurna Singh, Vipin Saxena Computational Water, Energy, and Environmental Engineering . 2013,第3期

机译：地下水和市政供水数据库的建模和OLAP多维数据集
3. NetCube: a comprehensive network traffic analysis model based on multidimensional OLAP data cube [J] . Daihee Park1 Jaehak Yu2 Jun-Sang Park1 and Myung-Sup Kim1† International Journal of Network Management . 2013,第2期

机译：NetCube：基于多维OLAP数据多维数据集的综合网络流量分析模型
4. Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases [C] . Duo Zhang, Chengxiang Zhai, Jiawei Han SIAM International Conference on Data Mining . 2009

机译：主题立方体：多维文本数据库OLAP主题建模
5. Cop Topics: Topic Modeling-Assisted Discoveries of Police-Related Themes in African-American Journalistic Texts. [D] . Lemire Garlic, Nicole. 2017

机译：缔约方会议主题：非裔美国人新闻文本中主题建模辅助的警察相关主题的发现。
6. Topic models: A novel method for modeling couple and family text data [O] . David C. Atkins, Tim N. Rubin, Mark Steyvers, -1

机译：主题模型：一种模拟夫妇和家庭文本数据的新方法
7. Topic cube: Topic modeling for olap on multidimensional text databases [O] . Duo Zhang, Chengxiang Zhai, Jiawei Han 2009

机译：主题立方体：多维文本数据库上的olap主题建模

Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases

摘要

著录项

相似文献

相关主题

期刊订阅