首页> 外文会议>IEEE International Congress on Big Data >Topic modeling for management sciences: A network-based approach
【24h】

Topic modeling for management sciences: A network-based approach

机译:管理科学的主题建模:基于网络的方法

获取原文

摘要

Big data mining and unsupervised pattern recognition from large corpus of text-based documents has been an active research topic over the past decade. This paper presents a novel sequence of network-based models for identifying high-dimensional clustering patterns between topics for quantitative and predictive modeling of trends in Management Science (an INFORMS Journal) papers over the past 54 years. The proposed methods extrapolate a new spatial dimension from publication records to identify and assess topic inter-dependence and clustering trends over time. First, the optimal number of topics for trend analysis is identified based on spatial clustering patterns using Self-Organizing-Maps (SOM). Next, topic models are used to construct weighted and unweighted complex networks. Based on spatio-temporal clustering trends in the complex networks, the influence, importance and uniqueness of topics are quantified. Finally, the dynamic trends in topic influence are modeled for predictive purposes using Hidden Markov Models (HMM). The proposed methods provide insights into topic type co-existence patterns, topic type rankings, identify ~40% topics as unique and predict topic importance with average accuracy per topic in the range of 79-84%. Thus, the proposed methods provide the apparatus to translate time-series text-intensive data sets to spatio-temporal models that can provide additional insights on data interdependencies and inter-data influences.
机译:在过去十年中,大数据挖掘和来自大型文本文档集的无监督模式识别一直是活跃的研究主题。本文介绍了一种新的基于网络的模型序列,用于识别主题之间的高维聚类模式,以对过去54年间管理科学(INFORMS Journal)论文的趋势进行定量和预测建模。所提出的方法从出版物记录中推断出一个新的空间维度,以识别和评估主题之间的相互依赖性以及随着时间的推移聚类趋势。首先,使用自组织映射(SOM)根据空间聚类模式确定趋势分析的最佳主题数。接下来,主题模型用于构建加权和未加权的复杂网络。根据复杂网络中的时空聚类趋势,可以确定主题的影响力,重要性和唯一性。最后,使用隐马尔可夫模型(HMM)为预测目的对主题影响力的动态趋势进行了建模。所提出的方法提供了对主题类型共存模式,主题类型排名的洞察力,将约40%的主题识别为唯一主题,并预测主题的重要性,每个主题的平均准确度在79-84%的范围内。因此,所提出的方法提供了将时间序列文本密集型数据集转换为时空模型的设备,该模型可以提供关于数据相互依赖性和数据间影响的更多见解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号