首页> 外文期刊>Journal of Intelligent Information Systems >Topic modeling for sequential documents based on hybrid inter-document topic dependency
【24h】

Topic modeling for sequential documents based on hybrid inter-document topic dependency

机译:基于混合文档主题依赖性的顺序文档主题建模

获取原文
获取原文并翻译 | 示例
       

摘要

We propose two new topic modeling methods for sequential documents based on hybrid inter-document topic dependency. Topic modeling for sequential documents is the basis of many attractive applications such as emerging topic clustering and novel topic detection. For these tasks, most of the existing models introduce inter-document dependencies between topic distributions. However, in a real situation, adjacent emerging topics are often intertwined and mixed with outliers. These single-dependency based models have difficulties in handling the topic evolution in such multi-topic and outlier mixed sequential documents. To solve this problem, our first method considers three kinds of topic dependencies for each document to handle its probabilities of belonging to a fading topic, an emerging topic, or an independent topic. Secondly, we extend our first method by considering fine-grained dependencies in a given context for more complex topic evolution sequences. Our experiments conducted on six standard datasets on topic modeling show that our proposals outperform state-of-the-art models in terms of the accuracy of topic modeling, the quality of topic clustering, and the effectiveness of outlier detection.
机译:我们为基于混合文档主题依赖性的顺序文档提出了两个新主题建模方法。顺序文档的主题建模是许多有吸引力的应用程序的基础,例如新兴主题聚类和新颖主题检测。对于这些任务,大多数现有模型在主题分布之间引入文档依赖关系。然而,在真实情况下,邻近的新兴主题通常与异常值交织在一起并与异常值混合。这些基于单个依赖性的模型在处理这种多主题和异常值混合顺序文档中处理主题演进困难。为了解决这个问题,我们的第一个方法考虑了每个文档的三种主题依赖关系,以处理其属于衰落主题,新兴主题或独立主题的概率。其次,我们通过考虑在给定的上下文中考虑更复杂的主题演进序列的细微级别依赖性来扩展我们的第一种方法。我们的实验在主题建模上进行了六个标准数据集,表明我们的提议在主题建模的准确性,主题集群的质量和异常值检测的有效性方面优先表现出最先进的模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号