首页> 外文期刊>International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms >TOPICVIEW: VISUAL ANALYSIS OF TOPIC MODELS AND THEIR IMPACT ON DOCUMENT CLUSTERING
【24h】

TOPICVIEW: VISUAL ANALYSIS OF TOPIC MODELS AND THEIR IMPACT ON DOCUMENT CLUSTERING

机译:TOPICVIEW:主题模型的可视化分析及其对文档聚类的影响

获取原文
获取原文并翻译 | 示例
       

摘要

We present a new approach for analyzing topic models using visual analytics. We have developed TopicView, an application for visually comparing and exploring multiple models of text corpora, as a prototype for this type of analysis tool. TopicView uses multiple linked views to visually analyze conceptual and topical content, document relationships identified by models, and the impact of models on the results of document clustering. As case studies, we examine models created using two standard approaches: Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). Conceptual content is compared through the combination of (i) a bipartite graph matching LSA concepts with LDA topics based on the cosine similarities of model factors and (ii) a table containing the terms for each LSA concept and LDA topic listed in decreasing order of importance. Document relationships are examined through the combination of (i) side-by-side document similarity graphs, (ii) a table listing the weights for each document's contribution to each concept/topic, and (iii) a full text reader for documents selected in either of the graphs or the table. The impact of LSA and LDA models on document clustering applications is explored through similar means, using proximities between documents and cluster exemplars for graph layout edge weighting and table entries. We demonstrate the utility of TopicView's visual approach to model assessment by comparing LSA and LDA models of several example corpora.
机译:我们提出了一种使用视觉分析来分析主题模型的新方法。我们已经开发了TopicView,这是一种用于在视觉上比较和探索文本语料库的多个模型的应用程序,作为此类分析工具的原型。 TopicView使用多个链接的视图以可视方式分析概念和主题内容,由模型标识的文档关系以及模型对文档聚类结果的影响。作为案例研究,我们检查使用两种标准方法创建的模型:潜在语义分析(LSA)和潜在狄利克雷分配(LDA)。通过组合(i)基于模型因子的余弦相似度将LSA概念与LDA主题匹配的二部图和(ii)包含每个LSA概念和LDA主题的术语的列表(按重要性降序)进行组合来比较概念内容。通过(i)并排的文档相似度图表,(ii)列出每个文档对每个概念/主题的权重的表格以及(iii)全文阅读器中的文档组合来检查文档关系。无论是图形还是表格。 LSA和LDA模型对文档聚类应用程序的影响是通过类似的方法来探索的,使用文档和聚类示例之间的邻近度来进行图形布局边缘加权和表条目。通过比较几个示例语料库的LSA和LDA模型,我们演示了TopicView可视方法在模型评估中的实用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号