首页> 外文期刊>Computer Graphics Forum: Journal of the European Association for Computer Graphics >iVisClustering: An Interactive Visual Document Clustering via Topic Modeling
【24h】

iVisClustering: An Interactive Visual Document Clustering via Topic Modeling

机译:iVisClustering:通过主题建模的交互式可视文档聚类

获取原文
获取原文并翻译 | 示例
       

摘要

Clustering plays an important role in many large-scale data analyses providing users with an overall understanding of their data. Nonetheless, clustering is not an easy task due to noisy features and outliers existing in the data, and thus the clustering results obtained from automatic algorithms often do not make clear sense. To remedy this problem, automatic clustering should be complemented with interactive visualization strategies. This paper proposes an interactive visual analytics system for document clustering, called iVisClustering, based on a widelyused topic modeling method, latent Dirichlet allocation (LDA). iVisClustering provides a summary of each cluster in terms of its most representative keywords and visualizes soft clustering results in parallel coordinates. The main view of the system provides a 2D plot that visualizes cluster similarities and the relation among data items with a graph-based representation. iVisClustering provides several other views, which contain useful interaction methods. With help of these visualization modules, we can interactively refine the clustering results in various ways. Keywords can be adjusted so that they characterize each cluster better. In addition, our system can filter out noisy data and re-cluster the data accordingly. Cluster hierarchy can be constructed using a tree structure and for this purpose, the system supports cluster-level interactions such as sub-clustering, removing unimportant clusters, merging the clusters that have similar meanings, and moving certain clusters to any other node in the tree structure. Furthermore, the system provides document-level interactions such as moving mis-clustered documents to another cluster and removing useless documents. Finally, we present how interactive clustering is performed via iVisClustering by using real-world document data sets.
机译:聚类在许多大规模数据分析中起着重要作用,可为用户提供对其数据的整体理解。但是,由于数据中存在嘈杂的特征和离群值,所以聚类并不是一件容易的事,因此从自动算法获得的聚类结果通常没有明确的意义。为了解决这个问题,自动聚类应该辅以交互式可视化策略。本文基于一种广泛使用的主题建模方法,潜在的狄利克雷分配(LDA),提出了一种用于文档聚类的交互式视觉分析系统,称为iVisClustering。 iVisClustering提供了每个聚类中最具代表性的关键字的摘要,并在平行坐标中可视化了软聚类结果。该系统的主视图提供了一个2D图,该图以基于图的表示形式可视化集群相似性和数据项之间的关系。 iVisClustering提供了其他几个视图,其中包含有用的交互方法。借助这些可视化模块,我们可以以各种方式交互式地改进聚类结果。可以调整关键字,以便更好地表征每个群集。此外,我们的系统可以过滤出嘈杂的数据并相应地重新聚类数据。可以使用树结构构建集群层次结构,为此,系统支持集群级别的交互,例如子集群,删除不重要的集群,合并具有相似含义的集群以及将某些集群移动到树中的任何其他节点结构体。此外,该系统提供了文档级别的交互,例如将错误聚类的文档移至另一个群集并删除无用的文档。最后,我们介绍如何通过使用实际文档数据集通过iVisClustering执行交互式聚类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号