首页> 外文期刊>ACM Transactions on Interactive Intelligent Systems >A Visual Analytics Approach for Interactive Document Clustering
【24h】

A Visual Analytics Approach for Interactive Document Clustering

机译:交互式文档聚类的可视化分析方法

获取原文
获取原文并翻译 | 示例
       

摘要

Document clustering is a necessary step in various analytical and automated activities. When guided by the user, algorithms are tailored to imprint a perspective on the clustering process that reflects the user's understanding of the dataset. More than just allow for customized adjustment of the clusters, a visual analytics approach will provide tools for the user to draw new insights on the collection. While contributing his or her perspective, the user will also acquire a deeper understanding of the data set. To that effect, we propose a novel visual analytics system for interactive document clustering. We built our system on top of clustering algorithms that can adapt to user's feedback. In the proposed system, initial clustering is created based on the user-defined number of clusters and the selected clustering algorithm. A set of coordinated visualizations allow the examination of the dataset and the results of the clustering. The visualization provides the user with the highlights of individual documents and understanding of the evolution of documents over the time period to which they relate. The users then interact with the process by means of changing key-terms that drive the process according to their knowledge of the documents domain. In key-term-based interaction, the user assigns a set of key-terms to each target cluster to guide the clustering algorithm. We have improved that process with a novel algorithm for choosing proper seeds for the clustering. Results demonstrate that not only the system has improved considerably its precision, but also its effectiveness in the document-based decision making. A set of quantitative experiments and a user study have been conducted to show the advantages of the approach for document analytics based on clustering. We performed and reported on the use of the framework in a real decision-making scenario that relates users discussion by email to decision making in improving patient care. Results show that the framework is useful even for more complex data sets such as email conversations.
机译:文档聚类是各种分析和自动化活动中的必要步骤。在用户的指导下,可对算法进行定制,以在聚类过程中添加一个观点,以反映用户对数据集的理解。视觉分析方法不仅允许对集群进行自定义调整,还可以为用户提供工具,使他们可以对集合进行新的洞察。在贡献自己的观点的同时,用户还将获得对数据集的更深刻理解。为此,我们提出了一种新颖的可视化分析系统,用于交互式文档聚类。我们基于可适应用户反馈的聚类算法构建了我们的系统。在提出的系统中,初始聚类是基于用户定义的聚类数量和所选聚类算法创建的。一组协调的可视化允许检查数据集和聚类的结果。可视化为用户提供了各个文档的突出显示,并了解了文档在与之相关的时间段内的演变。然后,用户可以根据他们对文档域的了解,通过更改驱动过程的关键术语来与过程交互。在基于关键术语的交互中,用户向每个目标聚类分配一组关键术语,以指导聚类算法。我们使用一种新颖的算法为聚类选择合适的种子,从而改进了该过程。结果表明,该系统不仅大大提高了其精度,而且还提高了基于文档的决策制定的有效性。已经进行了一组定量实验和一个用户研究,以显示基于聚类的文档分析方法的优势。我们在真实的决策场景中执行并报告了该框架的使用情况,该场景通过电子邮件将用户讨论与改善患者护理的决策联系起来。结果表明,该框架甚至对于更复杂的数据集(例如电子邮件对话)也很有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号