首页> 外文会议>IEEE Conference on Visual Analytics Science anf Technology >Serendip: Topic model-driven visual exploration of text corpora
【24h】

Serendip: Topic model-driven visual exploration of text corpora

机译:Serendip:主题模型驱动的文本语料库可视化探索

获取原文
获取外文期刊封面目录资料

摘要

Exploration and discovery in a large text corpus requires investigation at multiple levels of abstraction, from a zoomed-out view of the entire corpus down to close-ups of individual passages and words. At each of these levels, there is a wealth of information that can inform inquiry - from statistical models, to metadata, to the researcher's own knowledge and expertise. Joining all this information together can be a challenge, and there are issues of scale to be combatted along the way. In this paper, we describe an approach to text analysis that addresses these challenges of scale and multiple information sources, using probabilistic topic models to structure exploration through multiple levels of inquiry in a way that fosters serendipitous discovery. In implementing this approach into a tool called Serendip, we incorporate topic model data and metadata into a highly reorderable matrix to expose corpus level trends; extend encodings of tagged text to illustrate probabilistic information at a passage level; and introduce a technique for visualizing individual word rankings, along with interaction techniques and new statistical methods to create links between different levels and information types. We describe example uses from both the humanities and visualization research that illustrate the benefits of our approach.
机译:在大型文本语料库中进行探索和发现需要从多个抽象层次进行研究,从整个语料库的缩小视图到单个段落和单词的特写。在每个级别上,都有大量信息可以为查询提供信息-从统计模型到元数据,再到研究人员自己的知识和专长。将所有这些信息结合在一起可能是一个挑战,并且在此过程中还有一些规模问题需要解决。在本文中,我们描述了一种文本分析方法,该方法解决了规模和多个信息源的这些挑战,使用概率主题模型通过促进偶然发现的方式,通过多个级别的查询来构建探索结构。在将这种方法实施到称为Serendip的工具中时,我们将主题模型数据和元数据合并到一个高度可重新排序的矩阵中,以显示语料库水平的趋势。扩展带标签文本的编码,以说明段落级别的概率信息;并介绍了一种可视化单个单词排名的技术,以及交互技术和新的统计方法,以创建不同级别和信息类型之间的链接。我们描述了人文和可视化研究中的示例用法,这些示例说明了我们方法的好处。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号