首页> 外文期刊>IEEE Computer Graphics and Applications >Cartolabe: A Web-Based Scalable Visualization of Large Document Collections
【24h】

Cartolabe: A Web-Based Scalable Visualization of Large Document Collections

机译:Castolabe:基于网络的可扩展可视化的大型文档集合

获取原文
获取原文并翻译 | 示例

摘要

We describe Cartolabe, a web-based multiscale system for visualizing and exploring large textual corpora based on topics, introducing a novel mechanism for the progressive visualization of filtering queries. Initially designed to represent and navigate through scientific publications in different disciplines, Cartolabe has evolved to become a generic framework and accommodate various corpora, ranging from Wikipedia (4.5M entries) to the French National Debate (4.3M entries). Cartolabe is made of two modules: The first relies on natural language processing methods, converting a corpus and its entities (documents, authors, and concepts) into high-dimensional vectors, computing their projection on the two-dimensional plane, and extracting meaningful labels for regions of the plane. The second module is a web-based visualization, displaying tiles computed from the multidimensional projection of the corpus using the Umap projection method. This visualization module aims at enabling users with no expertise in visualization and data analysis to get an overview of their corpus, and to interact with it: exploring, querying, filtering, panning, and zooming on regions of semantic interest. Three use cases are discussed to illustrate Cartolabe's versatility and ability to bring large-scale textual corpus visualization and exploration to a wide audience.
机译:我们描述了一个基于网络的多尺度系统,用于可视化和探索大型文本语料库的基于主题,引入了逐步可视化的过滤查询的新机制。最初旨在通过不同学科的科学出版物来代表和导航,Castolabe已经发展成为一个通用框架,并容纳各种各样的基层,从维基百科(4.5米条目)到法国国家辩论(4.3M条目)。 Castolabe由两个模块组成:第一个依赖于自然语言处理方法,将语料库及其实体(文档,作者和概念)转换为高维向量,计算它们在二维平面上的投影,并提取有意义的标签对于飞机的区域。第二模块是基于Web的可视化,使用UMAP投影方法显示从语料库的多维投影计算的瓷砖。此可视化模块旨在使用户能够在可视化和数据分析中没有专业知识,以获取其语料库的概述,并与其进行交互:探索,查询,过滤,平移和放大语义兴趣区域的区域。讨论了三种用例,以说明Castolabe的多功能性和为广泛的受众带来大规模文本语料库可视化和探索的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号