首页> 外文期刊>Computers & Graphics >Normalized compression distance for visual analysis of document collections
【24h】

Normalized compression distance for visual analysis of document collections

机译:归一化的压缩距离,用于文档收集的可视化分析

获取原文
获取原文并翻译 | 示例
           

摘要

In a world flooded by text of various sources, it is of strategic importance to find ways to map information present in written documents in a form that helps users locate and associate important information within a particular text data set. Content-based maps can support extremely useful explorations of text data sets. This paper proposes and evaluates the use of Kolmogorov complexity approximations as a means to detect similarity between general textual documents, in order to support mapping and visualization techniques for corpora exploration. The calculation of this similarity measure requires no intermediate representation of a corpus (such as vector representation) and therefore no pre-processing or parametrization steps. That makes it very attractive for a wider range of exploratory applications compared to conventional measures that need vector-based text representations. The visual layout used here is based on fast distance multi-dimensional projections. It is shown that the similarity measure and the resulting maps present very good precision and that the approach can be used successfully for visual analysis of automatically generated text maps.
机译:在一个充满各种来源的文本的世界中,找到一种方法来映射存在于书面文档中的信息具有一定的战略意义,这种形式可以帮助用户在特定的文本数据集中找到并关联重要的信息。基于内容的地图可以支持对文本数据集的极为有用的探索。本文提出并评估了使用Kolmogorov复杂度近似作为检测一般文本文档之间相似性的方法,以支持语料库浏览的映射和可视化技术。该相似性度量的计算不需要语料的中间表示(例如矢量表示),因此不需要预处理或参数化步骤。与需要基于矢量的文本表示形式的常规度量相比,这对于更广泛的探索性应用非常有吸引力。此处使用的视觉布局基于快速距离多维投影。结果表明,相似性度量和生成的地图具有非常好的精度,该方法可成功用于自动生成的文本地图的可视化分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号