首页> 外文学位 >Visual summarization and exploration of text streams.
【24h】

Visual summarization and exploration of text streams.

机译:可视化摘要和文本流探索。

获取原文
获取原文并翻译 | 示例

摘要

We are in the midst of a data explosion. Data in text format such as digitalized textural data and data from new social media like blogs and Twitter have been generated at an unprecedented rate. For example, Google Books has scanned and digitalized 15 million books, greatly increasing the accessibility of information all around the world. Twitter publishes more than 300 new messages every second, and the numbers keep increasing. However, exploring and analyzing this enormous amount of data become increasingly difficult. Information visualization can help analyze huge and complex data by turning them into visual representations to exploit the tremendous pattern-recognition capability of the human visual system.;In this thesis, we propose three advanced text visualization techniques for summarizing and exploring various relation patterns existing in large time-varying text document collections. This thesis is composed of three main parts, each of which addresses an important problem in text visualization. In the first part, we present an enhanced word cloud layout that keeps the semantic relations between the displayed words in a sequence of word clouds generated over time for dynamic document data. In the second part, TextWheel is introduced to visualize complex micro-macro relations within news streams. In the last part, we deal with the splitting/merging patterns between topics that are extracted from text streams. We proposed TextFlow, which is inspired by river flows, to show various topic evolution patterns at different granularities. The effectiveness of these methods has been demonstrated through extensive experiments using both synthetic data and data from real applications.
机译:我们正处于数据爆炸之中。文本格式的数据(如数字化的纹理数据)以及来自新的社交媒体(如博客和Twitter)的数据以前所未有的速度生成。例如,Google图书对1500万本图书进行了扫描和数字化处理,极大地提高了全球信息的可访问性。 Twitter每秒发布300多个新消息,并且数量还在不断增加。但是,探索和分析大量数据变得越来越困难。信息可视化可以通过将庞大而复杂的数据转化为可视化表示,以利用人类视觉系统的巨大模式识别能力来帮助分析这些复杂数据。本文提出了三种先进的文本可视化技术,用于总结和探索现有的各种关系模式。大量随时间变化的文本文档集合。本文由三个主要部分组成,每个部分解决了文本可视化中的一个重要问题。在第一部分中,我们提出了一种增强的词云布局,该布局可以在显示的词之间保持语义关系,而这些语义关系是针对动态文档数据随时间生成的一系列词云。在第二部分中,介绍了TextWheel,以可视化新闻流中的复杂微宏关系。在最后一部分中,我们处理了从文本流中提取的主题之间的拆分/合并模式。我们提出了TextFlow,它受到河流的启发,以不同的粒度显示了各种主题演变模式。这些方法的有效性已通过使用合成数据和来自实际应用程序的数据的广泛实验证明。

著录项

  • 作者

    Cui, Weiwei.;

  • 作者单位

    Hong Kong University of Science and Technology (Hong Kong).;

  • 授予单位 Hong Kong University of Science and Technology (Hong Kong).;
  • 学科 Computer Science.;Artificial Intelligence.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 130 p.
  • 总页数 130
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号