首页> 外文学位 >A visual analytics approach to comparing cohorts of event sequences.
【24h】

A visual analytics approach to comparing cohorts of event sequences.

机译:一种用于比较事件序列队列的可视化分析方法。

获取原文
获取原文并翻译 | 示例

摘要

Sequences of timestamped events are currently being generated across nearly every domain of data analytics, from e-commerce web logging to electronic health records used by doctors and medical researchers. Every day, this data type is reviewed by humans who apply statistical tests, hoping to learn everything they can about how these processes work, why they break, and how they can be improved upon.;To further uncover how these processes work the way they do, researchers often compare two groups, or cohorts, of event sequences to find the differences and similarities between outcomes and processes. With temporal event sequence data, this task is complex because of the variety of ways single events and sequences of events can differ between the two cohorts of records: the structure of the event sequences (e.g., event order, co-occurring events, or frequencies of events), the attributes about the events and records (e.g., gender of a patient), or metrics about the timestamps themselves (e.g., duration of an event). Running statistical tests to cover all these cases and determining which results are significant becomes cumbersome.;Current visual analytics tools for comparing groups of event sequences emphasize a purely statistical or purely visual approach for comparison. Visual analytics tools leverage humans' ability to easily see patterns and anomalies that they were not expecting, but is limited by uncertainty in findings. Statistical tools emphasize finding significant differences in the data, but often requires researchers have a concrete question and doesn't facilitate more general exploration of the data.;Combining visual analytics tools with statistical methods leverages the benefits of both approaches for quicker and easier insight discovery. Integrating statistics into a visualization tool presents many challenges on the frontend (e.g., displaying the results of many different metrics concisely) and in the backend (e.g., scalability challenges with running various metrics on multi-dimensional data at once). I begin by exploring the problem of comparing cohorts of event sequences and understanding the questions that analysts commonly ask in this task. From there, I demonstrate that combining automated statistics with an interactive user interface amplifies the benefits of both types of tools, thereby enabling analysts to conduct quicker and easier data exploration, hypothesis generation, and insight discovery. The direct contributions of this dissertation are: (1) a taxonomy of metrics for comparing cohorts of temporal event sequences, (2) a statistical framework for exploratory data analysis with a method I refer to as high-volume hypothesis testing (HVHT), (3) a family of visualizations and guidelines for interaction techniques that are useful for understanding and parsing the results, and (4) a user study, five long-term case studies, and five short-term case studies which demonstrate the utility and impact of these methods in various domains: four in the medical domain, one in web log analysis, two in education, and one each in social networks, sports analytics, and security.;My dissertation contributes an understanding of how cohorts of temporal event sequences are commonly compared and the difficulties associated with applying and parsing the results of these metrics. It also contributes a set of visualizations, algorithms, and design guidelines for balancing automated statistics with user-driven analysis to guide users to significant, distinguishing features between cohorts. This work opens avenues for future research in comparing two or more groups of temporal event sequences, opening traditional machine learning and data mining techniques to user interaction, and extending the principles found in this dissertation to data types beyond temporal event sequences.
机译:目前,几乎在每个数据分析领域中都会生成带有时间戳记的事件序列,从电子商务Web记录到医生和医学研究人员使用的电子健康记录。每天,使用统计测试的人员都会审查此数据类型,以期希望他们能学习有关这些过程如何工作,为什么破裂以及如何对其进行改进的一切知识;以进一步了解这些过程如何以它们的工作方式工作的确,研究人员经常将两组或一组事件序列进行比较,以找出结果和过程之间的差异和相似性。对于临时事件序列数据,由于单个事件和事件序列在两个记录组之间可能有不同的方式多种多样,因此该任务很复杂:事件序列的结构(例如,事件顺序,并发事件或频率)事件的名称,事件和记录的属性(例如,患者的性别)或时间戳本身的度量(例如,事件的持续时间)。运行统计测试以涵盖所有这些情况并确定哪些结果有意义变得很麻烦。当前用于比较事件序列组的视觉分析工具强调纯统计或纯视觉方法进行比较。可视化分析工具利用人类的能力轻松查看他们不期望的模式和异常,但受到发现不确定性的限制。统计工具强调发现数据中的显着差异,但通常要求研究人员有一个具体的问题,并且不利于对数据进行更广泛的探索。将可视化分析工具与统计方法结合使用可充分利用这两种方法的优势,从而可以更快,更轻松地发现问题。 。将统计数据集成到可视化工具中会给前端(例如,简洁地显示许多不同指标的结果)和后端(例如,一次在多维数据上运行各种指标的可伸缩性挑战)带来许多挑战。我首先探讨比较事件序列队列的问题,并了解分析人员在此任务中通常提出的问题。从那里,我演示了将自动统计信息与交互式用户界面相结合可放大这两种工具的优势,从而使分析师能够更快,更轻松地进行数据探索,假设生成和洞察力发现。本文的直接贡献是:(1)用于比较时间事件序列队列的指标分类法;(2)使用我称为大容量假设检验(HVHT)的方法进行探索性数据分析的统计框架,( 3)一系列可视化和交互技术指南,有助于理解和解析结果,以及(4)一项用户研究,五个长期案例研究和五个短期案例研究,这些研究证明了Google的效用和影响这些方法在各个领域中使用:医学领域中的四种,Web日志分析中的一种,教育领域中的两种,以及社交网络,体育分析和安全性中的一种。;我的论文有助于理解时态事件序列的共同点比较以及应用和解析这些度量结果的困难。它还提供了一组可视化,算法和设计准则,用于平衡自动统计信息和用户驱动的分析,以指导用户使用同类群组之间的重要区别特征。这项工作为比较两个或更多组的时间事件序列,为用户交互开放传统的机器学习和数据挖掘技术以及将本文中发现的原理扩展到了时间事件序列之外的数据类型开辟了途径。

著录项

  • 作者

    Malik, Sana.;

  • 作者单位

    University of Maryland, College Park.;

  • 授予单位 University of Maryland, College Park.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2016
  • 页码 177 p.
  • 总页数 177
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:50:24

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号