首页> 外文学位 >Realizing a feature-based framework for scientific data mining.
【24h】

Realizing a feature-based framework for scientific data mining.

机译:实现基于特征的科学数据挖掘框架。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation presents an efficient realization of a feature based framework for analyzing scientific data. The main components of the framework include: feature detection, feature classification, feature verification, and modeling the evolutionary behavior of the features. The usefulness of first three steps is shown on datasets originating from computational molecular dynamics. Modeling the evolutionary behavior of the features involves: (i) understanding the trajectory of an individual feature; (ii) discovering the change which features undergo due to various interactions; and (iii) understanding and deriving various spatio-temporal relationships among features.; A rule-based feature detection algorithm extracts the features. These rules are developed by making use of the domain specific properties. The algorithm is highly robust in the presence of noise. The features detected from noisy datasets are consistent with the features detected from noise-free data.; The trajectory of a feature is represented by using physically meaningful parameters: linear velocity, angular velocity and scale parameters. Most of the existing techniques abstract the feature to a single point and only take into account the change in the position. The proposed representation scheme accounts for change in position, orientation and size of the feature. The representation also aids in establishing various spatial and spatio-temporal relationships among the features. The usefulness of the scheme is evaluated on datasets originating from molecular dynamics and fluid flows.; The interactions among co-existing features is captured by a set of critical events: continuation, merging, bifurcation, creation and dissipation. The algorithms establish correspondence among features based on the degree of overlap between the features in consecutive time steps.; Finally, a visual toolkit is developed which aids the user in establishing various spatial and spatial-temporal relationships. The toolkit achieves real time performance. The usefulness of the toolkit is shown on datasets originating from 2D fluid-flow datasets.; Prior to the developed algorithms, manual analysis of a very small dataset of 100 MB used to take around 6 weeks. However, now feature extraction and classification tasks for a 10 GB molecular dynamics dataset can be performed in 25 hours which is faster than data generation time of 35 hours. (Abstract shortened by UMI.)
机译:本文提出了一种基于特征的科学数据分析框架的有效实现。该框架的主要组件包括:特征检测,特征分类,特征验证以及对特征的演化行为进行建模。前三个步骤的有用性在源自计算分子动力学的数据集中显示。对特征的演化行为进行建模涉及:(i)了解单个特征的轨迹; (ii)发现各种交互作用引起的功能变化; (iii)理解和推导特征之间的各种时空关系。基于规则的特征检测算法提取特征。这些规则是通过使用域特定的属性来开发的。该算法在存在噪声的情况下具有很高的鲁棒性。从噪声数据集中检测到的特征与从无噪声数据中检测出的特征一致。使用物理上有意义的参数表示特征的轨迹:线速度,角速度和比例参数。大多数现有技术将特征抽象到一个点,并且仅考虑位置的变化。拟议的表示方案考虑了特征的位置,方向和大小的变化。该表示还有助于在要素之间建立各种空间和时空关系。该方案的有用性是在源自分子动力学和流体流动的数据集上进行评估的。一系列关键事件捕获了并存特征之间的相互作用:连续,合并,分叉,创建和耗散。该算法基于连续时间步长中特征之间的重叠程度在特征之间建立对应关系。最后,开发了一种视觉工具包,可帮助用户建立各种空间和时空关系。该工具包可实现实时性能。该工具包的有用性在源自2D流体数据集的数据集上显示。在开发算法之前,人工分析一个非常小的100 MB数据集通常需要大约6周的时间。但是,现在可以在25小时内完成10 GB分子动力学数据集的特征提取和分类任务,这比35小时的数据生成时间快。 (摘要由UMI缩短。)

著录项

  • 作者

    Mehta, Sameep.;

  • 作者单位

    The Ohio State University.;

  • 授予单位 The Ohio State University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 197 p.
  • 总页数 197
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号