首页> 外文学位 >Detection of Outliers in Spatial-temporal Data.
【24h】

Detection of Outliers in Spatial-temporal Data.

机译:时空数据中异常值的检测。

获取原文
获取原文并翻译 | 示例

摘要

Outlier detection is an important data mining task that is focused on the discovery of objects that deviate significantly when compared with a set of observations that are considered typical. Outlier detection can reveal objects that behave anomalously with respect to other observations, and these objects may highlight current or future problems.;Previous outlier detection methods have focused primarily on only one non-spatial numerical attribute and have not successfully dealt with multiple dimensions. Many previous methods assume a Gaussian distribution of the data which is probably a major fallacy in determining outliers for spatial-temporal data. Most previous efforts did not provide a statistical confidence measure, but including a confidence measure should improve the detection of outliers. Outlier detection is often complicated by noise in the data, so a good outlier detection methodology should be successful in identifying outliers in noisy data. Global outlier methods calculate a single outlier statistic that summarizes the outliers for the entire geographic area and temporal duration, while local outlier methods calculate a outlier statistic for each feature based on its similarity to its neighbors. Previous methods have not been able to determine outliers as the vector of attributes, location, and time change. The objective of my research is to devise a methodology to address these problems and challenges.;The objective of my research is to develop a robust method of diagnosing outliers and to extend it to detecting outliers in spatial-temporal data. A spatial-temporal outlier is an observation whose values are significantly different from those of other spatially and temporally referenced objects in its spatial-temporal neighborhood.;Geographic phenomena are difficult to analyze using traditional data mining methods. Determining relationships among phenomena as they move and change over time is not possible by means of human analysis of spatial-temporal data streams. Also, the volume of spatial-temporal data being collected is increasing steadily due to the usage of cameras, sensors, and mobile devices (e.g., cell phones) and is too much data for the human to analyze.;My method, unlike many detection methods found in the literature, does not require the user to enter the number of outliers to be found or the percentage of outliers to be found and does not assume any distribution of the data (e.g., Gaussian). My method only requires the input of two parameters: the statistical confidence level and the number of nearest neighbors, and only the statistical confidence level is significant. My method allows for different ways to measure the degree of non-conformity and works for high-dimensional data, noisy data, and data with or without clustering information.;The basic outlier detection method was extended to spatial-temporal data by using kernels for the vector of attributes, spatial, and time that provides a capability to focus outlier detection on local neighborhoods, and the user is able to input weights for each of the kernels. Local spatial-temporal outliers are outliers determined within a specific spatial area and time frame which is a subset of the entire spatial area and total temporal duration.;Empirical evaluation was conducted on several datasets with very good results achieved. The datasets increased in complexity and dimensionality. The experiments on these datasets using my method produced results with a high True Positive percentage and a low False Positive percentage.
机译:离群检测是一项重要的数据挖掘任务,着重于发现与一组典型观察值相比有明显偏差的对象。离群值检测可以揭示相对于其他观察结果表现异常的对象,并且这些对象可能会凸显当前或将来的问题。先前的离群值检测方法主要只关注一个非空间数值属性,而未能成功处理多个维度。许多先前的方法都假设数据的高斯分布,这可能是确定时空数据离群值的主要谬误。以前的大多数努力都没有提供统计上的置信度,但是包括置信度应会改善对异常值的检测。离群值检测通常会因数据中的噪声而变得复杂,因此,一种好的离群值检测方法应该能够成功识别出嘈杂数据中的离群值。全局离群值方法会计算单个离群值统计量,该统计量汇总整个地理区域和时间持续时间的离群值,而局部离群值方法会根据每个要素与其相邻要素的相似性为每个要素计算离群值统计量。先前的方法无法将离群值确定为属性,位置和时间变化的向量。我的研究目标是设计一种解决这些问题和挑战的方法。我的目标是开发一种可靠的异常值诊断方法,并将其扩展到检测时空数据中的异常值。时空离群值是一种观测值,其值与时空邻域中的其他时空参考对象的值显着不同。;地理现象很难使用传统的数据挖掘方法进行分析。借助于人类对时空数据流的分析,无法确定现象随时间变化和变化之间的关系。此外,由于使用了相机,传感器和移动设备(例如手机),所收集的时空数据量也在稳步增长,并且对于人类而言,这些数据量太大,难以分析。文献中找到的方法不需要用户输入要找到的离群值的数量或要找到的离群值的百分比,并且不假设数据有任何分布(例如,高斯分布)。我的方法只需要输入两个参数:统计置信度和最近邻居的数量,只有统计置信度才有意义。我的方法允许使用不同的方法来测量不合格程度,并且适用于高维数据,嘈杂数据以及有或没有聚类信息的数据。属性,空间和时间的向量,提供了将异常值检测集中在本地邻域上的功能,并且用户能够输入每个内核的权重。局部时空离群值是在特定空间区域和时间范围内确定的离群值,该特定时间区域是整个空间区域和总时间持续时间的子集。;对多个数据集进行了经验评估,取得了很好的结果。数据集的复杂性和维数增加。使用我的方法对这些数据集进行的实验得出的结果具有较高的“真实肯定”百分比和较低的“错误肯定”百分比。

著录项

  • 作者

    Rogers, James P.;

  • 作者单位

    George Mason University.;

  • 授予单位 George Mason University.;
  • 学科 Information Technology.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 110 p.
  • 总页数 110
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号