首页> 外文OA文献 >Mining Massive Moving Object Datasets from RFID Data Flow Analysis to Traffic Mining
【2h】

Mining Massive Moving Object Datasets from RFID Data Flow Analysis to Traffic Mining

机译:从RFID数据流分析到交通挖掘的大规模运动目标数据集挖掘

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Effective management of moving object data, originating in supply chain operations, road network monitoring, and other RFID applications, is a major challenge facing society today, with important implications into business optimization, city planning, privacy, and national security. Towards the solution of this problem, I have developed a comprehensive framework for warehousing, mining, and cleaning large moving object data sets.The proposed framework addresses the following key challenges present in object tracking applications: (1) Datasets are massive, a single large retailer may generate terabytes of moving object data per day. (2) Data is usually dirty, many tags are not detected at all, or are incorrectly detected at the wrong location. (3) Dimensionality is very large, there are spatio-temporal dimensions defined by object trajectories, sensor related dimensions such as temperature or humidity recorded at different locations, and item level dimensions describing the attributes of each object. (4) Data analysis and mining need to navigate and discover interesting patterns, at different levels of abstraction, and involving a large number of interrelated records in multiple datasets.At the core of my dissertation, is the RFID data warehousing engine. It receives clean data from the cleaning engine, and provides highly compressed data, at multiple levels of abstraction, to the mining engine. The mining engine is composed of three modules. The first, mines commodity flow patterns that identify general flow trends and significant flow exceptions in a large supply chain operation. The second, makes route recommendations, based on observed driving behavior and traffic conditions. And the third, discovers and characterizes a wide variety of traffic anomalies on a road network.RFID Data Warehousing. A data warehouse is an enterprise level data repository that collects and integrates organizational data in order to provide decision support analysis. At the core of the data warehouse is the data cube, which computes an aggregate measure (e.g., sum, avg, count) for all possible combination of dimensions of a fact table (e.g., sales for 2004, in the northeast). Online analytical processing (OLAP) operations provide the means for exploration and analysis of the data cube. My research on this direction has extended the data cube to handle moving object data sets, by significantly compressing such data, and proposing a new aggregation mechanism that preserves its path structure. The RFID warehouse is built around the concept of the movement graph, which records both spatio-temporal and item level information in a compact model. We show that compression and query processing efficiency can be significantly improved, by partitioning the movement graph around gateway nodes, which are special locations connecting different spatial regions in the graph.RFID Data Cleaning. Efficient and accurate data cleaning is an essential task for the successful deployment of applications, such as object tracking and inventory management systems, based on RFID technology. Most existing data cleaning approaches do not consider the overall cost of cleaning in an environment that possibly includes thousands of readers and millions of tags. We propose a cleaning framework that takes an RFID data set and a collection of cleaning methods, with associated costs, and induces a cleaning plan that optimizes the overall accuracy-adjusted cleaning costs. The cleaning plan determines the conditions under which inexpensive cleaning methods can be safely applied, the conditions under which more expensive methods are absolutely necessary, and those cases when a combination of several methods is the optimal policy. Through a variety of experiments we show that our framework can achieve better accuracy at a fraction of the cost than that obtained by applying any single technique.Mining Flow Trends. An important application of moving objects is mining movement patterns of objects in supply chain operations. In this context, one may ask questions regarding correlations between time spent at quality control locations and laptop return rates, salient characteristics of dairy products discarded from stores, or ships that spent abnormally long at intermediate ports before arrival. The gigantic size of such data, and the diversity of queries over flow patterns pose great challenges to traditional workflow induction and analysis technologies since processing may involve retrieval and reasoning over a large number of inter-related tuples through different stages of object movements. Creating a complete workflow that records all possible commodity movements and that incorporates time will be prohibitively expensive since there can be billions of different location and time combinations. I propose the FlowGraph, as a compressed probabilistic workflow, that captures the general flow trends and significant exceptions of a data set. The FlowGraph achieves compression by recording the set of major flow trends, and the set of non-redundant flow exceptions (i.e., abnormal transitions or durations) present in the data. I extended the concept of the FlowGraph to incorporate multiple levels of abstraction of object and path characteristics, and defined the FlowCube, which is a data cube that records FlowGraphs as measures, and that allows OLAP reasoning on object flows.Mining Route Recommendations. Modern highway networks provide several mechanisms for automatic vehicle identification. The most common are the use of toll collection transponders to detect vehicles at multiple points in the network, and the use of cameras to automatically identify license plates. Such information provides valuable patterns useful to online navigation systems and route planning applications. Most existing route planning applications use a fastest path algorithm based on static or dynamic models of road speeds, but such models in general disregard observed driver behavior, and other important factors such as weather, car-pool availability, or vehicle type. Existing solutions may, for example, provide a route that is the fastest one, but that goes through a high crime area, and is thus avoided by experienced drivers. We propose a traffic-mining-based path-finding method that mines speed and driving models from historic traffic data, and uses them to compute fast routes that are well supported by historic driving behavior under the set of relevant driving and traffic conditions.Mining Traffic Anomalies. Identification and characterization of traffic anomalies on massive road networks is a vital component of traffic monitoring. Anomaly identification can be used to reduce congestion, increase safety, and provide transportation engineers with better information for traffic forecasting and road network design. However, due to the size, complexity and dynamics of such transportation networks, it is challenging to automate the process. We propose a multi-dimensional mining framework that can be used to identify a concise set of anomalies from massive traffic monitoring data, and further overlay, contrast, and explore such anomalies in multi-dimensional space.
机译:有效地管理源于供应链运作,道路网络监控和其他RFID应用程序的移动对象数据,是当今社会面临的主要挑战,对业务优化,城市规划,隐私和国家安全具有重要意义。为了解决这个问题,我开发了一个用于仓储,挖掘和清理大型运动对象数据集的综合框架,该框架解决了对象跟踪应用程序中存在的以下主要挑战:(1)数据集是庞大的,单一的大型零售商每天可能会生成TB级的移动对象数据。 (2)数据通常很脏,根本没有检测到很多标签,或者在错误的位置错误地检测了标签。 (3)维度非常大,存在由对象轨迹定义的时空维度,与传感器相关的维度(例如记录在不同位置的温度或湿度)以及描述每个对象属性的项目级别维度。 (4)数据分析和挖掘需要​​导航和发现有趣的模式,这些模式处于不同的抽象级别,并且涉及多个数据集中的大量相互关联的记录。本文的核心是RFID数据仓库引擎。它从清理引擎接收清理数据,并以多个抽象级别向挖掘引擎提供高度压缩的数据。挖掘引擎由三个模块组成。首先,挖掘商品流模式,以识别大型供应链运作中的总体流向和重大流向异常。第二,根据观察到的驾驶行为和交通状况提出路线建议。第三,发现并表征道路网络上的各种交通异常。RFID数据仓库。数据仓库是企业级数据存储库,它收集和集成组织数据以提供决策支持分析。数据仓库的核心是数据多维数据集,它为事实表的所有维度组合(例如,东北地区2004年的销售额)计算汇总度量(例如,总和,平均数,计数)。在线分析处理(OLAP)操作提供了探索和分析数据立方体的方法。我对此方向的研究已经扩展了数据立方体,以通过显着压缩此类数据来处理移动对象数据集,并提出了一种新的聚合机制来保留其路径结构。 RFID仓库是围绕移动图的概念构建的,该图以紧凑模型记录时空信息和物品级别信息。我们展示了通过在网关节点周围划分移动图可以显着提高压缩和查询处理效率,网关节点是网关图中连接不同空间区域的特殊位置。高效,准确的数据清理是成功部署基于RFID技术的应用程序(例如对象跟踪和库存管理系统)的重要任务。大多数现有的数据清理方法并未考虑在可能包含数千个阅读器和数百万个标签的环境中进行清理的总体成本。我们提出了一个清洁框架,该框架采用RFID数据集和清洁方法的集合以及相关费用,并提出了一项清洁计划,该计划可优化调整精度后的总体清洁成本。清洁计划确定可以安全地应用便宜的清洁方法的条件,绝对必要使用更昂贵的方法的条件以及几种方法的组合是最佳策略的情况。通过各种实验,我们证明了与使用任何一种单一技术相比,我们的框架可以以较低的成本实现更高的准确性。移动对象的重要应用是在供应链操作中挖掘对象的移动模式。在这种情况下,人们可能会问有关在质量控制地点花费的时间与笔记本电脑退货率,从商店丢弃的乳制品的显着特征,或在到达港口之前在中间港口花费异常长时间的船舶之间的相关性的问题。由于处理可能涉及通过对象移动的不同阶段对大量相互关联的元组进行检索和推理,因此此类数据的巨大规模以及对流模式的查询多样性给传统的工作流归纳和分析技术带来了巨大挑战。创建一个完整的工作流程以记录所有可能的商品移动并记录时间,这将是非常昂贵的,因为可能有数十亿个不同的位置和时间组合。我提出了FlowGraph,作为一种压缩的概率工作流程,以捕获总体流量趋势和数据集的重大异常。 FlowGraph通过记录数据中存在的主要流量趋势集和一组非冗余流量异常(即异常过渡或持续时间)来实现压缩。我扩展了FlowGraph的概念以合并对象和路径特征的多个抽象级别,并定义了FlowCube,这是一个将FlowGraphs记录为度量的数据立方体,并允许对对象流进行OLAP推理。现代高速公路网络为自动车辆识别提供了多种机制。最常见的是使用收费应答器来检测网络中多个点的车辆,以及使用摄像头来自动识别车牌。这些信息提供了对在线导航系统和路线规划应用有用的有价值的模式。大多数现有的路线规划应用程序都使用基于道路速度的静态或动态模型的最快路径算法,但是此类模型通常会忽略观察到的驾驶员行为以及其他重要因素,例如天气,拼车可用性或车辆类型。例如,现有的解决方案可以提供最快的路线,但是要经过犯罪率高的地区,因此有经验的驾驶员可以避免。我们提出了一种基于交通挖掘的路径查找方法,该方法可从历史交通数据中挖掘速度和驾驶模型,并使用它们来计算在相关驾驶和交通条件下可被历史驾驶行为很好地支持的快速路线。异常现象。大规模道路网络上交通异常的识别和表征是交通监控的重要组成部分。异常识别可用于减少拥塞,提高安全性,并为运输工程师提供更好的交通预测和路网设计信息。然而,由于这种运输网络的规模,复杂性和动态性,自动化该过程具有挑战性。我们提出了一种多维挖掘框架,该框架可用于从海量交通监控数据中识别出一组简明的异常,并进一步叠加,对比和探索多维空间中的此类异常。

著录项

  • 作者

    Gonzalez Hector;

  • 作者单位
  • 年度 2008
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号