首页> 外文学位 >RFID Big Data Warehousing and Analytics in Cloud Computing Environment.
【24h】

RFID Big Data Warehousing and Analytics in Cloud Computing Environment.

机译:云计算环境中的RFID大数据仓库和分析。

获取原文
获取原文并翻译 | 示例

摘要

Radio Frequency Identification (RFID) technology is a prevalent tool in tracking moving objects. In supply chain management systems, most major retailers use RFID systems to track the movement of products from suppliers to warehouses, store backrooms, and eventually points of sale. The amount of information generated by such systems can be enormous since each individual item (a pallet, a box, or a SKU) will leave a trail of data as it moves to different locations. Data warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. Warehousing and analyzing massive RFID data sets is an important problem with great potential benefits for inventory managing, object tracking, and product procurement processing.;Many industries that have been collecting digital data are having difficulties scaling up their systems because of the large size of the data. Since the data sets are so large and complex, it becomes difficult and expensive to process using traditional database management tools and data processing applications. Cloud computing services and big data platforms, such as Hadoop, can scale to handle much larger data sets.;In this thesis, I propose two RFID data warehouse designs, normalized schema and denormalized schema, that can handle massive amounts of RFID data and support a variety of OLAP queries as well as location and path related queries. This thesis implements the proposed schemas using a relational database system (PostgreSQL) and a big data platform (Hadoop/Hive), and then conducts performance tests with the cloud computing service. I closely studied how the schema designs, database systems, data storage formats, and the number of Hadoop nodes affected the performance for each type of queries I implemented.;A lot of businesses are interested in switching from relational databases to big data platforms, thinking this will enhance the query performance. This thesis shows that a big data platform does not always have a better performance than a relational database when there are less than a few billion records. Also, when the size of the data is not big enough, increasing the number of Hadoop nodes is not always effective because the percentage of wait-time takes longer than the percentage of query-time. Once the characteristics of data and the database query optimizer are understood, there are extensive opportunities to increase the query performance in both systems.
机译:射频识别(RFID)技术是跟踪移动物体的一种流行工具。在供应链管理系统中,大多数主要零售商使用RFID系统来跟踪产品从供应商到仓库,商店后勤室以及最终销售点的移动。由于每个单独的项目(托盘,盒子或SKU)在移动到不同位置时都会留下数据痕迹,因此此类系统生成的信息量可能非常庞大。数据仓库为业务主管提供体系结构和工具,以系统地组织,理解和使用其数据来制定战略决策。仓储和分析海量RFID数据集是一个重要问题,对库存管理,对象跟踪和产品采购处理具有巨大的潜在好处。;许多已经收集数字数据的行业由于其庞大的系统而难以扩展其系统。数据。由于数据集如此之大和复杂,因此使用传统的数据库管理工具和数据处理应用程序进行处理变得既困难又昂贵。云计算服务和Hadoop等大数据平台可以扩展以处理更大的数据集。本文提出了两种RFID数据仓库设计,规范化模式和非规范化模式,可以处理大量RFID数据并提供支持。各种OLAP查询以及与位置和路径相关的查询。本文使用关系数据库系统(PostgreSQL)和大数据平台(Hadoop / Hive)来实现所提出的模式,然后使用云计算服务进行性能测试。我仔细研究了架构设计,数据库系统,数据存储格式以及Hadoop节点数如何影响我实现的每种查询的性能。;许多企业都对从关系数据库切换到大数据平台感兴趣,这将提高查询性能。本文表明,当记录少于几十亿时,大数据平台的性能并不总是比关系数据库好。另外,当数据大小不够大时,增加Hadoop节点数并不总是有效的,因为等待时间的百分比要比查询时间的百分比长。一旦了解了数据和数据库查询优化器的特性,便有大量机会提高两个系统中的查询性能。

著录项

  • 作者

    Woo, Yei-Sol.;

  • 作者单位

    Purdue University.;

  • 授予单位 Purdue University.;
  • 学科 Computer science.;Information technology.
  • 学位 M.S.
  • 年度 2015
  • 页码 104 p.
  • 总页数 104
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号