首页> 外文会议>International Conference on Computing for Geospatial Research and Application >Organizing and Storing Method for Large-Scale Unstructured Data Set with Complex Content
【24h】

Organizing and Storing Method for Large-Scale Unstructured Data Set with Complex Content

机译:具有复杂内容的大规模非结构化数据集的组织和存储方法

获取原文

摘要

At the arrival of big data era, traditional geological industries are still using the traditional way to produce and collect data, and geosciences information is represented as unstructured data in various forms. These data is often categorized together according to a relatively simple way, thus forming a number of datasets with complex internal structure. However, this is not a good expression of rich geoscience information carried by unstructured data and it is also inconvenient to express complex relationships among the information, even against to find in-depth knowledge across datasets. Meanwhile, existence forms of such data also impeded the application of advanced technological methods. In an attempt to solve the problem, this paper proposes a multi-granularity content tree model and pay-as-you-go mode to support evolvement data modeling. These features help to split the data model, position data content precisely and to expand the dimensions of the main features that described according to the data subject, and then gradually discover data contained information and relationships among the information. Considering the large size of the data features, this paper designs data persistence mode based on HBase, so as to achieve the purpose of data processing by using technologies within the Hadoop system. This article also presents data content extraction and content tree initial state algorithms under MapReduce framework, and dynamic loading and local caching algorithms of content tree, thus forming a basic extract-store-load process. An application example of the model about the geological industries is given at the end.
机译:在大数据时代到来之际,传统的地质工业仍在使用传统的方式来产生和收集数据,而地球科学信息则以各种形式的非结构化数据来表示。这些数据通常按照相对简单的方式分类在一起,从而形成许多内部结构复杂的数据集。但是,这不是由非结构化数据携带的丰富的地球科学信息的良好表达,即使在跨数据集中查找深入知识的情况下,也难以表达信息之间的复杂关系。同时,这些数据的存在形式也阻碍了先进技术方法的应用。为了解决该问题,本文提出了一种多粒度内容树模型和即付即用模式来支持演化数据建模。这些功能有助于拆分数据模型,精确定位数据内容,并扩展根据数据主体描述的主要功能的维度,然后逐步发现包含信息的数据以及这些信息之间的关系。考虑到数据特征的庞大性,本文设计了基于HBase的数据持久化模式,以达到利用Hadoop系统内部技术进行数据处理的目的。本文还介绍了MapReduce框架下的数据内容提取和内容树初始状态算法,以及内容树的动态加载和本地缓存算法,从而形成了基本的提取-存储-加载过程。最后给出了一个关于地质产业模型的应用实例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号