Organizing and Storing Method for Large-Scale Unstructured Data Set with Complex Content

机译：具有复杂内容的大规模非结构化数据集的组织和存储方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

At the arrival of big data era, traditional geological industries are still using the traditional way to produce and collect data, and geosciences information is represented as unstructured data in various forms. These data is often categorized together according to a relatively simple way, thus forming a number of datasets with complex internal structure. However, this is not a good expression of rich geoscience information carried by unstructured data and it is also inconvenient to express complex relationships among the information, even against to find in-depth knowledge across datasets. Meanwhile, existence forms of such data also impeded the application of advanced technological methods. In an attempt to solve the problem, this paper proposes a multi-granularity content tree model and pay-as-you-go mode to support evolvement data modeling. These features help to split the data model, position data content precisely and to expand the dimensions of the main features that described according to the data subject, and then gradually discover data contained information and relationships among the information. Considering the large size of the data features, this paper designs data persistence mode based on HBase, so as to achieve the purpose of data processing by using technologies within the Hadoop system. This article also presents data content extraction and content tree initial state algorithms under MapReduce framework, and dynamic loading and local caching algorithms of content tree, thus forming a basic extract-store-load process. An application example of the model about the geological industries is given at the end.

机译：在大数据时代到来之际，传统的地质工业仍在使用传统的方式来产生和收集数据，而地球科学信息则以各种形式的非结构化数据来表示。这些数据通常按照相对简单的方式分类在一起，从而形成许多内部结构复杂的数据集。但是，这不是由非结构化数据携带的丰富的地球科学信息的良好表达，即使在跨数据集中查找深入知识的情况下，也难以表达信息之间的复杂关系。同时，这些数据的存在形式也阻碍了先进技术方法的应用。为了解决该问题，本文提出了一种多粒度内容树模型和即付即用模式来支持演化数据建模。这些功能有助于拆分数据模型，精确定位数据内容，并扩展根据数据主体描述的主要功能的维度，然后逐步发现包含信息的数据以及这些信息之间的关系。考虑到数据特征的庞大性，本文设计了基于HBase的数据持久化模式，以达到利用Hadoop系统内部技术进行数据处理的目的。本文还介绍了MapReduce框架下的数据内容提取和内容树初始状态算法，以及内容树的动态加载和本地缓存算法，从而形成了基本的提取-存储-加载过程。最后给出了一个关于地质产业模型的应用实例。

著录项

来源
《International Conference on Computing for Geospatial Research and Application》|2014年|70-76|共7页
会议地点
作者
Wei Dongqi; Li Chaoling; Naheman Wumuti; Wei Jianxin; Yang Junlu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data Model; Geosciences Information; Large-scale Data; Unstructured Data;

机译：数据模型地球科学信息;大规模数据非结构化数据;

相似文献

外文文献
中文文献
专利

1. Storing and Handling Complex Content for Large-scale Data [J] . Hong Li Xu, Hong Hua Jiang, Qiu Lan Wu, Journal of Communications . 2018,第12期

机译：存储和处理大型数据的复杂内容
2. Novel method to construct large-scale design space in lubrication process utilizing Bayesian estimation based on a small-scale design-of-experiment and small sets of large-scale manufacturing data [J] . MaedaJ., SuzukiT., TakayamaK. Drug development and industrial pharmacy . 2012,第12期

机译：基于小规模实验设计和少量大规模制造数据的贝叶斯估计在润滑过程中构造大规模设计空间的新方法
3. Robust Three-Dimensional Level-Set Method for Evolving Fronts on Complex Unstructured Meshes [J] . Wei Ran, Bao Futing, Liu Yang, Mathematical Problems in Engineering . 2018,第12期

机译：复杂非结构网格上前沿演化的鲁棒三维水平集方法
4. Organizing and Storing Method for Large-Scale Unstructured Data Set with Complex Content [C] . Wei Dongqi, Li Chaoling, Naheman Wumuti, International Conference on Computing for Geospatial Research and Application . 2014

机译：用于具有复杂内容的大规模非结构化数据集的组织和存储方法
5. Applying Statistical Methods to Unstructured Data Sets Using Text Pattern Analysis [D] . Iheagwam, Success Chijioke. 2018

机译：使用文本模式分析将统计方法应用于非结构化数据集
6. Are open set classification methods effective on large-scale datasets? [O] . Ryne Roady, Tyler L. Hayes, Ronald Kemker, 2020

机译：开放式分类方法对大型数据集有效吗？
7. Distributed Spectral Graph Methods for Analyzing Large-Scale Unstructured Biomedical Data [O] . Quinn Shannon 2014

机译：分布式光谱图方法分析大规模非结构化生物医学数据

Organizing and Storing Method for Large-Scale Unstructured Data Set with Complex Content

摘要

著录项

相似文献

相关主题

期刊订阅