An Efficient Parallel Approach of Parsing and Indexing for Large-Scale XML Datasets

机译：大型XML数据集的高效并行解析和索引方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

MapReduce is a widely adopted computing framework for data-intensive applications running on clusters. We propose an approach to exploit data parallelisms in XML processing using MapReduce in Hadoop. Our solution seamlessly integrates data storage, labelling, indexing, and parallel queries to process a massive amount of XML data. Specifically, we introduce an SDN labelling algorithm and a distributed hierarchical index using DHTs, we develop an efficient data retrieval approach called B-SLCA. More importantly, we design an advanced two-phase MapReduce solution that is able to efficiently address the issues of labelling, indexing, and query processing on big XML data. We implemented our solution on a real-world Hadoop cluster processing the real-world datasets. Our experimental results show that SDN outperforms NCIM by up to a factor of 1.36 with an average of 1.17, our BSLCA outperforms BwdSLCA by up to a factor of 1.96 with an average of 1.2.

机译：MapReduce是为群集上运行的数据密集型应用程序广泛采用的计算框架。我们提出了一种在Hadoop中使用MapReduce利用XML处理中的数据并行性的方法。我们的解决方案无缝集成了数据存储，标签，索引和并行查询，以处理大量XML数据。具体来说，我们介绍了使用DHT的SDN标记算法和分布式层次结构索引，我们开发了一种称为B-SLCA的有效数据检索方法。更重要的是，我们设计了一种先进的两阶段MapReduce解决方案，该解决方案能够有效解决大XML数据上的标签，索引和查询处理问题。我们在处理真实数据集的真实Hadoop集群上实施了我们的解决方案。我们的实验结果表明，SDN比NCIM高1.36倍，平均为1.17，我们的BSLCA比BwdSLCA高1.96倍，平均为1.2。

著录项

来源
《IEEE International Conference on Parallel and Distributed Systems》|2016年|184-191|共8页
会议地点
作者
Kunfang Song; Hongwei Lu; Xiao Qin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
XML; Labeling; Indexing; Semantics; Keyword search; Query processing; Encoding;

机译：XML;标签;索引;语义;关键词搜索;查询处理;编码;

相似文献

外文文献
中文文献
专利

1. MMSVC: An efficient unsupervised learning approach for large-scale datasets [J] . Hong Gu, Guangzhou Zhao, Jianliang Zhang Neurocomputing . 2012,第期

机译：MMSVC：一种针对大型数据集的有效无监督学习方法
2. Embedded XML DOM Parser: An Approach for XML Data Processing on Networked Embedded Systems with Real-Time Requirements [J] . Esther Mínguez Collado, MAngeles Cavia Soto, José A Pérez García, EURASIP journal on embedded systems . 2007,第1期

机译：嵌入式XML DOM解析器：一种对具有实时需求的网络嵌入式系统进行XML数据处理的方法
3. Effective and efficient indexing in cross-modal hashing-based datasets [J] . Intelligence: A Multidisciplinary Journal . 2020,第期

机译：基于跨模型散列的数据集有效和高效的索引
4. An Efficient Parallel Approach of Parsing and Indexing for Large-Scale XML Datasets [C] . Kunfang Song, Hongwei Lu, Xiao Qin IEEE International Conference on Parallel and Distributed Systems . 2016

机译：大型XML数据集解析和索引的有效并行方法
5. Parallel XML and XPath Parsing [D] . Zhang, Ying. 2018

机译：并行XML和XPath解析
6. REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets [O] . Camille Marchet, Zamin Iqbal, Daniel Gautheret, -1

机译：REINDEER：在测序数据集中高效索引k-mer的存在和丰度
7. Parallel and Distributed Approach for Processing Large-Scale XML Datasets [O] . Zacharia Fadika, Michael R. Head, Madhusudhan Govindaraju 2012

机译：处理大规模XML数据集的并行和分布式方法
8. Evaluation of Efficient XML Interchange (EXI) for Large Datasets and as an Alternative to Binary JSON Encodings. [R] . 2015

机译：评估大数据集的高效XmL交换（EXI）和二进制JsON编码的替代方法。

An Efficient Parallel Approach of Parsing and Indexing for Large-Scale XML Datasets

摘要

著录项

相似文献

相关主题

期刊订阅