首页> 外文会议>IEEE International Conference on Parallel and Distributed Systems >An Efficient Parallel Approach of Parsing and Indexing for Large-Scale XML Datasets
【24h】

An Efficient Parallel Approach of Parsing and Indexing for Large-Scale XML Datasets

机译:大型XML数据集的高效并行解析和索引方法

获取原文

摘要

MapReduce is a widely adopted computing framework for data-intensive applications running on clusters. We propose an approach to exploit data parallelisms in XML processing using MapReduce in Hadoop. Our solution seamlessly integrates data storage, labelling, indexing, and parallel queries to process a massive amount of XML data. Specifically, we introduce an SDN labelling algorithm and a distributed hierarchical index using DHTs, we develop an efficient data retrieval approach called B-SLCA. More importantly, we design an advanced two-phase MapReduce solution that is able to efficiently address the issues of labelling, indexing, and query processing on big XML data. We implemented our solution on a real-world Hadoop cluster processing the real-world datasets. Our experimental results show that SDN outperforms NCIM by up to a factor of 1.36 with an average of 1.17, our BSLCA outperforms BwdSLCA by up to a factor of 1.96 with an average of 1.2.
机译:MapReduce是为群集上运行的数据密集型应用程序广泛采用的计算框架。我们提出了一种在Hadoop中使用MapReduce利用XML处理中的数据并行性的方法。我们的解决方案无缝集成了数据存储,标签,索引和并行查询,以处理大量XML数据。具体来说,我们介绍了使用DHT的SDN标记算法和分布式层次结构索引,我们开发了一种称为B-SLCA的有效数据检索方法。更重要的是,我们设计了一种先进的两阶段MapReduce解决方案,该解决方案能够有效解决大XML数据上的标签,索引和查询处理问题。我们在处理真实数据集的真实Hadoop集群上实施了我们的解决方案。我们的实验结果表明,SDN比NCIM高1.36倍,平均为1.17,我们的BSLCA比BwdSLCA高1.96倍,平均为1.2。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号