首页> 外文会议>International Conference on Advanced Cloud and Big Data >Fast Construction of an Index Tree for Large Non-ordered Discrete Datasets Using Multi-way Top-Down Split and MapReduce
【24h】

Fast Construction of an Index Tree for Large Non-ordered Discrete Datasets Using Multi-way Top-Down Split and MapReduce

机译:使用多路自上而下的拆分和MapReduce快速构建大型无序离散数据集的索引树

获取原文

摘要

Effective indexing schemes are crucial in supporting efficient queries on large datasets from multidimensional Non-ordered Discrete Data Spaces (NDDS) in many applications such as genome sequence analysis in bioinformatics. Although constructing an index structure for a large dataset in an NDDS via a bulk loading technique is quite efficient (comparing to using a conventional tuple loading technique), existing bulk loading techniques cannot meet the scalability requirement for the fast growing sizes of datasets in contemporary NDDS applications. To tackle this challenge, we propose a new bulk loading method for fast construction of an index structure, called the PND-tree, for large datasets in NDDSs. Specifically, utilizing the characteristics of an NDDS and a priori knowledge of the given dataset, we suggest an effective multi-way top-down dataset split strategy with a MapReduce implementation for our bulk loading procedure. Experiments demonstrate that the proposed bulk loading method is quite promising in terms of the index construction efficiency and the resulting index quality, comparing to the conventional tuple loading method and a popular serial bulk loading method for a state-of-arts index tree in NDDSs.
机译:在许多应用中,例如生物信息学中的基因组序列分析,有效的索引方案对于支持从多维无序离散数据空间(NDDS)对大型数据集进行有效查询至关重要。尽管通过批量加载技术为NDDS中的大型数据集构建索引结构是非常有效的(与使用传统的元组加载技术相比),但现有的批量加载技术无法满足当代NDDS中数据集快速增长的可伸缩性要求应用程序。为了解决这一挑战,我们提出了一种新的批量加载方法,用于快速构建索引结构的索引结构,称为PND树,用于NDDS中的大型数据集。具体来说,利用NDDS的特征和给定数据集的先验知识,我们建议采用有效的多路自上而下的数据集拆分策略,并为我们的批量加载过程提供MapReduce实现。实验表明,与传统的元组加载方法和流行的用于NDDS的最新索引树的串行批量加载方法相比,所提出的批量加载方法在索引构建效率和生成的索引质量方面非常有前途。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号