首页> 外文会议>International conference on database systems for advanced applications >Parallel Top-k Query Processing on Uncertain Strings Using MapReduce
【24h】

Parallel Top-k Query Processing on Uncertain Strings Using MapReduce

机译:使用MapReduce对不确定字符串的并行Top-K查询处理

获取原文

摘要

Top-k query is an important and essential operator for data analysis over string collections. However, when uncertainty comes into big data, it calls for new parallel algorithms for efficient query processing on large scale uncertain strings. In this paper, we proposed a MapReduce-based parallel algorithm, called MUSK, for answering top-k queries over large scale uncertain strings. We used the probabilistic n-grams to generate key-value pairs. To improve the performance, a novel lower bound for expected edit distance was derived to prune strings based on a new defined function gram mapping distance. By integrating the bound with TA, the filtering power in the Map stage was optimized effectively to decrease the transmission cost. Comprehensive experimental results on both real-world and synthetic datasets showed that MUSK outperformed the baseline approach with speeds up to 6 times in the best case, which indicated good scalability over large datasets.
机译:Top-K查询是一个重要的和必要的运算符,用于对字符串集合进行数据分析。但是,当不确定性进入大数据时,它需要新的并行算法,以便在大规模不确定字符串上有效地查询处理。在本文中,我们提出了一种基于MapReduce的并行算法,称为Musk,用于在大规模不确定字符串上应答顶部k查询。我们使用概率n-gram来生成键值对。为了提高性能,基于新的定义函数克映射距离导出预期编辑距离的新型下限。通过与TA的界限集成,地图阶段中的滤波功率有效地优化以降低传输成本。关于现实世界和合成数据集的综合实验结果表明,麝香表明,在最佳情况下,速度最高可达3倍的基线方法,这表明大型数据集的良好可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号