首页> 外文期刊>Cluster computing >Toward a new approach for sorting extremely large data files in the big data era
【24h】

Toward a new approach for sorting extremely large data files in the big data era

机译:朝着大数据时代中对极大的数据文件进行排序的新方法

获取原文
获取原文并翻译 | 示例
           

摘要

The extensive amount of data and contents generated today will require a paradigm shift in processing and management techniques for these data. One of the important data processing operations is the data sorting. Using multiple passes in external merge sort has a great influence on speeding up the sorting of extremely large data files. Since in large files, the swapping time is dominant in many applications, algorithms that minimize the swapping operations are normally superior to those which only focus on CPU time optimizations. In sorting extremely large files, external algorithms, such as the merge sort, are normally used. It is shown that using multiple passes over the data set, as proposed in our algorithm, has resulted in a great improvement in the number of swaps, thus, reducing the overall sorting time. Moreover, the proposed technique is suitable to be used with the emerging parallelization techniques such as GPUs. The reported results show the superiority of the proposed technique for "CPU only" and hybrid CPU-GPU implementations.
机译:今天生成的广泛数据和内容将需要用于这些数据的处理和管理技术的范式转换。其中一个重要的数据处理操作是数据排序。在外部合并中使用多个传递,对加速超大数据文件的排序具有很大影响。由于在大文件中,交换时间在许多应用中都是主导的,最小化交换操作的算法通常优于仅关注CPU时间优化的算法。在对极大的文件进行排序中,通常使用外部算法,例如合并排序。结果表明,使用多次通过数据集,如我们算法中所提出的数据集,导致掉次数的巨大改进,从而减少了整体排序时间。此外,所提出的技术适合于诸如GPU的新出现的并行化技术。据报道的结果表明了“仅限CPU”和混合CPU-GPU实现的提出技术的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号