Toward a new approach for sorting extremely large data files in the big data era

Shatnawi Ali; AlZahouri Yathrip; Shehab Mohammed A.; Jararweh Yaser; Al-Ayyoub Mahmoud

首页> 外文期刊>Cluster computing >Toward a new approach for sorting extremely large data files in the big data era

【24h】

Toward a new approach for sorting extremely large data files in the big data era

机译：朝着大数据时代中对极大的数据文件进行排序的新方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The extensive amount of data and contents generated today will require a paradigm shift in processing and management techniques for these data. One of the important data processing operations is the data sorting. Using multiple passes in external merge sort has a great influence on speeding up the sorting of extremely large data files. Since in large files, the swapping time is dominant in many applications, algorithms that minimize the swapping operations are normally superior to those which only focus on CPU time optimizations. In sorting extremely large files, external algorithms, such as the merge sort, are normally used. It is shown that using multiple passes over the data set, as proposed in our algorithm, has resulted in a great improvement in the number of swaps, thus, reducing the overall sorting time. Moreover, the proposed technique is suitable to be used with the emerging parallelization techniques such as GPUs. The reported results show the superiority of the proposed technique for "CPU only" and hybrid CPU-GPU implementations.

机译：今天生成的广泛数据和内容将需要用于这些数据的处理和管理技术的范式转换。其中一个重要的数据处理操作是数据排序。在外部合并中使用多个传递，对加速超大数据文件的排序具有很大影响。由于在大文件中，交换时间在许多应用中都是主导的，最小化交换操作的算法通常优于仅关注CPU时间优化的算法。在对极大的文件进行排序中，通常使用外部算法，例如合并排序。结果表明，使用多次通过数据集，如我们算法中所提出的数据集，导致掉次数的巨大改进，从而减少了整体排序时间。此外，所提出的技术适合于诸如GPU的新出现的并行化技术。据报道的结果表明了“仅限CPU”和混合CPU-GPU实现的提出技术的优势。

著录项

来源
《Cluster computing》 |2019年第3期|共10页
作者
Shatnawi Ali; AlZahouri Yathrip; Shehab Mohammed A.; Jararweh Yaser; Al-Ayyoub Mahmoud;
展开▼
作者单位

Jordan Univ Sci &

Technol Box 3030 Irbid 22110 Jordan;

Jordan Univ Sci &

Technol Box 3030 Irbid 22110 Jordan;

Jordan Univ Sci &

Technol Box 3030 Irbid 22110 Jordan;

Jordan Univ Sci &

Technol Box 3030 Irbid 22110 Jordan;

Jordan Univ Sci &

Technol Box 3030 Irbid 22110 Jordan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类分子生物学;
关键词
Big data; Sorting; External merge sort; Large file processing; Hybrid CPU-GPU;

机译：大数据;排序;外部合并排序;大文件处理;混合CPU-GPU;

相似文献

外文文献
中文文献
专利

1. Toward a new approach for sorting extremely large data files in the big data era [J] . Shatnawi Ali, AlZahouri Yathrip, Shehab Mohammed A., Cluster computing . 2019,第3期

机译：朝着大数据时代中对极大的数据文件进行排序的新方法
2. Actsds And Odfsds: Programs For Convertinginteract And The Observer Data Files into Sdis Timed-event Sequential Data Files [J] . Roger Bakeman, VICENC QUERA Behavior Research Methods . 2008,第3期

机译：Actsds和Odfsds：用于将交互数据和观察者数据文件转换为Sdis定时事件顺序数据文件的程序
3. ActSds and OdfSds: Programs for converting INTERACT and The Observer data files into SDIS timed-event sequential data files [J] . Roger Bakeman, Vicen? Quera Behavior Research Methods . 2008,第3期

机译：ActSds和OdfSds：用于将INTERACT和The Observer数据文件转换为SDIS定时事件顺序数据文件的程序
4. A multi-pass algorithm for sorting extremely large data files [C] . Shatnawi Ali, Alzahouri Yathrip International Conference on Information and Communication Systems . 2015

机译：用于对超大数据文件进行排序的多遍算法
5. Validation and Benchmarking of the Thermal Neutron Scattering Law Data Files for Neutron Moderators Using Experimental Data [D] . Wendorff, Carl 2018

机译：使用实验数据对中子减速器的热中子散射定律数据文件进行验证和基准设定
6. ActSds and OdfSds: Programs for Converting INTERACT and The Observer Data Files into SDIS Timed-Event Sequential Data Files [O] . Roger Bakeman, Vicenç Quera -1

机译：ActSds和OdfSds：用于将INTERACT和Observer数据文件转换为SDIS定时事件顺序数据文件的程序
7. Genome-wide survey of microRNA–transcription factor feed-forward regulatory circuits in human† †Electronic supplementary information (ESI) available: Description of oligo analysis and randomizations for network motifs analysis, randomization results for the network motifs analysis of mixed feed-forward loops, and supplementary files S1–S11. See DOI: 10.1039/b900177h Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. [O] . Re, Angela, Corá, Davide, Taverna, Daniela, 2009

机译：人类中microRNA转录因子前馈调节回路的全基因组调查† 可用的电子补充信息（ESI）：用于网络图案分析的寡核苷酸分析和随机化的描述，用于混合前馈环路的网络图案分析的随机化结果以及补充文件S1-S11。参见DOI：10.1039 / b900177h 单击此处获取其他数据文件。单击此处获取其他数据文件。单击此处获取其他数据文件。单击此处获取其他数据文件。单击此处获取其他数据文件。单击此处获取其他数据文件。单击此处获取其他数据文件。单击此处获取其他数据文件。单击此处获取其他数据文件。单击此处获取其他数据文件。单击此处获取其他数据文件。单击此处获取其他数据文件。单击此处获取其他数据文件。
8. Smartfiles: An OO approach to data file interoperability [R] . Haines, Matthew, Mehrotra, Piyush, Vanrosendale, John 1995

机译：smartfiles：数据文件互操作性的OO方法

Toward a new approach for sorting extremely large data files in the big data era

摘要

著录项

相似文献

相关主题

期刊订阅