A fast and vectorizable alternative to binary search in O(1) with wide applicability to arrays of floating point numbers

Fabio Cannizzo

首页> 外文期刊>Journal of Parallel and Distributed Computing >A fast and vectorizable alternative to binary search in O(1) with wide applicability to arrays of floating point numbers

【24h】

A fast and vectorizable alternative to binary search in O(1) with wide applicability to arrays of floating point numbers

机译：O（1）中二进制搜索的一种快速且可矢量化的替代方法，广泛适用于浮点数数组

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Abstract

Given an arrayXofN+1strictly ordered floating point numbers and a floating point numberzbelonging to the interval[X0,XN), a common problem in numerical methods is to find the indexiof the interval[Xi,Xi+1)containingz, i.e. the index of the largest number in the arrayXwhich is smaller or equal thanz. This problem arises for instance in the context of spline interpolation or the computation of empirical probability distribution from empirical data. Often it needs to be solved for a large number of different valueszand the same arrayX, which makes it worth investing resources upfront in pre-processing the arrayXwith the goal of speeding up subsequent search operations. In some cases the valueszto be processed are known simultaneously in blocks of sizeM, which offers the opportunity to solve the problem vectorially, exploiting the parallel capabilities of modern CPUs. The common solution is to sequentially invokeMtimes the well knownbinary searchalgorithm, which has complexityO(log2N)per individual search and, in its classic formulation, is not vectorizable, i.e. it is not SIMD friendly. This paper describes technical improvements to thebinary searchalgorithm, which make it faster and vectorizable. Next it proposes a new vectorizable algorithm, based on an indexing technique, applicable to a wide family ofXpartitions, which solves the problem with complexityO(1)per individual search at the cost of introducing an initial overhead to compute the index and requiring extra memory for its storage. Test results using streaming SIMD extensions compare the performance of the algorithm versus various benchmarks and demonstrate its effectiveness. Depending on the test case, the algorithm can produce a throughput up to two orders of magnitude larger than the classicbinary search. Applicability limitations and cache-friendliness related aspects are also discussed.

Highlights

•The problem of finding the insertion point in a sorted numerical array is discussed.

•Optimized and vectorizable variations of the binary search algorithm are proposed.

•A new algorithm significantly faster with complexityO(1)is proposed.

•Applicability and limitations of the new algorithm are discussed.

•Performance test results including various SIMD implementations are presented.

机译：

摘要

给出一个数组 X N + 1 严格排序的浮点数和浮点数 z 属于区间 [ X 0 ， X N < / mml：mi> ），数值方法中的常见问题是找到索引 i 间隔 [ X i ， X i + 1 ）包含 z ，即最大的索引数组中的数字 X ，小于或等于 z 。例如，在样条插值或根据经验数据计算经验概率分布的情况下会出现此问题。通常需要解决大量不同的值 z 和相同的数组 X ，因此值得在预处理数组 X ，目的是加快后续搜索操作的速度。在某些情况下，值 z 在大小为 < mml：mi> M ，可以利用现代CPU的并行功能来矢量地解决问题。常见的解决方案是依次调用 M 乘以众所周知的二进制搜索算法，其复杂度 O （ l o g < mml：mn> 2 N ）逐个搜索，并且按照其经典格式，不能向量化，即，它不是SIMD友好的。本文介绍了对二进制搜索算法的技术改进，该算法使其速度更快且可向量化。接下来，它提出了一种基于索引技术的新矢量化算法，适用于广泛的 X 分区，这解决了复杂性问题 < mml：mi> O （ 1 ）进行单个搜索的代价是引入了计算索引的初始开销，并且需要额外的存储空间来存储索引。使用流SIMD扩展的测试结果将算法的性能与各种基准进行了比较，并证明了其有效性。根据测试用例，该算法可以产生比经典二进制搜索大两个数量级的吞吐量。还讨论了适用性限制和与缓存友好性有关的方面。

突出显示

• 讨论了在排序的数值数组中查找插入点的问题。 < / ce：list-item> • 提出了二进制搜索算法的优化和矢量化变体。

• 采用复杂度 O （ 1 ）。

• 讨论了新算法的适用性和局限性。

• 给出了包括各种SIMD实现的性能测试结果。

著录项

来源
《Journal of Parallel and Distributed Computing》 |2018年第3期|37-54|共18页
作者
Fabio Cannizzo;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A SEARCH FOR FAST RADIO BURSTS AT LOW FREQUENCIES WITH MURCHISON WIDEFIELD ARRAY HIGH TIME RESOLUTION IMAGING [J] . S. J. Tingay12, C. M. Trott12, R. B. Wayth12, The Astrophysical journal . 2015,第6期

机译：利用Murchison宽阵列高分辨率成像进行低频快速广播搜索
2. Erratum: Pulsar Timing Array Based Search for Supermassive Black Hole Binaries in the Square Kilometer Array Era [Phys. Rev. Lett. 118, 151104 (2017)] [J] . Wang Yan, Mohanty Soumya D. Physical review letters . 2020,第16期

机译：错误：基于Pulsar时序阵列的Square公里阵列ERA中的超迹象黑洞二进制文件[PHY。 rev. lett。 118,151104（2017）]
3. Pulsar Timing Array Based Search for Supermassive Black Hole Binaries in the Square Kilometer Array Era [J] . Wang Yan, Mohanty Soumya D. Physical review letters . 2017,第15期

机译：在平方公里阵列时代中基于脉冲星时序阵列的超大质量黑洞二元搜索
4. Fast AFC technique using a code estimation and binary search algorithm for wideband frequency synthesis [C] . Kun-Seok Lee, Eun-Yung Sung, In-Chul Hwang, . 2005

机译：使用码估计和二进制搜索算法的快速AFC技术进行宽带频率合成
5. Design and analysis of scalable floating point fast Fourier transforms on field programmable gate arrays. [D] . Montano Martinez, Victor Brian. 2010

机译：在现场可编程门阵列上设计和分析可伸缩浮点快速傅里叶变换。
6. A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic [O] . Jin-Peng Qi, Jie Qi, Qing Zhang 2016

机译：基于二叉搜索树和Kolmogorov统计量的突变检测快速框架
7. Fast and Vectorizable Alternative to Binary Search in O(1) Applicable to a Wide Domain of Sorted Arrays of Floating Point Numbers [O] . Cannizzo, Fabio 2017

机译：O（1）中二进制搜索的快速可矢量替代方法一个宽域的浮点数排序数组

A fast and vectorizable alternative to binary search in O(1) with wide applicability to arrays of floating point numbers

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅