首页> 外文期刊>Journal of Parallel and Distributed Computing >A fast and vectorizable alternative to binary search in O(1) with wide applicability to arrays of floating point numbers
【24h】

A fast and vectorizable alternative to binary search in O(1) with wide applicability to arrays of floating point numbers

机译:O(1)中二进制搜索的一种快速且可矢量化的替代方法,广泛适用于浮点数数组

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

AbstractGiven an arrayXofN+1strictly ordered floating point numbers and a floating point numberzbelonging to the interval[X0,XN), a common problem in numerical methods is to find the indexiof the interval[Xi,Xi+1)containingz, i.e. the index of the largest number in the arrayXwhich is smaller or equal thanz. This problem arises for instance in the context of spline interpolation or the computation of empirical probability distribution from empirical data. Often it needs to be solved for a large number of different valueszand the same arrayX, which makes it worth investing resources upfront in pre-processing the arrayXwith the goal of speeding up subsequent search operations. In some cases the valueszto be processed are known simultaneously in blocks of sizeM, which offers the opportunity to solve the problem vectorially, exploiting the parallel capabilities of modern CPUs. The common solution is to sequentially invokeMtimes the well knownbinary searchalgorithm, which has complexityO(log2N)per individual search and, in its classic formulation, is not vectorizable, i.e. it is not SIMD friendly. This paper describes technical improvements to thebinary searchalgorithm, which make it faster and vectorizable. Next it proposes a new vectorizable algorithm, based on an indexing technique, applicable to a wide family ofXpartitions, which solves the problem with complexityO(1)per individual search at the cost of introducing an initial overhead to compute the index and requiring extra memory for its storage. Test results using streaming SIMD extensions compare the performance of the algorithm versus various benchmarks and demonstrate its effectiveness. Depending on the test case, the algorithm can produce a throughput up to two orders of magnitude larger than the classicbinary search. Applicability limitations and cache-friendliness related aspects are also discussed.HighlightsThe problem of finding the insertion point in a sorted numerical array is discussed.Optimized and vectorizable variations of the binary search algorithm are proposed.A new algorithm significantly faster with complexityO(1)is proposed.Applicability and limitations of the new algorithm are discussed.Performance test results including various SIMD implementations are presented.
机译: 摘要 给出一个数组 X N + 1 严格排序的浮点数和浮点数 z 属于区间 [ X 0 X N < / mml:mi> ,数值方法中的常见问题是找到索引 i 间隔 [ X i X i + 1 包含 z ,即最大的索引数组中的数字 X ,小于或等于 z 。例如,在样条插值或根据经验数据计算经验概率分布的情况下会出现此问题。通常需要解决大量不同的值 z 和相同的数组 X ,因此值得在预处理数组 X ,目的是加快后续搜索操作的速度。在某些情况下,值 z 在大小为 < mml:mi> M ,可以利用现代CPU的并行功能来矢量地解决问题。常见的解决方案是依次调用 M 乘以众所周知的二进制搜索算法,其复杂度 O l o g < mml:mn> 2 N 逐个搜索,并且按照其经典格式,不能向量化,即,它不是SIMD友好的。本文介绍了对二进制搜索算法的技术改进,该算法使其速度更快且可向量化。接下来,它提出了一种基于索引技术的新矢量化算法,适用于广泛的 X 分区,这解决了复杂性问题 < mml:mi> O 1 进行单个搜索的代价是引入了计算索引的初始开销,并且需要额外的存储空间来存储索引。使用流SIMD扩展的测试结果将算法的性能与各种基准进行了比较,并证明了其有效性。根据测试用例,该算法可以产生比经典二进制搜索大两个数量级的吞吐量。还讨论了适用性限制和与缓存友好性有关的方面。 突出显示 讨论了在排序的数值数组中查找插入点的问题。 < / ce:list-item> 提出了二进制搜索算法的优化和矢量化变体。 采用复杂度 O 1 讨论了新算法的适用性和局限性。 给出了包括各种SIMD实现的性能测试结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号