Considering the shortcoming that the Fused Multiply-Add(FMA) unit increases the latency of separate floating-point add/subs tract and multiply operations,the effect of FMA unit latency optimization,reducing the latency of separated floating-point add/subtract and multiply operations from 6 cycles to 4 cycles,on floating-point performance is studied.Based on a homemade processor with FMA unit,the RTL design is modified.The effect of the optimization on floating-point performance is estimated after running SPEC CPU2000 floating-point benchmarks on the hardware emulation acceleration platform.As the results turned out that the floating-point performance of the benchmarks is all improved 5.25% at most and 1.61% on average,proving that such optimization in favor of floating-point performance promotion.%浮点融合乘加部件会增加独立浮点加减法、乘法等运算延迟.为克服该缺陷,研究将乘加部件独立乘法、加减法等运算延迟由6拍减为4拍时对浮点性能的影响.以某支持乘加运算的国产处理器为基础,修改相关的RTL级设计代码,利用硬件仿真加速器平台,对SPEC CPU2000浮点测试课题进行评估.实验结果表明,该延迟优化有利于提高浮点性能,最大提高5.25%,平均提高1.61%.
展开▼