【24h】

CUDA Flux: A Lightweight Instruction Profiler for CUDA Applications

机译:CUDA Flux:适用于CUDA应用的轻量级指令分析器

获取原文

摘要

GPUs are powerful, massively parallel processors, which require a vast amount of thread parallelism to keep their thousands of execution units busy, and to tolerate latency when accessing its high-throughput memory system. Understanding the behavior of massively threaded GPU programs can be difficult, even though recent GPUs provide an abundance of hardware performance counters, which collect statistics about certain events. Profiling tools that assist the user in such analysis for their GPUs, like NVIDIA's nvprof and cupti, are state-of-the-art. However, instrumentation based on reading hardware performance counters can be slow, in particular when the number of metrics is large. Furthermore, the results can be inaccurate as instructions are grouped to match the available set of hardware counters.
机译:GPU是功能强大的大规模并行处理器,需要大量的线程并行性才能使其数千个执行单元保持繁忙,并在访问其高吞吐量内存系统时能够承受延迟。即使最近的GPU提供了大量的硬件性能计数器,这些计数器收集有关某些事件的统计信息,也很难理解大规模线程化的GPU程序的行为。诸如NVIDIA的nvprof和cupti之类的可帮助用户对其GPU进行此类分析的性能分析工具是最先进的。但是,基于读取硬件性能计数器的检测速度可能会很慢,尤其是在度量标准数量很大时。此外,由于将指令分组以匹配可用的硬件计数器集,因此结果可能不准确。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号