In some cases, processor graphics with a slower local memory can compensate by using another memory in place of the lowest level or L3 cache. For example, in some processors, there is a large register space that can be used for the local memory function by allocating the local memory within those registers. Also, since the registers do not operate with barriers, barriers can be simulated by letting one execution unit thread execute more SIMD instructions. For example, one execution thread may simulate a whole work-group in the OpenCL API.
展开▼