As technology enables to integrate real-time good quality 3D rendering in a single chip, the classical problem of the gap between internal data bandwidth and external memories arises. The texture mapping function requires a tremendous number of texture accesses and many past implementations have been based on costly high bandwidth external memory. Our impact study of texture cache used with today's commercial representative 3D software shows that it is possible to render 100 million pixels per second while using an internal cache smaller than 32 KB and a PC memory bus for textures. Texture blocking and number of requests on the cache contribute mainly to those results. Building a high performance parallel subsystem based on such a chip may become an interesting opportunity for leading performance 3D graphic manufacturers though many problems have been observed with caches in parallel machines. As far as texture accesses are concerned, we show that image parallelism generates poor performance with caches. But triangle parallelism scales with multiple rendering processors, each having its own texture cache and the speedup is nearly linear.
展开▼