A microprocessor integrated with DRAM on the same die has the potential to improve system performance by reducing the memory latency and improving the memory bandwidth. However a high performance microprocessor will typically send more accesses than the DRAM can handle due to the long cycle time of the embedded DRAM, especially in applications with significant memory requirements. A multi-bank DRAM can hide the long cycle time by allowing the DRAM to process multiple accesses in parallel, but it will incur a significant area penalty and will therefore restrict the density of the embedded DRAM main memory. In this paper we propose a hierarchical multi-bank DRAM architecture to achieve high system performance with a minimal area penalty. In this architecture, the independent memory banks are each divided into many semi-independent subbanks that share I/O and decoder resources. A hierarchical multi-bank DRAM with 4 main banks each composed of 32 subbanks occupies approximately the same area as a conventional 4 bank DRAM while performing like a 32 bank one-up to 65% better than a conventional 4 bank DRAM when integrated with a single-chip multiprocessor.
展开▼