While delayed branch mechanisms were popular with the designers of RISC processors, most superscalar processors deploy dynamic branch prediction to minimise run-time branch penalties. We propose a generalised branch delay mechanism that is more suited to superscalar processors. We then quantitatively compare the performance of our delayed branch mechanism with run-time branch prediction, in the context of a high-performance superscalar architecture that uses aggressive compile-time instruction scheduling.
展开▼