Prefix computation is one of the fundamental problems that can be used in many applications such as fast adders. Most proposed parallel prefix circuits assume that the circuit is of the same width as the input size. In this paper, we present a class of parallel prefix circuits that perform well when the input size, n, is more than the width of the circuit, m. That is, the proposed circuit is an almost optimal in speed when n > m. Specifically, we derive a lower bound for the depth of the circuit and prove that the circuit requires one time step more than the optimal number of time steps needed to generate its first output. We also show that the size of the circuit is optimal within one. The input is divided into subsets each of width m-1 and presented to the circuit in subsequent time steps. The circuit is compared to other circuits to show its outperforming speed. The circuit is faster than any other circuit of the same width and fan-out.
展开▼