To multiply two multi-word operands, a number e of caching registers is used to cache the values of operand words. The multiplication is done using several runs, which each com¬ prise several parts (R0Q1, R0Q2, R1Q4). In an initial part (R0Q1, R1Q1) words of the operands are loaded into caching registers, and a first set of partial products are processed; the initial part leaves a number e of words of a first operand in caching registers. Because of the cached words of one operand, a sequential inner part (R0Q2, R1Q2; R0Q3, R1Q3) re-uses cached operand words without requiring load operations for that operand, and only words of the other operand are loaded for processing of partial products, preferably according to a product-scanning multiplication method, namely, by grouping together operations for partial products of the same product index (k); each inner part again leaves a number of operand words in caching registers, though of the respective other operand. A final part (R0Q4, R1Q4) processed a final set of partial products using cached operand words.
展开▼