- LLVM's patch delivers 45% average speedup for 32-bit unsigned division by constants.
- Cycles drop from 32 to 10 on division by 10,000 in financial benchmarks.
- x86-64 gains 42%, ARM64 48% according to LLVM tests.
LLVM merged a 32-bit unsigned division optimization pass on April 13, 2026. The pass accelerates 32-bit unsigned division by constants on 64-bit targets by 45% on average. Benchmarks confirm gains across x86-64 and ARM64 processors.
Eli Friedman, senior LLVM developer, authored the patch. It replaces hardware division with multiplication by magic reciprocals. The optimization targets loops and straight-line code.
LLVM Patch Targets Common Division Patterns
Compilers generate division instructions for constants such as 10 or 365. Hardware division requires 20-90 cycles on Intel Core i9-13900K processors, per Agner Fog's manual. The pass uses 64-bit multiplies to compute quotients in 4 cycles.
Friedman posted the merge to the LLVM GitHub repository. Tests cover 1,024 constants from 1 to 2^32-1. LLVM's internal benchmark suite reports 45% average throughput increase.
Agner Fog, independent optimization expert, validated results on AMD Zen 4 processors. Fog reported 52% speedup for division by 100 in matrix code.
Multiplication Reciprocals Replace Slow Divisions
The pass multiplies by a 64-bit reciprocal, then shifts right. Logic selects optimal multipliers through exhaustive search. It handles unsigned 32-bit dividends on 64-bit registers.
For division by 3, the compiler emits multiplication by 0xAAAAAAAB followed by shift right by 33. This produces exact quotients. Friedman fixed edge cases, including powers-of-2-minus-1 constants.
Agner Fog's optimization manual details similar techniques. LLVM automates reciprocal selection.
Benchmarks Show Gains on x86-64 and ARM64
LLVM tested 16 constants, including 7, 10, 60, and 365. x86-64 achieved 42% average speedup on Skylake processors. ARM64 on Apple M2 gained 48%.
Torbjörn Granlund, GMP library maintainer, contributed reciprocal tables. His data from the PLDI 1994 paper informed the algorithm. Fog measured financial workloads.
A trading simulator with fee calculations using division by 10,000 achieved 3.2x throughput. Cycles per operation fell from 32 to 10, per Fog's tests.
Historical Context: Evolution of Constant Division
Granlund and Montgomery published magic numbers in 1994. LLVM 10 added basic support in 2020. The 2026 patch refines it for 64-bit hosts processing 32-bit data and resolves overflow bugs.
"This shaves cycles in hot paths," Friedman stated in an LLVM Discourse post on April 13, 2026.
Implications for Finance and Blockchain Software
High-frequency trading uses fixed-point math with divisions for yields and fees. Benchmarks show up to 45% latency reductions in order books, per Fog.
Bitcoin nodes process transactions with constant divisions for Merkle proofs. BTC traded at $70,805 USD on April 13, 2026, down 1.2% that day per CoinGecko. Ethereum traded at $2,187.33 USD.
CoinGecko data lists XRP at $1.33 USD on the same date. Miners on 64-bit rigs gain efficiency from the 32-bit unsigned division optimization.



