Performance comparison

Comparing `sqrt_ratio` function implementations

Comparison of performance of sqrt_ratio implementations. The new implementation (new_method below) merges the exponentiations in the Tonelli–Shanks algorithm with those in the inversion (as described in Scott's paper). Thus 3 exponentiations (1 for each squareroot and 1 for the inversion) become 1 (computing the "projenitor" for num^3*den). When p - 1 is 2-adically small one expects this to perform much better. On the other hand when p - 1 = 2^s t with t small, most of the time is spent on the loop and little is saved. The worst case of this is the Fermat primes, the best case is p = 3 (mod 4), but performance is pretty good when t is smallish.

Each instance consists of computing sqrt_ratio(&num, &den) for 1000 random pairs (num, den) (so that the average 6.75ms of new_method for 2²⁵⁵ - 19 equates to average 6.75μs per instance). This was on my machine, your milage may vary.

See individual benchmark pages below for more details.

The favourable cases

sqrt_ratio_Curve25519

sqrt_ratio_Secp256k1

sqrt_ratio_Random256_1

sqrt_ratio_Random256_2

sqrt_ratio_Random320

sqrt_ratio_Random384

sqrt_ratio_Random448

The unfavourable cases

sqrt_ratio_Fermat65537

sqrt_ratio_Bls381K12

sqrt_ratio_JubJub

sqrt_ratio_Pallas

sqrt_ratio_Vesta

Comparing sqrt_ratio function implementations

The favourable cases

The unfavourable cases

Comparing `sqrt_ratio` function implementations