How to find square root accurately

tim.one · August 25, 2025, 3:34am

Sure - no problem. In many areas, simple “degenerate” cases provide an easy and solid base to build on. Like the sum of an empty set of ints is 1, the product is 1, factorial(0) is 1, and so on. Less well known are these examples of Python’s logical quantifiers in degenerate cases:

>>> all([])
True
>>> any([])
False

skirpichev · August 25, 2025, 4:51am

It looks you are using some brute-force search, instead of known parametric solutions like Heronian triangle - Wikipedia. (BTW, SymPy/Diofant can’t solve generic equation.)

sr-murthy · August 25, 2025, 1:54pm

Yes, we should seek exact solutions where they exist, I didn’t say we shouldn’t. But if you want to focus on square roots though I can think you can argue that with problems involving square roots of integers, usually, you can only get approximate solutions: there is not a high probability that a randomly chosen integer is going to be a perfect square.

oscarbenjamin · August 25, 2025, 4:00pm

No one said that you did. (I think maybe you have misunderstood my comments above.)

No, it is not the case that you can only get approximate solutions. You can choose to work with approximate solutions but there are other options:

Do you need to compute the square root at all? Maybe it is better to work with a number that is the exact square of something e.g. you can use x**2 + y**2 < r**2 rather than sqrt(x**2 + y**2) < r.
You may only be interested in having the exact square roots of integers that are perfect squares as is the case here.
You can also do exact calculations with irrational square roots as I demonstrated above.

Using floating point is a reasonable choice in many situations but it is not the only way.

sr-murthy · August 25, 2025, 5:26pm

No problem on the first point, thanks for clarifying. For the second point, perhaps I should clarify: when I have need to use roots, e.g. math.sqrt, Decimal.sqrt, which is not very often, but often enough, using an approximate value of the root to some number of decimal places is OK. I don’t use float exactly for the reasons described above, but have found decimal very useful, as I can set precision to my preferences (decimal.getcontext().prec). It is also true that the solutions to many numerical problems are not integer-valued, but real-valued, and that’s the point I was trying to make.

Furler · August 26, 2025, 9:33am

Hi Oscar,

Thank you for your enlightening response , very interesting. I admit to using brute force, but I only only do it to find the first six solutions to the sequence. From there I can establish one simple formula that will find all of the successive triangles in the sequence.

Much appreciate your help and assistance.

mdickinson · August 29, 2025, 5:31pm

[Meta: I messed up by trying to use Markdown syntax in the WYSIWYG editor. This apparently broke things to the extent that I couldn’t then edit to fix my mistake. Here’s a second attempt.]

On the use of brute force: I (finally) had the opportunity to run the brute-force version of the script overnight. It took a bit under 4 hours to get to the 6th row of output (which works out to around 500 nanoseconds per b value). I’m guessing that @Furler was also able to get that 6th output. At that rate, it would have taken around two months of continuous computation to complete the iteration of the while loop.

But I also got interested enough to work through the problem by hand. @Furler I’m not sure what your mathematics background is, or where this particular problem originated, but there’s some very satisfying old-school algebraic number theory (quadratic number fields, rings of integers, factorisation into prime ideals, class numbers, fundamental units, etc.) to be had here. Not surprisingly, it leads to families of solutions roughly equivalent to those @oscarbenjamin obtained with SymPy (though we end up with three families rather than six).

In brief:

We’re looking for solutions in nonnegative integers to 16A^2 = (a + b + c)(a + b - c)(a - b + c)(-a + b + c)
Substituting in a = 37 and c = b + 3 and simplifying, we want solutions to A^2 - 85 (2b + 3)^2 = -85·37^2
Writing B = 2b+3, we’re looking for solutions in nonnegative integers to A^2 - 85 B^2 = -85·37^2. (Any such solution must have B odd and at least 37, so we can always recover a nonnegative integer b from B.)
In other words, we’re looking for elements A + √85 B of norm -85·37^2 in the commutative ring Z[√85].
But Z[√85] isn’t a particularly nice ring to work in since it’s not the ring of integers of the corresponding quadratic field Q[√85], so none of the theory about rings of integers applies to it. The easier ring to work with is the slightly larger ring R = Z[(1 + √85)/2], which is the ring of integers of Q[√85]. So let’s first find all elements of norm -85·37^2 in R, then we can figure out how to filter down to just the elements in Z[√85].
R is unfortunately not a unique factorization domain (its class group has order two), so we can’t easily make use of prime factorization of elements. But …
Any element of norm -85·37^2 in R generates a principal ideal of norm 85·37^2 in R.
The general theory about rings of integers (see also Dedekind domains) says that any nonzero ideal of R can be written uniquely as a product of prime ideals.
Analysis that’s easily performed by hand establishes that R has exactly one prime ideal I of norm 5, one prime ideal J of norm 17, and two prime ideals K and L of norm 37. None of these ideals is principal - they all belong to the unique class of non-principal ideals in the class group.
So there are exactly three distinct ideals of norm 85·37^2: IJK^2, IJL^2 and IJKL. They’re all principal, and it’s easy to find generators (either by careful multiplication, or more simply and quickly, by a brute force search that takes only milliseconds to execute).
Given one generator g of a principal ideal, any other generator is a product of g with a unit of R. So we can describe all generators of a principal ideal if we can describe all units of R.
General theory says that the units of the ring of integers of a real quadratic field are all of the form ±u^k, where u is a fundamental unit and k is allowed to vary over all integers.
The element (9 + √85)/2 is a fundamental unit for R, with norm -1. (There’s an algorithm to find fundamental units via continued fractions, but I cheated and used GP/PARI’s quadunit function.)

Given all the above, you can describe all elements of R of norm -85·37^2 - they occur in three families corresponding to the three ideals of norm 85·37^2 (two of the three families are conjugate to one another), and then filter down to elements of Z[√85]. Then it’s not hard to code up a Python script that will produce all the solutions up to a given limit on A or b, in a very short time. I will confess that I didn’t manage to work through all of the above before the brute force script produced its sixth solution. So maybe brute force wins in this case.

Furler · August 29, 2025, 11:47pm

(Attachment Hi Mark.docx is missing)

Furler · August 30, 2025, 12:01am

Hi Mark ,

Likewise I have developed formulas to find higher solutions and they also contain the value of sqrt(85). I tried to send through a file containing my solution but because it was a word document it was rejected. Send me your email and I will copy it to yo.

Furler · August 30, 2025, 3:09am

(attachments)

tim.one · August 30, 2025, 7:49pm

Ya, I’ve found the WYSIWYG editor to be pretty much a nightmare. What I see is what I get, but I’ve too often found it impossible to force what I see to be what I want to see. Then I resort to inserting markdown, and that backfires as often as not. I don’t bother with it anymore.

Over on the Discourse “meta” site, I learned that they think the editor is still a work in progress, with a bunch of known gotchas. What I didn’t learn is why they released it before it was in better shape. But there’s hope it will work more smoothly in the future.

What did you use to run it? It took a bit over 7 hours for me under pypy (which is far faster than CPython until the ints exceed the range of a 64-bit signed int).. Gave up waiting under CPython. Even slower using CPython with gmpy2. While the ints here exceed the precision of floats, they’re still small by gmpy2 standards - nowhere near even “crypto size”.

They did! They later posted a JPEG image of a doc that shows that first 6 outputs.

I’ll skip repeating your delightful but involved analysis of the problem, which could have resulted in code that ran very fast instead. It’s easer to trust brute force anyway

Thank you for putting in all that work. It’s impressive.

mdickinson · September 1, 2025, 6:57pm

@Furler Very nice - thank you for sharing that!

For what it’s worth, here’s one version of the code I ended up with. It prints the side lengths of the smallest 15 Heron triangles in the sequence. (I had a more elaborate version that involved setting up a class for working with elements of \mathbf Z[(1 + \sqrt{85}) / 2], but it’s really overkill to write your own class for this when packages like SymPy exist.)

import math

def issquare(n: int) -> bool:
    """Return True if n is a perfect square, else False."""
    return n >= 0 and (a := math.isqrt(n)) * a == n

# Find all pairs (A, B) with A*A - 85*B*B = -85 * 37^2
# such that 37√85 <= A + B√85 < 37√85(285769 + 30996√85).
triangles = [
    (85 * B, math.isqrt(37 * 37 + 85 * B * B))
    for B in range(37 * 30996)
    if issquare(37 * 37 + 85 * B * B)
]

# Cycle through, multiplying each pair by the multiplier and
# placing it back on the queue.
for _ in range(15):
    A, B = triangles.pop(0)
    a, b, c = 37, (B - 3) // 2, (B + 3) // 2
    print((a, b, c))
    triangles.append((285769 * A + 85 * 30996 * B, 30996 * A + 285769 * B))

Output:

(37, 17, 20)
(37, 1105, 1108)
(37, 44197, 44200)
(37, 5286725, 5286728)
(37, 632362597, 632362600)
(37, 25261121185, 25261121188)
(37, 3021565090337, 3021565090340)
(37, 361419254820385, 361419254820388)
(37, 14437690680645637, 14437690680645640)
(37, 1726939268596598885, 1726939268596598888)
(37, 206564838060901696837, 206564838060901696840)
(37, 8251688856209585815825, 8251688856209585815828)
(37, 987011415692141369302097, 987011415692141369302100)
(37, 118059654415290214752862225, 118059654415290214752862228)
(37, 4716153745485876567325200517, 4716153745485876567325200520)

Explanation (which is much longer than the code): as before, we recast the problem in terms of finding pairs (A, B) of nonnegative integers satisfying A^2 - 85B^2 = -85\cdot 37^2. I’ll call those pairs “solution pairs”. Here A is the area of the Heron triangle, and B = 2b + 3, so the side lengths of the triangle are 37, (B - 3)/2 and (B + 3)/2.

Every solution pair (A, B) corresponds to a real number A + B\sqrt{85} that’s greater than or equal to 37\sqrt{85} (in the usual ordering of the real numbers).

Now the key observation is that if the real number A + B\sqrt{85} comes from a solution pair, then so does the product (A + B\sqrt{85})(285769 + 30996\sqrt{85}) (this is easily verified just by multiplying everything out). Furthermore, we can go backwards, too: if A + B\sqrt{85} corresponds to a solution pair, then so does the quotient (A + \sqrt{85}B)/(285769 + 30996\sqrt{85}), provided only that that quotient is still greater than or equal to 37\sqrt{85}.

So the solutions naturally fall into a collection of (three, as it turns out) geometric progressions, each with multiplier 285769 + 30996\sqrt{85}: every solution is uniquely of the form (A + B\sqrt{85})(285769 + 30996\sqrt{85})^k for some nonnegative integer k and some solution pair (A, B) for which 37\sqrt{85} \le A + B√85 < 37\sqrt{85}\cdot(285769 + 30996\sqrt{85}).

The code above simply does a brute force search to find all solution pairs (A, B) such that A + B\sqrt{85} lies in the interval [37\sqrt{85}, 37\sqrt{85}\cdot(285769 + 30996\sqrt{85})). It then repeatedly pops the smallest solution from the front of its list, prints it, multiplies it by the multiplier to get the next largest solution, and pushes that back to the end of the list. The brute force search part is very quick (it takes around a tenth of a second on my laptop), and finds exactly three solutions. There’s a little bit of sleight of hand there - we make use of the fact that if A^2 - 85 B^2 = -85\cdot 37^2 then A must be divisible by 85, so we actually search for all pairs (A, B) such that A^2 - 85B^2 = -37^2 and A + B\sqrt{85} lies in the interval [37, 37\cdot(285769 + 30996\sqrt{85})), and then scale the corresponding numbers A + B\sqrt{85} up by a factor of \sqrt{85}.

As to where the magic multiplier m = 285769 + 30996\sqrt{85} comes from: it’s the smallest norm-1 unit that’s larger than 1 in \mathbf Z[\sqrt{85}]. It can be found either (1) as the sixth power of the fundamental unit (9 + \sqrt{85})/2 that I mentioned in my last post, or (2) via a very quick brute force search: 30996 is the smallest positive B for which 1 + 85B^2 is a square, and then 285769 is the square root of that square.

>>> next(B for B in itertools.count(1) if issquare(1 + 85*B*B))
30996
>>> math.isqrt(1 + 85*B*B)
285769

And of course it’s closely related to the value 571538 that appears in your recurrence.

>>> 285769 + 30996 * sqrt(85)
571537.9999982503

In fact, your recurrence is coming from the fact that m exactly satisfies the quadratic equation m^2 = 571538m - 1: if (A_i, B_i) and (A_{i+1}, B_{i+1}) are two successive solution pairs in one of our three families, then the next solution pair is (A_{i+2}, B_{i+2}) = (571538 A_{i+1} - A_i, 571538 B_{i+1} - B_i).

Nothing special: home laptop (MacBook Pro, M4 Pro CPU), and CPython 3.13.5 as installed by uv.

tim.one · September 2, 2025, 9:58pm

Looks like this will remain a mystery! I ran it again under the released Win64 Python 3.13.6 on a capable desktop box. Started yesterday night, and still running when I got up today. It printed the 6th output after about 11.6 hours. on an otherwise mostly very quiet 4-core box.

I wasn’t surprised PyPy was much faster at the start - I have real programs that run 50(!) times faster if they stick to arithmetic on 64-bit signed ints (which PyPy often manages to treat directly as native machine ints, without the expenses of turning them into “Python objects”). I’m also not surprised by that PyPy slowed way down when ints exceeded that range (then they’re always wrapped in “Python objects”, and their implementation of multi-word fat integers seems significantly slower than Python’s C-coded implementation of those).

So the mystery is why it ran so much faster (than even PyPy did) for you under CPython. Maybe the M4 Pro is that much faster than my aging 7th-gen Intel CPU?

skirpichev · September 3, 2025, 4:12am

(Sorry for some off-topic)

I suspect, that at least sometimes this is not only due to a special implementation for arithmetic on small integers

Probably similar will be more and more true for upcoming CPython versions as well (e.g. already now there added some bytecode specialization for native integers). This is why I think that “use gmpy2 if you need fast arithmetic on large integers” is a bad advice in general as it come with a big speed penalty for small integers, which couldn’t be mitigated by extension like the gmpy2.

Just tried the mpmath CI tests as a poor-man benchmark: v1.14 beta1 by skirpichev · Pull Request #962 · mpmath/mpmath · GitHub. It looks, that PyPy3.11 is not much worse than CPython with GMP-backed (gmpy2 or python-gmp packages) arithmetic (5m22s vs 4m50s) and definitely better than native integer arthmetic on CPython (6m36s). (BTW, using gmpy2/python-gmp on PyPy makes things much more slow in this “benchmark”.) Tracing shows, that roughly 95% integers here have less than 10-bit length and 98% less than 53-bit. IMO, it’s something typical for applications like CAS (e.g. SymPy): large integers aren’t too common here.

mdickinson · September 3, 2025, 5:38pm

Could be. It’s certainly appreciably faster than the ancient macOS / Intel machine it replaced.

Here are some precise timings (or at least more precise than my “a bit under four hours”) on my machine on a subset of the search space: b between 25 billion and 26 billion (which should pick up just that 6th entry). I’ll run the following script, which has a few minor simplifications compared to the original:

import math
import time

start = time.monotonic()

a = 37
g = 3
b = 25_000_000_000
while b < 26_000_000_000:
    c = b + g
    sp = (a + b + c) // 2
    SquaredArea = sp * (sp - a) * (sp - b) * (sp - c)
    Area = math.isqrt(SquaredArea)
    if Area * Area == SquaredArea:
        print(a, b, c, Area)
    b = b + 1

end = time.monotonic()
print(f"Time taken (s): {end - start:.2f}")

Results (including the MD5 of the script so you can check you’re running the same thing, though I guess Windows line-endings might throw that off):

mdickinson@MacBook-Pro Desktop % uv run python -VV
Python 3.13.7 (main, Aug 28 2025, 17:02:04) [Clang 20.1.4 ]
mdickinson@MacBook-Pro Desktop % ls -l heron_subset.py 
-rw-r--r--@ 1 mdickinson  staff  384 Sep  3 18:16 heron_subset.py
mdickinson@MacBook-Pro Desktop % md5 heron_subset.py 
MD5 (heron_subset.py) = 960769fae3a622eff36a23a47aa9baea
mdickinson@MacBook-Pro Desktop % time uv run heron_subset.py        
37 25261121185 25261121188 465792059640
Time taken (s): 336.42
uv run heron_subset.py  335.30s user 1.05s system 99% cpu 5:36.44 total

So that’s 336.42 seconds for the last billion. Assuming previous billions are at least as fast, that gives me a predicted upper bound of 2hrs 25 minutes for a full run starting at b=17. So I’m reasonably confident that my previous “a bit under four hours” was accurate.

tim.one · September 3, 2025, 7:41pm

More like totally destroy it . A single bit difference in the input is expected to change about half the bits in the output checksum.

But getting rid of the carriage returns is easy, and then the checksum matched yours. So we’re certainly running the same code. Thanks!

Again much slower on my Windows box:

3.13.6
37 25261121185 25261121188 465792059640
Time taken (s): 1516.80

pypy
37 25261121185 25261121188 465792059640
Time taken (s): 924.31

Those are more-or-less consistent with the full run. The ints here really aren’t all that large - it’s only the square of the area that overflows a 64-bit int. So pypy still enjoys a significant advantage over CPython (well, at least for that reason; there’s also, e.g., no speed penalty for using globals under pypy),.