Hi Dominik,

What is wrong is that the Discuss software does not email images, it

emails an internal markdown code that is unviewable.

But why are you taking screenshots of text? Screenshots of GUI

applications makes sense: if you want to show a problem with a GUI, you

need to see the graphical elements. But your results are *text*.

Not everyone can see screenshots *at all*. I know that there is at least

one highly respected core Python developer whose eyesight has

deteriated to the point that he is (I think) legally blind. Iâ€™ve worked

with no fewer than three developers who are likewise are legally blind,

all had extreme problems dealing with screenshots. I know at least one

person who is completely blind and uses a screen reader to read emails.

I never realised that he was blind until he apologised in advance for

not being able to understand something I had written because his screen

reader was mispronouncing the words and he couldnâ€™t tell what they were.

Screen shots of text discriminate against the visually impaired and

blind. They are also inconvenient and annoying for many fully sighted

readers too.

I said:

"I would expect that your timing code is measuring almost nothing but

noise."

and you disagreed:

â€śIt seems that your expectation is wrong.â€ť

What makes you say that?

I asked what you were hoping to demonstrate with the cumulative errors,

and you responded by saying:

"The cumulative error is divided by the number of test runs at the

end, please have a look at the __ str __ method."

Yes, I know that. But I still donâ€™t know *why* you are calculating those

errors in the first place. What do you think the errors show?

Are you trying to find the algorithm with the smallest error? The

smallest error is the identity function that just returns the number

unchanged. Walk me through your logic as if I were five please.

You said:

â€śâ€ť"

I am calculating the differences between the result of each

limit*-function and the random FLOAT it has been fed with:

see â€śabs(float(fraction) - randomValueF) / abs(randomValueF)â€ť

â€śâ€ť"

Yes, I know that. Every float is equal to an exact fraction.

(To be pedantic: only the finite floats. NANs and INFs excepted.)

For example, the float math.pi is exactly equal to the fraction

884279719003555/281474976710656.

If we reduce that fraction to have a smaller denominator, we are

naturally going to introduce some difference. The fraction 22/7 is the

closest fraction to math.pi with the denominator less than 10.

The difference between math.pi and 22/7 is 0.0012644892673497412.

(I know that you are calculating the relative error, not the absolute

error. You are dividing by the original number. But that doesnâ€™t change

my point, so for simplicity I am going to continue with absolute error.)

The difference between math.pi and 311/99 is even smaller, just

0.00017851217565170184, so thatâ€™s better, right?

Not if we want to limit the denominator to be less than ten! 99 is not

less than ten.

In your case, you have an additional constraint: you also want to limit

the numerator, not just the denominator. But the principle applies: we

are not just trying to find the smallest difference between the original

number and the new number. For that, we could just use the original

number unchanged.

So the bottom line here, if we have four different algorithms for

limiting the numerator and denominator of a fraction, they **must**

return the same (numerator, denominator) pair. If they donâ€™t, some of

them are simply wrong and returning the wrong result.

To be clear:

If any two algorithms return a different (num, den) pair, then

*at least* one of them is wrong. They might both be wrong, but they

canâ€™t both be right. Only one num/den fraction can be the closest to the

original number.

If all four return the same pair, thatâ€™s reasonable evidence that they

are probably correct. It is unlikely that four independent algorithms

would all just happen to return the same wrong answer. (Very unlikely,

but not impossible.) If any one of them is different, thatâ€™s a good sign

that it is the wrong answer.

But if all four are different, that shows that *at least* three of them

are wrong, maybe all four are wrong. Itâ€™s not a matter of pulling out

the one which is least wrong *on average* and calling it the best.

Looking at the error terms and picking the function with the smallest

error terms is the wrong way to analyse this. You are measuring the

wrong thing.

You suggested that you are trying to:

"COMPARE the (mean) performance of the four algorithm versions (so, your

â€śnoiseâ€ť should cancel out sufficiently, right?)."

Iâ€™m afraid not. Errors only cancels out when they are equally likely to

be positive or negative. If I have a piece of wood exactly 78.3 cm long,

and ask six people to measure it, they are likely to get results

something like this:

```
77.9, 78.0, 78.2, 78.5, 78.5, 78.7
```

and the errors cancel out. (In this case, they cancel out *exactly*,

because I made the numbers up. In the real world, we are rarely that

lucky.)

But noise when timing code is not like that. The errors are always

one-sided: your measured time will never have a negative error.

Suppose the â€śtrueâ€ť time it takes your CPU to compute a result is, letâ€™s

say, exactly 10Âµs. Then your timing code will probably

measure something like:

```
12, 15, 16, 18, 19, 26 Âµs
```

and the best estimate of the true time to run the code is the minimum,

12Âµs, not the mean, 17.7Âµs.

*Noise does not cancel out* because it always adds time to the

measurement, it never subtracts it. Taking the average just averages the

noise. It gets you no closer to the actual execution cost of the code,

it just moves you further away.

Worse, if the code snippet is small and fast enough, the noise might

completely overwhelm the measurement. If your results are 12Âµs etc

above, we have no way of knowing that the true computational cost is

actually 10Âµs. It might be 1Âµs, and the rest of the measurement is all

noise.

This is why we have the timeit module. The timeit module is designed to

measure the running cost of small code snippets, with as little noise

as possible.

It cannot eliminate all noise â€“ nothing can. But it can try to make it

as small as possible, within the Python interpreter at least. It canâ€™t

do anything about noise from the OS, e.g. other processes running.

Regarding the use of eval on the function names, passed as strings, no,

I did not overlook the use of self.testFctNames to print the names of

the functions. Although you might have overlooked that functions know

their own names and you can print those

```
>>> def func(): pass
...
>>> func.__name__
'func'
```

Inside a Python script, there is little need to pass function names

separately from the function themselves. The only exception I can think

of is if you are getting the names from user-provided data, e.g. read

from a file, or a command line, or a GUI input widget (say, for a

calculator app). You can only get user input as strings.

Even there, *especially* there, you should not use eval to evaluate the

string to get the function. Using eval on untrusted user input is

very dangerous. Instead, use a global lookup:

```
func = globals()[name]
```

But in your case, you donâ€™t need that since you can just pass the

function object and retrieve the name from the function, not the

function from the name.