Timeit: Make CLI function available from python

yoavdw · May 14, 2024, 9:33pm

Original title: Warn about unreliable tests in timeit’s python interface, not just in the command line

Currently, when using timeit’s command line, if the test results vary too much, you get a very nice warning:

> python -m timeit -n 1000 "import this"
The Zen of Python, by Tim Peters
...
Namespaces are one honking great idea -- let's do more of those!
1000 loops, best of 5: 88.8 nsec per loop
:0: UserWarning: The test results are likely unreliable. The worst time (8.1 usec) was more than four times slower than the best time (88.8 nsec).

But when running the same thing in Python:

import timeit
>>> timeit.timeit("import this", number=1000)
# The Zen of Python, by Tim Peters
# ...
# Namespaces are one honking great idea -- let's do more of those!
# 0.0014674999983981252

I think it would be better to move this check for unreliable tests outside of timeit.main (which is used for timeit’s command-line interface), to timeit.Timer.repeat, which timeit.main internally calls.

This should be a pretty simple change, it would be more consistent, and since it’s already deemed useful for timeit’s command-line interface’s users, and the Python usage is usually for the same purposes, I don’t see why it wouldn’t be useful for that too.

Disclaimer: I haven't faced this issue myself

I want to be open about the fact that I haven’t faced this issue myself. I understand that this makes this idea “artificial” as I can’t confirm it has been a problem for everyone.

The reason I’m still doing this is that I still believe it’s an objective improvement, and that I’ve been reading (and commenting) the ideas section for a few months in preparation of participating more, and I believe the best way to start is from small ideas.

I still ask people will take this idea for what it is, and not dismiss it because of this, but I understand it has an effect.

Waiting to hear what you all think,
Thanks in advance!

yoavdw · May 14, 2024, 9:43pm

Also, since this idea is pretty minor, let me know if I should just directly open a GitHub PR instead of posting here. I wasn’t sure so I went here first for safety.

kknechtel · May 14, 2024, 11:17pm

You might be interested in my previous third-party work on the timeit interface:

And corresponding blog post:

yoavdw · May 14, 2024, 11:42pm

Thank you! I went ahead and checked what you did with the warning, and it looks like in your version you completely removed it here, do you happen to remember why?

Off-topic

Marking this as off-topic since I saw in your blog your intention is not to get this merged into cpython.

I really like your refactoring and blog, nice work! I also really liked the motto and linked article about “There must be a better way!”, that’s a really good way of thinking.

tjreedy · May 15, 2024, 1:47am

I think posting here first was a good idea. I assume that the feature was added after the initial code. You might see if git blame will point you to any discussion about when to give the warning. I don’t know if timeit is used non-interactively.

kknechtel · May 15, 2024, 2:29am

It’s still there (notice the message appears in both red and green parts of the diff); I just didn’t do very good commit splitting at that point in the process and I had moved around or reworked a few different things, so the diff for that commit has a lot in it.

I did leave it as being specific to the command-line tool, because in API use it would be impolite to print something as a side effect when calling a function that is supposed to just give a result back. (There would be no clean way to prevent it from printing.)

The credit here goes to @rhettinger of course.

yoavdw · May 15, 2024, 6:46am

Thanks! I initially did this and GitHub pointed to the wrong issue, since the commit was before the issues were migrated to GitHub.

After a bit of googling, I found the original issue, in which @rhettinger (which was funnily mentioned here already for unrelated reasons), explicitly says:

either in timeit.repeat() or in the command-line interface

However, in the rest of the discussion this was not mentioned, and the patch by Raymond and @storchaka, there is no mention of why it was only added to the command-line.

Raymond and Serhiy, long shot, but do you by any chance know why this decision was made, or know if this was discussed somewhere else?

Here’s a link to the discussion: Issue 23552: Have timeit warn about runs that are not independent of each other - Python tracker

storchaka · May 15, 2024, 9:17am

timeit.repeat() is an API. It is purposed to be used programmically. Unlike to the CLI, it does not print to the stdout and does not determine the number of repeats automatically. It is application depending what results are considered unreliable and how they should be handled. At first, the emitted warning will pollute the output of programs, even if it is normal to get a wide range of results or if such case is already handled. Every user will need to wrap the timeit.repeat() call with the code that silences the warning.

On other hand, the timeit CLI is not an API. It is a tool purposed to be directly used by human. Its output is human-readable and its format is not fixed.

yoavdw · May 15, 2024, 12:14pm

Thank you for the feedback!
That’s a good point, and I did think of it.

However, the way I think of it, and correct me if I’m wrong, timeit’s main use case, both in the Python and in the command-line, is for quick checks of measuring your snippets.

When I use timeit, I usually open the python REPL instead of using the command line, because it’s just more ergonomic.

I think this is the common use case. I don’t think many users are using timeit as an API in “production”. I think the use cases of timeit from the command-line and from the Python API are usually the same, and so users of timeit would benefit from this being in the Python API as-well.

I think timeit isn’t an API that is used in the same way most other stdlib modules are, and is pretty much only used interactively. I don’t think many people using timeit will end up wrapping it in something to catch the warnings.

I know this is an assumption I’m making about all users, and I can’t just assume everyone uses timeit the same as way I do, but:

This isn’t a behavior change that really affects execution of programs, so it’s not as bad.
Timeit’s title “for small code snippets” implies this is the intended use case.
If my assumption is flat out wrong, and I’m unaware of some very common use case of timeit in production, is there some way to only warn in interactive environments?

Stefan2 · May 15, 2024, 12:37pm

I’ve written hundreds if not thousands of benchmarking scripts and really wouldn’t want my outputs cluttered with such warnings.

Stefan2 · May 15, 2024, 2:33pm

Although I mostly use timeit() now, not repeat(), so mostly wouldn’t be affected…

Nineteendo · May 15, 2024, 2:46pm

Maybe a new parameter or function could be added? That wouldn’t break anything.

yoavdw · May 15, 2024, 3:52pm

I see. I’m not sure if you would get them a lot since it shouldn’t happen much, but I understand the point being made.

New idea: timeit.auto / timeit.run

What do you, and especially @storchaka, think about a timeit function that is made for human use, and not as a general API?

I really do like the features of the command-line like the warning and output, but I don’t want to use it from the command-line (It’s much easier to write even a few lines of code in the REPL).

How about a timeit.auto function, that does pretty much what timeit.main does, but without the command-line argument parsing?

I feel like that would be useful for a lot of people and would also give a solution to what I’m asking without disturbing the status quo

From the criticism I’ve gotten here so far, it seems like that would be the best solution, though it would require a bit more changes. I would still be glad to open a PR for that.

storchaka · May 15, 2024, 4:31pm

The timeit.repeat() function returns results of all repeats. You can do with them what do you want: calculate minimum, maximum, median, average, range, deviation, what is meaningful to you. And you can add new data to already collected to make the aggregated result more reliable. The timeit CLI outputs a single number. Without additional warning you cannot know if the results vary too much. This is the main difference.

timeit is for fast and cheap tests. If this is not enough to you, it is not difficult to create a wrapper. If this is still not enough (for example, you want to collect the time of every iteration in the loop, not only the total time, or collect additional data), it is better to write a specialized code using Python primitives. It does not need to be such general as timeit. At some point, the configuration for overgeneralized function became more complex that writing a specialized code from scratch.

yoavdw · May 15, 2024, 4:44pm

The thing is, I don’t need a wrapper to add functionality, I want functionality that is already available in timeit, but currently only available in the CLI.

That’s the main thing I’m proposing, make the functionality provided in the CLI also available in Python. I think that would improve the quality of life of using timeit a lot.

yoavdw · May 15, 2024, 5:14pm

Another thought

In my mind, this is similar to the approach in cProfile. A simple API is provided both in the command-line and in Python (cProfile.run is similar to my proposed timeit.auto), this does print human-readable outputs, even in the Python API.

A more advanced API is provided in Python only for those who need it.
(Sorry for the edit, accidentally published before finishing writing)

yoavdw · May 17, 2024, 7:56am

Hey @storchaka, sorry if this is spammy, but I didn’t see you answered to this argument (unless I missed it somewhere in which case let me know), and I’m really curious to hear what you think about it

storchaka · May 17, 2024, 11:39am

Sorry, I am not not interested in the new API. timeit is much simpler than cProfile. Everything you need can usually be implemented in a few lines of code faster than reading the cProfile documentation.

But you can try to find other core developers who are interested in this.

yoavdw · May 17, 2024, 12:31pm

Okay, thank you.

I’ll leave a response to this, but I understand if you won’t respond since you’re not interested:

It’s still feels like a missed opportunity for me that there is functionality which was deemed useful for timeit, and already exists in it, yet it’s only usable for CLI users.

I know I stated in the start of this thread I didn’t encounter this, but now that I think about - I think a human-only timeit.run could be really useful for a lot of things. The REPL for me is a much better place for quick tests than a CLI, and I would love the formatting, warnings, etc. to be available.

yoavdw · May 17, 2024, 1:00pm

Pinging core developers who I think might be interested in this, and I want their opinions on it.
If this is not the forum etiquette - I am very sorry, please let me know. I tried looking in the experts index in the dev guide, but there were none.

@tjreedy Since you showed interest
@vstinner @malemburg For relatively (last few years) contributions to timeit

Not core devs:
@kknechtel For interest in timeit, though I think I already know your opinion on this.

How would you all feel about a timeit.run function that provides functionality of timeit that is currently only in the CLI, for the python interface?

Here is a draft PR for it I made: Draft: Add timeit.run which provides the same functionality as the CLI by WolfDWyc · Pull Request #1 · WolfDWyc/cpython · GitHub, but the gist is this:

timeit.run

Provides (roughly) the same functionality as the CLI.
This functionality was previously in timeit, but only in main. I split main to the argument parsing and return codes part, and moved everything else to run.

This includes:

Print and format to human-readable results instead of returning them
Time unit selection for results
Verbose mode
Determine number using autorange, if not provided
Warn if the tests are unreliable

Example usage

import random
import time
import timeit

def test_random():
     if random.random() > 0.9:
             time.sleep(1)

timeit.run(test_random, verbose=True, number=0, time_unit="nsec")
# 1 loop -> 1.84e-05 secs
# 2 loops -> 2.09e-05 secs
# 5 loops -> 1.001 secs
#
# raw times: 1e+09 nsec, 5.04e+04 nsec, 1e+09 nsec, 2.001e+09 nsec, 3.69e+04 nsec
# 
# 5 loops, best of 5: 7380 nsec per loop
# :0: UserWarning: The test results are likely unreliable. The worst time (4.001e+08 nsec) was more than four times slower than # # the best time (7380 nsec).