Proposed unittest rewrite/overhaul

doublevcodes · May 13, 2021, 7:07pm

The `unittest` library

I propose a massive overhaul to the unittest standard library (link here). Scrolling through the source code I’ve found largely outdated Python practices such as:

Making classes inherit from object
```
class TestCase(object):
    ...
```
This is unnecessary as classes don’t have to inherit from object anymore.
Method names following the camelCase convention is just plain un-pythonic
```
class TestCase:
    ...
    def assertEquals(self, first, second, msg=None):
        ...
```
Python code should be in snake_case and so these should be renamed to things like assert_equals. This may cause a backwards compatibility issue and so we can simply add a method “alias” of sorts by setting assert_equals equal to assertEquals towards the end of the class definition.
Using the str.format() method and using placeholders in strings (such as here)
```
class _SubTest(TestCase):
    ...
    def __str__(self):
        return "{} {}".format(self.test_case, self._subDescription())
```
It is generally agreed that the f-strings introduced in PEP 498 are much easier to read and have a slight performance advantage.

With the introduction of typing in PEP 484, I believe it would be a good idea to propose adding type hints to most of the library as to improve the developer’s experience when using the unittest library.

I’m sure this is only beginning to reveal the “unpythonicness” of the unittest library.

Now for my question: Am I allowed to try and implement this and make a PR.

steven.daprano · May 13, 2021, 9:51pm

Hi Vivaan,

Of course you are allowed to do this, the unittest library is open
source and you can make a fork and modify it however you like.

The real question is whether such a massive overhaul would be accepted
into the standard library. My guess is that the core developers would
not really be keen on it:

The core devs are extremely overworked as it is, they are drowning
in open bug reports and PRs that they don’t have time to review;
probably nobody will have the time to review a massive PR that
just makes cosmetic changes.
We don’t usually like purely cosmetic changes with no functional
improvement. If the code is not broken, why touch it? It only risks
accidentally introducing bugs into the code, for very little or no
improvement.
Adding aliases for backwards compatibility is possibly even less
Pythonic than just leaving the existing camelCase style names. If
this change wasn’t made during the Python 2-3 transition, it
probably will never be.
Something so big will probably need a PEP. I wouldn’t expect it to
be a big or complicated PEP.

To discuss your concrete points:

There is nothing wrong with explicitly inheriting from object. As the
Zen says, “Explicit is better than implicit.” If you remove it, I
guarantee that somebody will propose fixing it by re-adding it. (I know
I would.)

Replacing str.format with f-strings at least has the potential of being
a very small performance improvement. But probably not a meaningful or
significant one.

I don’t remember the type-hinting policy for the standard library. I
think that the policy is that type-hints should be placed in stub files
rather than directly in the .py file, but maybe that rule has changed.
Check what PEP 8 says.

Sorry to be so negative, but I doubt this PR would be accepted. But it
isn’t up to me. I think the best place to ask would be on the Python-Dev
mailing list.

On the other hand, you probably only need one senior core developer
willing to review the PR for this to go through.

I recommend you do some quick and simple benchmarks to see whether the
use of f-strings actually does make a measurable difference. If it does,
that will give you some ammunition to argue for the change.

doublevcodes · May 14, 2021, 3:23pm

String formatting benchmarking

Hey Steven,

I have benchmarked all the string interpolation methods that I know of using the timeit module and it seems to me that f-strings do seem to be quicker, especially against str.format() if not using the % placeholders (not sure what to call the method). Now I’m not sure what counts as a measurable difference. I’ve attached the code snippets I used below.

`%-formatting`

>>> import timeit
>>> timeit.timeit(
... """
... name = "Foo"
... age = 50
... "%s is %s." % (name, age)
... """, number = 100000)
0.02678303599999765

`str.format()`

>>> import timeit
>>> timeit.timeit(
... """
... name = "Foo"
... age = 50
... "{} is {}.".format(name, age)
... """, number = 100000)
0.0725739259999898

`f-strings`

>>> import timeit
>>> timeit.timeit(
... """
... name = "Foo"
... age = 50
... f"{name} is {age}."
... """, number = 100000)
0.021585681000004797

I will look into this, thanks for the insight!

erlendaasland · May 17, 2021, 9:40am

I’d check string format() performance on the current main branch as well. @storchaka has made recent performance changes in that area, so it should be a fast as f-strings now. (I don’t remember if this made it into 3.10.)

Also, check out pyperf for stable benchmarking.

kj0 · May 17, 2021, 11:37am

Yep the old style %s string formatting should be as fast as f-strings for most cases in 3.11 onwards thanks to Serhiy. What’s New In Python 3.11 — Python 3.11.0a0 documentation

pf_moore · May 17, 2021, 11:44am

Is it %s formatting or str.format() that has been optimised? I’ve pretty much abandoned %-formatting altogether, so that’s of basically no interest to me. But sometimes dynamic “template”.format(args) is needed rather than an f-string, so I’d welcome speedups in that area.

Python 3.9 gives

❯ py -m timeit -s "foo=12" "x=f'a string {foo}'"
1000000 loops, best of 5: 190 nsec per loop
❯ py -m timeit -s "foo=12" "x='a string {foo}'.format(foo=foo)"
500000 loops, best of 5: 530 nsec per loop
❯ py -m timeit -s "foo=12" "x='a string %s' % foo"
1000000 loops, best of 5: 220 nsec per loop

It’s the second one I’d like to see improvement in, if that’s possible.

kj0 · May 17, 2021, 11:52am

From what I can glean from the bpo only %s was improved. .format wasn’t affected.

For unittest my guess is that it uses more %s due to its age. I spot quite a few .format uses from a cursory search though.

https://bugs.python.org/issue28307

erlendaasland · May 17, 2021, 1:15pm

Ah, I thought it was format()

tjreedy · June 29, 2021, 10:38pm

Vivaan, as an idlelib maintainer, I have made similar changes in idlelib.

Most of idlelib is private (see PEP-434 and idlelib.init). So external back compatibility is not an issue.
I will have to deal with any mistakes. I prefer a 2nd person to be involved. I either make one kind of change or change only one file at a time. The latter is usually done when making other changes: adding docstrings, fixing comments, adding tests, and maybe followed by substantive changes.

So (object) is gone from class statements. Within the last year I did away with imports that let 2.x code run on 3.x. (“from tkinter import messagebox as tkMessagebox” (the 2.x name used in the module code))

Many camelCase functions remain. I only change these within a module when other changes are made. I am more likely to do so when there is already a mixture.

There are still more % formats than {} formats. I would like to change some for readability.

If your interest in re-formatting could extend to idlelib, let me know your bpo name (private message if you wish). Or you might consider a project to write a %-format to f-string converter. I looked on pypi.org and could not find one. It might be possible using ast.parse and look for binary operator nodes with % operator, string left operand, and expression or tuple right operand.

Topic		Replies	Views
Snake_case aliases to camelCased methods in unittest Ideas	39	1693	February 22, 2024
unittest.assertCountEqual naming is confuse Ideas	4	554	January 5, 2021
Logging attributes standardization Ideas	19	1339	July 19, 2023
E.g. patch("module.function") -> patch(module.function) Ideas	10	1954	December 29, 2022
Parametrized unit tests with information inferred at run time Python Help help	0	310	February 9, 2022

Proposed unittest rewrite/overhaul

The unittest library

String formatting benchmarking

%-formatting

str.format()

f-strings

Related Topics

The `unittest` library

`%-formatting`

`str.format()`

`f-strings`