Proposed unittest rewrite/overhaul

The unittest library

I propose a massive overhaul to the unittest standard library (link here). Scrolling through the source code I’ve found largely outdated Python practices such as:

  • Making classes inherit from object

    class TestCase(object):
        ...
    

    This is unnecessary as classes don’t have to inherit from object anymore.

  • Method names following the camelCase convention is just plain un-pythonic

    class TestCase:
        ...
        def assertEquals(self, first, second, msg=None):
            ...
    

    Python code should be in snake_case and so these should be renamed to things like assert_equals. This may cause a backwards compatibility issue and so we can simply add a method “alias” of sorts by setting assert_equals equal to assertEquals towards the end of the class definition.

  • Using the str.format() method and using placeholders in strings (such as here)

    class _SubTest(TestCase):
        ...
        def __str__(self):
            return "{} {}".format(self.test_case, self._subDescription())
    

    It is generally agreed that the f-strings introduced in PEP 498 are much easier to read and have a slight performance advantage.

With the introduction of typing in PEP 484, I believe it would be a good idea to propose adding type hints to most of the library as to improve the developer’s experience when using the unittest library.

I’m sure this is only beginning to reveal the “unpythonicness” of the unittest library.

Now for my question: Am I allowed to try and implement this and make a PR.

Hi Vivaan,

Of course you are allowed to do this, the unittest library is open
source and you can make a fork and modify it however you like.

The real question is whether such a massive overhaul would be accepted
into the standard library. My guess is that the core developers would
not really be keen on it:

  • The core devs are extremely overworked as it is, they are drowning
    in open bug reports and PRs that they don’t have time to review;
    probably nobody will have the time to review a massive PR that
    just makes cosmetic changes.

  • We don’t usually like purely cosmetic changes with no functional
    improvement. If the code is not broken, why touch it? It only risks
    accidentally introducing bugs into the code, for very little or no
    improvement.

  • Adding aliases for backwards compatibility is possibly even less
    Pythonic than just leaving the existing camelCase style names. If
    this change wasn’t made during the Python 2-3 transition, it
    probably will never be.

  • Something so big will probably need a PEP. I wouldn’t expect it to
    be a big or complicated PEP.

To discuss your concrete points:

There is nothing wrong with explicitly inheriting from object. As the
Zen says, “Explicit is better than implicit.” If you remove it, I
guarantee that somebody will propose fixing it by re-adding it. (I know
I would.)

Replacing str.format with f-strings at least has the potential of being
a very small performance improvement. But probably not a meaningful or
significant one.

I don’t remember the type-hinting policy for the standard library. I
think that the policy is that type-hints should be placed in stub files
rather than directly in the .py file, but maybe that rule has changed.
Check what PEP 8 says.

Sorry to be so negative, but I doubt this PR would be accepted. But it
isn’t up to me. I think the best place to ask would be on the Python-Dev
mailing list.

On the other hand, you probably only need one senior core developer
willing to review the PR for this to go through.

I recommend you do some quick and simple benchmarks to see whether the
use of f-strings actually does make a measurable difference. If it does,
that will give you some ammunition to argue for the change.

String formatting benchmarking

Hey Steven,

I have benchmarked all the string interpolation methods that I know of using the timeit module and it seems to me that f-strings do seem to be quicker, especially against str.format() if not using the % placeholders (not sure what to call the method). Now I’m not sure what counts as a measurable difference. I’ve attached the code snippets I used below.

%-formatting

>>> import timeit
>>> timeit.timeit(
... """
... name = "Foo"
... age = 50
... "%s is %s." % (name, age)
... """, number = 100000)
0.02678303599999765

str.format()

>>> import timeit
>>> timeit.timeit(
... """
... name = "Foo"
... age = 50
... "{} is {}.".format(name, age)
... """, number = 100000)
0.0725739259999898

f-strings

>>> import timeit
>>> timeit.timeit(
... """
... name = "Foo"
... age = 50
... f"{name} is {age}."
... """, number = 100000)
0.021585681000004797

I will look into this, thanks for the insight!

I’d check string format() performance on the current main branch as well. @storchaka has made recent performance changes in that area, so it should be a fast as f-strings now. (I don’t remember if this made it into 3.10.)

Also, check out pyperf for stable benchmarking.

Yep the old style %s string formatting should be as fast as f-strings for most cases in 3.11 onwards thanks to Serhiy. What’s New In Python 3.11 — Python 3.11.0a0 documentation

1 Like

Is it %s formatting or str.format() that has been optimised? I’ve pretty much abandoned %-formatting altogether, so that’s of basically no interest to me. But sometimes dynamic “template”.format(args) is needed rather than an f-string, so I’d welcome speedups in that area.

Python 3.9 gives

❯ py -m timeit -s "foo=12" "x=f'a string {foo}'"
1000000 loops, best of 5: 190 nsec per loop
❯ py -m timeit -s "foo=12" "x='a string {foo}'.format(foo=foo)"
500000 loops, best of 5: 530 nsec per loop
❯ py -m timeit -s "foo=12" "x='a string %s' % foo"
1000000 loops, best of 5: 220 nsec per loop

It’s the second one I’d like to see improvement in, if that’s possible.

1 Like

From what I can glean from the bpo only %s was improved. .format wasn’t affected.

For unittest my guess is that it uses more %s due to its age. I spot quite a few .format uses from a cursory search though.

https://bugs.python.org/issue28307

1 Like

Ah, I thought it was format() :slight_smile:

Vivaan, as an idlelib maintainer, I have made similar changes in idlelib.

  • Most of idlelib is private (see PEP-434 and idlelib.init). So external back compatibility is not an issue.

  • I will have to deal with any mistakes. I prefer a 2nd person to be involved. I either make one kind of change or change only one file at a time. The latter is usually done when making other changes: adding docstrings, fixing comments, adding tests, and maybe followed by substantive changes.

So (object) is gone from class statements. Within the last year I did away with imports that let 2.x code run on 3.x. (“from tkinter import messagebox as tkMessagebox” (the 2.x name used in the module code))

Many camelCase functions remain. I only change these within a module when other changes are made. I am more likely to do so when there is already a mixture.

There are still more % formats than {} formats. I would like to change some for readability.

If your interest in re-formatting could extend to idlelib, let me know your bpo name (private message if you wish). Or you might consider a project to write a %-format to f-string converter. I looked on pypi.org and could not find one. It might be possible using ast.parse and look for binary operator nodes with % operator, string left operand, and expression or tuple right operand.