Ideas for a 100 Python Mistakes book

csm10495 · January 28, 2023, 10:50pm

Another one I’ve seen is:

try:
 ... stuff
except:
   print("something went wrong")
more code

Basically: doing giant try/excepts without specifying an exception type AND not at least logging the exception that was caught. It bites so hard because when debugging, its tough to know what code raised. Even worse: The thing that raised could be something completely unrelated to what you think it is.

Melendowski · January 29, 2023, 12:04am

I’m also going to reach out a bit into numeric computing (i.e. NumPy/Pandas), but deliberately not venture as far as dataviz libraries, deep neural networks, possibly just one or two examples with scikit-learn, but also not much with web frameworks and such matters (maybe one or two “mistakes” where the web touches security though).

Iterating over dataframe rows with iterrows() method and or excessive apply() when the operation can be done on the columns directly using methods or passed directly to numpy ufuncs.

Row ops in pandas are slow.

Far too many times I’ve something like this


import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(100,3), columns=list("xyz"))

df.apply(np.linalg.norm, axis=1).rename("magnitude")

When you could do

pd.Series(np.linalg.norm(df), name="magnitude")

Which is orders of magnitude faster

steven.daprano · January 29, 2023, 3:27am

Here’s a specific TDD mistake that you might like.

Don’t rely on “no output == no errors” when doing testing.

Many moons ago, I wrote a library which used a custom descriptor to wrap all the methods in a class. I had extensive doctests, and was developing using a variant of TDD for each method:

write the method;
add the docstring with some tests;
run doctest on the module and see what broke.

Most of the methods were relatively small, and I expected them to pass, so I felt pretty smug when all of the tests passed first time.

Until I got to a method with tests that I knew wouldn’t pass, because it was more complex and I hadn’t written all the implementation yet, but I did have all the tests. I ran doctest, and … everything passed.

Cut to the chase: I was relying on no output == no errors. When I ran doctest in verbose mode, I discovered that it wasn’t running any of my tests. Not one. The custom descriptor was interfering with doctest’s ability to recognise them as methods, so they weren’t being picked up and the doctests weren’t being run.

(If I recall corectly, this was back in 2.4 days, when doctest was more finicky about what it considered a method.)

Cue much wailing and gnashing of teeth. I had to re-engineer the whole module to get doctest to work, and which point I discovered that nearly all of the tests were failing.

The lessons I learned from this:

run doctest in verbose mode occasionally, to make sure it is actually running what you think it is running;
when doing TDD, having tests visibly fail first is a good idea;
don’t trust “no output == no errors”, since no output can also mean nothing ran at all.

Rosuav · January 29, 2023, 4:44am

To expand slightly on this:

TEST YOUR TESTS.

If you have never seen your test fail, you have never validated your test, and cannot trust it (even if you can see from verbose mode that the test really did run). Remember that untested code should be presumed to be buggy, and tests are themselves code, so an untested test is a buggy test that can’t be relied upon.

One of the simplest ways to test a test is to write it before writing the code, as per TDD. But that’s only one form of failure, so check your other tests too.

(If anyone has a system for automated testing of test suites, I am morbidly curious, because I’d love to know whether you can automatically test your automatic test suite tester.)

DavidMertz · January 29, 2023, 5:27pm

@csm10495 It’s already in my list, but I’m acknowledging you.

Same thing of @Melendowski whose good suggestion I’ve already thought of and included in the TOC.

Ditto @steven.daprano .

@ryan-duve : Yup, got mutable defaults. But adding you to acknowledgment.

@abessman : I hadn’t added the late binding closures, but that’s worth adding (even if I remove it later).

@cooperlees : I think mentioning bugbear in the intro or somewhere is worthwhile, although probably not as any particular individual mistake.

@Quercus : Hmm… yeah, references versus copies (send me your full name for acknowledgment).

@rhpvorderman : Well, CLEARLY it is a mistake to edit Python using something that isn’t vim. I guess I should add that :-).

ryan-duve · January 30, 2023, 5:26am

Do you already have mutable parameter defaults?

def combine(a, b, output=[]):
    output.append(a)
    output.append(b)
    return output

print(f"{combine(1,2)=}")
print(f"{combine(3,4)=}")

# combine(1,2)=[1, 2]
# combine(3,4)=[1, 2, 3, 4]

It took me a long time to stop thinking of parameter defaults as something that are evaluated each invocation.

abessman · January 30, 2023, 6:45am

Late binding closures are a classic:

>>> funcs = []
>>> for i in range(3):
>>>     funcs.append(lambda: print(i))
>>> for f in funcs:
>>>     f()
2
2
2

To get the output most newcomers expect, force early binding by assigning default arguments:

>>> funcs = []
>>> for i in range(3):
>>>     funcs.append(lambda x=i: print(x))
>>> for f in funcs:
>>>     f()
0
1
2

cooperlees · January 30, 2023, 9:05pm

Everything GitHub - PyCQA/flake8-bugbear: A plugin for Flake8 finding likely bugs and design problems in your program. Contains warnings that don't belong in pyflakes and pycodestyle. looks for …

d_n · January 30, 2023, 11:24pm

Failing to appreciate that Python’s is really a for-each, not a for-index construct.
for pointer in range( len( collection ) ):
do_something( collection[ pointer ] )

How about the old chestnut:

a = a + 1 # add one to a
or a += 1

Using algebra-like short identifiers, instead of taking advantage of the IDE’s capability to expand and reproduce names on-the-fly.

Making an (additional) copy of a list:
collection2 = collection1 # same id, if ‘one’ changes, so does ‘the other’

Regards =dn

Quercus · January 31, 2023, 1:23am

Consider including a section on the general topic of mistakes concerning references to, versus copies of, mutable objects. The issue can arise not only with assignment statements, but also with the passing of references to mutable objects as arguments to functions. This section would also deal with mistakes related to deep versus shallow copies of mutable objects, or hybrids of these. Also included could be improper creation of or handling of objects that contain references to themselves or to portions of themselves, either directly or through references to other objects, that in turn reference them.

Also, when initializing a nested list, one must be sure to avoid having the contained level consist of multiple references to the same list, for example:

chessboard = [[""] * 8] * 8
chessboard[7][3] = "White Queen"

There exist many exercises within instructional materials that ask the learner to write a function that manipulates a list or other object, but that do not specify whether the original should be kept intact while a copy is modified. In cases where the result is to be returned by the function, one can usually safely assume that the exercise intends for a copy of the object to be created and modified, with the original left untouched. The exercise instructions ought to be explicit about this, so that students are made aware of the issue early on. Perhaps the book might include in its collection of Python mistakes, such omissions that are commonly made by authors when writing instructions within educational material.

fancidev · January 31, 2023, 4:24am

Interesting topic!

Some of the more subtle but well-known mistakes:

function default argument = dict (pycharm would warn about that)
list multiplication of mutable object, e.g.[[]]*10 and then modifying the elements
closure capture for each variable
passing Iterable to functions that expects sequence
SQL (and other types of) injection
floating number equality comparison

rhpvorderman · January 31, 2023, 9:11am

One that I often see in my field: not using generators for large data structures such as large files. Instead reading everything into a dict or list and then passing the list/dict trough several functions that each return a modified list/dict. This makes the code orders of magnitude slower and the memory usage orders of magnitude higher.

A common beginner mistake (inspired by @fancidev): NOT using an IDE but a text editor. (Be it IDLE, notepad etc.). Interactive debuggers are so tremendously useful. You are really hurting yourself when you are not using them.

I once contributed to a Java project, even though I had never written Java before (though I had some experience with scala). I knew which data structures to use and how to solve the problem, I just didn’t know Java. The IDE (IntelliJ) saved me there and I was able to write Java without any Java experience. Syntax highlighting, autocomplete, highlighting mistakes and anti-patterns. IDE’s help so much! The Java PR was merged without much comment and I never had to write Java again!

Eclips4 · January 31, 2023, 1:08pm

A large number of newcomers use list comprehensions not for the intended purpose:

[print(i) for i in range(10)]

instead of

for i in range(10):
    print(i)

pcb21 · January 31, 2023, 4:46pm

There are some builtins that are very easy to overwrite without thinking. My personal two bete noirs are ‘id’ and ‘dir’. A secondary downside, apart from changing the behaviour of the program, is that they cause noise in static analysers. E.g. if you overwrite a builtin, PyCharm will put a warning on that line. You are then left with the options of surpressing warnings via #noqa or having a noisy inspection report where other more serious problems may be overlooked. To avoid this I have learnt to spell them ‘idd’ and ‘dirr’.

Rosuav · January 31, 2023, 5:38pm

So you use a less-clear variable name just to silence a PyCharm warning? That seems counter-productive.

pcb21 · January 31, 2023, 6:20pm

Well, as I said, it doesn’t just suppress a warning, it also avoids the shadowing itself.
But typing it out caused to me look deeper into the PyCharm settings, and it is possible to configure this warning on a variable-name-by-variable-name basis, which may well be preferable.

mattwelke · February 1, 2023, 4:30am

This book sounds cool. I reviewed two manuscripts for Manning “100 Mistakes” books - for Go and Java. Based on my experience reviewing them, I’ll share my thoughts on what I liked and didn’t like.

With the Go book, I liked that the author kept things short. Each “mistake” I could read in about a minute. If I wanted to fully grok it by re-reading it a few times and running the code myself, it was no more than 5 minutes each, maybe 10-15 for some more complex ones. The author also focused on things that were low level, so things that many Go devs could relate to. Variable shadowing mistakes, string handling quirks, slice handling quirks, and error handling come to mind.

Overall, I really liked it. I know you probably want to go for your own style, but I do recommend giving the final published version a glance if you’re looking for inspiration. Based on my experience working with Manning, they’re willing to open up their entire library to you to help you create the content you’re contracted to create for them. So, you could take advantage of that.

With the Java book, I found it a bit dry and unmemorable. I was seeing a lot of stuff that was either too basic for me to be interested in (where I felt it was hard to make a “mistake” with it as long as you read the docs) or it was very verbose content that was about arranging your project and setting up various tools like linters and annotation processors. That’s going to get out of date, so I felt it didn’t make much sense being there. The one thing I remember the most is a chapter on creating your own static code analysis plugin. That was cool, but then again, does it make sense for an entire chapter to be a tutorial like that in a “100 Mistakes” book? Probably not, so that still felt weird to me.

Based on those experiences, I think my ideal “100 Python Mistakes” book would be one that kept each mistake short and sweet and focused on language fundamentals (instead of subjective coding styles, project set up, favorite libraries, etc).

With regard to popular libraries, the exception I think I would make for Python is that I’d be okay with a “mistake” like “don’t implement data frames, use x” because Python doesn’t have a lot of the things it does best in its standard library. Instead, it’s considered best practice to look to particular popular 3rd party libraries for these use cases. Pandas for data frames comes to mind. But, don’t show me how to use x. That would take too long to read, get out of date quickly, and veer into being off topic. Just quickly let me know of its existence, with maybe a quick comparison between the verbose, error-prone monstrosity I’d have to write if using vanilla Python vs. the short, efficient, code I’d be able to write by using x.

With regard to Python language fundamentals, I know that as someone who plans to skill up on Python soon, there are a few things I’m interested in a few things about Python programming. But, I don’t know if they’re best for a “100 Mistakes” book or a book dedicated to learning Python programming in general. They are:

How to do async programming (mistakes could be “don’t block threads more often than you need to” etc)
How to stream data (mistakes could be “don’t store more data in memory than you need to” etc)
How to package Python applications (here be dragons - this is probably so hard to do in a book, I know. Also, Manning already has a book dedicated to this)
How to Dockerize Python applications, be they CLIs or long lived processes like web apps (maybe out of scope)
Anything that modern versions of Python has retroactively made into anti-patterns (mistakes could be “don’t do x because Python y introduced z” etc)

Best of luck!

mattwelke · February 1, 2023, 4:39am

Correct me if I’m wrong, but the problem with list comprehension for this is that list comprehension is meant for succinctly creating new data using the resulting expression, whereas loops are meant for iteration where you want to perform a side effect multiple times. So, relying on side effects that occur during list comprehension is less clear. Is this right?

Eclips4 · February 1, 2023, 2:59pm

Yep. You’re right.

Quercus · February 1, 2023, 3:18pm

Yes, let’s be guided by the following documents:

The first one states this as a rationale:

List comprehensions provide a more concise way to create lists in situations where map() and filter() and/or nested loops would currently be used.

The second one offers this:

List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition.

So, upon encountering a list comprehension in code, one’s initial impression would generally be that its purpose is to create a list that is to be used subsequently, rather than one that is to be immediately discarded after side effects are accomplished.