Ideas for a 100 Python Mistakes book

Had all sorts of noob confusion coming from C - pass by value / using pointers vs the python object reference model regarding mutable and immutable types.

Kept on getting caught on function argument re-assignment (thinking it was some kind of automatically dereferenced pointer). The idea of str being immutable was a totally new concept. Without the help of an IDE there’s no warnings.

TL;DR:

These apparently do the same and print the same result, but the first creates a new object, the second mutates the same object.

df = df * 2
print(df)

df[:] = df[:] * 2 
print(df)

To a beginner this could be very confounding. Its not until you have a mental model of mutable types and object references that it becomes second nature. Coming from C pointers, the foundation were there, but still the pythonic way was confusing, especially with immutable strings and extended pandas notations.

Full example:

import pandas as pd

data_dict = {'col_a': [1, 2, 3], 'col_b': [1, 2, 3]}
df = pd.DataFrame(data_dict)

df
Out[5]: 
   col_a  col_b
0      1      1
1      2      2
2      3      3

df * 2
Out[6]: 
   col_a  col_b
0      2      2
1      4      4
2      6      6

def double(df):
    df = df * 2
    print(df)
    
double(df)
   col_a  col_b
0      2      2
1      4      4
2      6      6

df
Out[9]: 
   col_a  col_b
0      1      1
1      2      2
2      3      3
# aaargh why didn't the double stick?? 
# Even printed it in the function to make sure!!

Same applies for any mutable.

Of course, this works:

df[:] = df[:] * 2 

df
Out[14]: 
   col_a  col_b
0      2      2
1      4      4
2      6      6
1 Like

That’s a nice example, @Brendan, perhaps a good one for the book.

Inspired by that, here’s a simplified example of reassignment versus slice replacement, maybe also one that could be adapted for the book:

a = [1, 2, 3, 4, 5]
b = a         # b is an alias for a
a = a * 2     # reassignment makes new list
print(a)
print(b)

output:

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]
a = [1, 2, 3, 4, 5]
b = a           # b is an alias for a
a[:] = a[:] * 2 # slice replacement modifies original list
print(a)
print(b)

output:

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

The takeaway lesson would be to always understand whether a reference or a new object is being created.

1 Like

Or better: Always know what you’re assigning to. It’s the last part of whatever assignment statement you did.

a.b.c[2] = 123

It’s the last part that’s assigned (in this case, element [2]), and everything else is being referenced.

2 Likes

The book could include examples of mistakes that relate to unawareness of how some operators have been overridden within certain popular Python libraries. This would include how bitwise operators are used in a pandas.DataFrame object. Perhaps someone who is more thoroughly practiced with pandas than myself could present some good examples of such mistakes within this discussion.

Not just less clear, but less efficient.

If you are looping ten thousand times in a comprehension just for the side effects, you are telling the interpreter to carefully construct a list containing 10000 items, and then to immediately garbage collect it and all 10000 items inside it.

The day may come that the interpreter is clever enough to optimize away that unneeded construction and destruction, but the CPython interpreter at least is not yet clever enough to do so.

1 Like

Does it matter? How often do you use the id and dir builtins? Who cares if you shadow them inside a short function?

One of the more interesting, and I mean that as a good thing, design choices of “Refactoring (Ruby Edition)” by Fields, Harvie and Fowler (that’s Martin Fowler) is that they will often recommend one refactoring technique, and then immediately recommend the opposite.

E.g. they have

  • Decompose Conditional
  • Recompose Conditional
  • Add Parameter
  • Remove Parameter

I think that shadowing is a good example for when this juxtaposition is helpful:

  • Don’t shadow builtins.
  • Don’t be afraid to shadow builtins.

Sometimes we intend to shadow builtins. Shadowing is not just a mistake, sometimes it’s a feature, and the mistake is to avoid it unnecessarily.

It is a mistake to shadow len or list in the top level of your module, where it has the potential to break your code in all sorts of places.

But its also a mistake to use an unclear or unnatural name as a local variable inside a short function merely to avoid shadowing a builtin you don’t care about.

Why use a worse name just to silence some opinionated linter or colleague? :wink:

Rules like “don’t shadow builtins” exist so that you think before you break them.

2 Likes

a technique I picked up from older tutorials is that if you really don’t want to shadow something that’s already defined you add an underscore as in len_ for len, I don’t use it that often though

+1. There’s good reasons to shadow builtins. There’s also good reasons to have your editor highlight builtins in a different colour, so you don’t unexpectedly find that you’ve shot yourself in the foot.

Spot on. For the mental model, the pass-by-object-reference to functions and methods is also important for python understanding.

2 Likes

On forums connected with introductory Python tutorials, I have often seen participants thinking that if they pass a variable to a function, that the function receives a reference to the variable itself, and that therefore the function should be able to change the value of that external variable, via something such as an assignment to the corresponding formal parameter. Consistent with that belief, when they observe that modifying a mutable object within the function can modify the external object referred to by the argument, they sometimes think it was because the external variable itself was modified.

If the book does contain a section on mistakes concerning references to, versus copies of, objects, that section would be especially interesting to beginners. In order to keep that component of the book’s audience happy, free of misconceptions, and ready to move forward, it would be helpful to explain at the start of that section, with the aid of a diagram and example code, the mechanism of pass-by-object-reference.

EDITED for clarification via using the phrase “assignment to the corresponding formal parameter”.

@Quercus great summary. Its not necessarily an easy topic for a beginner book, however one that is really important IMHO to gain competence. Its also a diffeent model to other languages (that I’d used in the past). I think it was a good post on SO that cleared it up for me, and I recall seeing some diagrams too - will see if I can dig up the reference.

In the meantime, here’s a great example of all 100 mistakes in one script!

1 Like

Then the concept must by necessity be introduced really early in the book - right on the cover, in fact. Instead of merely listing the name of the author there, have it state:

Written by a man named David Mertz

:grin:

1 Like

You may have seen some diagrams here …

… along with …

“Hamlet was not written by Shakespeare; it was merely written by a man named Shakespeare.”

And “…proclivity toward double abstractions”. Gold :slight_smile:

These also may be useful:
https://nedbatchelder.com/text/names.html

In the answers on this one:

1 Like

Overusing lambda. You’re almost always better with a list comprehension, generator expression, or something from the operator module.

Overusing regexes. Python is not Perl (fortunately). Usually if things can be done using str methods instead, that’s a win.

1 Like

Thanks for the links. All of them offer ideas that might be useful for the book.

Note that Programming FAQ: How do I write a function with output parameters (call by reference)? from the official Python documentation states:

Remember that arguments are passed by assignment in Python.

The object reference, of course, is the thing assigned, making that equivalent to stating that it is pass-by-object-reference. But is a new learner likely to recognize that? Somewhere I noticed the passing of arguments in Python described as pass-by-value, with the value that is passed being the object reference, however I cannot remember where that was stated. The book could take on the challenge of explaining this variety of terminology that has been used to describe the same process, ultimately standardizing on the terminology that describes it best, namely pass-by-object-reference.

I would hope that a new learner would read the next sentence after the one you quoted, which says:

Since assignment just creates references to objects…

and then goes on to give a very thorough explanation.

The next sentence in its entirety is this, and I’m not sure the beginner would understand all of the terminology within:

Since assignment just creates references to objects, there’s no alias between an argument name in the caller and callee, and so no call-by-reference per se.

For example, the beginner might not know what an alias is. The very thorough explanation that follows it is also good, and I’m not critical of it. But a beginner would probably need some additional help in order to understand it.

1 Like

These are so common I’m sure you have them, but I’ll list them anyway

  • Mutating an object you are iterating over
  • Expecting assignment to make a copy
  • Using a mutable default argument
  • Confusing list.append() with list.extend()
  • Expecting floating point numbers to be able to represent all decimals (possibly Python specific since decimal.Decimal exists)
  • Mutating a list rather than using a comprehension
  • Not understanding the difference between is and ==
3 Likes

Thanks Matt.

I’ve looked at the Go title already, which was apparently very successful (and I agree that Teiva has done a very good job). I have not seen the Java one, although from your description it doesn’t seem like I’m missing a lot.

I’ve added you to acknowledgements, although I think everything you suggest is already in the TOC. The idea of “use the right library rather than rolling your own” is touched on a couple times in different “mistakes” (not ones I’ve actually written yet, but I have the topics).

I’m not going to do packaging. It’s too big, and there are too many opinions. But specifically saying that I’m not doing it is something I should add to the front matter, so thank you.