Ideas for a 100 Python Mistakes book

Just looking for thoughts here. Manning has asked me to write a book in their series 100 {lang} Mistakes and How to Avoid Them. I’m working on a formal proposal, and it’s largely in place (but not yet to 100 topics).

To some extent, “mistakes” are similar to “antipatterns,” and I’ll definitely overlap somewhat with some of the numerous lists of antipatterns that many people have published (inasmuch as I do, I might use the same concept, but my own examples and explanation, of course). But I also think of mistakes as something a little bit different from antipatterns; the latter mostly gets you “inelegant” rather than “broken” … but clearly it’s a continuum.

I think I’m almost happy with my “Basic” and “Advanced” Python mistakes. I’d like to include some more about testing and test-driven development. I’m opinionated that TDD is usually the way to go, or at least should be a big part of how one develops. But “Not doing TDD” is a bit too broad for a single mistake/solution. I’d also like to flesh out my ideas about “data structure mistakes”, but my current sketch is pretty good.

I’m also going to reach out a bit into numeric computing (i.e. NumPy/Pandas), but deliberately not venture as far as dataviz libraries, deep neural networks, possibly just one or two examples with scikit-learn, but also not much with web frameworks and such matters (maybe one or two “mistakes” where the web touches security though).

Obviously, if anyone has any ideas I use (or even ones I decide not to use, but are still smart), I’ll add you to the acknowledgements. Maybe suggest reviewers or manage to get a few free books into people’s hands. But not much more than that, and I’m not looking for an actual co-author on this project currently.

1 Like
  • Treating Python as <insert other language here>, eg defining a class and then filling it with getter/setter functions and maybe properties
  • Creating a file called re.py
  • Arguing about whether tabs or spaces are better
3 Likes

@Rosuav You’ve earned your acknowledgement. :slight_smile:

I think the “treating as other language” one is too generic for one of the 100, although several individual ones are basically examples of that, and I’ll put something like that in the intro. The tabs vs. spaces feels a bit too stylistic.

However, the local file with name conflicting with stdlib is a great mistake to discuss, and I’ll definitely include that.

As written, it is indeed very generic. Here’s a more specific version:

  • Treating Python as if it were Java: writing dozens of lines of boilerplate getter and setter methods in every class

You could (alternatively or as well) cite other languages, with other mistakes, but I think this one is probably the most common.

True, but what I said was a mistake was getting into arguments about the distinction :slight_smile: Dunno whether you want some of the more tongue-in-cheek mistakes.

I actually have a mistake in my list about using getters and setters already. So yes, that is a good example that I will include in the book.

1 Like

Not too sure what kind of “mistakes” you are looking for, but I would cite SQL injection attacks, as one area where it’s easy, for coders that are unaware of the way that a cursor.execute() statement should be formed, to get it wrong and open up the DB to a number of different exploits.

3 Likes

I’ve assumed that

with sqlite3.connect(..) as conn:
    ...

Would close the connection after the contextmanager… But it doesn’t! Recommend using contextlib.closing to ensure that.

This bit me by leaving a file write handle open, which messed with the app later on.

Another one I’ve seen first hand:

Do not call pip from inside your app to install dependencies.

Using a virtualenv, installing to it is usually much better. If you need to distribute without dependencies, consider pyinstaller, shiv, etc.

TBH I consider this a wart in the DB API; it shouldn’t be “with conn” to open/close a transaction, but “with conn.transact()” or something. That way, the thing you’re opening and closing IS the context manager.

@rob42 I like this one. I have a “Security” chapter among the mistakes, but had not included this. There’s a danger of not being Python specific enough in my “mistakes” … but in this case, I think using the DB-API to sanitize data is Pythonic, and should be mentioned (as opposed to f"INSERT (foo) VALUES ({bar})").

I’d like to put you in acknowledgements, if you want to tell me your full name for that at mertz@kdm.training, that would be cool.

You’re welcome and I’m pleased to have helped.

I’m not too sure of the policies or even if you would consider this: rather than individual acknowledgements (which could be dozens, given time) I propose that you simply ‘acknowledge’ the members of this forum, for their input; just an idea.

Its a book, not a Tweet, I’m sure they can find room to include a page of the names of contributors :slight_smile:

Dozens of acknowledgements may accumulate, especially if updated editions follow the original, but they shouldn’t prove too onerous to publish. The Contributor List in the second edition of Think Python by Allen B. Downey (2016), contains roughly 125 entries, including the printed version. I rather enjoyed seeing what each contributor offered, and was eventually inspired to submit at least one observation of my own in anticipation of the possibility of a third edition.

See Think Python 2nd Edition by Allen B. Downey.

Inspired by a post that I made, and corrected: a trap that is all to easy to fall into is using Python reserved words for ones own object names. Do you have that covered?

This manner of trap would include reusing names of built-in functions or existing types. A common example of a mistake of this type is the use of the name list for a variable that refers to a list, especially if it is in the global namespace. This often occurs among beginners, which brings up an issue related to this:

Would these “Basic” mistakes include ones predominantly made by beginners?

By “reserved words”, do you mean keywords, builtins, or standard library modules? You fundamentally can’t use keywords as names (pass = 1 will just error out), but the other two aren’t necessarily wrong. For instance, I have frequently used names like dir, id, cmd, and site as variable names; although it would be a more common trap if you were to use list or super in your own code. Maybe pick out a small handful of names that are actually likely to be a problem - although you’ll find disagreement on which ones are likely to be an issue. Personally, I’d say that list, re, json, and type are probably all within that zone of “plausible enough to use as your own variables, but also plausible that you’d want the original meaning”, but Python is namespaced for a reason, and it’s generally fine to use these sorts of names for your own purposes.

It could turn out that David has this covered already, but if not, then, the likes of input = [20, 14, 45, 78, 12] or list = [20, 14, 45, 78, 12] which will cause a builtins.TypeError: if the function is subsequently used without first issuing a del command, which will mitigate the error, but I’d say it is bad practice to do that, although I’m sure that some will, and are a liberty to, disagree with me on this.

So, really I mean any name clash (be that keywords, builtins, or standard library modules) that will pass a syntax check, but are as likely as not going to come back a bite the coder (a beginner, least ways), further down the road.

1 Like

This is called “shadowing”, as in your use of the name puts the original in the shade where the interpreter can no longer see it.

You can’t shadow keywords:

import = "something"  # SyntaxError

but you can import builtins, stdlib modules, and even your own globals with a local variable.

As the last example suggests, shadowing is not always bad. We have separate local and global namespaces so that locals can use names without caring that they already exist as globals or locals of other functions.

Shadowing of builtins is sometimes neutral, occasionally very useful, but is usually a mistake.

Shadowing of stdlib modules is almost always a mistake.

TBH I’d say there’s no difference between these categories. I’ve used variables called code and cmd plenty of times, not concerned that I’m shadowing a stdlib module. And does it count as “shadowing” when you say from pprint import pprint ? I’ve done that plenty of times.

Namespaces exist to be used. If shadowing were “almost always a mistake”, we’d just reserve names globally (which is the case for True and False, for instance). There’s nothing special about a standard library module that means its name can’t be used for anything else.

Also, do please let me know if you have the entire Python standard library memorized, such that you know any time you’re shadowing a module. There’s only, what, 200ish top-level modules, that should be fine right?

This will indeed be a book. So having hundreds of acknowledgements really wouldn’t be a problem. That said, Manning’s formal structure in the series is to have 100 mistakes, not some larger number. It’s possible that one mistake might merit more than one acknowledgement, but my feeling is that each one will require fewer than one (I do think of some stuff by myself).

My wonderful colleague and friend Allen Downey has a nice list. Many, many years ago, for a different book, I had this: https://gnosis.cx/TPiP/acknowledgments.txt. It’s not as long as Allen’s, but it’s a fair number. That said, my current thought is just to list some names in paragraph form without going into who contributed which idea.

@rob42 So it’s not technically possible to use reserved words for your own names. But it is possible to use built-ins, which is also often bad. I think I’ll tentatively add that idea, but might not keep it in the book. Using sum as a local variable is a genuine danger; using classmethod as a local variable feels like it’s almost surely malice rather than accident.