Python has a foot gun with naked trailing comma. I got bitten by this again today:
url = invoice.url[len(prefix):],
The trailing comma shouldn’t be there, so url is now a tuple, and that fails much later as the url variable is passed around a bit before it’s being used.
In my time on the Unofficial Django Discord I see this mistake quite often. And very often it takes quite a while for anyone to notice what the problem is, even when several very senior developers are looking at a full traceback and the code.
I think naked trailing commas for 1-tuples should be deprecated and then changed to causing a syntax error directing the user to use the explicit (x,) syntax.
I think linters are the right solution here. And perhaps a corresponding pep8 recommendation? Something like “The form (x,) is preferred for 1 element tuples”. This would ensure that linters are soon updated accordingly.
Beginners are learning and there are many tools to help learn. IDEs often recommended to beginners can have built in linting and help such as PyCharm and VS Code.
Python as a language can’t anticipate every mistake a beginner might make and trying to code for them at the expensive of everyone else in the language doesn’t seem like a good trade off. Thus making a syntactically backwards incompatible change for this one case a beginner not using a tool to help them out doesn’t seem like a good trade off.
But perhaps you had though of some other solution?
In the sense that it doesn’t make a dent in the problem
Python as a language can’t anticipate every mistake a beginner might make and trying to code for them at the expensive of everyone else in the language doesn’t seem like a good trade off.
That’s not the case here though. I’m not suggesting a change that will make it easier for beginners and worse for experienced. I’m suggesting a change that makes it better for everyone. At worst neutral for experienced devs or those who use black on-save.
And I am still arguing for a, potentially very long, deprecation period. Could be a decade! But the future is long, so I think that’s fine.
Backwards incompatible changes to the language is worse for large code bases that use Python, and it’s caused a lot of pain in Python in the past, as it can significantly delay projects from upgrading to new versions of Python. And the issue is it can affect code they don’t control, as it the problem can arise in your dependencies or further downstream in transitive dependencies. Any benefit needs to be weighed against this.
One useful case for this is concisely unpacking an iterable into an ordered collection if you’re in an interactive Python session or feel like code golfing:
It is not just valid syntax, it is widely used syntax. I think that raising an error or even just a warning about it would lead to much more costs for testing, debugging and rewriting the existing code and adding wrappers for third-party code that is not updated as quickly. This would lead to chaos for several years.
Widely used? Do you have a source for that? It seems to me that since pylint warns for it, and black reformats it, it seems it should not be common at all.
And if it is in fact common, it’s always interesting to know how many % of those uses are latent bugs that someone either hasn’t been bitten by yet, or has worked around by adding a [0] somewhere else because they don’t know why they got a 1-tuple.
Hah. Googled a bit and found there’s a third option: a 1-tuple created by mistake and directly thrown on the ground, causing no issue. A majority of these for example: PYL-R1707 · Trailing comma tuple detected
I imagine it varies wildly by codebase.Some never use it, others don’t hesitate to.
The standard library has its share, I found 61 in 3.11. Some are innocious errors, but most are clearly intentional. None are obvious errors, though that can be hard to tell.
My script to find them:.
import tokenize, token, sys, glob
total_count = 0
for arg in sys.argv[1:]:
for fn in glob.glob(arg):
count = 0
prev_tok = None
try:
for tok in tokenize.generate_tokens(open(fn, encoding='utf-8').readline):
if tok.type==token.NEWLINE:
if prev_tok is not None and prev_tok.string==',':
print("%s:%s:%s: %s" % (fn, tok.start[0], tok.start[1], tok.line.rstrip()))
count += 1
prev_tok = tok
except UnicodeDecodeError:
print("couldn't read", fn)
else:
total_count += count
print(total_count, "total")
Note that this only looks for statement-ending commas, and will not find trailing commas in assignment targets.
I’ve used it quite a few times and wouldn’t like to lose it.
Some thoughts:
Parentheses for the 1-tuple would reduce consistency with the other lines:
foos = 1, 2
bars = 3,
quxs = 4, 5, 6
I couldn’t quickly out-comment like this anymore:
tests = test1, #test2, test3
If it applies to targets as well, I couldn’t do
for value, in query_results:
anymore and it would reduce consistency with loops like for x, y in points:. If it doesn’t apply to targets, then we lose consistency between targets and tuples.
I think all of those examples make reading your code unnecessarily more difficult. The for loop examples probably needs a comment to remind the reader that there’s a comma there.
I still sometimes forget to remove a trailing comma, not notice it, and try to figure out why something’s not working. So I don’t think blocking this syntax is “at my expense”. On the contrary, it protects code writers from these mistakes.
Blocking the syntax also protects code readers from having to make sense of code that doesn’t have clear comments.
I think I everyone would benefit from such a change in the long run. As others have mentioned, there would be a little bit of pain in the short run. But since many of the large codebases are linted, it’s not clear that there’s that much pain.