PEP 640: Unused variable syntax

I posted this to python-dev yesterday. I meant to post it here as well, but forgot.

One of the problems I have with the Pattern Matching proposal (PEP 622 originally, now PEPs 634, 635, 636) is the special-casing of ‘_’ to not actually assign to the name, which is a subtle but meaningful divergence from the rest of Python. In discussions with the authors I proposed using ‘?’ instead and extending that to existing unpacking syntax. The PEP authors were understandably a little hesitant to tack that onto their already quite extensive proposal, so I suggested making it a separate proposal. That proposal is at https://www.python.org/dev/peps/pep-0640, and also included below.

This proposal doesn’t necessarily require pattern matching to be accepted – the new syntax stands well enough on its own – but I’m recommending this not be accepted if pattern matching using the same syntax is not also accepted. The benefit without pattern matching is real but small, and in my opinion it’s not worth the added complexity.

The PEP (slightly edited for Markdown syntax):

PEP: 640
Title: Unused variable syntax
Author: Thomas Wouters thomas@python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 04-Oct-2020
Python-Version: 3.10
Post-History: 19-Oct-2020

Abstract

This PEP proposes new syntax for unused variables, providing a pseudo-name
that can be assigned to but not otherwise used. The assignment doesn’t
actually happen, and the value is discarded instead.

Motivation

In Python it is somewhat common to need to do an assignment without actually
needing the result. Conventionally, people use either "_" or a name such
as "unused" (or with "unused" as a prefix) for this. It’s most
common in unpacking assignments:

   x, unused, z = range(3)
   x, *unused, z = range(10)

It’s also used in for loops and comprehensions::

   for unused in range(10): ...
   [ SpamObject() for unused in range(10) ]

The use of "_" in these cases is probably the most common, but it
potentially conflicts with the use of "_" in internationalization, where
a call like gettext.gettext() is bound to "_" and used to mark strings
for translation.

In the proposal to add Pattern Matching to Python (originally PEP 622, now
split into PEP 634, PEP 635 and PEP 636), "_" has an additional
special meaning. It is a wildcard pattern, used in places where variables
could be assigned to, to indicate anything should be matched but not
assigned to anything. The choice of "_" there matches the use of "_"
in other languages, but the semantic difference with "_" elsewhere in
Python is significant.

This PEP proposes to allow a special token, "?", to be used instead of
any valid name in assignment. This has most of the benefits of "_"
without affecting other uses of that otherwise regular variable. Allowing
the use of the same wildcard pattern would make pattern matching and
unpacking assignment more consistent with each other.

Rationale

Marking certain variables as unused is a useful tool, as it helps clarity of
purpose of the code. It makes it obvious to readers of the code as well as
automated linters, that a particular variable is intentionally unused.

However, despite the convention, "_" is not a special variable. The
value is still assigned to, the object it refers to is still kept alive
until the end of the scope, and it can still be used. Nor is the use of
"_" for unused variables entirely ubiquitous, since it conflicts with
conventional internationalization, it isn’t obvious that it is a regular
variable, and it isn’t as obviously unused like a variable named
"unused".

In the Pattern Matching proposal, the use of "_" for wildcard patterns
side-steps the problems of "_" for unused variables by virtue of it
being in a separate scope. The only conflict it has with
internationalization is one of potential confusion, it will not actually
interact with uses of a global variable called "_". However, the
special-casing of "_" for this wildcard pattern purpose is still
problematic: the different semantics and meaning of "_" inside pattern
matching and outside of it means a break in consistency in Python.

Introducing "?" as special syntax for unused variables both inside and
outside pattern matching
allows us to retain that consistency. It avoids
the conflict with internationalization or any other uses of _ as a
variable
. It makes unpacking assignment align more closely with pattern
matching, making it easier to explain pattern matching as an extension of
unpacking assignment.

In terms of code readability, using a special token makes it easier to find
out what it means ("what does question mark in Python do" versus "why is my _ variable not getting assigned to"), and makes it more obvious that
the actual intent is for the value to be unused – since it is entirely
impossible to use it.

Specification

A new token is introduced, "?", or token.QMARK.

The grammar is modified to allow "?" in assignment contexts
(star_atom and t_atom in the current grammar), creating a Name
AST node with identifier set to NULL.

The AST is modified to allow the Name expression’s identifier to be
optional (it is currently required). The identifier being empty would only
be allowed in a STORE context.

In CPython, the bytecode compiler is modified to emit POP_TOP instead of
STORE_NAME for Name nodes with no identifier. Other uses of the
Name node are updated to handle the identifier being empty, as
appropriate.

The uses of the modified grammar nodes encompass at least the following
forms of assignment:

   ? = ...
   x, ?, z = ...
   x, *?, z = ...
   for ? in range(3): ...  # including comprehension forms
   for x, ?, z in matrix: ...  # including comprehension forms
   with open(f) as ?: ...
   with func() as (x, ?, z): ...

The use of a single "?", not in an unpacking context, is allowed in
normal assignment and the with statement. It doesn’t really make sense
on its own, and it is possible to disallow those specific cases. However,
for ? in range(3) clearly has its uses, so for consistency reasons if
nothing else it seems more sensible to allow the use of the single "?"
in other cases.

Using "?" in augmented assignment (? *= 2) is not allowed, since
"?" can only be used for assignment. Having multiple occurrences of
"?" is valid, just like when assigning to names, and the assignments do
not interfere with each other.

Backwards Compatibility

Introducing a new token means there are no backward compatibility concerns.
No valid syntax changes meaning.

"?" is not considered an identifier, so str.isidentifier() does not
change.

The AST does change in an incompatible way, as the identifier of a Name
token can now be empty. Code using the AST will have to be adjusted
accordingly.

How to Teach This

"?" can be introduced along with unpacking assignment, explaining it is
special syntax for ‘unused’ and mentioning that it can also be used in other
places. Alternatively, it could be introduced as part of an explanation on
assignment in for loops, showing an example where the loop variable is
unused.

PEP 636 discusses how to teach "_", and can simply replace "_" with
"?", perhaps noting that "?" is similarly usable in other contexts.

Reference Implementation

A prototype implementation exists at
https://github.com/Yhg1s/cpython/tree/nonassign.

Rejected Ideas

Open Issues

Should "?" be allowed in the following contexts:

   # imports done for side-effect only.
   import os as ?
   from os import path as ?

   # Function defined for side-effects only (e.g. decorators)
   @register_my_func
   def ?(...): ...

   # Class defined for side-effects only (e.g. decorators, __init_subclass__)
   class ?(...): ...

   # Parameters defined for unused positional-only arguments:
   def f(a, ?, ?): ...
   lambda a, ?, ?: ...

   # Unused variables with type annotations:
   ?: int = f()

   # Exception handling:
   try: ...
   except Exception as ?: ...

   # With blocks:
   with open(f) as ?: ...

Some of these may seem to make sense from a consistency point of view, but
practical uses are limited and dubious. Type annotations on "?" and
using it with except and with do not seem to make any sense. In the
reference implementation, except is not supported (the existing syntax
only allows a name) but with is (by virtue of the existing syntax
supporting unpacking assignment).

Should this PEP be accepted even if pattern matching is rejected?

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.


Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

3 Likes

Echoing this response from the the mailing list, I like to use _ as a prefix for named throwaway variables as a form of documentation.

Example:

scheme, _netloc, _path, params, query, fragment = urlparse(url)
instead of this–
scheme, _, _, params, query, fragment = urlparse(url)

That said, I don’t think that I would use this new syntax very much if adopted.

1 Like

This would make the proposal analogous to Rust’s convention—the compiler by default warns if any variable (including argument) is unused, unless you prefix it with an underscore. That is not as common in Python right now AFAIK, but not a bad idea in general and can be a reasonable recommendation for the community to adopt.

1 Like

I agree with sinply using the _XX convention. In the next step, might the bytecode compiler be taught to throw away unused variables immediately if they’re not used in the block they’re defined for?

If I understand well that is a backwards incompatible change that will break all current python code. Is that right? Is it fine to be included in minor version change? 3.9 -> 3.10

Sorry if I am not aware of the versioning scheme and policies of python or if I misunderstood the proposal.

If this proposal is accepted, then it would be new syntax that does
not break existing code, because the syntax chosen is not currently
legal:

a, b, ?, c = (1, 2, 3, 4)
      ^
SyntaxError

So nothing will break. But being a new feature it would only be
permitted in a new minor version, like 3.10 or 3.11, and not a bug-fix
release like 3.9.2 or 3.9.3.

That means that
a, b, _, c = (1, 2, 3, 4)

will keep working as well. I understood that ? would be the only valid syntax. Thanks for clarifying.