Has there been any discussion on deprecating the use of _ in the RHS?

NeilGirdhar · October 15, 2024, 1:04pm

While _ has a variety of uses:

often to discard unwanted return values:

_, x, y = f()

in match statements a catch-all:

match c:
    case _:
        ...

as an ordinary variable for, e.g., internationalization:

_ = gettext.gettext
t = _("lkjsdfa alskdjf lsdjf")

Discouraging the third case would

prevent bugs when interstitial code assigns to _ (to discard some value) thereby overwriting an important value stored in _.
prevent confusion with case _, which has nothing to do with the _ variable.
make code easier to understand (using _ can be very confusing considering that it’s now so rarely used in the RHS)

If we were to deprecate the third case, then

assigning to _ would not hold a reference to the variable, which could allow it to be deleted sooner,
it would reduce confusion when using Jupyter (is _ referring to the previous value or was it assigned to?)

Can we very slowly deprecate the third case? Maybe start by just:

encouraging linters like Ruff to nudge users away from reading from _,
allowing MyPy and other type checkers not to report errors when _ is assigned to with different types, and
changing the Python documentation to no longer recommending _ as the function for internationalization.

ericvsmith · October 15, 2024, 1:16pm

That would break a ton of code. I can’t imagine we’d ever do that.

chepner · October 15, 2024, 1:33pm

Your objection seems to be about explicit assignment to _, whereas Jupyter and the python REPL are special in implicitly assigning to _ after each expression statement.

NeilGirdhar · October 15, 2024, 3:04pm

Sorry if it wasn’t clear, but in the title RHS means “right hand side”.

I’m not proposing changes those implicit assignments, or even the use of _ in the RHS in the REPL.

Maybe not, but I’m not proposing deprecation today. I’m proposing three things (linter changes, type checker changes, and documentation changes) that don’t break any code. In ten years, we can see how much code still uses _ in odd ways?

barry · October 15, 2024, 3:27pm

Here, _ is just a function. It’s just a name binding in a namespace. There really is nothing special about it – it’s just a convention, originally inherited from GNU gettext and internationalization conventions for C. In fact, with a library like flufl.i18n you already don’t have to use it.

The _() function has two purposes; one is a callable that actually performs the catalog lookup and placeholder interpolation at runtime, and the other is to act as a marker for the off-line extraction of source strings. These days, standard gettext tools handle Python just fine^[1]. I haven’t looked in ages, but gettext probably supports customizable marker functions.

But kind of that’s all besides the point! _ is a perfectly valid, general purpose identifier in Python. It might have special uses or other conventions in other cases, but it really is just an identifier.

we used to have a pygettext tool to do that in the olden days ↩︎

NeilGirdhar · October 15, 2024, 3:39pm

Right, that’s the status quo today. My proposal is to suggest to linters that they nudge users away from that convention (e.g., tr is also a conventional name for the internationalization function), and to change the documentation to match.

oscarbenjamin · October 15, 2024, 4:05pm

The benefits that you suggest are predicated on eventually forcing all of the code that uses _() to stop using it. You can talk about doing it slowly or changing linters or docs etc but there is no benefit here until the eventual breaking change so that is part of the proposal or otherwise it is not worth changing anything.

NeilGirdhar · October 15, 2024, 4:18pm

Right, exactly. It’s a small cost today for no immediate benefit, and then one day there could be a larger cost for some benefit. What this proposal does is reduce that larger cost. After years of nudging, there may not be much code left that would be broken by such a change.

chepner · October 15, 2024, 4:57pm

I realize that, but your reasons for deprecating case 3 all apply to _ = gettext.gettext more than t = _(...). Did you mean to say “LHS”?

NeilGirdhar · October 15, 2024, 5:10pm

No. There’s nothing wrong with writing

_ = gettext.gettext

or any other RHS if you mean to discard it. My suggested warning would be for using _ in the RHS (as per the title of this post). For example:

x = _("lksjdf")  # Use of _ in RHS!

barry · October 15, 2024, 5:21pm

What about

print(_('This $item is $color')

?

NeilGirdhar · October 15, 2024, 5:35pm

As I said above, I think it would be nice to nudge people towards using names other than _ for cases like this. Seems like a code smell to use _ here when you could just use an actual word that’s:

easy to search for and easy to read,
won’t get clobbered by an unsuspecting person who writes to _ to discard values, and
won’t be confused with case _, which has nothing to do with the _ variable.

What’s wrong with:

print(f('This $item is $color')

if you must have brevity? Or decode would be even better.

ayhanfuat · October 15, 2024, 5:49pm

That’s a well established convention used in tons of places. Isn’t it a bit too much to suggest deprecating it just because it seems like a code smell to you?

NeilGirdhar · October 15, 2024, 5:55pm

I don’t think that’s a fair comment at all. I gave many reasons why I think deprecating would be beneficial well beyond personal preference:

I agree that it’s a historical convention for some very niche uses. I gave reasons why I think it’s bad code, and many reasons why deprecation would be beneficial in the long run.

ayhanfuat · October 15, 2024, 6:24pm

I gave many reasons why I think deprecating would be beneficial well beyond personal preference

Fair enough. But I still think these reasons are not even close to the bar required for deprecation.

assigning to _ would not hold a reference to the variable, which could allow it to be deleted sooner

Might just be me but I’ve never come across a situation where the variable takes too much space that it should be garbage collected right away but at the same time it can be deemed a throwaway variable and it is appropriate to discard it implicitly.

it would reduce confusion when using Jupyter (is _ referring to the previous value or was it assigned to?)

If the problem is Jupyter’s use of _ that’s still a problem. If there is a confusion, it is still a confusion. Is _ a throwaway variable or is it the output of the previous cell? How does this suggestion solve it? And where do we stop? Do we deprecate the names In and Out to? IPython uses them.

it would make code easier to understand (using _ can be very confusing considering that it’s now so rarely used in the RHS)

I think this is no different than saying it is code smell.

it would prevent bugs when interstitial code assigns to _ (to discard some value) thereby overwriting an important value stored in _.

That argument can be made for every possible variable name. If I import a name total from another module and assign to it by mistake there might be bugs, too.

it would prevent confusion with case _, which has nothing to do with the _ variable.

Very close to the second. If there is confusion it won’t remove it. _ is still a valid variable name. If I am confused about its usage deprecating _ in i18n is not going to help me because it can still be used. Maybe I am testing the case against the value that was supposed to be a throwaway value.

NeilGirdhar · October 15, 2024, 6:38pm

I didn’t propose immediate deprecation. This proposal is more of a 10-year-prelude to possible deprecation with some minor benefits.

If you look, that benefit is listed under deprecation. If _ cannot be used in the RHS (it’s essentially never a variable), then Jupyter would only ever show the previous cell’s value. There’s no _ variable to display.

It is a very well-established convention that _ is the discard variable. It is so well established that many linters and type checkers go out of their way to treat it as such. Thus, it’s particularly easy to assign to _ without checking if anyone is using it as a variable.

Even worse, if someone is using _ and you need a discard, then when you pick another assignment target, your linter may require you to silence its unused variable warning.

unused, x, unused = f()  # Linter warns that unused is unused.
g(x)

The idea would make it so that _ would not be used as a variable, which conceptually makes room for _ to mean fewer things.

I’m not sure what you mean, but it’s possible this is exactly the kind of confusion I’m addressing with that point. case _ does not “test the case against the value that was supposed to be a throwaway value”. It is the catch-all. E.g., what do you think this does?

_ = 42  # Here _ is a variable.
match 10:
  case _:  # What is it here?
    print("A")

ayhanfuat · October 15, 2024, 7:04pm

For that to work, Jupyter also needs to change its current behavior. Even then it wouldn’t solve the confusion because Jupyter is not the one that is confused, it is supposed to be the user. And if the user is currently confused about what _ is in Jupyter, there is nothing that would prevent them to be confused about it in the future. _ can still mean the throwaway value I assigned to a few cells ago or the output of previous cell. If I don’t know how Jupyter is handling it I still have no additional information.

It is a very well-established convention that _ is the discard variable. It is so well established that many linters and type checkers go out of their way to treat it as such. Thus, it’s particularly easy to assign to _ without checking if anyone is using it as a variable.

I am having trouble imagining a situation where I am in a codebase where _ is used for i18n, I am in a module that needs _ and imports it, I am even at the same scope that me shadowing _ can affect its other uses without me realizing it? How could that be possible? For example I am in a function where I am shadowing _ with a throwaway variable but I am also using _ for translation afterwards, in the same function?

_ = 42  # Here _ is a variable.
match 10:
  case _:  # What is it here?
    print("A")

Exactly my point. How is deprecating the use of _ in the RHS help with this? If I am confused about it, it is because I don’t know the rules. If you add another rule it doesn’t eliminate the confusion. That means one more rule to learn.

Rosuav · October 15, 2024, 7:07pm

case spam does not test against the contents of the variable spam either. What do you think this does?

spam = 42
match 10:
    case spam:
        print("A")

The match statement is not looking up variables, it is assigning to them. There’s a minor special case in that print(spam) would print 10, but print(_) would print 42; but that’s minor.

NeilGirdhar · October 15, 2024, 7:14pm

Like I said, this is listed as a benefit to deprecation. Under deprecation, Jupyter would change its behavior, so the user would not be confused since _ would always mean the same thing.

After deprecation, it can’t mean that.

When many people are working a project, it’s very common for one change to ignore the global situation. Or, for example, someone does a search and replace of verify_invariants() to _ = verify_invariants(). Etc. It’s not hard to clobber _.

Because after deprecation, _ is never a variable and there is no way for it mean 42. I’m not proposing “adding another rule”. I’m proposing removing one of the many meanings of _. _ would simply never be a variable.

Yes, I know how it works, but it seems that the person I was responding might not have.

ajoino · October 15, 2024, 7:29pm

I agree with Ayhan, I think the arguments are weak and I have never come across this as a problem.