Add None coalescing operator in Python

That’s fair. I’m not asking people to stop talking. All I’m saying is to be clear about what you’re expressing. Are you saying that you don’t like this syntax? Then say “I don’t like this syntax”. That’s a perfectly valid viewpoint - you don’t have to reword it as “this is more implicit” or “this is less readable” just to make it valid. Why can’t people express opinions as opinions instead of trying to appeal to facts that don’t exist?

Often, design considerations DO come down to whether people like something or not (although more commonly it’s the opinions of core devs rather than random people discussing on mailing lists, but still). So go ahead and say that you don’t like it.

BTW, it’s also perfectly reasonable to say “I love the idea of a None-coalescing operator, but I hate the x ?? y syntax”. Opinions are allowed to be complex :slight_smile:

Simply saying “I don’t like it” is not helpful to the audience because it doesn’t explain why I don’t like it.

On the other hand it’s helpful to elaborate that I find it less readable than “is None” so that the original proposal author gets some meaningful feedback.

To add another angle to the discussion:

I’m not sure whether the discussions have already mentioned this (the PEP 505 doesn’t list this), but SQL has a function called COALESCE(), which is commonly used for handling NULL values: see e.g. the PostgreSQL docs: PostgreSQL: Documentation: 16: 9.18. Conditional Expressions

Making this a builtin in Python would solve many of the situations listed in PEP 505 in an explicit and elegant way.

The function would also go beyond just checking one value for None. It returns the first non-None argument, so you don’t have to chain operators and you can use the builtin in a functional way with iterators.

Furthermore, we could optionally extend this to also accept N/A values (math.isnan()), empty strings, empty lists/tuples, etc. to address other areas where “this value is not available/usable” pops up. Here’s a sketch:

coalesce(*args, /, check=None, logic=False)

Return the first value from args which is not the check object (defaults to None). If check is a callable, return the first value from args which check(value) is False. logic may be set to True, to swap the test logic.

We could then add a few handy operators for the check function, e.g.

  • tuple.isempty() - check for empty tuples
  • list.isempty() - check for empty tuples
  • str.isempty() - check for empty strings
  • etc.
    or a more generic isemtpy() function, which check the length and the type of an object.

Other functions which come in handy as check function:

  • math.isnan()
  • math.isfinite(), with logic set to True
  • math.isinf()
  • len()
  • bool()
  • operator.attrgetter()
  • operator.itemgetter()
  • cmath.isnan()
6 Likes

Yeah, same syntax is in R/Tidyverse as well.

The limitation with COALESCE in Python (without additional language support) is that it does not lazily evaluate the arguments, which I imagine will be a necessary requirement.

If a special case is made in the Python language to lazily evaluate the arguments of COALESCE, it would be an interesting approach. It also opens the door as for whether to support lazy evaluation / short circuit in more contexts. (I feel a “macro” system is knocking at the door!)

What does “less readable” mean for you? Without understanding your personal sense of readability, it is difficult to know how to interpret your claims about readability.

On its own, “less readable” carries about as much information as “I don’t like it”. What makes it less readable?

  • Is it too verbose?

  • Too terse?

  • Too ambiguous? E.g. Python uses the * symbol for nearly a dozen different things, if we include the stdlib as well as the syntax itself.

  • Full of weird symbols that have to be memorised by rote?

  • Are the symbols visually hard to distinguish, e.g. in many fonts $ and S look very similar.

  • How does it compare to other syntax in Python?

Regarding the last point, is x + y less readable than add(x, y)? How about x**y compared to pow(x, y)?

If you answered “Yes”, then maybe you just don’t like symbolic operators, and prefer words, so of course you will dislike the ?? symbolic operator.

But if you find the + and ** operators more readable than the named function calls, and yet find the ?? operator less readable, that possibly means you are confusing familiarity with readability. You find + and ** readable because you are used to them, while ?? is unfamiliar.

Never underestimate the difference familiarity makes to readability. The first time I tried to read Python code, I found it an unreadable mess. It was full of weird symbols like [:] and {x: y} and I had no idea what was going on. Now I find Python so readable that every time I try to read code in another language, I cry :slight_smile:

My personal feelings are:

  • I think that dealing with None is a minor pain point. It would be nice to have a better (easier, more terse) way to deal with it.

  • I like the look of the ?? operator. It feels right to me, it’s not too weird, and is easier to remember.

  • I expect that as other languages introduce the same operator, it will get more familiar and more people will come to expect it.

  • I’m neutral towards the ?. and ?[] symbols. They don’t look as nice, but I can’t think of a better alternative.

  • I don’t think it is worth implementing just the ?? and not the other two.

So I guess that overall I’m positive towards the PEP.

3 Likes

Short-circuiting is natural for operators (as it already exists in or etc.) and useful for lazy evaluation in:
x ?? calculate_expensive_fallback().

Old broken x or default constructs can easily be fixed to x ?? default without introducing more bugs in the rewrite to coalesce(x, default) which requires editing in three places instead of one.

Chaining is much clearer and less error prone with operators:
(override ?? fallback).name ?? default
coalesce(coalesce(override, fallback).name, default)

The word coalesce is difficult to remember and spell.

A coalesce function can not replace ?. etc. I think?

A keyword-based coalesce operator x coalesce default could be plausible, but seems implausible for ?. etc.

Overall ?? wins IMO. (And is already more familiar from other languages.)

This cannot be done with a function, because function arguments have to be evaluated before the function is called. In other words, they are eagerly evaluated.

Like the ternary if, and and or, this has to be lazy and only evaluate the right operand if the left operand is None.

PEP 505 suggests three new operators, ??, “maybe dot” and “maybe subscript”. If you want to propose alternative spelling that doesn’t use a question mark, you need to propose an alternative for all three, not just ??.

Python uses words for and, or, and x if cond else y where a lot of C-like languages use symbols &&, ||, and cond ? x : y. So the ?? operator is a bit of an odd fit here. On the other hand, Python uses punctuation for . and [], so it makes full sense to go with ?. and ?[]. If someone wants to propose an alternate spelling for ??, go for it; personally, I can’t think of anything better, so the slight oddity won’t be all that bad. Also, there’s no concept of x and= y but we do have x *= y, so being able to write x ??= y is a win for the punctuation spelling.

That said, though, PEP 505’s semantics are NOT the same as the semantics in other languages, so this is still going to be something to learn, regardless of the spelling used.

By lack of readability I mean “??” is too terse. I prefer the more verbose (current) version.

The comparison to (x + y) is not helping because addition is a well known operator that every one learned in primary school.

While ?? and ?. is familiar in some other languages (I personally get to know them from C# in a “recent” edition, maybe C# 6?), I find the operation too specific to IT professional, and not friendly to the casual reader. I believe Python’s advantage (and beauty) lies in that one doesn’t have to be an IT professional to read and write Python (compared to e.g. C or JavaScript). I hope Python can keep this advantage in its evolution.

5 Likes

I think ?. and ?[] could be a parameter of sorts:

override = coalesce(override, fallback, atom=True).name.

Method chaining is a fair point. R solves this problem with a pipe operator %>% which does not exist in Python.

coalesce(override, fallback, atom=True).name %>% coalesce(., default)

Okay, finally something that can be reasonably debated!

So what you’re saying is that you prefer x if x is not None else y over x ?? y. That’s perfectly reasonable, but I disagree, partly because THAT much verbosity is extremely annoying, and partly because it forces you to write x twice - not a big deal if it’s a simple variable lookup, but it does make it harder to use when the left side is a function call or something.

Maybe, but I have seldom seen a programmer have trouble with extending that to x ** y, which I don’t recall learning in primary school (exponentiation was done with superscripts, but never a double asterisk), and even multiplication and division aren’t spelled the way I learned them in my youth (x * y vs x × y or simply xy). The “modulo” or “remainder” operator in programming, which varies in meaning from language to language, doesn’t really even exist in mathematics - but it’s not a problem to have x % y with an operator.

So I suspect the “familiarity” argument is far less about grade-school mathematics (which really only covers addition and subtraction), and more about what we’re accustomed to from other programming languages.

I’m not sure there’s as much difference as you might think. Simple features are pretty easy to use (“Python as a calculator” is a great tool - just fire up the REPL and type expressions to be evaluated, no programming knowledge needed), but to be able to read and write arbitrary Python code, you still need to be at least broadly familiar with a good number of concepts. The barrier-to-entry is notably higher in C (though I wouldn’t say it’s all that much higher in JS), but the upper reaches of the language are going to still need some programming skill. For instance, I wouldn’t expect a non-programmer to understand this:

await asyncio.gather(*[cancel_task(t) for t in tasks])

A None-coalescing operator wouldn’t be something that you need for Python-as-a-calculator, and it’s far FAR less to get your head around than all the concepts of async/await (and asynchronicity in general).

I would reword the strength you’re describing. Rather than being “one doesn’t have to be an IT professional to read and write Python”, I would say, instead, that “a non-programmer can become a Python programmer in less time” (than, say, a C programmer). If you take someone who isn’t a programmer (say, a research scientist) and invite him/her to learn some Python in order to be able to better analyze the raw data, how much time would that take? How many days of research get sacrificed to the initial learning process, in order to get this benefit?

Obviously it’s impossible to put a simple figure on this, as it depends on the person’s background and the level of code complexity needed, but I would say that Python still has a quite considerable advantage here - partly because of the immense expressiveness of the language. We are not restricted to just what we can intuitively understand from grade school; operators like matrix multiplication are utterly meaningless to someone who’s just finished fourth-grade arithmetic, but are incredibly useful to a scientist who expresses concepts in matrices because it’s the most natural form for them.

We have a symbol for matrix multiplication because it is useful, not because it is pre-known by every single potential programmer. A None-coalescing operator isn’t taught in grade school, but that doesn’t mean it’s not useful.

1 Like

Well, researchers in bioinformatics learn regular expressions for their research. Most people learn the basics of regex quickly and learn advanced regex concepts like lookahead assertions and backtracking shortly after. And yet, I would still consider regular expressions without f-strings and re.VERBOSE to be borderline unreadable write-only code. So ease of acquisition and readability are not necessarily coextensive.

1 Like

Agreed; a regex without re.VERBOSE is an exercise in compactness, but not particularly readable. That said, though, the expressiveness of a simple regex is quite good - it’s only really when they get overly complicated that they become hard to read, and that’s what re.VERBOSE is great at handling.

Is it more terse than **, //, <= and the other dozen or so operators we use in Python? Do you find them “unreadable” too?

If it is no more terse than the other operators you accept, then it isn’t the terseness that you object to. It must be something else.

How about operators like &, |, %, ^? I didn’t learn about bitwise operators in primary school, or even secondary school. They are twice as terse as the ?? operator. Do you dislike them twice as much?

Programming languages are not designed for the casual reader. The casual reader might, just barely, grok functions from maths class in school, but they won’t grok async, type declarations, globals and locals, closures, zip, map, regexes, Unicode, classes, imports, context managers, exceptions, etc.

The beauty of Python is that it is accessible to casual programmers. You don’t have to use null-coalescing operators any more than you have to write classes, or use closures, or use threads.

But we didn’t let those casual programmers stand in the way of Python getting classes, closures, threads, async, regexes etc. Let the casual programmers continue to write using the basic features, and the power users use the power features.

3 Likes

I feel like mathematical notation by convention is taught from very early on to be extremely terse, and bitwise operations are effectively mathematical binary operations with a modified syntax.

Are you saying that, because of that, mathematical operators are readable while terse, but other operators are unreadable while terse? If so, please explain to me the readability of the Willans formula for the nth prime number, which uses standard mathematical notation - plug in a number n and you get back the nth prime number, guaranteed! Try converting that into Python code and tell me whether it’s more readable in the terse mathematical form, or in a wordier form. And then, based on that, explain your above statement and how it affects a None-coalescing operator that has nothing to do with mathematics.

I don’t think there’s anything more going on here than the common phenomenon of familiarity. What you are already accustomed to is ALWAYS going to sit better in your brain than something you are not accustomed to. There’s nothing wrong with that phenomenon; just, please acknowledge it for what it is. Unless there really is something magical about mathematics, that’s not the factor here.

No, I feel like mathematical notation is unreadable by a very established convention that we are introduced to since early school arithmetic. This is why most people are often afraid of college level mathematics, and why the popular culture has so much respect for few physicists and mathematicians with natural mathematical talent, like Albert Einstein or Kurt Godel. Capital-sigma notation, for example, is pretty much nothing but a terse for loop that would be a two- or three-liner in Python if spelled out in a readable way.

A person who does not believe that mathematical notation is in any way sui generis is probably better off stating his case for Perl and Haskell than for Python.

So how far does that convention go?

  • Addition and subtraction x+y
  • Multiplication x*y
  • Division x/y
  • Modulo/remainder operator x%y
  • Exponentiation x**y
  • Bitwise operators x&y
  • Conditional operators x||y
  • None-coalescing operator x??y
  • Subscripting x[y]
  • Comparisons x == y
  • Assignment x = y

Please tell me which of these are part of the “very established convention”, which ones are reasonable extensions to that convention, and which ones are not. All of them use terse punctuation in Python. I would say that addition is the only one that really counts as part of the established convention, with all the others being some degree of extension from that.

But mathematics is all about those kinds of extensions. We can count; but reversing the effect of counting might get us to meaningless numbers, so we extend numbers to include zero and negatives. We can multiply; but reversing that can lead to non-integers, so we extend numbers to include rationals. We can square numbers, but finding the square root might not work in all cases, so we extend numbers to include complex numbers. The Reimann zeta function diverges unless x>1, so we use analytic continuation to figure out what its value should be for other x.

I’m not sure about you, but I never learned about “the modulo operator” in grade school. Does that mean it’s bad, and we should instead have a mod(x, y) function? No! It’s a very useful operator. I certainly didn’t learn about assignment or comparison operators in algebra, yet we absolutely would not want to get rid of those. They are extensions to that original “very established convention”, but that convention is already a massive pile of extensions upon extensions just to get us the concepts that we grew up with.

I am not saying that the notation that Python has is well established in mathematics, I am saying that terse notation in general is a well-established convention in mathematics. People are socialized from early on to express arithmetic operations concisely.