Reduction comprehensions

MarkRotchell · June 26, 2024, 5:32pm

Guido has talked before about his dislike of the functional-programming map, filter and reduce, prefering comprehensions. So [f(x) for x in L] instead of map and [x for x in L if <condition>] instead of filter.

He also has expressed a general dislike for reduce as confusing, especially given the need for a lambda, and relegating it to functools for python 3, suggesting it is better to use a loop with an accumulator.

I think there is still a case for reduction, but agree that reduce is ugly and confusing as it stands. I’d like to suggest that just as map and filter are more naturally expressed pythonically using comprehensions, so could reduction, by introducing a reduction comprehension:

So, taking a trivial example, instead of

acc = 0
my_list = [2, 3, 4, 5]
for n in my_list:
    acc += n

or the confusing and ugly

acc = functools.reduce(lambda acc, n: acc+n, my_list)

we could have something that looks a lot more pythonic:

acc=(acc+n for acc, n reducing my_list)

with the choice of () over other brackets because we are returning a single item as if from a single call to a generator expression. Obviously the keyword reducing could be something else. The word from would be nice, but is already taken.

I think this especially improves things when there is no natural “null” initial state for an accumulator (i.e. as in 0 for addition or 1 multiplication), in which case you need to start with the first item. Take for example wanting to join a list of dataframes together. We might do one of the below:

dfs = [df1, df2, df3, df4]

# slicing to pick the 0th and remaining items
acc = dfs[0]
for df in dfs[1:]:
    acc = acc.merge(df)

# create an iterator and use next
dfs_iter = iter(dfs)
acc = next(dfs_iter)
for df in dfs_iter:
    acc = acc.merge(df)

# pop the first item if willing to mutate dfs
acc = dfs.pop(0)
for df in dfs:
    acc = acc.merge(df)

# use reduce
acc = functools.reduce(lambda acc, df: acc.merge(df), dfs)

Alternatively, we could have an elegant reduction comprehension

acc = (acc.merge(df) for acc, df reducing dfs)

I don’t think this is a particularly more difficult concept to get your head around as a learner than the other comprehensions.

franklinvp · June 26, 2024, 6:31pm

It doesn’t look like that syntax solves the objection

… almost every time I see a reduce() call with a non-trivial function argument, I need to grab pen and paper to diagram what’s actually being fed into that function before I understand what the reduce() is supposed to do. So in my mind, the applicability of reduce() is pretty much limited to associative operators, and in all other cases it’s better to write out the accumulation loop explicitly.

map and filter look good with comprehensions because they are “building a set”. So, a syntax similar to the set builder notation, that many are familiar with, is easy to guess what it means, even without knowledge of Python.

But reduce produces a value. I find reduce(...) easier to understand than something that begins looking like a set builder notation, until at some point, possibly far inside the line, there is a key word reducing telling what it really was.

Speculating …

Something that makes reduce(...) “need to grab pen and paper” is that is there are many choices of what the reduction could have been defined to be doing; from the left, from the right, other, which is the initial element.

If that is the reason, it doesn’t look like a nice compact syntax will solve any of that. That can only be solved (I think) by being explicit, like in a FOR loop.

MarkRotchell · June 26, 2024, 6:56pm

He’s refering there to non-trivial functions. I’d suggest that, idiomatically at least, comprehensions are designed for simple computuations on their argments. We wouldn’t use a list comprehension for something complicated either - we’d use a loop. Obviously you can use a non-trivial pre-defined function or method, but essentially it’s the body of a one-line for loop, it can’t be that complex to understand.

I take your point on having to wait for the keyword until later in the line, however I’d point out that we already have to wait for an if in a comprehension to see that it’s a filter, not a map, and therefore returns potentially fewer items than the input.

kknechtel · June 26, 2024, 6:59pm

I see at least three fundamental problems here:

Reduction is rarely useful outside of computing a sum or product.
Reduction is generally hard to explain when it’s any more complicated than that.
Comprehension (generator) syntax is expected to produce a sequence (resp. iterable), not a scalar value.

mikeshardmind · June 26, 2024, 7:09pm

I don’t think reduction is hard to explain in most cases, and find it more useful than only things people think of as just quick math

from functools import reduce
from operator import getitem

def nested_getitem(d: dict, dot_delimited_key: str):
    return reduce(getitem, dot_delimited_key.split("."), d)

Now, there are plenty of reasons someone might write that differently, but that’s pretty clear to me. Everyone’s definition of “clear” is going to vary on things like this, especially for those not used to working with functional folds (reduce is a lazily accumulated left-associative fold).

franklinvp · June 26, 2024, 7:15pm

I think that one should read that “non-trivial” only as the negation of trivial, instead of as meaning “complicated”. In other words, as soon as the function is non-associative then one needs to have “inside knowledge” of the choices that could be specific to Python in defining reduce(f, L). This is, in which order is f going to consume L, which initial value is used if one if not given, where is output of f fed back to f?

franklinvp · June 26, 2024, 7:23pm

Genuine question, is that always the case? If one forgets or doesn’t know about Python, but has knowledge of functional programming, potentially in many other languages, is that the interpretation of reduce that people will assume?

mikeshardmind · June 26, 2024, 7:31pm

I think reduce’s documentation^[1] makes it’s behavior as that clear to anyone who would understand “lazily accumulated left-associative fold”, but there are other folds. Whether people would naturally realize this or not is not something I can weigh in on. I think it’s likely to be obvious in cases that python programmers would write using reduce that the order either doesn’t matter (such as with a sum of numeric values), or the order is clear from context.

The very first line of documentation for reduce reads: Apply function of two arguments cumulatively to the items of iterable , from left to right, so as to reduce the iterable to a single value. ↩︎

kknechtel · June 26, 2024, 8:14pm

I understand exactly what you mean, but I would strongly urge you to reconsider the target audience for Python. Very few people come to Python from a Haskell family language. (Or even a Lisp family language.)

JamesParrott · June 26, 2024, 8:20pm

I’m very open to the idea in principle, but I’m against the suggested new syntax (I’m sorry but I find it difficult to understand, and even messy).

All that is required to implement reduction comprehensions from generator expressions and assignment expressions, is more_itertools.last or tail(.., n=1).

A basic implementation of that is already in the docs:

import collections
def last(iterable):
    # Taken from tail https://docs.python.org/3/library/itertools.html#itertools-recipes
    return next(iter(collections.deque(iterable, maxlen=1)))

Then for example this is possible:

>>> x = 0
>>> last((x := x * 10 + y) for y in range(5))
1234

Rosuav · June 26, 2024, 8:41pm

Not Pythonic in my opinion, but even if it is, you could probably make yourself something that works this way. What’s the point though? reduce is often unreadable, not because of the function call, but because of the confusing information flow.

For this specific example, we already have a really good alternative: sum. It takes a perfectly ordinary list comprehension or genexp, and it gives you back the result of reducing it via addition. No, it doesn’t; it gives you back the sum of those values. You don’t think in terms of “reduce”, unless you have a massively warped brain ^[1]. If what you’re doing is more complicated, it can still be wrapped up the same way:

def product(items):
    ret = 1
    for item in items: ret *= item
    return ret

print(product(x + x + 1 for x in range(20)))

Don’t want to name the function? That’s fine too!

product = 1
for n in (x + x + 1 for x in range(20)):
    product *= n
print(product)

It’s more imperative that way, but still readable. (Or, of course, you could stick the expression into the assignment and skip the genexp, but that feels even more imperative, which may or may not be a good thing.)

Using map or filter with a lambda function is a code smell, but they still serve a very useful purpose when the function already exists, eg map(str, stuff) to stringify everything. And if you have that situation with reduce (which is FAR less common than map/filter), you still have that available in functools, and it’ll do that job just fine. But when you aren’t using a pre-existing function, what’s it gain you?

yeah, sorry Lisp programmers, I have huge respect for you but your brains are weird!! ↩︎

NeilGirdhar · June 26, 2024, 9:18pm

I totally agree with the main point that reduce is generally less readable because of the confusing information flow.

However, sum is slightly different from a reduction. I use reduce(add, some_iterable) instead of sum with Jax (which compiles code) because the sum produces 0 + x_1 + x_2 + ... whereas the reduction produces x_1 + x_2 + ..., which is slightly more efficient. Last I checked, the compiler doesn’t fold out the zero constant. I agree this is very niche.

If reductions were more common, this might be a more motivating idea. Maybe if you were adding map-reduce to Python. But I think even that’s not that common.

skirpichev · June 27, 2024, 3:33am

BTW, I think this was unfortunate decision. Argument “the applicability of reduce() is pretty much limited to associative operators” doesn’t make much sense for me. After all, Guido mention “few examples involving + or *”, but in may cases these operators aren’t actually associative (floating-point addition or multiplication, eh?).

It’s not a big deal to do “import functools”, but… From time to time people tell me " I don’t think most people know about functools.reduce".

MegaIng · June 27, 2024, 3:56am

Which was the goal: Hide away a function the he thinks should not be used.

This is somewhat similar to the discussions around multiline lambdas. It might be possible to come up with a syntax or move reduce back into the builtins, but there have to be really good examples that are fundamentally and obviously clearer when written in this form instead of as an explicit loop. Just because something can be written in one line doesn’t mean it should be. Almost always when I go for reduce I want to use mul or maybe or_ ^[1] from the operator module.

And often, reduce can be replaced by better apis. For example, the example in OP could ideally be written as

acc = dfs[0].merge(*dfs[1:])

or

acc = DataFrame.merge(*dfs)

Which would IMO be the cleanest of all solution.

Even in absent of that convenience ^[2], it can already be written as

acc = reduce(DataFrame.merge, dfs)

which IMO is the cleanest currently possible.

for set/dict union ↩︎
And yes, I know that this would be a breaking api change ↩︎

Rosuav · June 27, 2024, 4:12am

Outside of pathological cases, those ARE associative. And if you’re worried about pathological cases, you really should be using math.fsum instead, which is able to give an accurate result regardless of the order of operands.

MarkRotchell · June 27, 2024, 10:14am

This is clean for the simplest case, but quickly becomes less clean if, e.g. you wish to pass an extra argument to the function, e.g. on='key', you’d need to use something like

reduce(partial(DataFrame.merge, on='key'), dfs)
# or
reduce(lambda acc, df: acc.merge(df, on='key'), dfs)

with a reduction comprehension that would be expressed as

(acc.merge(df, on='key') for acc, df reducing dfs)

An alternative syntax would be to adopt a leading character before a comprehension in a similar way to how we prefix strings with an f or r, and have acc (or something similar) becoming a keyword representing the accumulator, so that we get

r[acc.merge(df, on='key') for df in dfs]

This could also be extended to the equivalent of itertools.accumulate with an a prefix so that

>>> a[acc+n for n in range(10)]
[0, 1, 3, 6, 10, 15, 21, 28, 36, 45]

Rosuav · June 27, 2024, 12:49pm

Mark Rotchell:

An alternative syntax would be to adopt a leading character before a comprehension in a similar way to how we prefix strings with an f or r, and have acc (or something similar) becoming a keyword representing the accumulator, so that we get
r[acc.merge(df, on='key') for df in dfs]
This could also be extended to the equivalent of itertools.accumulate with an a prefix so that
>>> a[acc+n for n in range(10)]
[0, 1, 3, 6, 10, 15, 21, 28, 36, 45]

But why?

A list comprehension looks like a list display, because it’s doing basically the same thing: constructing a new list. You can create it from a series of values, or you can create it from an existing list, and either way you start with the square brackets. Why should reduce/accumulate be an adorned list comprehension? How does this make sense?

MarkRotchell · June 27, 2024, 1:19pm

People coming from a mathematical background will obviously see comprehensions as based on set builder notation. But I’d argue the average python user sees them more as syntactic sugar for writing a simple for loop as an expression. This expands the universe of simple for loops that can be simplified in this way.

mikeshardmind · June 27, 2024, 1:45pm

I don’t think this is a great line of reasoning for adding syntax. I think you’re approaching this from justifying why such syntax could be teachable, but there’s not actually a strong motivator for this syntax. It’s also a sub-par line of reasoning to call this a parallel to existing comprehensions. Just because both involve iteration and simplified explanations of them may treat them as just a loop with a little extra doesn’t mean the differences are negligible or that it’s actually a good parallel.

I’m very glad to have reduce in python, but I don’t see a use case for reduction comprehension syntax. Ironically, despite disagreeing more generally, I think the only cases where something like this would be written are the cases where I would agree with the original quote which was the justification to move reduce out of the builtins.

It’s fine to write a loop for things like this, It’s also fine to pass a comprehension to functools.reduce I don’t think this simplifies anything, just shoves it into a single more terse statement that no longer has a direct parallel to either common functional or common imperative designs.

JamesParrott · June 27, 2024, 1:51pm

This expands the universe of simple for loops that can be simplified in this way.

That’s not neccessarily a good thing. Comprehensions of all types can easily become unreadable.

I’m not a fan of any of the three suggestions, but I favour for … reducing.

I wondered if we could mimick one of the better ideas of JavaScript, and give Sequence types, and even Iterators, a .reduce method. This reminded me Iterators, and all objects it seems, already have a .__reduce__ dunder method. So adding reduce to core Python terminology (instead of just letting it continue living happily in functools) could easily create confusion.