Introduce funnel operator i,e '|>' to allow for generator pipelines

Nodd · October 22, 2024, 4:52pm

Leaving aside the exact semantic, I don’t understand why a new operator is needed when >> exists ? It looks like the perfect symbol to indicate the flow of the data. The bar would be lower to introduce the functionality.

elis.byberi · October 22, 2024, 5:44pm

Printing the generator would only print the generator object itself:

def gen():
    for i in range(4):
        yield i


print(gen())  # <generator object gen at 0x7bd6b414e200>

I got confused by the |> pipeline operator and the term ‘generator’ used in your arguments. The UNIX pipe operator allows data to flow in chunks, while the pipe operator used in some programming languages simply chains functions together.

It seems that you want the UNIX pipe, but none of your examples process data in chunks.

So it is just chaining? In this case, it doesn’t matter what the input or output is. You can use a generator or any data type; the next function simply needs to know how to handle the argument.

def first_gen():
    for i in range(1, 5):
        yield i

def second_gen(gen):
    for value in gen:
        yield value * 2

def third_gen(gen):
    for value in gen:
        yield value + 1

# Function chaining using generators
result = third_gen(second_gen(first_gen()))

for val in result:
    print(val)

sirosen · October 22, 2024, 6:48pm

Yes, my suggestion for a new keyword bound to a specific context avoids the problem that any valid identifier… is a valid identifier.
So it answers the question of what this sample code does:

PIPE = "hi"
x = range(100) |> filter(lambda x: x % 2 == 0, PIPE)

by saying that PIPE may be a valid name in both contexts, but it only has its special meaning in one of them.

The alternative path is to define some new symbol whose name is not a valid identifier.

Just so it’s clear, I’m not devoting too much time and thought to this because I think it’s not likely to work well. If in a future Python version, a piping syntax appears, I’ll dutifully and happily eat these words though!

blhsing · October 23, 2024, 1:21am

Because if we are to implement the exact proposal of the OP’s, where the right operand is not a normal expression of an immediate call but rather a specification of a callable and its second argument and onwards, we need a new dedicated grammar for that to possibly happen. With an existing operator the right operand will be evaluated immediately as a normal call.

That said, I think the call specification syntax proposed by the OP is too easily confused with a normal call, e.g. it is not immediately clear from brackets |> windowed(2) that there is an implicit first argument to windowed and it looks as if windowed is called with an argument of 2 because that’s what windowed(2) normally represents.

zhangyx · October 23, 2024, 1:34am

Overloading bitwise or operator seems more elegant to me (as long as it does not break any existing code).

mikeshardmind · October 23, 2024, 2:22am

bitwise or would conflict with callable types being usable this way since | is used for unions of types.

blhsing · October 23, 2024, 2:41am

If you’re referring to the syntax I suggested above, it actually won’t conflict with the | operation for a type because the __or__ method of the left operand, which is a pipe object, would be successfully called so the __ror__ method of the right operand would not be attempted, so it doesn’t matter if the right operand is a type.

blhsing · October 23, 2024, 4:01am

Here’s a simple implementation of the syntax I suggested above.

Since in practically all use cases a piped object would be passed as either the first or the second argument I decided to simply use the >> operator to denote passing the object as a second argument to avoid specifying the position with a sentinel object:

class pipe:
    def __init__(self, obj):
        self.obj = obj

    def __or__(self, func):
        return pipe(func(self.obj))

class using:
    def __init__(self, *args, **kwargs):
        self.args = args
        self.kwargs = kwargs

    def __rlshift__(self, func):
        return lambda obj: func(obj, *self.args, **self.kwargs)

    def __rrshift__(self, func):
        first, *rest = self.args
        return lambda obj: func(first, obj, *rest, **self.kwargs)

Sample usage:

from itertools import batched

pipe('abcde') | batched << using(2) | map >> using(''.join) | list | print
# outputs ['ab', 'cd', 'e']

sayandipdutta · October 23, 2024, 7:41am

Since now functools.Placeholder is available, something like the following could be done:

from itertools import batched
Placeholder = object()  # meant to emulate functools.Placeholder

class Pipeable:
    def __init__(self, func):
        self.func = func
    def __ror__(self, value):
        return self.func(value)

class Pipe(Pipeable):
    def __call__(self, *args, **kwds):
        def _lambda(x):
            filled_args = [x if arg is Placeholder else arg for arg in args]
            filled_kwds = {k: (x if val is Placeholder else val) for k, val in kwds.items()}
            return self.func(*filled_args, **filled_kwds)

        return Pipeable(_lambda)

_ = Placeholder
range(5) | Pipe(map)(add, _, _) | Pipe(batched)(_, n=3) | Pipe(list)(_) | Pipe(print)
# NOTE: for single arg funcs (e.g. list, print), calling is optional
# prints:
# [(0, 2, 4), (6, 8)]

What would be nice is, if there were an operator, that enabled the following behavior:

Then that would enable the following:

range(5) |> map(add, _, _) |> batched(_, n=3) |> list(_) |> print(_)

Here _ represents functools.Placeholder. This would match the semantics associated with functools.Placeholder.

I would even go so far as to suggest there could be a placeholder symbol (say, ? or <>, or ~, or -), that would create implicit lambda/partial like functions, e.g. greater_than_5 = gt(~, 5), but that’s a separate discussion.

EDIT: On further thought, combination of |> and Placeholder isn’t flexible enough, because it cannot be reduced. With the Pipe construct, or with ~, one could use reduce like this: reduce(operator.pipe, [map(double, ~), sum(~)], initial=range(5))

dg-pb · October 23, 2024, 9:55pm

I would just like to point out that there is a fairly simple solution to this.

Although it would not be as convenient as new operator, but it is simple and works well.

Implement performant pipe and use it in conjunction with partial. Now as Placeholder exists it can satisfy most of variations.

pipeline = pipe(partial(map, partial(mul, 2)), partial(batched, n=3), list, print)
pipeline(range(5)
# [(0, 2, 4), (6, 8)]

I have been thinking about pipe function for a while now and I think it would be very useful.

I think it would be good to implement a minimal set of modular components for functional programming first before jumping to syntax changes and functional frameworks.

vovavili · October 24, 2024, 1:47am

dg-pb:

I would just like to point out that there is a fairly simple solution to this.

Although it would not be as convenient as new operator, but it is simple and works well.

Implement performant pipe and use it in conjunction with partial. Now as Placeholder exists it can satisfy most of variations.
pipeline = pipe(partial(map, partial(mul, 2)), partial(batched, n=3), list, print)
pipeline(range(5)
# [(0, 2, 4), (6, 8)]
I have been thinking about pipe function for a while now and I think it would be very useful.

I think it would be good to implement a minimal set of modular components for functional programming first before jumping to syntax changes and functional frameworks.

I think that would be the way to do it. While I love pipe operators in languages that support them, they were a head-scratcher for me when I was a total beginner. pipe or functools.pipe would be a lovely middle ground between Python’s English-like readbility and the ease of code readability you can get with |> .

blhsing · October 24, 2024, 1:52am

The | operator as a pipe becomes a second nature once you get to develop in a *NIX-like environment. Although Windows supports pipes in its command line too it comes with far fewer built-in tools to make pipes useful so it’s understandable why pipes may be less intuitive for beginners.

vovavili · October 24, 2024, 2:04am

I personally got introduced to Python before I got introduced to Linux/macOS shell scripts, and these days I can’t be the only one.

blhsing · October 24, 2024, 2:32am

dg-pb:

Implement performant pipe and use it in conjunction with partial. Now as Placeholder exists it can satisfy most of variations.
pipeline = pipe(partial(map, partial(mul, 2)), partial(batched, n=3), list, print)
pipeline(range(5)
# [(0, 2, 4), (6, 8)]

This can currently be done with reduce and a helper function like apply as below:

from operator import mul
from itertools import batched
from functools import reduce, partial

apply = lambda obj, func: func(obj)
pipe = partial(partial, reduce, apply)
pipeline = pipe((partial(map, partial(mul, 2)), partial(batched, n=3), list, print))
pipeline(range(5))
# [(0, 2, 4), (6, 8)]

mikeshardmind · October 24, 2024, 2:49am

You may want to find better motivating examples and work out full comparisons because this is easily going to devolve into “why would you add that” given comparisons possible to the existing methods.

from itertools import batched

print([*batched((2*i for i in range(5)),n=3)])

dg-pb · October 24, 2024, 3:09am

C pipe implementation would still be good as:
a) This solution will be fairly slow. E.g. using it as predicate to filter would be a significant bottleneck (well, not for everything, but when using operator functions on simple objects for sure)
b) it would be convenient to have an object, so that one can access component functions. E.g. pipe.funcs. (I have some use-cases in mind for this.)

blhsing · October 24, 2024, 3:12am

I think the “why” of this discussion has been pretty clear–that it is rather often that we see a pattern of applying a large number of functions to an object to arrive at a final result, but by nesting the calls it makes the order of function appearance opposite from the order of execution. With a pipeline pattern the order of function appearance aligns with the order of execution so the code is easier to read. That’s the main point. The toy examples are for easier demonstration of the various syntaxes that people have come up with, but for a good real-world example we can find it in @kalekundert’s post.

dg-pb · October 24, 2024, 3:52am

So one thing about straight forward piping that I don’t particularly like is that it doesn’t cover function composition, but only one-off piped call.

And to achieve function composition with this one would need to use lambda:

pipeline = lambda arg: arg |> func |> func2

Which is not very elegant. And given all the work that this would require somewhat disappointing outcome.

Alternatively, if to make pipeline composition to be the target, then it can be used in conjunction with infix operators.

Building on top of @sayandipdutta’s idea, pipe and infix operators:

from operator import add, sub

class pipe:
    """C-implemented pipe"""

class A:
    """User-defined infix-apply"""

class C:
    """User-defined infix-composition"""

class CC:
    """User-defined infix-composition that accepts sequence of positionals"""

def final_func(a, b, *, c):
    return (a - b) / c

# `partial` at parser level
# ~N indicates N'th positional argument will be placed there
add1 = func(add, ~0, 1)
print(type(add1))        # partial

# Usage
pipeline = add(~0, 1) |C| sub(~0, 1)
print(type(pipeline))    # pipe
pipeline(1)    # 1

pipeline = range(~0, ~0 + 3) |C| list |CC| final_func(~2, ~0, c=~1)
pipeline(2)    # 0.6666

2 |A| range(~0, ~0 + 3) |C| list |CC| final_func(~2, ~0, c=~1)    # 0.6666

Later on, could implement specialised operators, but I think it is a bit early for that.

dg-pb · October 24, 2024, 3:58am

Given pipe exists, one can already use the above for simple cases. (having to use partial explicitly)

P = functools.partial
_ = functools.Placeholder
pipeline = P(add, 1) |C| P(sub, _, 1)
pipeline(2)    # 2

2 |A| P(add, 1) |C| P(sub, _, 1)    # 2

blhsing · October 24, 2024, 4:21am

The | operator can just as easily be used on a Pipe object that does not make calls right away but rather compose the functions for a later call to __call__.

dg-pb:

Given pipe exists, one can already use the above for simple cases. (having to use partial explicitly)
P = functools.partial
_ = functools.Placeholder
pipeline = P(add, 1) |C| P(sub, _, 1)
pipeline(2)    # 2

2 |A| P(add, 1) |C| P(sub, _, 1)    # 2

This makes the usage less verbose but more cryptic with too many one-letter aliases IMHO.