Leaving aside the exact semantic, I don’t understand why a new operator is needed when >>
exists ? It looks like the perfect symbol to indicate the flow of the data. The bar would be lower to introduce the functionality.
Printing the generator would only print the generator object itself:
def gen():
for i in range(4):
yield i
print(gen()) # <generator object gen at 0x7bd6b414e200>
I got confused by the |>
pipeline operator and the term ‘generator’ used in your arguments. The UNIX pipe operator allows data to flow in chunks, while the pipe operator used in some programming languages simply chains functions together.
It seems that you want the UNIX pipe, but none of your examples process data in chunks.
So it is just chaining? In this case, it doesn’t matter what the input or output is. You can use a generator or any data type; the next function simply needs to know how to handle the argument.
def first_gen():
for i in range(1, 5):
yield i
def second_gen(gen):
for value in gen:
yield value * 2
def third_gen(gen):
for value in gen:
yield value + 1
# Function chaining using generators
result = third_gen(second_gen(first_gen()))
for val in result:
print(val)
Yes, my suggestion for a new keyword bound to a specific context avoids the problem that any valid identifier… is a valid identifier.
So it answers the question of what this sample code does:
PIPE = "hi"
x = range(100) |> filter(lambda x: x % 2 == 0, PIPE)
by saying that PIPE
may be a valid name in both contexts, but it only has its special meaning in one of them.
The alternative path is to define some new symbol whose name is not a valid identifier.
Just so it’s clear, I’m not devoting too much time and thought to this because I think it’s not likely to work well. If in a future Python version, a piping syntax appears, I’ll dutifully and happily eat these words though!
Because if we are to implement the exact proposal of the OP’s, where the right operand is not a normal expression of an immediate call but rather a specification of a callable and its second argument and onwards, we need a new dedicated grammar for that to possibly happen. With an existing operator the right operand will be evaluated immediately as a normal call.
That said, I think the call specification syntax proposed by the OP is too easily confused with a normal call, e.g. it is not immediately clear from brackets |> windowed(2)
that there is an implicit first argument to windowed
and it looks as if windowed
is called with an argument of 2 because that’s what windowed(2)
normally represents.
Overloading bitwise or
operator seems more elegant to me (as long as it does not break any existing code).
bitwise or would conflict with callable types being usable this way since |
is used for unions of types.
If you’re referring to the syntax I suggested above, it actually won’t conflict with the |
operation for a type because the __or__
method of the left operand, which is a pipe
object, would be successfully called so the __ror__
method of the right operand would not be attempted, so it doesn’t matter if the right operand is a type.
Here’s a simple implementation of the syntax I suggested above.
Since in practically all use cases a piped object would be passed as either the first or the second argument I decided to simply use the >>
operator to denote passing the object as a second argument to avoid specifying the position with a sentinel object:
class pipe:
def __init__(self, obj):
self.obj = obj
def __or__(self, func):
return pipe(func(self.obj))
class using:
def __init__(self, *args, **kwargs):
self.args = args
self.kwargs = kwargs
def __rlshift__(self, func):
return lambda obj: func(obj, *self.args, **self.kwargs)
def __rrshift__(self, func):
first, *rest = self.args
return lambda obj: func(first, obj, *rest, **self.kwargs)
Sample usage:
from itertools import batched
pipe('abcde') | batched << using(2) | map >> using(''.join) | list | print
# outputs ['ab', 'cd', 'e']
Since now functools.Placeholder
is available, something like the following could be done:
from itertools import batched
Placeholder = object() # meant to emulate functools.Placeholder
class Pipeable:
def __init__(self, func):
self.func = func
def __ror__(self, value):
return self.func(value)
class Pipe(Pipeable):
def __call__(self, *args, **kwds):
def _lambda(x):
filled_args = [x if arg is Placeholder else arg for arg in args]
filled_kwds = {k: (x if val is Placeholder else val) for k, val in kwds.items()}
return self.func(*filled_args, **filled_kwds)
return Pipeable(_lambda)
_ = Placeholder
range(5) | Pipe(map)(add, _, _) | Pipe(batched)(_, n=3) | Pipe(list)(_) | Pipe(print)
# NOTE: for single arg funcs (e.g. list, print), calling is optional
# prints:
# [(0, 2, 4), (6, 8)]
What would be nice is, if there were an operator, that enabled the following behavior:
Then that would enable the following:
range(5) |> map(add, _, _) |> batched(_, n=3) |> list(_) |> print(_)
Here _
represents functools.Placeholder
. This would match the semantics associated with functools.Placeholder
.
I would even go so far as to suggest there could be a placeholder symbol (say, ?
or <>
, or ~
, or -
), that would create implicit lambda/partial like functions, e.g. greater_than_5 = gt(~, 5)
, but that’s a separate discussion.
EDIT: On further thought, combination of |>
and Placeholder
isn’t flexible enough, because it cannot be reduced. With the Pipe
construct, or with ~
, one could use reduce
like this: reduce(operator.pipe, [map(double, ~), sum(~)], initial=range(5))
I would just like to point out that there is a fairly simple solution to this.
Although it would not be as convenient as new operator, but it is simple and works well.
Implement performant pipe
and use it in conjunction with partial
. Now as Placeholder
exists it can satisfy most of variations.
pipeline = pipe(partial(map, partial(mul, 2)), partial(batched, n=3), list, print)
pipeline(range(5)
# [(0, 2, 4), (6, 8)]
I have been thinking about pipe
function for a while now and I think it would be very useful.
I think it would be good to implement a minimal set of modular components for functional programming first before jumping to syntax changes and functional frameworks.
I think that would be the way to do it. While I love pipe operators in languages that support them, they were a head-scratcher for me when I was a total beginner. pipe
or functools.pipe
would be a lovely middle ground between Python’s English-like readbility and the ease of code readability you can get with |>
.
The |
operator as a pipe becomes a second nature once you get to develop in a *NIX-like environment. Although Windows supports pipes in its command line too it comes with far fewer built-in tools to make pipes useful so it’s understandable why pipes may be less intuitive for beginners.
I personally got introduced to Python before I got introduced to Linux/macOS shell scripts, and these days I can’t be the only one.
This can currently be done with reduce
and a helper function like apply
as below:
from operator import mul
from itertools import batched
from functools import reduce, partial
apply = lambda obj, func: func(obj)
pipe = partial(partial, reduce, apply)
pipeline = pipe((partial(map, partial(mul, 2)), partial(batched, n=3), list, print))
pipeline(range(5))
# [(0, 2, 4), (6, 8)]
You may want to find better motivating examples and work out full comparisons because this is easily going to devolve into “why would you add that” given comparisons possible to the existing methods.
from itertools import batched
print([*batched((2*i for i in range(5)),n=3)])
C pipe
implementation would still be good as:
a) This solution will be fairly slow. E.g. using it as predicate to filter
would be a significant bottleneck (well, not for everything, but when using operator
functions on simple objects for sure)
b) it would be convenient to have an object, so that one can access component functions. E.g. pipe.funcs
. (I have some use-cases in mind for this.)
I think the “why” of this discussion has been pretty clear–that it is rather often that we see a pattern of applying a large number of functions to an object to arrive at a final result, but by nesting the calls it makes the order of function appearance opposite from the order of execution. With a pipeline pattern the order of function appearance aligns with the order of execution so the code is easier to read. That’s the main point. The toy examples are for easier demonstration of the various syntaxes that people have come up with, but for a good real-world example we can find it in @kalekundert’s post.
So one thing about straight forward piping
that I don’t particularly like is that it doesn’t cover function composition, but only one-off piped call.
And to achieve function composition with this one would need to use lambda
:
pipeline = lambda arg: arg |> func |> func2
Which is not very elegant. And given all the work that this would require somewhat disappointing outcome.
Alternatively, if to make pipeline
composition to be the target, then it can be used in conjunction with infix operators.
Building on top of @sayandipdutta’s idea, pipe
and infix
operators:
from operator import add, sub
class pipe:
"""C-implemented pipe"""
class A:
"""User-defined infix-apply"""
class C:
"""User-defined infix-composition"""
class CC:
"""User-defined infix-composition that accepts sequence of positionals"""
def final_func(a, b, *, c):
return (a - b) / c
# `partial` at parser level
# ~N indicates N'th positional argument will be placed there
add1 = func(add, ~0, 1)
print(type(add1)) # partial
# Usage
pipeline = add(~0, 1) |C| sub(~0, 1)
print(type(pipeline)) # pipe
pipeline(1) # 1
pipeline = range(~0, ~0 + 3) |C| list |CC| final_func(~2, ~0, c=~1)
pipeline(2) # 0.6666
2 |A| range(~0, ~0 + 3) |C| list |CC| final_func(~2, ~0, c=~1) # 0.6666
Later on, could implement specialised operators, but I think it is a bit early for that.
Given pipe
exists, one can already use the above for simple cases. (having to use partial explicitly)
P = functools.partial
_ = functools.Placeholder
pipeline = P(add, 1) |C| P(sub, _, 1)
pipeline(2) # 2
2 |A| P(add, 1) |C| P(sub, _, 1) # 2
The |
operator can just as easily be used on a Pipe
object that does not make calls right away but rather compose the functions for a later call to __call__
.
This makes the usage less verbose but more cryptic with too many one-letter aliases IMHO.