Introduce funnel operator i,e '|>' to allow for generator pipelines

Agreed, this is in no way acceptable final result of such endeavour, but can be a fairly good starting point.

This is what I have managed to pull together:

_ = Placeholder
pipeline = partial(opr.add, 1) -C- partial(opr.sub, _, 1)
pipeline(2)                                          # 2
2 |A| partial(opr.add, 1) -C- partial(opr.sub, _, 1) # 2
[1, 2] |AS| opr.sub -C- partial(opr.mul, 2)          # -2
[11, 3] |AS| divmod -CS- opr.mul                     # 6
[1, 2] |AS| opr.sub -C- split([opr.pos, opr.neg]) -C- sum  # 0

This is a working code.
Design and functionality improvements can now be addressed separately:

  1. partial improvements to specify positional order of inputs
  2. partial at parser level
  3. pipe implementation
  4. other useful utilities
  5. more convenient operators

As a demonstration Iā€™ve made my pipe compose functions instead when it is not yet given an object:

_NOTSET = object()

class Pipe:
    def __init__(self, obj=_NOTSET):
        self.obj = obj
        self.funcs = []

    def __or__(self, func):
        if self.obj is _NOTSET:
            if not self.funcs:
                self = Pipe()
            self.funcs.append(func)
            return self
        return Pipe(func(self.obj))

    def __ror__(self, obj):
        if self.funcs:
            for func in self.funcs:
                obj = func(obj)
            return obj
        return Pipe(obj)

    __call__ = __ror__

pipe = Pipe()

So with the using class I suggested in this post it can both perform immediate calls and compose functions for a later call:

'abcde' | pipe | batched << using(2) | map >> using(''.join) | list | print
# outputs ['ab', 'cd', 'e']

pairs = pipe | batched << using(2) | map >> using(''.join) | list | print

'abcde' | pairs
# outputs ['ab', 'cd', 'e']

pairs('abcde')
# outputs ['ab', 'cd', 'e']

Demo here

2 Likes

As a comment on readability, Iā€™ll point out that even though Iā€™m familiar with languages with pipe operator functions, and Iā€™ve been reading this thread, I have no idea what that pipeline function is intended to do (specifically what the point is of the initial lambda).

If I saw something like that in code I was reviewing, Iā€™d ask for it to be rewritten so the intent was clearer. Which suggests that the goal of more readable pipelines is not being achievedā€¦

2 Likes

I like using idea.

But I would use it with Placeholder for simpler implementation and not needing to use 2 operators. This way, one can just use mental model of partial.

Also, to keep Pipe simple, can use brackets.

from itertools import batched
from functools import partial, Placeholder as _


class using:
    def __init__(self, *args, **kwds):
        self.args = args
        self.kwds = kwds

    def __call__(self, func):
        return partial(func, *self.args, **self.kwds)

    __rlshift__ = __call__


class Pipe:
    def __init__(self, *funcs):
        self.funcs = funcs

    def __or__(self, func):
        funcs = self.funcs + (func,)
        return type(self)(*funcs)

    def __ror__(self, obj):
        for func in self.funcs:
            obj = func(obj)
        return obj

    __call__ = __ror__


'abcde' | (Pipe() |
           batched << using(_, 2) |
           map << using(''.join) |
           list |
           print)
1 Like

I was aiming to make the usage involve as few symbols as possible but yes, your version would make the implementation simpler.

I think this would cover 90% of cases.

What about this?

def final_func(a, b, *, c):
    return (a - b) / c

obj = 1
a = add(obj, 1)
b = add(obj, 4)
c = range(a, b)
d = list(c)
e = final_func(d[2], d[0], c=d[1])
print(e)    # 0.6666666

pipeline = ?
print(pipeline(1))    # 0.6666666

I also have reservations about readability. You remove some brackets but thatā€™s not much of a win when I canā€™t figure out what about 80% of the examples proposed on this page are supposed to do. And for all of the few that I can understand (or helpfully have the output included in the example), Iā€™m thinking wow, that would have been so much clearer as a comprehension loop!

The analogy to UNIX shells raises more questions than it answers. A shell pipe can only receive one input and itā€™s up to the receiving executable to decide whether stdin.read() is one object or a newline/null delimited array to loop over or an xargs style sequence of arguments. That concept doesnā€™t transfer to a language where functions take arbitrary arguments and keyword arguments.

FWIW, I should mention, a pipeline operator should be general, in that, not limited to generators. For example, I have a few microservices, that pass around different upload IDs (UUID). If there were a pipe operator I could use it like this:

path_to_image
|> upload  # returns uploadID
|> remove_spots(~, max_area=30)  # returns id after spot removal
|> convert_to_png  # returns id of the png image
|> ocr # returns text
|> retrieve(query, ~)  # returns relevant results

Now granted, I can do this with reduce:

reduce(
    lambda x, f: f(x),
    [
        upload,
        partial(remove_spots, max_area=30),
        convert_to_png,
        ocr,
        lambda text: retrieve(query, text),
    ],
    path_to_image,
)

With Placeholder:

_ = Placeholder
reduce(
    lambda x, f: f(x),
    [
        upload,
        partial(remove_spots, max_area=30),
        convert_to_png,
        ocr,
        partial(retrieve, query, _),
    ],
    path_to_image,
)

I find the pipeline operator to be much more elegant and readable.

Of course, there is the other version, which is the most readable of all:

upload_id = upload(path_to_image)
spotless_id = remove_spots(upload_id, max_area=30)
png_id = convert_to_png(spotless_id)
text_results = ocr(png_id)
answer = retrieve(query, text_results) 

But note, here I am having to come up with names for variables that are throwaway in this case. The other option is:

_ = upload(path_to_image)
_ = remove_spots(_, max_area=30)
_ = convert_to_png(_)
_ = ocr(_)
answer = retrieve(query, _) 

Now _= does behave like poor-manā€™s pipeline operator, but I when I look at this code, it makes me sad.

1 Like

If that is the issue, then I donā€™t think thereā€™s anything to solve. First, the current function chaining isnā€™t what youā€™re referring to, and second, if youā€™re converting a generator to a list, youā€™re misusing generators.

Iā€™m not understanding the other postsā€”what exactly is the problem being solved?

1. I remember someone earlier in this thread mentioned that if a callable is chained in the pipeline, it will be automatically called and a generator is expected to be returned.

I think this idea can be replaced by non-generator functions that handles one item at a time:

generator = range(3) | str
# Equivalents to `map(str, range(3))`

list(generator) # ["0", "1", "2"]

# or even:

generator = range(3) | (lambda x: x + 1) | print

list(generator) # [None, None, None]

# Side effect - prints:
# 1
# 2
# 3

Reason behind this is that, if a function returns a generator, it very likely needs additional arguments other than pipelined values. For example:

def add(b):
    a = yield
    while True:
        a = yield a + b

# Usage
range(3) | add(10) # 10, 11, 12

Hence it might be too wasteful to automatically call a callable just for omitting a pair of empty braces. Not to mention that a callable object can also be an iterable (supports __iter__) and a generator (supports send) at the same time.

2. Also, as pointed out in multiple posts, the funnel operator will return a generator, nothing will be executed unless the generator is (later) being iterated.

Solutions have been proposed by chaining either a list, set, dict whenever the pipeline is supposed to drain itself immediately. However, this semantic can also be used to convert each item into corresponding types (e.g. enumerate(range(3)) | list should return a generator that generates [0, 0], [1, 1], [2, 2] instead of immediately drain the pipeline and return [(0, 0), (1, 1), (2, 2)]. Therefore, a ā€œfinalizerā€ helper might be helpful:

range(3) | str | finalize(list) # Returns ["0", "1", "2"]
# Equivalents to `list(map(str, range(3)))`

I think there needs to be a clear separation between ā€œiterator pipingā€ and ā€œfunction compositionā€.

And leave such experimentations for 3rd party libraries. At least for now.

4 Likes

Great example, but I think the problem is that some people when looking at the various syntaxes proposed so far find them confusing because thereā€™s no clear indication that a pipeline pattern is about to follow an object being piped.

To improve clarity I think we can introduce a dedicated statement with a new keyword so its body consists of unmistakably call specifications rather than expressions. Something like:

pipe path_to_image:
    |> upload  # returns uploadID
    |> remove_spots(_, max_area=30)  # returns id after spot removal
    |> convert_to_png  # returns id of the png image
    |> ocr # returns text
    |> retrieve(query, _)  # returns relevant results
    => result # assigns the final returning value to result

And thereā€™s precedent in the match-case statement, in which Point(x, y) is not treated as a call to Point with arguments x and y but rather a specification of a match pattern, and where _ has special meanings.

Furthermore, following the logics of partial, we only need a _/Placeholder in the call specification only if it isnā€™t the last positional argument. As a toy example for easier illustration:

pipe 'abcde':
    |> batched(_, 2) # or batched(n=2) to avoid using a placeholder
    |> map(''.join) # no need for _ because the piped object follows ''.join
    |> list
    => paired # paired becomes ['ab', 'cd', 'e']
    |> print # possible to continue piping after an intermeidate assignment
1 Like

One disadvantage of scope syntax is it prevents us from using it in a lambda. For me thatā€™s okay, stylistically that should be avoided anyway. But the advantage (more on that later) is we donā€™t have to wrap it in parentheses to make it multiline, a formatter can ensure that when it is in a pipe block.

But I am starting to feel explicit call and placeholder should be mandated. For example, in map(''.join), although it is clear for me that it functions like a partial, map(''.join, _) may be more explicit. Which also brings it closer to structural pattern matching. And typing 3/4 extra characters is a small price to pay. Similarly, I am hesitant on the intermediate assignment syntax, even here we could repurpose as just like match-case:

pipe 'abcde':
    |> batched(_, 2)
    |> map(''.join, _)
    |> list(_) as paired
    |> print(_)  # or print(paired)

Where paired could be used nominally in subsequent functions as argument if needed. And when I use _ as a placeholder here, I donā€™t mean _ = functools.Placeholder but a soft-keyword (I think that is what you were suggesting).

Although => as the final assignment looks pretty, there isnā€™t much being added there to warrant a new token.

Some questions that come to mind:

  1. Is it okay, to unpack _? e.g. |> print(*_)
  2. Is it okay, to use it in fstring? i.e.
    |> print(f"paired = {_}") (if we hadnā€™t done an intermediate assignment)
  3. Can scope block be made atomic if needed? e.g. atomic pipe 'abcde'
  4. can the pipeline itself be assigned? e.g.
pipe as pipeline:
    |> batched(_, 2)
    |> map(''.join, _) as paired # what does intermediate assignment mean here?
    |> list(_) # return type inferred from here

pipeline(iterable1)
pipeline(iterable2)

First two seem fine to me. I am not sure about the feasibility of 3rd.
I am not convinced about 4 myself. I would prefer writing a wrapper function if I want to reuse a pipeline.

Speaking of advantages, as a consequence of using pipe as block, we donā€™t need ~. And if needed, |> could be dropped as well, any of |, >, >>, -> can be used to represent pipeline operator inside a pipe block.

pipe 'abcde':
    >> batched(_, 2)
    >> map(''.join, _)
    >> list(_) as paired
    >> print(_)

EDIT: scratch atomic idea, I donā€™t think it is possible. There could be arbitrary functions with side-effects.

1 Like

With current toolkit one can implement:

path_to_image | (pipe
    >> upload
    >> remove_spots@sub(_, max_area=30)
    >> convert_to_png
    >> ocr
    >> retrieve@sub(query, _)
) == result

The == operator cannot perform an assignment.

You are right, then would have to be this:

result = path_to_image | (pipe
    >> upload
    >> remove_spots@sub(_, max_area=30)
    >> convert_to_png
    >> ocr
    >> retrieve@sub(query, _)
)

Iā€™d rather use partial(remove_spots, _, max_area=30) instead of this trick. And TBH, I have quite a few codebases where I use this kind of pattern. But having to use object | (pipe | ...) or object | (pipe >> ...) leaves a bad taste. It works quite well when I have some object of my own, where I have overloaded __or__/__rshift__ and their siblings, to begin with. That is why having an operator/some mechanism that is applicable on naive objects seems so useful.

Furthermore, @blhsingā€™s version has a nice side-effect of intermediate assignments, which could be useful once in a while.

Yeah I believe mixing a pipeline with other expressions can easily make the code unreadable when the pipeline has different grammar rules.

I like the as idea to avoid spending a line just to name an intermediate value.

Agreed about mandating a placeholder when there are other arguments in the specification, but I think we can still allow the simplest use case to be written with a bare callable like list and print above because it would remain unambiguous where the piped object is placed when it is the only argument. But then when thereā€™s an as clause I do think a placeholder should be mandated to avoid list as paired looking as if weā€™re assigning list to paired.

Good idea about making intermediate variables immediately reusable for subsequent calls, as they should, with a subsequent call specification evaluated only after its preceding call returns.

In fact, _ can also be simply implemented as an intermediate variable storing the last returning value.

So yes, if _ is simply a normal variable storing the last returning value then it can be used in any expression like the two above.

I donā€™t see a need for a separate scope. The pipe statement should be more like match-case rather than def and class.

I agree. An alternative syntax that defines a pipeline function would be convenient but may make the statement too confusing as performing immediate calls within the current scope and defining a function with its own scope are two distinctly different operations.

Agreed that we donā€™t need to introduce new tokens within a dedicated pipe block. I still like | slightly better because I associate | with a pipe, and if bare callables are allowed in the simplest case as I suggested, the above would become:

pipe 'abcde':
    | batched(_, 2)
    | map(''.join, _)
    | list(_) as paired
    | print
1 Like

I am a fan of this idea! In fact i created an account to propose something almost the same thing. I would really like to see @ __matmul__ used for this. I think it would be fit into the language quite nicely being that the symbol is used for decorators already. Although I have no idea how prevalent the use of matrix multiplication is. Anecdotally Iā€™ve used it maybe 4 times in 6 years. I also think that | is already used extensively in the language. Standard bitwise or, merging dicts/mappings and type unions. I think it would be good to distinguish it with different syntax. Although it does remind me of piping and makes sense in that way.

I also really hope that unpacking is built into it as well. So that something like
(g @ f)(x) is equivalent to g(f(x))
(g *@ f)(x) is equivalent to g(*f(x))
(g **@ f)(x) is equivalent to g(**f(x))

and in my dreams this would also be true
(g ***@ f)(x) is equivalent to g(*y[0], **y[1]) # where y = f(x)
but thatā€™s a whole other topic I think.

Regardless Iā€™m quite excited about the prospect of this. Iā€™ll have to re-read through the whole thing more carefully when I have some more time.

edit: add brackets and inputs to the compose statements.

1 Like

+1 to using @ for function composition.
But i think thatā€™s a different topic.
f@g@h(x) :== f(g(h(x)))
Whereas this thread discussed the syntax
x|>f()|>g()|>h() :== h(g(f(x))) (and variations thereof)

1 Like