Introduce funnel operator i,e '|>' to allow for generator pipelines

OK, I see now. One problem with focusing on the convenience wrappers is that it makes it harder to evaluate the |> proposal on its own merits. If people only like it in conjunction with the wrappers, then you either need to include the wrappers in the proposal (scope creep) or the proposal risks failing from lack of support for its “bare” form. Personally, I’d prefer it if this topic were to evaluate the bare form of the proposal, as it was originally. Maybe put “add convenience wrappers” into a “deferred ideas” section of the PEP, but otherwise leave them out of consideration for now.

Right now, I’m struggling to understand whether the simple |> proposal is viable, because no-one is posting examples that use it without some form of extension or helper.

For example,

map(lambda x: x + 2, lst) 

gets translated to

lst |> partial(map, lambda x: x + 2)

That doesn’t feel like much of an improvement to me, but I can see that in some contexts it might be. I’d rather people focused on finding the cases where the improvement is compelling, and not on trying to improve “toy” examples like this by focusing on the lambda and the partial. Everyone knows that lambda and partial are clumsy. There have been many fruitless years of discussion about better syntax for them, and if a case for |> can’t be made without resolving those discussions, then it’s going to have a really hard time getting accepted.

All of which is to say, can we please focus on what |> has to offer without helpers or library code that experience has shown people simply won’t use?

3 Likes

I think it is because people are trying to cover the full scope to prevent deadlocks in future developments and bad decisions to be made at this early stage. (at least this is what I was doing up to now)

My speculatory guess is that distribution of use cases would be as following:

  1. ~50% of operands would be vanilla
sequence |> list
pairs |> dict
iterable |> sorted |> enumerate

Also, real-life use case in deep learning: Introduce funnel operator i,e '|>' to allow for generator pipelines - #131 by sayandipdutta

  1. ~20% would need current partial
sequence |> partial(map, ''.join)
func |> partial(filter, _, [0, 1, 0, 1])
  1. ~10% would need partial extensions
1 |> partial(lambda x: x, x=_)
[[0, 1], [2, 3]] |> partial(zip, *_)
{'a': 1, 'b': 2} |> partial(lambda a, b: a + b, DblStarPlaceholder)
  1. ~10% would be method calls and attributes

These can be done with simple utility.

'a' |> X.upper() |> X.replace('a', 'b')
  1. 5-10% could be something that has not yet been thought of

Thus, |> with partial extension would potentially cover a significant application scope - ~80%. But better analysis is needed.


Maybe it would not offer all of the conveniences straight away which are potentials to be done later, but I think simple binary operator |> implementation has an actual chance to go through while aiming for something bigger is a commitment for a very long ride and endless discussions without any results any time soon, which I fear would be at significant risk of ending up like: PEP 505 – None-aware operators | peps.python.org


It is unlikely that |> operator syntax is going to be useful for anything else than this.

And simple binary |> does not introduce any new paradigms, constructs, new ways of thinking.

Thus, the simplicity of such proposal might actually have a chance to materialise in the near future.

1 Like

I think the most attractive thing about piping is ability to see the flow clearly:

result = (
    sequence
    |> partial(batched, _, 2)
    |> zip(*_)
    |> list
)

As opposed to:

result = list(zip(batched(sequence, 2)))

So although making such as convenient as possible is indeed a good idea, but even if it is not perfectly convenient and “clumsy” partial and lambda need to be used for the time being (until more convenient ways are invented), it does provide the main benefit.

1 Like

Toy examples are utterly unconvincing here, as sequence |> list provides no benefit over list(sequence). To demonstrate any value for these cases, real-world examples are needed. The one from @sayandipdutta is a good example here. I’ve not seen code like that myself, so maybe it’s common in neural network applications, but not elsewhere? More information (and examples) would be useful.

Again, toy examples don’t demonstrate much. rewriting without |> or partial is generally simpler and clearer.

Same. And even more so, as we don’t currently have those extensions.

But all the evidence suggests that people wouldn’t write those utilities.

But by your own argument, |> on its own covers ~70%. That extra 10% simply isn’t worth complicating the proposal, when partial extensions can be covered in a separate PEP (and the failure of such a PEP wouldn’t stop |> being accepted).

This I agree with, which is why I’m so confused that you seem to continually be promoting extensions and helpers rather than focusing on the core proposal.

That’s irrelevant, as the fact that we can use it for this, doesn’t imply that we should.

The original proposal isn’t for a binary |> operator, though. It’s for |> as syntax, explicitly not a “normal” operator. No-one has yet really explored the implications of that, though, and honestly I think that’s far more of a problem with the original proposal than trivia like whether partial is sufficiently powerful.

Only if we keep the discussion focused on that proposal, and stop getting side-tracked by unnecessary extra features.

This, I agree with 100%. But if what we have to do to make the individual functions suitable for a flow-based expression is too clumsy, the clarity of the flow is lost. And whether we like it or not, current Python is not designed around anonymous functions and composition.

It’s like the “fluent” approach - methods that modify the object and return it - that is natural and popular in Javascript. It’s used sometimes in Python, but it can feel out of place and awkward, because Python has a design principle that mutating methods return None. So what works in Javascript doesn’t translate well to Python. At the moment, flow-based expressions feel similar - a nice idea that works well in languages designed around it, but potentially an awkward fit for Python.

@jamsamcam this was originally your idea, and @sadaszewski you provided the reference implementation that restarted the discussion. Do either of you want to distil things down to an actual proposal, and if so, do you want to include the various “extras” being discussed, or would you prefer to keep things to the basic |> construct? I think we’re at a point here where we need an actual proposal (even if it’s not yet a formal PEP), along with a clear owner (or owners) for that proposal to keep the discussion on track. Otherwise, I’m pretty sure things will simply keep on getting debated with no real progress towards a solution.

Personally, I’m sympathetic to the idea, but only if it’s kept focused and clear, and shows real potential for benefits in actual code. I don’t think that abstract “functional is good” principles are enough here, and I’m concerned that as I said above, function pipelining might not be a good fit for Python as it exists at the moment. A formal proposal would help address these concerns.

5 Likes

I would more than happy to with the input of others, allow me to write a more more structured informal proposal here after this response and hopefully others can chime in with things that could be improved. Hopefully this can be submitted as a formal proposal, but as I said this would be my first time submitting something.

I think I will mention some of these other proposals and discussions as a interesting side-note but otherwise it won’t be considered a part of the proposal. because whilst I still think this operator is useful and can be justified on it’s own merit.

I feel that without additional context, it will likely turn into a discussion about “why do we need this ? when we can just do X”, so even if none of these side discussions bare fruit they could illustrate how it could evolve in the future or more clearly what it’s trying to solve or why we couldn’t go down a certain route.

Does this make sense ?

Indeed - but then maybe we could aim for a stdlib addition of something similar - maybe even via PEPifying a standard behavior, implementing that in a published lib, and them go for adoption.

It turns out I am halfway across a “Pipe” class, which emerged here a couple months ago - but aiming at different things: a way to control(and throtle) the parallelization of asyncio tasks -
The proponent (@missing-stardust at Add task pipeline to asyncio with capped parallelism and lazy input reading ) problem was not about having a nice syntax - but actually a nice pattern to do that,

It turns out I have a nive “utility box for asyncio” package that is a lot less toyich than that linked code, meaning: this one is published, and I’d use it in prod: GitHub - jsbueno/extraasync: Extra utilities for Python asynchronous programming

So, after some invitation, Matin did contribute his code, I started the fiddling around it, got his contributions - and them had to pause due to (life) - I came back a couple weeks ago, and decided for a different approach than his initial nested-function based one. And, having a class for the Pipe will allow me to do all the niceties customizing (existing) operators such as seen in this thread.

One of Martin’s core ideas is to be agnostic to the source being either a sync or asynchronous generator. And them, comes one pattern he found missing: the ability to parallelize / throttle the item generation in this source itself, which is orthogonal to having operators.

(Indeed, Python can always build a readable, very nice pipeline just adding a sequence of callables as args to an orchestrator - readable enough, without even needing the operator overriding part)

Feel free to join the discussion in the project there - I will likely resume working on it this week.

1 Like

I want to make another implementation addressing some of the requests (chaining with method calls, using the placeholder arbitrarily (i.e. in any valid Python syntax - how I understood your suggestion @pf_moore and also how I did this initially although the implementation was not good for technical reasons), etc.). After the above discussion, all things considered, I am leaning towards treating it as new syntax rather than as an operator. Some things will never be possible to achieve if we treat it as an operator and we have not touched even once upon the innovative uses that implementation as BinOp with magics is supposed to enable. All I am hearing is that we can craft workarounds if it is implemented as such an operator. The argument that we need to propose as little as possible because more comprehensive changes could be rejected is not valid, since as you said - without those “extras” the solution essentially does not offer anything useful at all. We need certain coverage and elegance to make this happen - I second this line of thinking. I’d definitely like to write this up when I have the learnings from the new implementation. All help is welcome.

OK, that’s fair. If you haven’t been able to find compelling cases for the simple “|> only” proposal, then I agree, it’s not worth pursuing. However, I’m concerned that by treating this as syntax and adding special constructs for placeholders and partials, it will look like an embedded “specialist language” within Python, and that might be hard to sell, as well.

But I’ll reserve my opinion until there is an actual proposal on the table.

One thing that will need to be resolved is the question of whether this is an operator or syntax. If the two of you don’t agree on that basic point, we essentially have two independent proposals.

I think it may be helpful to look at how the current implementation by @sadaszewski looks in real* code. I built cpython from @sadaszewski 's repo at df9e6f. Here is a solution to part 1 of Advent of Code 2023 day 2 challenge.

from collections.abc import Sequence
from itertools import starmap
from operator import attrgetter, le
from typing import NamedTuple
from functools import partial


class CubeSet(NamedTuple):
    red: int = 0
    green: int = 0
    blue: int = 0

    @classmethod
    def from_str(cls, segment: str):
        return (
            segment.strip().split(", ")
            |> map(partial(str.split, maxsplit=1))
            |> starmap(lambda amount, color: (color.strip(), int(amount)))
            |> dict()
            |> cls(**_)
        )

class Game(NamedTuple):
    AVAILABLE_CUBESETS = CubeSet(red=12, green=13, blue=14)

    Id: int
    cubesets: tuple[CubeSet, ...]

    @classmethod
    def from_str(cls, line: str):
        left, right = line.split(":", maxsplit=1)
        _, game_id_str = left.split()
        cubesets = right.split(";") |> map(CubeSet.from_str) |> tuple()
        return cls(Id=int(game_id_str), cubesets=cubesets)

    def is_playable(self):
        return (
            zip(*self.cubesets)
            |> map(max)
            |> zip(_, Game.AVAILABLE_CUBESETS)
            |> starmap(le)
            |> all()
        )


def part1(text: str) -> int:
    return (
        text.splitlines()
        |> map(Game.from_str)
        |> filter(Game.is_playable)
        |> map(attrgetter("Id"))
        |> sum()
    )

if __name__ == "__main__":
    text = """\
Game 1: 3 blue, 4 red; 1 red, 2 green, 6 blue; 2 green
Game 2: 1 blue, 2 green; 3 green, 4 blue, 1 red; 1 green, 1 blue
Game 3: 8 green, 6 blue, 20 red; 5 blue, 4 red, 13 green; 5 green, 1 red
Game 4: 1 green, 3 red, 6 blue; 3 green, 6 red; 3 green, 15 blue, 14 red
Game 5: 6 red, 1 blue, 3 green; 2 blue, 1 red, 2 green"""
    print(part1(text))
How this looks currently

from typing import NamedTuple

class CubeSet(NamedTuple):
    red: int = 0
    green: int = 0
    blue: int = 0

    @classmethod
    def from_str(cls, segment: str):
        cubestrs = segment.strip().split(", ")
        splitted = (cubestr.split(" ", maxsplit=1) for cubestr in cubestrs)
        return cls(**{color: int(amount) for amount, color in splitted})


class Game(NamedTuple):
    AVAILABLE_CUBESETS = CubeSet(red=12, green=13, blue=14)

    Id: int
    cubesets: tuple[CubeSet, ...]

    @classmethod
    def from_str(cls, line: str):
        left, right = line.split(":", maxsplit=1)
        _, game_id_str = left.split()
        cubesets = tuple(CubeSet.from_str(item) for item in right.split(";"))
        return cls(Id=int(game_id_str), cubesets=cubesets)

    def is_playable(self):
        max_cubes = (max(color) for color in zip(*self.cubesets))
        return all(
            needed <= available
            for needed, available in zip(max_cubes, Game.AVAILABLE_CUBESETS)
        )

def part1(text: str) -> int:
    games = (Game.from_str(line) for line in text.splitlines())
    return sum(game.Id for game in games if game.is_playable())

Regarding this:

Here’s a usecase I have, which I shared earlier.

But, yes, this pattern is quite common in data science and AI workflows, where you would like to build pipelines to morph the data in various ways, and you don’t care about the intermediate results.

It is indeed. Personally, I don’t like the rewritten version, it’s less clear to me what’s going on, and crucially, the lack of named intermediate values makes it much more difficult to understand. If I were asked to maintain this code, I’d find the non-pipelined version far more maintainable. Of course, that’s a matter of preference, rather than saying there’s anything objectively wrong with the pipelined version.

I’ll note that this requires the placeholder syntax for partial. I have all sorts of reservations about that (for instance, is remove_spots(~, max_area=30) a valid way of writing partial(remove_spots, max_area=30) outside of a |> construct?) but even using partial it’s still a useful example.

Thanks. I think I’ll take the question of “will this be useful in real world code” as answered in the affirmative at this point. But I will also point out that realistic examples like yours are far easier to discuss than the “toy” examples being used in a lot of posts.

2 Likes

I associate this more closely to an implicit lambda, but in principle, yes, that was the intention. I had thought about posting this as a standalone idea, but I couldn’t make my mind up on some of the questions that arose. So I feel I understand some of your reservations.

Here are few of the open questions I had regarding implicit lambda/partial
  1. For example, I couldn’t figure out how the precedence/binding/tokenization would work. So if I wrote x = map(~ < 5, [1, 2, 3]), does this get translated to x = lambda arg: map(arg < 5, [1, 2, 3]) or x = map(lambda arg: arg < 5, [1, 2, 3])? I want the latter, but I don’t know how it would be possible for the parser to know that, unless it has a semantic like := operator, where you must parenthesize, when ambiguous. And the innermost parens define the expression.
  2. What about more than one arguments? strict_zip_two = zip(~, ~, strict=True); strict_zip_two(a, b). Should this be allowed?
  3. What about this: (print(~{0}, ~{2}))(a, b, c) which prints a and c?
  4. It is impossible to correctly infer type of something like (~.some_method()). This restriction is applicable to lambda as well, but if the newer proposal doesn’t solve things like this, is there enough benefit to justify adding this much complexity to the language?
  5. What to do about the following: a = (~(y) + ~(z))? Is it a = lambda f: f(y) + f(z) or a = lambda f1, f2: f1(y) + f2(z) or a plain-old bitwise-not? To rule out bitwise-not, it will probably have to be a = ((~)(y) + (~)(z)). If I saw this, I will scratch my eyes out.

There are many more questions. I won’t enumerate all of them here, since this is not the thread for that. If it is decided that |> operator needs such a partial/lambda shorthand, we can then delve deeper on these issues. All this to say, although I would like this syntax to be useful on its own, it raises too many problems in practice.

1 Like

Yup, this is one of the reasons (among others) why I made effort to convey possibility of more modular approach.

Well, if some enclosed DSL will be made for piping, partial will inevitably will have to be upgraded to match the functionality of it (even if not with the same syntax). It would be really awkward if some enclosed DSL was providing functionality which is not available outside of it.

Thus, if new DSL will not be making use of existing partial and partial available within DSL will not be available outside of it, then this will potentially introduce situation where 2 independent partial functionalities will need to be kept in sync.

Here is one from me:

from more_itertools import ilen
class rpipe: "feed pipe (same as |>)"
class P: "partial convenience"
class Filter: "walk results filter"

nlines = (
    rpipe(os.getcwd())
    | os.walk
    | Filter.exts("py")
    | map@P(open)
    | map@P(ilen)
    | sum
).value

With addition of |> it would look like:

nlines = (
    os.getcwd()
    |> os.walk
    |> Filter.exts("py")
    |> map@P(open)
    |> map@P(ilen)
    |> sum
)

With DSL proposal it would/could look like:

nlines = (
    os.getcwd()
    |> os.walk()
    |> Filter.exts("py")
    |> map(open)
    |> map(ilen)
    |> sum()
)

With statement:

nlines = pipe:
    os.getcwd()
    |> os.walk()
    |> Filter.exts("py")
    |> map(open)
    |> map(ilen)
    |> sum()

And without piping:

nlines = sum(map(ilen, map(open, Filter.exts("py")(os.walk(os.getcwd())))))

And without piping and not in one line:

walk_it = os.walk(os.getcwd())
py_paths = Filter.exts("py")(walk_it)
files = map(open, py_paths)
line_lens = map(ilen, files)
nlines = sum(line_lens)

P.S. yes yes, opened files are never closed, but this is kind of the whole program so it terminates immediately.

nlines = sum(ilen(p.open()) for p in Path.cwd().rglob("*.py"))

Or my preference (because naming things is better for maintainability):

def line_count(p: Path) -> int:
    with p.open() as f:
        return ilen(f)

nlines = sum(line_count(p) for p in Path.cwd().rglob("*.py"))

(Seriously, do you have an actual script where you use that rpipe version rather than the pathlib approach above? I find that a little surprising…)

Many of these examples feel like they are being forced into the “pipeline” format when there are perfectly readable approaches already available. For me, they suggest that there’s a problem with the proposal, in that it will encourage people to use it in places where it isn’t appropriate.

What I like about @sayandipdutta’s examples are that they don’t have an immediately obvious alternative that’s just as readable.

5 Likes

So I’ve been catching up on the thread and I am now leaning towards specific syntax or construct like @sadaszewski as can see how operator could be too limiting or confusing

I think if I was to strip this feature to its conceptual core we are trying to construct an array of partials which are then wrapped themselves in a function which takes an item iterates through each partial passing the item through all of them into its transformed into the final result

We can declare that already using this definition of a function called pipe


def pipe(funcs):
    return lambda initial_value: reduce(lambda acc, func: func(acc), funcs, initial_value)

item_pipeline = pipe([
   partial(map, lambda a: item.name),
   partial(filter, lambda b: “delete red” not in b),
   list
])

processed_items = item_pipeline(items)
   

This itself is enough to satisfy my original motivation and perhaps is a good candidate for functools.

Earlier in the thread someone suggested that they would rather the variable which stores result would be on the right hand side.

It’s obviously more limited but if the pipe object instead returned a context manager class


from functools import reduce

class pipe:
    def __init__(self, funcs):
        self.funcs = funcs
        self.result = None
    
    def __call__(self, initial_value):
        """Allows direct invocation like a function."""
        return reduce(lambda acc, func: func(acc), self.funcs, initial_value)

    def __enter__(self):
        """Context manager entry — simply return the result."""
        return self.result
    
    def __exit__(self, exc_type, exc_val, exc_tb):
        """Context manager exit — nothing special needed."""
        pass
    
    def __getitem__(self, initial_value):
        """Allows `with p(6) as result:` syntax."""
        self.result = self(initial_value)
        return self


p = pipe([
    lambda x: x + 2,
    lambda x: x * 3,
    lambda x: x - 1
])

# As a callable:
print(p(5))  # Output: 20

# As a context manager:
with p(6) as result:
    print(result)  # Output: 23

So okay perhaps at this point we could say that proposing this simple thing for functools could be enough and a nice little win for people who need this use case

But the problem remains that this only works for functions which only accept the data argument as the last argument

The use of placeholder objects would make this cumbersome and hard to read.

And of course you can wrap using lambdas but I don’t think people would use such a feature e


p = pipe([
    partial(str.replace, _, "a", "b"),
    lambda x: x.upper(),
    partial(lambda a, b: f"{a} ends with {b}", _, "!"),
])

We could replace this with partial literals or operators with special behavioursbut it doesn’t feel like such a proposal would be accepted as this grammar would only really be used for this feature and may over complicate Python


p = pipe([
    ***map(process, *_)
])

Therefore I agree we do need special gramma so with the caveat I have no idea how hard this would be to do

But what if a pipe was just a special type of lambda ?

Like a lambda the rules of the expression in the body of statement would bea specialised subset focused on declaring a pipeline

This would allow us to reuse existing tokens and only introduce at most one new reserved keyword

What would that look like ?

items = pipe: 
       map(lambda x: x > 2) |
       list()

This code would be same as declaring a lambda function which passes an item to map, then the result of map to list before returning that

Regarding place holders we could reuse the “as” keyword which is already a convention from with statements and the fact lambda style statements have named arguments


items = pipe item: 
       fancymap(lambda x: x > 2, item) as result |
       fancyzip(result, 6)

Note these won’t be intermediate variables available outside of the scope they are just ways of creating a named reference which Python compiler can use to know where to place the result of the previous step and the item from the start of the pipeline

Each name reference can only be referred by the following pipeline step statement (the partical call declared before each l )

Yes, you are right it is somewhat forced to use pipe, but there is a reason for it. This is an example of utility which is made to blend into bash pipeline. And I don’t use pathlib.

Was rejected, but I am still hopeful it will be reconsidered once things fall into places. It is a bit too early to tell if this is the good addition I suppose. See: Addition: functools.pipe · Issue #127029 · python/cpython · GitHub

In this example, I need to refer to the documentation to see what arguments each function takes. You could argue that these are built-in functions and I should already know them. Still, I don’t think it’s necessary to memorize them, especially since I don’t use them regularly. I might occasionally use all or any, but that’s about it. My point is that reading someone else’s code filled with their own custom functions isn’t exactly a fun experience.

Honestly, I’m not a fan of the original version either. It could have been much easier to read. It seems like the author was determined to use sum no matter what.

def part1(text: str) -> int:
    games = (Game.from_str(line) for line in text.splitlines())
    return sum(game.Id for game in games if game.is_playable())

Compare it to this version (assuming I translated the original code correctly):

def part1(text: str) -> int:
    total = 0
    for line in text.splitlines():
        game = Game.from_str(line)

        if not game.is_playable():
            continue

        total += game.Id

    return total
1 Like

Feels like the proposal in that form perhaps maybe would be accepted if it didn’t overload operators

Basically a more minimal
Implementation

Probably good idea. Also, there was feedback in rejection. If you are interested in polishing it and re-submitting we can discuss in: `functools.pipe`.

1 Like

So with pipe as a lambda syntax perhaps it would be better

def part1(text: str) -> int:
   process_lines = pipe lines:
        map(Game.from_str, lines) as games
        | filter(Game.is_playable, games) as playable_games
        | map(attrgetter("Id"), playable_games) as totals
        | sum(totals)

   return process(text.splitlines())