Introduce funnel operator i,e '|>' to allow for generator pipelines

This is a bit of a misleadingly reductive example.

This^^^ perhaps goes in circles a bit more that would be realistic, but this structure is representative for data manipulation pipelines. This is the kind of situation where piping would be useful.

The actual realistic examples where you’re not going in circles are situations where you have functions that you want to the method chain that you want to apply, and those functions are also 10 lines long, and there’s 2 of them… Not great for demonstrating a minimal viable example on the forum.

Another example you might consider is

for file in list(Path(".").parent.parent.glob("*.py"))[2:3]: ...

which is (obviously, to me) less readable than

for file in Path(".").parent.parent.glob("*.py")|>list()[2:3]: ...

would be.

There’s probably better ways to do this specific example, but it’s a moderately common need for me to turn a glob into a list. And if it was easier to use a custom globbing function (because you could pipe them instead of having to put them at the start of the line) I would use them more often.

Many people (myself included) would argue that making an intermediate list here is also the wrong approach, and that this is a place to use itertools.islice

I think there’s at least some merit to this proposal, but I would make sure the examples you use don’t run into “we already have an idiomatic option that doesn’t require creating a temporary container in memory”, as the obvious objection here is that syntax is now leading people to worse solutions.

Regarding examples, I had listed a handful of examples in another thread, see my last comment for the link. Adding to that, a very common and repetitive pattern emerges in Deep Neural Networks.

from torch import nn

class CNN(nn.Module):
    def __init__(
        self,
        input_channels: int,
        output_channels: int,
        dropout: float,
    ):
        super(CNN, self).__init__()
        self.conv0 = nn.Conv2d(input_channels, 16, kernel_size=3, padding=1)
        self.bn0 = nn.BatchNorm2d(16)
        self.conv1 = nn.Conv2d(16, 32, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.conv4 = nn.Conv2d(128, 256, kernel_size=3, padding=1)

        self.pool = nn.MaxPool2d(2, 2)

        self.fc1 = nn.Linear(256 * 20 * 20, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, output_channels)

        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
        self.dropout = nn.Dropout(dropout)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.pool(self.relu(self.bn0(self.conv0(x))))
        x = self.pool(self.relu(self.bn1(self.conv1(x))))
        x = self.pool(self.relu(self.bn2(self.conv2(x))))
        x = self.pool(self.relu(self.conv3(x)))
        x = self.pool(self.relu(self.conv4(x)))

        x = x.view(-1, 256 * 20 * 20)
        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.relu(self.fc2(x))
        x = self.dropout(x)
        x = self.sigmoid(self.fc3(x))

        return x

The forward pass could be like this:

    def forward(self, x):
       return (
            x                                         # 640 x 640 (features)
            |> self.conv0 |> self.bn0 |> self.relu    # 3 -> 16   (channels)
            |> self.pool                              # 640 -> 320
            |> self.conv1 |> self.bn1 |> self.relu    # 16 -> 32
            |> self.pool                              # 320 -> 160
            |> self.conv2 |> self.bn2 |> self.relu    # 32 -> 64
            |> self.pool                              # 160 -> 80
            |> self.conv3 |> self.relu |>             # 64 -> 128
            |> self.pool                              # 80 -> 40
            |> self.conv4 |> self.relu |>             # 128 -> 256
            |> self.pool                              # 40 -> 20
            |> ~.view(-1, 256, 20, 20)                # flatten
            |> self.fc1 |> self.relu |> self.dropout  # fully conn 1 + dropout
            |> self.fc2 |> self.relu |> self.dropout  # fully conn 2 + dropout
            |> self.fc3 |> self.sigmoid               # output layer
       )    # fmt: skip

Not only is it representative of how the data flows through the network, but also it is extremely adaptive to change. For example, if I wanted to turn off batch normalization in some layers, and turn on in others, it would be trivial. Whereas with the previous version, I would have had to hop around parentheses, and make sure they are balanced. Not to mention, not having to allocate a single variable throughout the process.

Arguably, the first version is marginally easier to debug, but for simple networks, it is generally a set pattern, and you don’t really have to debug at batch-norm levels. And even if you had to, it is quite straightforward to split it up.

Although, this raises another question. How do formatters go about formatting? I guess it will be like any other binary operator. Also, as mentioned earlier in the thread, a way to store a pipeline would be very handy. One suggestion was pipeline = lambda arg: arg |> f1 |> f2 |> ...

But if ~ (tilde) is used as implicit partial/lambda, maybe it makes sense to consider expressions beginning with ~ |> ... as a pipeline for later use?

e.g.

# in constructor
def __init__(self):
    self.layer1 = (~ |> nn.Conv2D(3, 16, 3, 1) |> nn.BatchNorm2d(16) |> nn.ReLu)
    .
    .
    .

# then in the forward pass
def forward(self):
    return x |> self.layer1 |> self.layer2 |> ...

NOTE: PyTorch itself provides several ways of function composition (e.g. nn.Sequential, torch.nn.functional.Compose, etc.).

2 Likes

The example I have in my code base was of a system that needed to process a list of images, chunk them into pairs, calculate the relationship of those pairs by passing it to a algorithm, returning the final results to a function that uploads them including the index of the pair which can be used to figure out which images were part of it whilst using tqdm to show the processing status.

Doing all of this without generators requires a large amount of hand written for loops which hurts readability since it’s less explicit

On top of this generators allowed us to lazily evaluate the list of items because we didn’t need to load or generate the images which has a cost

Unfortunately this presented two problems, it was easy to have bugs from the order of operations

pairs = windowed(images, 2)
pairs = map(process, pairs)
pairs = tqdm(pairs)
all_pairs = list(pairs)
pairs = enumerate(pairs)

return all_pairs

We can of course put this on one line to prevent line ordering bugs but you now need to read the methods right to left and balance brackets

all_pairs = list(enumerate(tqdm(map(process, windowed(images, 2)))))

So it’s difficult to maintain in another sense.

I would argue Python already has a construct that tries solve similar problems, list comprehension

Before it was introduced you similarly had to create the same logic across multiple lines

items =
for i in range(1,10):
if i % 2:
items.append(i)

Most languages decided to replace this pattern with functional programming, Python decided that it wasn’t a good fit for the language

Instead List comprehension turned that into something more readable and more maintainable by being able to build an array in one line

a = [i for i in range(1, 10) if i % 2]

But it still had some serious downsides, it executed immediately which made it difficult to handle iterating through large lists compared to hand rolled loops

The syntax was foreign compared to the functional programming people were used to from other languages.

And the syntax was both too limited for complex sequence generation logic and could still be hard to read when using more complex generation with nested arrays

[I for i in j for j in w for w in items]

And so Python took another step towards functional programming when they introduced generators and lambdas (to avoid predicates not being colocated with the statement it’s usd for)

x = list(filter(lambda i: i % 2, range(1, 10)))

So now we finally have a new way to build sequences that’s flexible enough to unlock new use cases without polluting the code with loads of for loops and callbacks for lazy evaluation

But we still haven’t quite cracked the readability as unfortunately the execution order is now in reverse, this is what the funnel operator or introducing a pipe object in functools could solve

x = range(1,10) |> filter(lambda i: i % 2) |> list()

Think of this as you will as “method call comprehension”

Python in many ways has crossed the functional programming bridge a while ago but in a way that allows teams to adopt how much functional programming they want

This concept would be the same, you can easily write the same code without needing to use it but the downside is the reverse call order I mentioned above

Just like you would have to drop down to for loops or list comprehension without the use of generators

Regarding readability like all concepts in Python you can find cases where they would be less readable and of course there are multiple ways of doing same thing in Python with different levels of readability and advantages & disadvantages

In your example I don’t think anyone would use it like that for I in enumerate(items) would be preferable and we can restrict the use of this operate in the for loops without expression to promote readability just like we do with list comprehension

Now I’m not sure about all of the syntax proposals but it feels like the ability to compose functions like this. Would be the final missing puzzle piece in a long history of Python slowly adopting more concepts to make it easy to work with streams

For example imagine this HTML DSL being used to construct a todo list of visible items

with html.ul() as list:
[list.append(html.li(item.text)) for item in items if item.visible]]

Would become:

with html.ul() as list:
list_items = items |> filter(lambda item: item.visible) |> map(lambda item: html.li(item.text))
[list.append(item) for item in list_items]

Essentially we get a nice way to declare reusable predicates

Em dom., 18 de mai. de 2025 às 18:45, Elis Byberi via Discussions on Python.org <[notifications@python1.discoursemail.com](mailto:Em dom., 18 de mai. de 2025 às 18:45, Elis Byberi via Discussions on Python.org < escreveu:

The things that I don’t like about this version are:

  1. The weird X symbol which has appeared from nowhere. Presumably it was imported somewhere at the top of the file, but why doesn’t it have a meaningful name?
  2. The @P symbol. map@P is presumably a matrix multiplication of the map function and another badly-named P object. What does matrix multiplication have to do with this?

To be a realistic example, I’d want to see the actual names proposed for P and X. If those are the proposed names, the proposal is an immediate non-starter as “stealing” such commonly used names isn’t realistic.

Looking back at the original post that led to this proposal, I can’t think of a reasonable name for P - it’s essentially a way to “cheat” the operator overloading machinery to make @P look like a special construct. I don’t like that approach - apart from being error-prone, it’s a misuse of the language. At best, it’s a “cute” trick that might be used in a 3rd party library, but IMO it has no place in the standard library or as an example of idiomatic Python code.

For object-oriented programming, it’s best to use a class:

from more_itertools import windowed
from tqdm import tqdm

class ImagePipeline:
    def __init__(self, data):
        self.data = data

    def window(self, size):
        self.data = windowed(self.data, size)
        return self

    def map(self, func):
        self.data = map(func, self.data)
        return self

    def progress(self, desc=None):
        self.data = tqdm(self.data, desc=desc)
        return self

    def to_list(self):
        self.data = list(self.data)
        return self

    def enumerate(self):
        self.data = enumerate(self.data)
        return self

    def get(self):
        return self.data

pipeline = (
    ImagePipeline(images)
    .window(2)
    .map(process)
    .progress("Processing pairs")
    .to_list()
    .enumerate()
    .get()
)

for idx, pair in pipeline:
    ...

The same applies to the PyTorch example:

model = (
    LayerChain()
    .conv(3, 16, 3, padding=1)
    .bn(16)
    .relu()
    .pool(2)
    .conv(16, 32, 3, padding=1)
    .relu()
    .pool(2)
)
1 Like

I think the part I’m most hesitant on with this is that people could already write in this style with only a very very little bit of reusable boilerplate

from functools import reduce, partial

def process(val, *funcs):
    return reduce(lambda x, f: f(x), funcs, val)

def app_filter(predicate):
    return partial(filter, predicate)

def app_map(op):
    return partial(map, op)

# This is now ordered starting value, operations in the order they happen
process(range(1, 10), app_filter(lambda i: i % 2), list)

Not to mention that many of the examples that don’t involve the unused app_map above are things people could just write more idiomatically :

[*range(1, 10, 2)]

I would need examples that show this actually improves over what’s possible and that people would actually write things that way to go from “this has potential” to “this is worth adding syntax for”

2 Likes

So the problem is the way you would need to pull off this pattern is you need to wrap the function you want to call so you are effectively are passing it the arguments you want and then returning it as effectively a partial that will receive the data from the lest step in pipeline

So far we’ve seen a few ways of pulling this off.

  1. Make one massive class containing every function you would ever want to chain and know how to chain them but now you have a kind of god object in your application

pipe = GodClass(images) .window(2) .map(process) .progress(“Processing pairs”) .to_list() .enumerate() .get()

  1. You example of a function that accepts list of functions to call, the functions themselves being partials youve had to wrap manually or using decorator so you can deal with

from functools import reduce, partial def process(val, *funcs): return reduce(lambda x, f: f(x), funcs, val) def app_filter(predicate): return partial(filter, predicate) def app_map(op): return partial(map, op) # This is now ordered starting value, operations in the order they happen pipe = process(

images,
partial(window, 2),
partial(map, calculate),
partial(tqdm),
partial(list)
)

  1. A generic class that can handle doing this via operators (see pipes sample above) or chain method. Accepting partial functions

Items = pipe(

images) |
f(window, 2)| f(map, calculate),

f(tqdm) | f(list)

Now i anm open minded, it seems that this pattern is popular enough to at least warrant a pipe like object to be included in functools . It happens frequently enough that I don’t think it’s reasonable to expect uses to hand roll it nor is nor is it overly complex or niche

That said a operator would provide advantages, especially if like me you kind of consider partial to be so frequently used in codebases that it might as well be as much a part of the lanaguage as map or filter (not in functools) are

Firstly those examples above I would argue either violate good software architecture design or make the nature of what’s going not very explicit because in the class examples it looks like you are calling a random function and doing a bit wise comparison the meaning in this context is different

Or you need to mentally reconstruct what the function call will be based on all the partials you are passing to the process function

And this is before you get to the verbosity of the samples in the thread regarding using placeholders to handle partials for functions where the argument that consumes your item from the pipeline is in different place

items = images |> windowed(2) |> map(process) |> list()

To me is more explicit and therefore more pythonic

You know that the data like any other operator is flowing left to right, you can see straightaway how the steps of the pipeline are being configured by looking st the arguments of the partial call sites. No needing to wrap everything or make it less explicit by wrapping everything with partials

With the suggestions above that people have made it seems like we can even add ability to declare placeholders using this syntax in a fairly easy to read way too

To me this kind of reminds me of match statement there were lots of other ways to do matching in Python using existing constructs but usually invoked same things as what you are proposing now which resulted in less explicit code

Em dom., 18 de mai. de 2025 às 22:28, Michael H via Discussions on Python.org <[notifications@python1.discoursemail.com](mailto:Em dom., 18 de mai. de 2025 às 22:28, Michael H via Discussions on Python.org < escreveu:

1 Like

I don’t think it’s productive to try building a pipeline workflow using the existing Python syntaxes and data model because:

  1. If you can build it with Python code now, it means it can be packaged as a third-party library, of which there are already literally dozens on PyPI and as recipes.
  2. The result of a pipeline should be conceptually on the right, but it is impossible to perform assignment to a right operand with the current Python grammar, so all the existing solutions make the compromise of assigning the result to the LHS of a pipeline.

And all the talks about using a placeholder to indicate where a piped object should be passed into are not very helpful IMHO because:

  1. It makes the syntax look more cluttered and complex and therefore less likely to be accepted by most of those who believe that being Pythonic means “simple is better than complex”.
  2. Having to search for the placeholder in all the arguments in every call adds significant performance overhead.
  3. The vast majority of functions used in a pipeline takes the incoming object as either the first or the second argument, so I don’t think there is any need for a generic placeholder just to allow arbitrary positioning for very rare use cases, which will be possible with functools.partial anyway.

For this proposal to provide value over the status quo of third-party libraries I think it needs to have a dedicated syntax like match-case in which things have semantics distinctly different from the regular grammar. It’ll require more work, but it’s the only path to making the proposal actually meaningful.

Here’s my revised proposal:

  1. A dedicated pipe statement. Making the pipeline an expression would mix the pipeline with conventional operators and easily confuse the readers.
  2. Like match-case, where Foo(x=1) in a case clause is a specification of a pattern rather than a call, a callable target in a pipe is also a specification of a call rather than an actual call, so:
    • To avoid ambiguity, arbitrary expression is not allowed as a pipe target. Rather, only a bare name, a dotted name, or one of them followed by a pair of parentheses enclosing an argument list is allowed. Names without parentheses are treated as if they are followed by ().
    • A piped object is always passed as the last positional argument of a call. If there are more arguments after the piped object, use keyword arguments instead. Use a wrapper or functools.partial for the very rare use cases where keyword arguments are not possible.
    • Arbitrary expressions are allowed inside the outermost parentheses.
    • Use | as the pipe operator as opposed to the more exotic |> both because it looks cleaner and because it can be unambiguously supported when we have a dedicated pipe statement.
  3. Use the as clause for assignment, allowed after every call specification.

For example:

pipe 'abcde' | itertools.batched(n=2) | map(''.join) | list as pairs | print

will be equivalent to:

print(pairs := list(map(''.join, itertools.batched('abcde', n=2))))

I think this syntax is clean and intuitive enough to be understood by even first-time readers.

The advantage over the status quo in terms of cleanliness and expressiveness may then be apparent enough for the proposal to be possibly justified.

You’re right, a couple of hours after I posted this I realized the more proper example is

print(list(islice(Path(".").parent.parent.joinpath("tests/data").glob("*"), None, 5)))

Perhaps there is a better way, but this currently feels cumbersome enough that I resort to printing the path and using ls in the terminal.
Or playing around with joining paths and using “.exists()” repeatedly to find out where I am.

Being able to write

print(Path(".").parent.parent.joinpath("tests/data").glob("*")|>islice(None, 5)|>list())

might switch the order of convenience.

Maybe I should have waited with the post until today.

Those were defined in: Introduce funnel operator i,e '|>' to allow for generator pipelines - #99 by dg-pb

X is MethodCaller utility defined in there, while P is partial wrapper - also, defined in there.

Names I took are arbitrary - just picked couple of 1-letter shorthands. Operator is also arbitrary - any with convenient precedence can be used.

So I agree with alot of your comments, I think though the “|” in match makes more sense than for pipes because it still retains the semantic meaning of “this” or “that”. With pipes it would be about saying that data flows from left to right, I’m not sure if any operator has that meaning. Closest would be “+” or “-”.

I think adding things like “as” needlessly complicates it another way, I think if I were to think of a more pythonic grammar that didn’t involve adding another operator and we simplified the feature to not have complex grammar round placeholders at least for indication version.

Your pipe keyword makes sense but I don’t think it suit’s pythons existing grammar in a way that wouldn’t make it harder to eread. Perhaps the pythonic grammar for this which makes it clear that what is being defined is a definition of a deferred pipleine of method calls is introducing something like “Pipeline Comprehension”

If list comprehension is this
image_paths = [image.path for image in images]

And dictionary comprehension is this

paths_and_images = {image.path:image for image in images}

Then perhaps pipeline comprehension AKA tunneling would be this:

existing_images = <images in map(lambda x: x.path) in filter(lambda x: os.exists(x))>

What is this grammar saying ? It’s saying take images pass it “IN” to the map function then pass that “IN” to the filter function.

The advantage is no new tokens at least.

I think these are 2 main paths:

  1. monolith statement
  2. modular design

To compare potential final results:

pipe 'abcde':
    | ~.toupper()
    | itertools.batched(n=2)
    | map("".join)
    | list as pairs
    | max(*~)
    | print

Versus:

pairs = (
    'abcde'
    |> ~.toupper()
    |> itertools.batched(n=2, ~)
    |> map("".join, ~)
    |> list
    |> max(*~)
)
print(pairs)

For match statement to be monolithic statement there is a clear case for it - case constructs are not meaningful outside of it, while in this case all components are meaningful on their own - thus I am not so certain about which path is better.

Pros of monolihic design:

  1. Likely much easier to reach optimal performance
  2. Mid assignment possible
  3. Syntax would be likely be a bit more succinct as it is easier to work in enclosed environment in this respect.

Pros of modular approach:

  1. Majority (if not all) of components are meaningful/useful on their own outside the statement.
  2. Does not have to be implemented in 1 go.
  3. Syntax would most likely be more in line to existing one as it is naturally constrained by it.
  4. Learning curve for such would be more natural (e.g. learning curve of match is pretty bad - need to learn most of the thing in one go and that knowledge is hardly transferable to anywhere else)

e.g. |> binary operator implementation on its own combined with couple of convenience utilities already allows for something quite acceptable:

class X: """<methodcaller utility>"""
    # Pick any name you want
class P: """ <partial utility>"""
    # Pick any name and operator you like
    # or just use partial as it is. See below:

pairs = (
    'abcde'
    |> X.toupper()
    |> partial(itertools.batched, n=2)
    |> map@P("".join)
    |> list
    |> max>>P(*_)
)
print(pairs)

Also, for modular design the path is much clearer, while monolithic design path leads to exploration of endless possibilities. (not sure if this is a pro or con :person_shrugging:)

(let me know if you see other pros/cons - will update so to be fair)

After second look, unfortunately this does not seem possible. func(**{'': 1}) is a valid call to func(**kwds): ... and such would override it. So without dedicated syntactic convenience only partial(func, *_) is achievable.

@dg-pb would something in-between like “method pipeline” work as another option to be considered ?

existing_images = <images in map(lambda x: x.path) in filter(lambda x: os.exists(x))>

:person_shrugging:

Angle brackets would be hard to pull off. There are countless proposals that never happened with both { and <. There were various arguments against each of these. Many of them were about ambiguities and potential clashes with existing syntax. E.g. see: Syntax for Generator iterables using angle brackets - #7 by chepner

Exactly. If this is a proposal for a change to the language/stdlib, naming is important. Demonstrating your proposal using arbitrary names that frankly stand no chance of being accepted into the stdlib is going to misrepresent your proposal. You should choose realistic names and show what code would look like using them, instead.

If I’ve misunderstood, and you’re proposing that people write their own versions of P and X, so we don’t need a stdlib change, then I apologise. Although I’m surprised if that is what you’re proposing, given that this discussion seems to be based very heavily on the idea[1] that existing “roll your own” solutions aren’t enough.


  1. which I have some sympathy for ↩︎

1 Like

Yes, the path that I was talking about is a gradual easing into this. I am not suggesting naming anything as P or X in stdlib - that would indeed be inappropriate.

First step would be to implement simple binary operator |> and allow people to define their own utilities. Yes, they could name it P, but P possibly makes use of partial, thus subsequent partial extension allows people to write such utility in few lines of code and name it whatever they want.

So this stage (|> implementation) would eliminate the need of wrapping initial object into some Pipe class. This also eliminates the need for final unwrapping. E.g.:

Pipe([1, 2]) | sum | unwrap

# Versus:

[1, 2] |> sum

Once it is there, there is time to experiment and figure out what other parts can be made more convenient, maybe some sort of syntactic convenience for partial, so that:

partial(func, _, 1)
func@P(_, 1)    # Or user-made this

Can become:

func(~, 1)

So the potential final convenience would be similar, but the approach is more gradual - figuring out the next most beneficial part to be made more convenient as opposed to the whole thing in 1 go. At the same time considering each part in a wider context so that they are useful everywhere and not only in piping.

So I just think that |> is the bit which would provide most convenience with reasonable cost. Reason being that from above it can be seen that it eliminates more of currently necessary boilerplate compared to syntactic convenience of partial and at the same time its implementation is much more straight forward (just copy paste binary operator logic) compared to what would be needed for partial syntactic convenience.

I don’t deny other conveniences, but if this approach is taken each bit can be addressed separately and sequentially.

Also, while |> is being implemented, partial could be extended at the same time orthogonally, such that it supports:

1 |> partial(lambda x: x, x=_)
[1, 2] |> partial(sum, *_)
{'a': 1, 'b': 2} |> partial(lambda a, b: a + b, DblStarPlaceholder)

So that it covers all necessary transforms for piping.

So that user has fully functional partial to be used in conjunction. And for the time being (until a better convenience is produced) he can make it a bit more convenient if he wishes with few lines of code:

class P:
    def __init__(self, *args, **kwds):
        self.args = args
        self.kwds = kwds
    def __call__(self, func):
        return partial(func, *self.args, **self.kwds)
    __rmatmul__ = __ror__ = __rrshift__ = __call__

Which is quite reasonable temporary solution on the user side (if using partial in a direct manner is not enough), which gives plenty of time to figure out what sort of syntactic conveniences for partial or other bits could be made and if they are necessary.

Ah yes that’s a pain. Only way I can think of doing this is to use “_” as a prefix for list comprehension, to indicate “generator list comprehension”

active_names = _[ images in filter(lambda x: x.active) in map(lambda x: x.name) ]_

In this case “_[” isn’t valid in python today and would be unambiguous. But certainly feels like a operator is a better and moew powerful way to go.

1 Like

_[ is actually valid python as in _=(1,2); _[0]. The parser needs to look ahead all the way to the closing ]_ bracket to make a decision.