Introduce funnel operator i,e '|>' to allow for generator pipelines

So I’ve been letting the comments in this thread stew in my brain a little and I think I’ve come to the conclusion this discussion in it’s current form will never come to a conclusion or be implemented.

Here is why.

Reason One: The most useful form of pipes can already be done in python

The two most useful features of being able to declare a pipe (with or without an operator) is to be able to see at a glance the order of operations, how they transform the data, how you can modify them and potentially how that work is scheduled across cores.

As @pf_moore has mentioned you can write a simple pipe function that handles this

result = pipe(data, step_1, step_2)

I think the reason most people opt not to do this is the time and effort to build and more importantly maintain such a construct in your code which is compatible with libraries and functions built without such a feature existing in Python.

The only thing that I can see a “|>” would unlock is the ability to mix expressions with a list of steps in the pipeline

result = x |> [item for i in _] |> step

But I don’t think this is actually a net benefit.

  1. It allows these statements to become hard to read again which defeats the purpose of this proposal
  2. One of the benefits of pipelines as I see it is the ability to declare how that work gets scheduled (Similar to Rx). In some data pipelines you need the ability to balance performance for example how many cores get dedicated to a certain task.

You can either push this logic to where you are doing your pipeline work which makes that code hard to grok, or you can push it into the functions themselves which can make it harder to manage resources beaucse you might miss that one function spins up a bunch of threads with no way to configure it

Allowing immediately executable expressions like this, would prevent you from having patterns where declaring what work should be done and when it should be done (Similar to languages like Halide where the pipeline and the scheduling is split)

results = pipe(data, step_1, step2, scheduler=concurrent.ThreadPoolExecutor(max_workers=4))

Therefore we should just focus on getting a pipe like construct into functools, there is already a discussion and proposed implementation happening for this.

Reason Two: The majority of the things that make pipes cumbersome to use aren’t the lack of syntax to define them but the lack of syntax to define placeholders or functions that can be used in a pipe

Even with Reason 1 basically solved it still makes pipes not as ergonomic as I would like, one problem is there is no agreement where the parameter that accepts the data should live. This makes sense since Python wasn’t built for functional programming and so no convention has emerged beyond a few build in methods like map and filter using the last argument.

This means you need to use a combination of partials and placeholders which can make it just cumbersome enough to write that I wonder if people who would benefit from it would actually use it and also hurts readability.

result = pipe(data, functools.partial(step_1, functools.placeholder, 2), step_2)

You can of course wrap the function call in a lambda for the same effect but lambdas will then strip away any ability for the pipe function to be able to figure out what function the user is trying to call (unlike with partials) meaning if we one day wanted to allow a library author to detect when two functions are being called as part of a pipeline and as concurrent steps then this would simply not be possible (and other such use cases) without providing a pipe aware interface such as “function” and “function_pipeable” (or a decorator that allows them to provide a version the pipe interface can switch to)

result = pipe(data, lambda i: cv2.boostContrast, cv2.saturation) // cv2 can't detect that the two commans are next to each other and configure the functions to happen in place

Therefore I think only way to solve this particular speed bump is to create a new discussion to figure out how we could easily declare such partials. I know this is a discussion that’s happened in python many times and it feels like a large part of this discussion above essentially descended into a full blown debate around what was essentially a new way of declaring partials just not in name

Next Steps:

Therefore I want to suggest some next steps

  • For Reason 1 lets try to land a pipe implementation into functools, Lets contribute to this proposal in the works `functools.pipe` - Function Composition Utility - #50 by mikeshardmind
  • For Reason 2: Continue a new discussion around a simple way to declare partials there are many good ideas by @sadaszewski and I quite like the idea of simply placing a @ prefix to a function call to make it a partial @my_func(1, 2, _). Being able to have literals for declaring partials like this is what unlocked the ability to construct pipelines in languages like Swift which also struggles with inconsistent interfaces which more functional languages like Javascript don’t have (Mainly because in JS you can just call “.apply” on any function)
5 Likes