Introduce funnel operator i,e '|>' to allow for generator pipelines

sadaszewski · May 29, 2025, 7:07pm

This could be possible with some further engineering:

(
  (_1 := number_list) |>
  (_2 := abs_func()) |>
  (_3 := [ x for x in _ if x > 5 ]) |>
  (_4 := await count_func()) |>
  (_5 := show_result())
)

Although not trivial to achieve, this would retain 1:1 the functionality you describe and still afford much cleaner and more robust syntax. Would this truly be of interest and essence?

bwoodsend · May 29, 2025, 9:08pm

Do you know any tooling for any language that can do this though? Certainly in Java land (where I get most of my exposure to and dislike of functional programming), even the uber fancy >1.5GB IDEs like IntelliJ (JetBrains’s Java equivalent of PyCharm) which can debug inside Vagrant VMs or across SSH connections still can’t do this one fundamental thing that you’re taking for granted.

How would you even specify where the breakpoint should be to a debugger? It’s one statement so even if you write it across multiple lines, those stages in the pipeline still all have the same line number.

sadaszewski · May 29, 2025, 9:17pm

PDB can already next through the lines of the pipeline expression as far as I can tell. I will test some more tomorrow. The missing part is actually printing the value of the evaluation of the “line” (which I would define as the last expression evaluated on that line).

By line.

This is most definitely not true. The lambda calls I generate map to separate lines if the pipeline stages are written on separate lines - guaranteed.

MegaIng · May 29, 2025, 9:46pm

Python debuggers could go bytecode-by-bytecode and some of them support it. But that is probably too fine grained for common useage, but I am pretty sure some also support going to every call (and if not, writing one on top of python’s stdlib debugger shouldn’t be difficult at all)

sadaszewski · May 30, 2025, 10:53am

I’ve updated the implementation and finished the first draft of the PEP. I would like @pf_moore , @tzengshinfu , @sayandipdutta , @dg-pb , @jamsamcam , @elis.byberi , @jsbueno , @blhsing , @petercordia , @mikeshardmind , peterc , bwoodsend , syntaxfiend , MegaIng , psarka , GalaxySnail , nemocpp and others who contributed to the discussion since my original post or before to review the PEP and let me know or submit pull requests, as well as let me know if you are fine to be on the list of Authors. @pf_moore could you be our Sponsor?

This syntax:

(
        (_1 := image) |>
        (_2 := hist_eq()) |>
        (_3 := threshold(t=128)) |>
        (_4 := dilate((3, 3))) |>
        (_5 := erode((3, 3))) |>
        (_6 := connected_components())
)

now works as expected.

Pyodide deployment: here

PEP: here

jamsamcam · May 30, 2025, 11:20am

One thing we can mention in the PEP that it’s not just about readability

it would unlock optimizations not currently possible without developer explicitly opting in via special API

Libraries could detect use of this operator and can do things in place that currently need that library to expose a bespoke API to pull it off

James

Em qui., 29 de mai. de 2025 às 10:11, Stanislaw Adaszewski via Discussions on Python.org < [notifications@python1.discoursemail.com](mailto:Em qui., 29 de mai. de 2025 às 10:11, Stanislaw Adaszewski via Discussions on Python.org < escreveu:

pf_moore · May 30, 2025, 11:49am

Please don’t mass ping individuals. Anyone who is still interested in this topic will see your post anyway, pings like this just spam people.
You need to own the PEP yourself, you should not expect everyone who contributed to be part of the author list.
Please don’t include me as an author.
Sorry, but I’m not willing to sponsor this PEP. It has some merit, but I’m not sufficiently convinced that this particular variation of the various “pipeline” proposals is the right answer to be willing to sponsor a PEP.

sadaszewski · May 30, 2025, 1:59pm

Hi Paul. Many thanks for your feedback, it’s noted. Regarding 4) - how would you suggest to proceed? Would you be willing to elaborate about the gaps or particularities of the proposal that turn you off? If addressed - do you think you could be convinced? Are there ways to submit such a draft PEP for consideration for sponsorship to other Python team members? As far as I know this is the only pipeline proposal having a draft PEP. Am I missing alternatives in similar stage of maturity? I think the discussions around the concept bring a lot but they do not meet the bar of being actual proposals. What I am proposing is immediately useful (i.e. without auto-partials), aligned with the syntax in other languages and more powerful than most (all?) implementations I know (i.e. allow arbitrary expressions on the RHS, not just calls). I think it’s a great proposal. With that said, it might have some shortcomings that I am missing. Even if this PEP ends up not being successful, it would be great to capture this feedback in a formal fashion. Maybe contributing your reservations/critique to the Open Issues section of the PEP would be a viable course of action? Thank you for your consideration.

sadaszewski · May 30, 2025, 2:39pm

I like this idea. I would love to come up with the API and implementation before putting it in the proposal. Any idea what it could look like?

pf_moore · May 30, 2025, 3:47pm

Well, you need a sponsor for a PEP, so “find a sponsor” has to be part of the answer. But I also think the proposal needs to be developed further before it’s ready to be a PEP.

See below… (To be clear, what makes me unwilling to be a sponsor is more that you seem unwilling to address the gaps that other people have mentioned, rather than me having any specific technical issues).

To be the sponsor? Not with the proposal in the state it’s currently in. I don’t have the time available to commit to working with you to get the PEP to a submittable state, which IMO is the key job of a sponsor. I’ll offer help here when I have the time, but that’s all I can commit to.

If you ever reach a point where I think the PEP is good enough to simply submit, and I’ve been convinced that the feature is worth having, and at that point you still don’t have a sponsor, then I may reconsider - but I’m not promising. You have a lot of work still to do before it’s at that point.

Posting here is the correct route. There are enough core developers who read this category that if your proposal is good, someone should offer to sponsor.

Not really - the key point here is the “stage of maturity”. You need to refine your proposal first, and only when it’s ready should you worry about getting a sponsor.

Agreed. But what they do demonstrate is that there are a lot of open questions around the final form of any “pipeline” solution. At the moment, your PEP hasn’t addressed any of those proposals, beyond stating that they are not included. You haven’t demonstrated that they aren’t needed, and you haven’t (as far as I can see) persuaded any of the people who brought up those points that your proposal is sufficient on its own. So to someone like me who has followed the discussion, the PEP reads like it’s ignoring many of the points raised here. And to someone who hasn’t read the discussion here, the PEP misrepresents the community view.

Also, if a PEP leaves too many things open “for future improvement”, it’s likely to get rejected because it isn’t complete on its own. A PEP needs to stand alone as a useful feature, not just act as a starting point (except in very special cases, and frankly this is nowhere near significant enough to qualify).

So convince people. Why don’t others in this thread agree with you? You can’t go to the steering council with something that you think is great, but no-one else agrees. You need to show community consensus that this is the right solution.

The process of creating a successful PEP takes a lot of time, and much of that time is made up of consensus building. I’ve seen a lot of proposals (and I’m PEP delegate for packaging PEPs, so I’ve made the decision on a fair number of them) and for me, the single most important factor in a proposal being successful is whether the community is unified behind it. And this discussion still feels like it’s a long way from that sort of unified sense that people think “yes, this is what data pipelines in Python should look like”.

It’s not just about shortcomings. It’s also about demonstrating that you’ve addressed feedback. I had a paragraph here pointing out that you’d not addressed comments that the proposal didn’t handle method calls, or function calls where the piped data wasn’t the last argument. It took me a couple of readings of the PEP to realise that you now did handle that, by means of the special behaviour of _. You should be making that clearer in the PEP, with examples of usage and a discussion of the design choices you made - and getting feedback from the people who had those concerns that your new proposal addresses the problem.

I’m not sure the debuggability question has been addressed. I don’t use debuggers enough to know, but my understanding is that you can’t set a breakpoint on a line within an expression - breakpoints are set on statements. So I don’t think your response of stepping into the calls addresses that. In a complex pipeline, with multiple expensive steps, manually stepping through each one may not be enough. Also, what about things like tracing, or event monitoring? Maybe there’s no technical issue with these, but by encouraging a style of coding where significant chunks of processing are done in a huge pipeline expression, rather than multiple statements, are we going to harm the usefulness of breakpoints, trace hooks, audit events and the like?

Honestly, it’s your job as the PEP author to capture and accurately represent community feedback. I don’t feel it’s up to me to try to do that for you - most of the points I raised above aren’t even my concerns. I’m just summarising feedback others have given.

jsbueno · May 30, 2025, 4:43pm

That is a nice workaround - but keep in mind that the stages designated by the funnel or pipe operator have to be “collected” in the expression, and then called by whatever is driving the operator (in library code, without a dedicated operator, we use a Pipe() instance, for example, which will implement __or__ - and the same code which implements the actual call of each stage could have some debugging instrumentation on it, so that intermediate results are kept in some internal attribute.

Of course, this is no job for something in the stdlib or language itself - not for now, but a lib implementing an universal Pipe could do that.

As for your post, I am not sure that in the end this is any “cleaner” or “robust syntax” than the straightforward plain Python assignments in the grand-parent. Of course that is subjective, but definitelly not for cleaner for me.

sirosen · May 30, 2025, 7:22pm

A quick follow-up on this, to offer a related perspective:

I’ve been barely following this discussion. I think the core idea of pipelines has merit but I’m scared off by a lot of the questions about debuggability, etc. Following an active discussion, even silently, can take a lot of time – and when the idea doesn’t seem to be advancing, I know my own level of interest in the thread tends to go down over time.

Plus, even if I read something four months ago, I may not have retained it.
Write for a reader who has not read this thread at all; that’s the safe assumption.

elis.byberi · May 30, 2025, 7:51pm

This could be any programming language, but it’s not Python:

(
    pd.read_csv("my_file") |>
    _.query("A > B").filter(items=["A"])
     .to_numpy().flatten().tolist() |>
    map(lambda x: x + 2) |>
    list() |>
    np.array() |>
    _.prod()
)

Kurt · May 31, 2025, 8:01am

Hi Stanislaw

I admire your passion for the pipeline operator. Unfortunately, there’s something I don’t quite understand…

# The example from the PEP
[1, 2, 3] |> [ x ** 2 for x in _ ] |> map(str) |> ", ".join() |> print()

# could be written as:
[1, 2, 3] |>
[ x ** 2 for x in _ ] |>
map(str) |>
", ".join() |>
print()

# since explicit is better than implicit
# and "continuing" operator go typically on the begin of the line
# it might be written as:
[1, 2, 3]
|> [ x ** 2 for x in _ ]
|> map(str, _)              # BTW: mind the position!
|> ", ".join(_)
|> print(_)        

# and this *can* be written as:
_= [1, 2, 3]
_= [ x ** 2 for x in _ ]
_= map(str, _)
_= ", ".join(_)
_= print(_)
# which shows 1, 4, 9 here

So essentially you are proposing a new operator |> which can be seen as a replacement for _= respectively the usual _ =? Is this really superior?

It could be that I miss the key aspect of your proposal!
Please help me…

nemocpp · May 31, 2025, 9:47am

I am excited for this PEP but for me too, the debuggability is an important topic as well, and as tzengshinfu greatly mentions, I am afraid people may take on the similar but debuggable and with bad readability alternative he also mentions, because to me that’d be a regression on Python’s readability, even if it is more or less the programmer’s fault.

Though I do not know if the walrus version of the syntax makes it debuggable and can be breakpointed

nemocpp · May 31, 2025, 9:58am

Sorry but why not?

If I understand correctly your point (that the code you provided is hard to read), this is pandas’ syntax not Python’s. Also, I believe your example is deceptively messy and can be rewritten to be clearer* such as:

(
    pd.read_csv("my_file")
    |> _.query("A > B")
    |> _.filter(items=["A"])
    |> _.to_numpy()
    |> _.flatten()
    |> _.tolist()
    |> map(lambda x: x + 2)
    |> list()
    |> np.array()
    |> _.prod()
)

Or if you wish not to add more funnel operators:

(
    pd.read_csv("my_file")
    |> _.query("A > B")
        .filter(items=["A"])
        .to_numpy()
        .flatten()
        .tolist()
    |> map(lambda x: x + 2)
    |> list()
    |> np.array()
    |> _.prod()
)

Tbf, I’ve done transformations that could be even longer, specially when adding polars and sklearn in the mix.

*Note1: I am not talking about the partial lambdas or the use of _
Note2: Also out of topic your code could be simplified not to convert from numpy to list and back to numpy

(
    pd.read_csv("my_file")
    |> _.query("A > B")
    |> _.filter(items=["A"]) + 2
    |> _.prod()
    |> _.prod()
)

#  |> _.prod(axis=None)  <- Will replace both prods in a future version of Pandas

ayhanfuat · May 31, 2025, 11:19am

That’s an example from the PEP. I agree it is unnecessarily convoluted.

NemoCur:

your code could be simplified not to convert from numpy to list and back to numpy
(
    pd.read_csv("my_file")
    |> _.query("A > B")
    |> _.filter(items=["A"]) + 2
    |> _.prod()
    |> _.prod()
)

This could already be done with method chaining though. What is the advantage?

(
    pd.read_csv("my_file")
        .query("A > B")
        .filter(items=["A"])
        .add(2)
        .prod()
        .prod()
)

bwoodsend · May 31, 2025, 2:10pm

For me, what’s missing from the PEP is still examples that really benefit from this. I need to see APIs…

where the pipeline of information really is commonly a long, strictly 1:1:…:1:1 chain without a single stage that takes or returns two bits of information
that can’t be simplified into something that doesn’t need the operator like the pandas example
that isn’t superseded by comprehension loops (i.e. every example involving map() and/or filter())

The only example that isn’t immediately obviously better written some other way is the one starting with image but that example doesn’t even say what library it’s referencing and does nothing to explain why its API isn’t designed so that all those functions are methods of image or why you’d have zero interest in the intermediate results or why operations aren’t done inplace if the intermediate results are throwaway.

dg-pb · May 31, 2025, 2:49pm

I like how this is fully functional and able to incorporate a wide range of situations.

But the costs of this are non-trivial.

It is a monolithic syntactic construct, where completely independent syntax is mimicking existing syntax, while also deviating from it.

Thus, the final result of such introduces a fair bit of complexity in pretty much all possible dimensions.

The benefit of such should be immense to justify such extension.

I would like to see improvements in functional programming in Python, but to me this seems like something 5 steps ahead.

I would like to see how this would look when it is 1 step ahead.

elis.byberi · May 31, 2025, 3:52pm

The proposal offers a solution to a non-existent problem. The code examples presented in the rationale, both in the original post and the PEP, are not idiomatic Python, and this style is generally discouraged.

Once again, what actual problem is being solved?

I mean, can you provide real-world code examples that follow the Zen of Python and sound coding practices, where the proposed syntax acts as syntactic sugar rather than introducing a different paradigm? Python is not a functional programming language.