A native pipeline syntax is uniquely suitable for building conditional pipelines and/or pipelines that can short-circuit (i.e. terminate early depending on a condition). For example:
x |> (
(
_["content"] |> [ x["text"] for x in _ if "text" in x ] |> _[0]
if len(_) > 0
else None
) if isinstance(_.get("content"), list) else
(
_["content"]
) if isinstance(_.get("content"), str) else
None
)
It should be more flexible than match..case for example for matching items in the middle of a list. More importantly it can be nicely decomposed.
handle_list = |> (
_["content"] |> [ x["text"] for x in _ if "text" in x ] |> _[0]
if len(_) > 0
else None
)
handle_str = |> _["content"]
x = {'content': [{'text': 'foo'}]}
x |> (
handle_list(_) if isinstance(_.get("content"), list) else
handle_str(_) if isinstance(_.get("content"), str) else
None
)
>>> 'foo'
The fact that it is an expression as opposed to the match..case being a statement doesnât hurt - they can be complementary.
Itâs not mentioned in the PEP what the parameters to __pipe__() are (they even vary!)? That in turn makes it hard to tell from the def __pipe__() examples what makes it any more than just another operator to overload.
Minus the deep learning case, these donât help to see a contrast between something thatâs impossible/messy to write without the operator but clean/easy with it. Iâd try doing more with |> vs without |> comparisons, making sure to use the cleanest/most readable form you can for your without code so that us readers arenât siting here thinking well if I just rewrite that some other way and then I wouldnât need the operator after all.
Nobody should use âPythonicâ to make arguments for or against features, just as nobody should make arguments which rest on shallow interpretations of the Zen.
These are useful shorthands in casual conversation, but highly subjective and imprecise.
âPythonicâ just means âhas good vibesâ. Usually, when it comes up, thereâs an idea behind it which can be clearly stated without jargon, but requires more effort to elucidate.
The Python community and maintainers typically prize clarity and legibility very highly. Brevity is a secondary goal, and is useful to pursue when it can enhance readability. Beyond that, I donât think there are pithy and easily stated principles. (The Zen is pretty good, but too many people stop thinking when they quote it.)
Iâm not bothered, but perhaps a bit concerned for you when you raise the question of âwhat is Pythonic?â in this way. There is no underlying âtruthâ of what that word means, and if you see your task in proposing this feature as shifting that (illusory) definition, I think you will spend a lot of wasted time and effort.
Regarding these examples, I would say that the complexity you are showing here in some of these is not very helpful â actually, on the contrary, harmful. For example, large ternary if expressions are a red flag because they bury branching logic within expressions. Are they necessary to the examples? Could they be replaced with simple named functions? High cognitive complexity mixed with a new feature makes evaluation of the proposal harder.
When I see pipelines in R, the input is some simple data object â maybe the raw text of a CSV â and the output is some useful analysis â perhaps a data frame with statistics. Perhaps you could find similar Python code and demonstrate the usefulness of pipes in a simpler context?
I also find it very concerning that only one of the examples, âuse case 2â, appears to be sourced from real existing code, and that analysis declares an improvement where I, as a reader, see an inferior result. The original had very clear control flow (linear series of method calls) and each expression is small and readable. Perhaps thereâs some real improvement in there, but itâs extremely hard to identify, even with a very generous reading.
Most people have dropped off of this thread. I think most participants have given as much feedback and devoted as much volunteer time to this topic as theyâre willing to do right now.
You should take a bit of time to address the most substantive criticisms in this thread before posting more. Iâd rather come back to this topic and read significant progress all at once than get updates as minor changes are made, and I suspect that this feeling is shared by some other readers.
The variations are only due to lack of synchronization between examples. The examples with the most arguments (__pipe__(self, rhs, rhs_noinject, last, name, unparsed) are the ones that reflect the current state of the implementation.
Indeed, I should stress that what makes __pipe__() absolutely unique are the parameters. rhs and rhs_inject represent the right-hand side (respectively without / with parameter injection) which has not been evaluated yet. This is the core difference compared to other operators which can also act on the RHS but any deferral must be done manually (using partial or lambda), whereas here it is automatic. Futhermore, last informs whether the current call corresponds to the last stage of the pipeline. name is set to the corresponding identifier if named expression is used at the given stage of the pipeline, None otherwise. Last but not least, unparsed is a string representation of the RHS. It can be parsed using ast.parse and enables all sorts of customizations.
This makes the operator extremely powerful. To which use cases the syntax actually fits is another story. I agree that the Deep Learning example is the most convincing one so far, not exclusively because of its organic origin but because it simply expresses the story better after the rewrite (IMHO). Surprisingly, it is not a case where we gain more compactness. Thatâs a useful observation, I guess. Clarity and legibility before brevity, as mentioned by @sirosen .
Good point. I need more examples sourced from real code.
Since codon actually already has the |> (and ||>) operator, (and is otherwise extremely similar to python) it seems like the most logical place to look for good examples is in codon code bases. Unfortunately I have not been able to find any projects written in codon.
Where fn is the provided static function and *args and **kwargs are their call args.
For the injection we could replace the placeholder (e.g. _) to be self.
This way the class can iterate over the args or kwargs to find self if there is a need for more complex implementations.
I understand that the explicit placeholder placing is slower to write, but it skips the need to remember which one is implemented.
This also assumes we donât follow the auto-lambda, so the scope could be reduced.
Also, for this to be more useful, I believe object should implement a naive version of __pipe__ like in the previous example, and ideally, there should be a custom text where an exception is raised during a pipeline
As for non-ml example of real code, this is from a personal project of mine, not sure if it counts, but:
Non ml real code converted
For context, this is a handshake with a new user over using two passwords with the signal protocol
I wouldnât say it is a big improvement.
To me (personal opinion), it makes the code a bit harder to read but makes it easier to understand that the steps to create a new user are to decrypt twice the user data and then deserialize it
RHS is not necessarily a function. Internally it is wrapped in a lambda that takes _ and only _ as parameter, in order to lend itself to treatment by __pipe__() but any *args and **kwargs would have been captured by that time already.
I am not sure I am following this at all.
This is not what is expected of a pipeline operator. The __pipe__() does not control where the injection happens and it happens as the last positional argument to the leftmost call (if present) on the RHS.
We can play with names but what these are, are actually composable pipelines and IMHO are an integral part of a package that goes beyond âsyntax sugarâ.
??? No clue what this is about. Could you explain with an example?
Although you can squeeze things onto one line in the given example, you shouldnât assume that you can do so for an arbitrary (useful) function call.
Keeping lines short is a pretty well established element of readability, with some implications for accessibility as well. (See also: universal design.)
If a function has four or five parameters, youâll naturally run out of space and it will need to wrap.
What is your preferred formatting style for a pipeline including functions which wrap lines?
In case youâd want to use a different placeholdet other than self (like self.value), the code from __pipe__ could iterate over the args to replace all references to self
Like
class MyClass:
value: float
def __pipe__(self, fn: Callable, *args, **kwargs):
args = (
arg.value
if arg is self
else arg
for arg in args
)
kwargs = {
kw_key: (
kw_value.value
if kw_value is self
else kw_value
)
for kw_key, kw_value in kwargs.items()
}
return fn(*args, **kwargs)
I will admit I am not 100% sure what you mean, but in case I did guess correctly:
Peter Suter mentioned a table where they compare many implementations of the operator, most of them having the implicit argument being the first one:
So I believe for a programmer who uses Python as a secondary language it would not be obvious whether the implicit placeholder goes first or last, hence being an option to explore to go with a forced explicit placeholder.
Sorry for the bad wording, I meant that Pythonâs primitives should implement a minimal __pipe__ so that you can do basic out of the box stuff like 3 |> str # '3'
This is subective. I use linters that enforce PEP8 so more often than not one line calls are not possible for me.
Edit: Now Iâve realized youâve removed the explicit placeholder. I agree with you that IMO the implicit placeholder looks cleaner, but I also think it can become a bigger learning barrier because of the reason stated before.
Edit 2: I like that you start with peer_bundle |>. I didnât think about that, and even if it is just a detail, I like it more that way. That I agree it is a bit more convoluted than it should be in my code.
This is the adapted code, so I tried to minimize changes. As I imagine it, those unpacked kwargs would end up in the kwargs of the call of __pipe__.
Also Peers can take a dict because it is a dataclass and the dict is deserialized in the step before.
In the current implementation this is just not necessary. Now I would be hard pressed to find a use case for what you are proposing. With that said you can achieve that by calling ast.parse(unparsed) and handle the RHS manually. The current implementation just does not assume that the RHS is a function call, so there is no place for *args and **kwargs in this logic.
I donât think other implementations should influence such a decision. So far, to me it seemed more practical and a better fit to existing functions in Python (map, filter) to inject it as the last argument. You can always achieve injection as the first argument by passing all other arguments as keywords. It would not work the other way around. To be perfectly fair though - we should probably chart out more use cases before making a definite decision here.
BTW. the leftmost call on the RHS is for example the call to map() in this example, as opposed to the call to pow() which will not get the injection:
[1, 2, 3, 4, 5] |> map(lambda x: pow(x, 2))
IMHO, a forced placeholder is a downgrade in all possible ways. It doesnât have to be obvious, especially to newcomers and people with biases, it just has to be properly documented. The meaning of the rules for injection should be obvious - with this I can agree but not that the rules should become known magically by looking at the symbol.
You can already do that (3 |> str(), that is). Please check it out.
It should be obvious then that visually pipeline works best with short calls. I would strive to keep them that way.
Implicit placeholder is practically a defining feature of a pipeline. Forced one is superfluous, annoying to read and distracts from the logic.
Something a bit more fleshed out and with as little vertical spread as possible - let the indentation do the work. The difference to PEP8 is small but to my eyes it looks WAY better. What would be your preference? PEP8 or something new?
I wouldnât even make comparisons to or think about PEP8.
IMO itâs important that you think about and formulate an opinion, as the proposer, more so than it is important what in particular it is. Function calls can span multiple lines and if a feature canât compose âwellâ with that, itâs dead in the water.
I was concerned by the fact that you fixed an example to be more readable by squishing things into single lines â that you might not have a plan for how multiline calls should look.
I like blackâs and ruff-formatâs rule that when arguments expand over multiple lines, the trailing paren does so as well.
While weâre on the subject of style, Iâll raise a related item regarding readability:
The removal of an explicit placeholder value for the pipe output is incredibly destructive, to the point that I cannot imagine this proposal succeeding with that behavior included. I assume that if this ever moves to the stage of being a PEP, that would get removed, but itâs so severe that I suspect itâs one of the things which will make it hard or impossible to find a core dev to sponsor this.
The assumption that pipe output should be the first or last argument to a call is a huge one to bake in at the language level, and it basically blows up linting in all kinds of ways since an ast.Call node is not guaranteed to contain a valid suite of arguments to the called function.
Plus it introduces confusion and ambiguities for a human reader when seeing foo().bar() regarding which function gets the implicit argument. I donât particularly care that there is a clear rule for the interpreter to follow â the human who reads and writes this is now being asked to learn and memorize a special case rule for something as fundamental as calling a function.
I would rather we just bound a name as a local. I think _ is a worse choice than what I suggested back in October of PIPE, but Iâd much rather see _ get bound than nothing.
Calls like bson.loads() look for all the world like errors, and if we stick with _ as the bound name, you are literally saving one character for a very, very costly decision.
I think pipelines are a cool tool and, if done right, might make it into the language. But to me that implicit argument passing stuff is a deal breaker, and Iâd hate to see a useful feature not advance for that reason alone.
_ conflicts with both existing common patterns for i18n and for unused variables in unpacking. While the latter conflicting might not be a big deal, the former is.
An explicit value sentinel similar to functools.Placeholder would be ideal here rather than an implicitly bound name.
That, or syntax at the beginning of the pipeline that chooses the name of the pipe
Good point. I was thinking that this would be normal name shadowing, but if the name is fixed it plays havoc on the _() convention.
Playing this out, what is the value of that sentinel though?
e.g. PIPE is a name for builtins.PIPE, thatâs the sentinel. What happens if I print(PIPE) outside of a pipeline?
But maybe youâre right that the name shouldnât be implicitly bound. In that case, should it always be explicitly bound?
e.g.
Here, this approach does not seem to look as nice as keeping the closing paren on the same line as the last argument. IMHO it makes the whole construct look like a weird, brittle mix of symbols. Maybe the presence of the explicit placeholder here adds to the problem. Too many symbols are made too visible so that they might help someone who has trouble noticing symbols or memorizing conventions but annoys and distracts (from the semantics) someone who is acutely aware of their presence through decades of practice.
That is true about the ast.Call node but you would first be parsing the ast.Pipeline node which comes on top and captures the calls and therefore would have an opportunity to handle accordingly (i.e. actually perform the injection OR tag ast.Call for further processing, indicating that it is subject to auto-injection). You could even infer what is being injected from the LHS in the same way any type inference is done currently. Here, the AST of the example above:
Itâs not much different than the convention for arguments without defaults having to come before those with default values in a function declaration OR that positional arguments MUST precede keyword arguments in a function call. The whole language is a set of rules - some more special than others. Here, it is obvious and instantaneously clear that the injection happens to the leftmost call (i.e. foo()). We could run a competition with all sort of expressions and I can literally indicate to you in <0.5s for each case where the injection happens. With that said, if this was truly to be a blocker, Iâd be fine with letting it go. My impression however is that use cases are the most burning issue for the core devs.
The current implementation DOES bind _ - it does NOT change anything about _ at the parser level. I havenât seen _( in any of the pipeline examples so far so I donât see how it would interfere with i18n detecting those. Do you mean using i18n in the pipelines? If so, itâs not any different than currently using _( elsewhere where _ is assigned to.
Out of a context, maybe. But preceded by |> itâs clear where the seemingly missing argument comes from. Realistically, how long such a state of confusion can last in a programmer when they are told what |> means?
Thatâs great to hear. I am of the same opinion and I am sure we can find a middle ground. It would be easier if the PEP could contain alternatives for the core devs to consider rather than settle on choices made under assumptions as to what would be tolerable for the core devs. Keeping the alternatives in Rejected Ideas is not good enough, as it suggests that they were rejected by the community, whereas we would be rejecting the implicit placeholder in large part for the sake of the core devs if I understand correctly?
Thatâs exactly the assumption we are making. Do we KNOW that it would not be advancing because the core devs do not approve of this feature? Or is it really the majority of the community who wants an explicit placeholder? Most (almost all?) languages (R, Julia, F#, Elixir, Ocaml, etc.) use an implicit placeholder. It is very much a defining feature of what is meant by a âpipelineâ these days. It saves a lot of typing and redundancy. An explicit placeholder is something relatively unique and IMHO would have to add tangible value to justify deviating from a proven (and preferable to some) canon. Are there any use cases other than subjective clarity to this approach? Can you do something with an explicit placeholder that you cannot do without one?
BTW, to be clear. You still CAN use an explicit placeholder. 1 |> print() and 1 |> print(_) are doing the same thing in the current implementation. Itâs when the explicit placeholder is MISSING that the injection behavior kicks in. Does that help?
Rejected Ideas is for things rejected by the PEP authors.
Authors have to lead the discussion, and ideally also be receptive to feedback.
âThe communityâ is a nebulous thing. Nobody can know that âthe communityâ wants. We debate from our various perspectives and try to capture that in a document. Most of the user community isnât even in the discussion â we just do our best.
I donât know all of these particular languages, but several of them (e.g., Julia) emphatically do not do any implicit argument injection. The consumers of pipelines are uncalled functions, and they are invoked on pipe output.
Itâs not 1 |> print(), but 1 |> print.
The difference between those two is very significant.
_ is classically the name given to the internationalization function in many projects. So that particular name has a special meaning and preventing its use would be problematic.
I think Iâve said my piece at this point. I feel like itâs not landing, which is disappointing, given that a case which I think is inherently ambiguous is being called âobvious and instantaneously clearâ.
Iâll just leave a few examples of nasty cases here with the note that the point is not âhow clear is it to a subject matter expertâ but instead âhow clear is it to a noviceâ.
x |> (f(), g())
x |> f(g())
x |> f(g)()
x |> f()()
x |> (f := g())()
x |> [f() for f in funcs][0]()
x |> y[f()](g)()
x |> (f() * g(),)
x |> f() if g() else h()
I think that might be all of the help I can offer. I hope you can refine this to a point where it can advance.
I can get behind a predefined PIPE, but this could be onto something. I think having it be a âfake stepâ is not the best option IMO. Since the currently discussed operator has two char, I thought about having the variable name in between (like functions with parenthesees) or adding a third, but I think it could be confused with other operators. (data | PIPE > parse(PIPE) and data |> PIPE > parse(PIPE) the lower than, data | PIPE |> parse(PIPE) the or, data ~ PIPE |> parse(PIPE) the bin not) Maybe the closest could be something like data $ PIPE |> parse(PIPE), which has a bit of precedence in string.Template. This could have the option to ârenameâ or to have multiple temporary variables, if we deem it readable and useful:
I donât think the operator is a bad idea, but definetly is not easy to make right, specially since I agree with you it can be very ambiguous in many of the parts of the theoretical operator.
A PEP ultimately has to propose a single alternative. The purpose of the ideas thread is to engage with as many members of the community (including core devs) and converge on a single proposal that the community can stand behind. But I feel as though in this case, the discussion is more about you defending your vision of what the feature should look like, rather than adapting to feedback. Thatâs fine, if youâre sufficiently sure that your vision is the right solution, but it does increase the risk that the SC wonât agree with you (simply because thereâs already evidence that people have different views, and the SC are no different in that regard).
Not in the slightest. We would be rejecting the implicit placeholder because thatâs the choice that the comminity feels is best for Python. It may not be your preference, but designing a new language feature is always an exercise in compromise.
As a community member, I find @sirosenâs arguments against the implicit placeholder to be compelling - Python does not have partial function application like functional languages do, so the implicit placeholder doesnât fit naturally with other language structures like it does in functional languages. Thatâs a personal view though, not some sort of proclamation of my âcore developer opinionâ - thereâs no core dev gatekeeping going on here, just a community trying to put together a proposal[1].
Not at all. The core devs have no say in this, beyond the fact that a PEP needs one core dev to sponsor it. What matters is the SCâs decision, ultimately, and one of the factors they will take into account is the community consensus around the proposal. So at the moment, the biggest concern for the acceptance of the PEP is that youâre not managing to get community consensus that your proposed design is the right approach (specifically around implicit argument passing, but possibly in other areas as well).
While Iâm commenting, Iâll pick up on this point. First, itâs not about âfor core devsâ (again). Itâs about the fact that having clear use cases is what makes for a good proposal. Youâre not âticking boxesâ on some sort of checklist here, youâre being given advice and trying to follow it. Thereâs lots of things that make for a good proposal. Use cases are one, consensus is another. You need to address them all.
And second, the use cases youâve added to the PEP still seem weak to me. Iâve only skimmed them, so I may be missing something, but they are great examples of the power of the proposal, which is a good start, but they donât seem to include any examples of real world code that would be improved by the proposal. To give an example, use case 6 (picked completely at random):
a, b, c, d, e = (None,) * 5
extract(self) |> (a, b, c, d, e)
a += 1
b += 2
c += 3
d += 4
e += 5
update(self) |> (a, b, c, d, e)
I canât imagine ever bothering to rewrite code that is currently written:
in that form, especially if I had to develop and maintain the extract and update classes that are included in the example. So this use case doesnât give me any intuition of where real-world code would benefit from the addition of this feature. Sure, itâs a neat example of the power of the pipeline operator, but power without applications is wasted.
Thatâs my current view as well. I like the idea of a powerful pipelining operator, but itâs hard to come up with a design that feels natural, addresses real needs, and doesnât interact badly with existing conventions and coding styles. I wish we could make progress here, but as long as the discussion is taking the form of @sadaszewski defending the existing design against concerns from community members, I think weâre at a bit of an impasse
Totally agreed with Paul. In theory, pipelining is useful and fun. Iâm big fan of Linux shell pipes. I use them a lot nearly every day. And I love Pythonâs syntax and design.
But the current proposal for Python pipes doesnât feel viable. And Iâm strongly -1 for it.
Thank you for your kind feedback. I am grateful for the revived interest and the dedicated time.
OK letâs gather the community feedback on the following issues:
Right-hand side
Only a call, e.g. func()
Only a callable, e.g. func
An arbitrary expression, e.g. [ x ** 2 for x in PIPE ]
0voters
Placeholder
Implicit (injected) placeholder when explicit placeholder absent, i.e. func() gets the injection treatment whereas func(_) doesnât
Implicit (injected) placeholder always, i.e. func(_) gets an extra _ anyway and is transformed into func(_, _)
Predefined explicit placeholder, e.g. _ or PIPE mandatory on the right-hand side
Configurable explicit placeholder, e.g. (PIPE := x) |> func(PIPE) |> func2(PIPE) (the named expression becomes mandatory on the left-hand side and defines the variable that will be auto-updated)
0voters
If predefined and/or default placeholder then what symbol would you choose among these two?
_
PIPE
0voters
If configurable placeholder then which way to configure it among these?
(PIPE := x) |> func1(PIPE) |> func2(PIPE)
x |PIPE> func1(PIPE) |> func2(PIPE)
(x as PIPE) |> func1(PIPE) |> func2(PIPE)
x $ PIPE |> func1(PIPE) |> func2(PIPE)
0voters
If the injection behavior is kept, at which position should it happen?
As the first positional argument
As the last positional argument
0voters
Letâs keep the selection narrowed down to these choices as an exercise in finding alignment? Does that make sense? Looking forward to the results - I hope there are many votes.