Dear All,
Thanks for all the feedback.
I believe this discussion has been going in circles for almost a year now because everyone is talking about a different thing while pretending to be talking about the same one. I see at least 4 topics here:
1) “A pipeline statement” (represented by Kurt Bischoff & others)
- good for debuggability
- already possible with the syntax below
_= x
_= f(_)
_= g(_)
_= h(_)
2) “An operator for chaining partials” (represented by dg-pb and others)
- modular
- each stage evaluates to a valid object alone
- already possible with any overloadable operator if the first item is a special
Pipe
object and/or some helpers are used
(Pipe(pd.read_csv("my_file"))
>> X.query("A > B")
>> X.filter(items=["A"])
>> X.to_numpy().flatten().tolist()
>> map@P(lambda x: x + 2)
>> list
>> np.array@P(_, order='K')
>> X.prod()
)
3) “an auto-lambda/auto-partial syntax” (dg-pb and others)
~ + 2
map(~ + 2)
foo~("bar")
etc.
- a related topic at best
- the basic expectation due to the history of programming languages is for a pipeline to deal with calls like in any other language (with arbitrary expressions - better) not with callables (unless the syntactic distinction between the two is removed, see below)
- the best auto-lambda would be to simply make an incomplete call (i.e. missing arguments) not an error but rather an auto-lambda generator
- the proposed pipeline expression could accommodate such a change without a change in syntax, i.e.
[1,2,3] |> map(str)
would remain[1,2,3] |> map(str)
even ifmap(str)
became a partial/lambda; we would just start invoking those callables - still it would be more complicated for arbitrary expressions so potentially we need to drop back to allowing calls on the RHS only (still this would be in line with most/all other languages)
- if anything, this would be an actual trait of functional programming
4) a “pipeline expression” (the OP, IIRC, myself and others)
[1, 2, 3] |> map(str) |> ", ".join()
Btw I use these short examples because they are well… short and easy to write. The community seems to desire real life examples at every step. I think it’s safe to just refer to real-life examples contained in this thread, use “toy” examples for brevity and let imagination do the rest.
- actually pretty inconvenient, error-prone and hard to maintain using the existing Python syntax
- well-defined and established in the broader programming ecosystem providing best practices, usage patterns and education materials
- can be used in lambda expressions
- can be easily inlined anywhere
- not meant for debugging any more than
+
or*
are - why is there no outrage at that? - when debugging - drop back to something debuggable like you do with other expressions
- not any more “syntax sugar” / “convenience” than the matrix multiplication operator - that one was introduced for a single application domain and I have not encountered any good uses of overriding that one; yet it was accepted; this here is much more general!
- can be heavily optimized if RHS is limited to be a call (like in other languages or in the first implementation) OR more advanced code analyses and generation techniques are employed to retain current behavior while removing the use of nameop for the placeholder
- for those demanding some degree of customizability / extensibility - can be easily extended to accommodate a
__pipe__()
magic method on the LHS and pass the RHS lambda to it __pipe__()
magic would allow to use|>
in its native capacity as well as instead of the alternative operator in the partial chaining approach (provided that the chain starts with a special classPipe
or ratherPartialsPipe
?) - delivering a somewhat unified syntax for chaining calls and chaining partials- not any more “functional” than functions themselves, we are talking about conveniently passing an argument here, it’s not going to make Python a “functional programming language”
- allows to avoid the “God” classes (also, a nice alternative to Extension Members)
- with custom
__pipe__()
,L
being one of the dg-pb-style helpers,(L |> _ ** 2)
could double as an auto-lambda / lambda shorthand, i.e. generatelambda _: _ ** 2
- just saying.map(L |> _ ** 2, x)
- actually this made me think that omitting the LHS altogether could potentially also be made to generate the auto-lambda, this aligns nicely with the syntax so far.
|> map(str)
would generatelambda _: map(str, _)
, whereas|> map(str) |> ", ".join()
would generatelambda _: ", ".join(map(str, _))
- like this we could hit two birds (topics 3 and 4) with one stone.map(|> _ ** 2, x)
Going forward I would propose to focus the discussion on the “pipeline expression” since the other topics (1, 2) are already implemented/possible in Python and they do different things, whereas (3) is a massive discussion on its own and completely unnecessary for the “pipeline expression” as it can work equally well with and without it and vice versa! At maximum, topic 3 could be incorporated, as outlined above.
On the other hand if the community keeps insisting that all of the topics (including the pointless [already implemented] ones) should be folded into a single ill-defined concept, then it’s clear to me that we have a situation here of trying to be “too many things to too many people” and I definitely want to drop the effort as the PEP author in this scenario due to lack of bandwidth for such a massive and (IMHO) misguided and doomed endeavor.
If the impasse is due to me not capturing this in the PEP, then I am happy to capture these considerations and how we agreed to pursue the “pipeline expression” in the PEP but I don’t want to participate in any more of the circular arguments.
- I need a pipeline expression because the pipeline statement does not satisfy my use case
- You don’t need a pipeline expression because you have a pipeline statement.
and so on
Do we agree to pursue the “pipeline expression” PEP?