I don’t see why we should limit pipelines to work for only iterables when the idea applies equally well to regular functions. I also think requiring an additional method such as __pipe__
or __pipe_iter__
for a callable to support a pipeline is way too much work and unnecessarily excludes regular functions from working with a pipline.
I also find proposals that require a placeholder in every call in the pipeline excessively noisy to read. A placeholder isn’t always needed because in many real-world cases the last argument is where the piped object is going to be placed anyway so why not make it the default behavior?
I think with just 2 rules we can make the pipeline syntax both intuitively readable and generally applicable:
- If there is any placeholder
?
on the RHS of the pipe operator|>
, the RHS is simply evaluated as an expression with?
replaced with the piped object from the LHS. - If there is no placeholder on the RHS, the RHS must be a call, and the call will be made with the piped object from the LHS inserted as the last positional argument.
The tasks of searching for placeholders, replacing placeholders with piped objects, and inserting piped objects into calls are all performed by the compiler so there will be no performance penalty at runtime.
The new placeholder token ?
is chosen because of the feedback that _
conflicts with i18n usage and because ?
is both currently unused and already widely understood as a placeholder (particularly in SQL). It is interpreted by the compiler and does not occupy a namespace.
So as a toy example for easier illustration, the following pipeline:
result = (
'ABCDEFGH'
|> ?.lower()
|> itertools.batched(n=2)
|> map(''.join)
|> zip(?, ?)
|> dict()
|> ? | {'default': ''}
)
# result = {'ab': 'cd', 'ef': 'gh', 'default': ''}
would be transformed into, without the intermediate assignments:
result = 'ABCDEFGH'
result = result.lower()
result = itertools.batched(result, n=2)
result = map(''.join, result)
result = zip(result, result)
result = dict(result)
result = result | {'default': ''}
# result = {'ab': 'cd', 'ef': 'gh', 'default': ''}
EDIT: On second thought, inventing a new placeholder token ?
just to accommodate the rather rare use cases of _
for an existing purpose seems a bit overkill. Instead, we can allow the LHS to specify the name of the placeholder for the RHS with an optional as
clause like this:
result = (
'ABCDEFGH'
|> _.lower()
|> itertools.batched(n=2)
|> map(''.join) as pairs
|> zip(pairs, pairs)
|> dict()
|> _ | {'default': ''}
)
# result = {'ab': 'cd', 'ef': 'gh', 'default': ''}
With this syntax, both the default placeholder _
and the specified placeholder pairs
will be actual variable names in the namespace so there’s the benefit of possible reuse in later expressions or statements. The compiler will still search for the presence of the placeholder name in the RHS to determine which of the two rules applies.
And there’s also the benefit of possibly allowing a nested pipeline expression by specifying a different placeholder name for the outer piped object.