Introduce funnel operator i,e '|>' to allow for generator pipelines

As a reader of this discussion I just want to suggest to all parties to consider that the person you are arguing with most likely does not have bad intentions, and that most frustrations come from misunderstandings. Arguing things out to show you are right never leads to a productive outcome.

3 Likes

Because it’s a minor UX detail when there are major issues like:

  • We still haven’t seen more than that one deep learning example where this gives the best (in our eyes) possible way to write something
  • Where did all this scope creep come from?

Paul isn’t on some vendetta to sabotage your polls. He’s just trying to get you to focus on the real issues preventing the rest of us from thinking that this proposal is worthwhile.

Please focus on those and drop this meta-discussion and accusations of childishness or bad intent. It won’t take much more of either before this thread gets locked.

3 Likes

I wouldn’t be making all these efforts to have them ended in a *storm. If I had bad intentions, I would have gone for the *storm right away. Tbh I have zero ill feelings towards anyone here. I have a good stomach for way more heated discussions but 1. I would need to keep them symmetrical to my content and 2. indeed, I don’t know how the moderators would take it. I am happy to drop the meta discussion and focus on the core issues, which all things considered, I dare guess might be that I didn’t interrogate the skeptical voices nearly enough. Let’s do this then, please. If there is more feedback other than missing/weak use cases or if we can align on some “axioms” as defined above, please let me hear it. In the meantime, I will keep up my work on the use cases and interrogate skeptics more directly only once you had the chance to look at the new batch of real life examples. Hope that makes sense.

1 Like

Before continuing, please read this: Guidelines - Discussions on Python.org

If |> (relu := nn.ReLU()) is considered more readable than self.relu = nn.ReLU(), then I don’t think we need to see more examples. I understand your point of view, but at that point, it wouldn’t really be Python anymore.

I did notice that you overrode the pipe operator in almost all of your examples, but I found them difficult to read. What’s the point of the pipe operator if it isn’t useful without being overridden?

In this example, you’re just creating variables in the local scope, and I have no idea what you’re actually piping:
|> (relu := nn.ReLU()) |> (sigmoid := nn.Sigmoid()) |> (dropout := nn.Dropout(dropout)))

@bwoodsend Is the example above the one you were referring to?

2 Likes

Yes, although I realize now that I phrased that pretty misleadingly. I meant that the deep learning example was the only real world example and that any examples given mustn’t be rewritable better without the operator. I wasn’t trying to comment specifically on whether I think that deep learning is better with the operator (although if you are asking, torch looks like a bunch of made up words to me so I can’t really follow either the before or after codes enough to decide which is most clear).

I’ll probably be stepping back from this thread after this comment, unless there’s a change in the mode of discussion or my lack of impulse control gets the better of me. It’s taking a lot of effort to communicate here and the possibility of a rich and enjoyable technical discussion is fading.


First, to an issue with the discussion, not the proposal.

@sadaszewski, you have been making comments that are unhelpful but not problematic as regards the CoC. When you say “it’s obvious”, “it’s clear that”, “everyone agrees”, etc, you are hurting the discussion. The quip about punchcards being turing complete falls into this category as well. It may not seem this way to you, but these comments can make it look like you don’t respect dissenting voices.
Even if you have a great deal of respect for other opinions, please try to be aware of the fact that such responses can read as subtly hostile.

By contrast, ad hominem attacks – calling your interlocutors childish and unprofessional – are a CoC violation. We don’t do that here.

Okay. That’s out of the way.


Regarding the proposal, I’d like to see less energy poured into tweaks, and more consideration of what a bare-bones, stripped down version would be. Perhaps that’s an analysis of such a proposal would conclude that it needs some of the additional features in the current draft, but I’d like to see the analysis – an analysis which should be carried out with real use cases in hand – which proves that such features are needed.

In other languages, e.g. Julia, the output of a pipe operator is passed to a callable on the right-hand side. Taking such an approach would radically simplify the proposed language addition because it would eliminate any idea of special binding or arg injection.

I will share what I think is an interesting version of pipes for Python, oriented solely around iterables, below. I’m not asking anyone to pursue it or even look too closely at the details. It is meant as a demonstration of a proposal for pipes, and is not something I intend to pursue. However, if it can prompt good broader discussion of “what should pipes do and how should they work?”, that would be worth our time.

A Simplified Pipe Proposal: Iterable Pipelines

This document proposes a new binary operator, |>, which should be read aloud as “pipe”. Python pipes allow users to pass iterables across various stages, optionally culminating in some final “reduce” or “fold” step.

Pipes are inspired by Unix pipes (|), R pipes (%>%), Julia pipes (|>), and many other paradigms. The strongest inspiration is Unix-like processing of line-oriented data – the nearest analogy for which in Python’s data model is an iterable.

Additionally, map and filter are extended to make them more usable within pipelines. When map or filter are given a single argument, they produce new callables as outputs, which take iterables as their inputs and produce iterables as their outputs.

Definition

Pipes are a special calling syntax. In short, x |> f is an alternative form for f(iter(x)), with the following caveats:

  • If x defines __pipe_iter__, then f(x.__pipe_iter__()) will be used instead
  • If x is an asynchronous iterable or iterator and does not provide __pipe_iter__, then f(aiter(x)) is used instead

The map and filter builtins are updated as follows. When given a single argument, these act as partial application of map or filter.
Therefore, the added valid signatures are

def map(function, /) -> Callable[[Iterable], Iterable]: ...
def filter(function, /) -> Callable[[Iterable], Iterable]: ...

__pipe_iter__, when defined, must return an iterable.

Callables used on the right-hand-side of a pipe should always expect an iterable input.

Example Usage

Using the new tools, we can easily perform simple tasks like “count the multiples of 3 which are not divisible by 2 in a range”:

from itertools import pipemap, pipefilter

count: int = (
    range(100)
    |> filter(lambda x: x % 3 == 0) |> map(lambda x: (x + 1) % 2)
    |> sum
)

Because many existing Python objects define iteration, like open file objects, we can leverage this to create data pipelines like classical Unix tools.
For example, here is a simplified pipeline which takes log output and finds relevant log lines for some criteria.

from datetime import datetime
from typing import Iterable

_WINDOW_START = datetime.fromisoformat('2012-11-04T00:05:23+00:00')
_WINDOW_END = datetime.fromisoformat('2012-11-05T00:00:00+00:00')

# we assume some well-defined parsed type
def parse_log(log_line: str) -> ParsedLog: ...

def not_localhost(item: ParsedLog) -> bool:
    return not item.remote_addr.startswith(("localhost", "127.0.0.1", "::1"))

def in_target_window(item: ParsedLog) -> bool:
    return _WINDOW_START < item.date_time_utc < _WINDOW_END

# define a series of pipeline filters for the data
def filter_logs(pipe_data: Iterable[ParsedLog]) -> Iterable[ParsedLog]:
    return pipe_data |> filter(not_localhost) |> filter(in_target_window)

with open("access.log", "r") as logfile:
    data: list[ParsedLog] = map(parse_log, logfile) |> filter_logs |> list

# analysis of `data` follows below

Note that in both of these cases, the expressions can be written without the pipe syntax and builtin updates today. For comparison purposes, here is the pipeline usage from the log example without pipes, and relying on comprehensions instead:

with open("access.log", "r") as logfile:
    data: list[ParsedLog] = [
        item for item in
        map(parse_log, logfile)
        if not_localhost(item) and in_target_window(item)
    ]

Again, my primary purpose in outlining this alternative proposal is to show a very different notion of what a pipe could mean in the language.
(I actually think it’s a pretty good way for pipes to work, but I don’t have the kind of time or energy needed to figure out if it’s actually worthwhile to add.)

And there are variations on it, like making the pipe operator itself act as map (and then you need to work out clever ways of reconstructing filter, but it’s possible), or adding new functions (e.g. variants of functools.reduce).

8 Likes

I believe you meant to use pipemap and pipefilter:

from itertools import pipemap, pipefilter

count: int = (
    range(100)
    |> pipefilter(lambda x: x % 3 == 0) |> pipemap(lambda x: (x + 1) % 2)
    |> sum
)

The entire proposal is complete and well-focused.

I don’t see why we should limit pipelines to work for only iterables when the idea applies equally well to regular functions. I also think requiring an additional method such as __pipe__ or __pipe_iter__ for a callable to support a pipeline is way too much work and unnecessarily excludes regular functions from working with a pipline.

I also find proposals that require a placeholder in every call in the pipeline excessively noisy to read. A placeholder isn’t always needed because in many real-world cases the last argument is where the piped object is going to be placed anyway so why not make it the default behavior?

I think with just 2 rules we can make the pipeline syntax both intuitively readable and generally applicable:

  1. If there is any placeholder ? on the RHS of the pipe operator |>, the RHS is simply evaluated as an expression with ? replaced with the piped object from the LHS.
  2. If there is no placeholder on the RHS, the RHS must be a call, and the call will be made with the piped object from the LHS inserted as the last positional argument.

The tasks of searching for placeholders, replacing placeholders with piped objects, and inserting piped objects into calls are all performed by the compiler so there will be no performance penalty at runtime.

The new placeholder token ? is chosen because of the feedback that _ conflicts with i18n usage and because ? is both currently unused and already widely understood as a placeholder (particularly in SQL). It is interpreted by the compiler and does not occupy a namespace.

So as a toy example for easier illustration, the following pipeline:

result = (
    'ABCDEFGH'
    |> ?.lower()
    |> itertools.batched(n=2)
    |> map(''.join)
    |> zip(?, ?)
    |> dict()
    |> ? | {'default': ''}
)
# result = {'ab': 'cd', 'ef': 'gh', 'default': ''}

would be transformed into, without the intermediate assignments:

result = 'ABCDEFGH'
result = result.lower()
result = itertools.batched(result, n=2)
result = map(''.join, result)
result = zip(result, result)
result = dict(result)
result = result | {'default': ''}
# result = {'ab': 'cd', 'ef': 'gh', 'default': ''}

EDIT: On second thought, inventing a new placeholder token ? just to accommodate the rather rare use cases of _ for an existing purpose seems a bit overkill. Instead, we can allow the LHS to specify the name of the placeholder for the RHS with an optional as clause like this:

result = (
    'ABCDEFGH'
    |> _.lower()
    |> itertools.batched(n=2)
    |> map(''.join) as pairs
    |> zip(pairs, pairs)
    |> dict()
    |> _ | {'default': ''}
)
# result = {'ab': 'cd', 'ef': 'gh', 'default': ''}

With this syntax, both the default placeholder _ and the specified placeholder pairs will be actual variable names in the namespace so there’s the benefit of possible reuse in later expressions or statements. The compiler will still search for the presence of the placeholder name in the RHS to determine which of the two rules applies.

And there’s also the benefit of possibly allowing a nested pipeline expression by specifying a different placeholder name for the outer piped object.

2 Likes

Thanks for the proposal, @sirosen !

This code contains an error (it counts numbers, divisible by 2), or I’ve missed something?

Nope, you’re quite right, it’s an error! I was scribbling out a small numerical example and thought “mod piped to sum” was a nice demo, but didn’t pay enough attention to how I wrote it.

Likewise, there’s another mistake, noted above, which reveals my waffling on updates to map and filter.


I don’t intend to update the post with corrections but there may be other minor errors.

Primarily I came back in thread to reply to this.

The question a pipe operator must answer is
“Why is x |> f better than f(x)?”

The iterable-centric framing says “pipes consume iterables, by definition”, and therefore x |> f tells the reader that x is iterable and f takes a single input, also an iterable.

The idea only applies equally well to arbitrary inputs and outputs if you define it to do that. Being specialized is one way to answer the question of “why would I use this alternative function call syntax?”

4 Likes

Okay, it makes sense.

Basically, because x |> f |> g |> h looks slightly better than h(g(f(x))).
A lot of parenthesis, with all 3 functions being untrivial, is harder to read.
But, on the other side, we could rewrite this with 3 lines and some local variable or variables.
And that is a good question, why pipelining is better than that approach.

I can see some advantages, though:

  1. If it becomes a standard syntax and would be widely used, then coding w/ this style will become familiar and homogeneous.
  2. Linters can check something useful in those constructions, but it’s hard to determine “pipelining” just with using local variable.
  3. It already present in many languages and it’s naturally for Python to introduce some best practices which were inspired by those.

Of course, it’s just my opinion.

1 Like

So I think as the original poster this is the proposal we should go for

I agree the complicated placeholder syntax is a distraction.

Here’s why:

  1. In most cases I think setting a convention of it adding it to the last argument would be fine. map and filter which are built in would work with this and I think whilst it’s not technically explicit I believe there are many examples of non-explicit features like generators which you have no idea are being used without the developers following a convention that highlights their use.
  2. Most of the examples of the special placeholder syntax feels like to me a way of working around this convention not existing previously and thus the iterable not being in the last place

The special pipe function would solve this.

I could imagine for example offering a decorator let’s call it @pipeable.

It would work exactly like @property, we can decorate the non pipeable function and then specify a function which handles the function call as part of the pipeline

All it does is define that pipe Dundee

Old code works as normal, fund that already accept iterable as last argument can be used out of box without boilerplate, for the cases where that isn’t true you can use this decorator to upgrade them

@pipeable
def legacy_func(
):
   # old code here

@legacy_func.pipe
def pipable_legacy_func(
.)
   # new code here 

As to why x |> f is better than f(x)

One such cases is there are many libraries that provide pipeline functions for example open CV

In many cases you have to manually configure if the library allocates new memory or applies the transformation in place

There currently no way to do this in Python without forcing application user memorize this (in CV you pass in a dst keyword argument with variable containing numpy array for in place) or wrapping in code that handles this (in open CV this is usually done by wrapping into a class since the functions in that library are just the C functions

In theory with this new pipe if we somehow could pass along enough information that the iterable it’s going to get is from another Open CV function or a numpy array

It could automatically apply this operation in place

Library user doesn’t need to do this, and open CV doesn’t need to introduce some special python class wrapper to keep track of when it should do it or not

You can see this with the sort function in Python

list.sorr() # sorts in place and returns none, if this isn’t what user wants because they wanted to store a copy of sorted list then they have to hunt for other way to do it 
messages = list.sort |> map(lambda I: I * 2) # sort knows its part of a pipeline and therefore it can sort in place and then return the list or whatever makes sense 

All whilst keeping the same function interface from its c library

I see you repeat this argument in almost every post, but isn’t piping simply another way to call a function? If so, why can’t it just be done using regular function calls?

In the example above, sort or any function has no way of knowing who called it, let alone whether it’s part of a one-liner using pipes. We can inspect who called a function, but that’s not what functions are meant to be used for.

map(lambda x: (x + 1) % 2)  # -> TypeError: map() must have at least two arguments.

IMPORTANT FACT : There is no semantically consistent way to obviate to the explicit placeholder syntax, because the interpreter requires to infer the precendency between calls and pipe operators.

This argument has been (partially) given previously and is based on rational reality. You cannot continue providing examples with implicit placeholder without demonstrating consistency of their usage first (but I guess you will require a set of exceptions rules to do so, which will overcomplicate the discussion). Otherwise there is an inifinite loop within this discussion.

1 Like

Could you please demonstrate what you mean about the precedence problem? The current implementation supports an implicit placeholder and it works flawlessly in all my tests so far.

In case you didn’t receive a notification, could you please explain why you’re creating local variables here and what the pipe operator is supposed to do in this context?

I reply late due to difficulty finding time to do so. Thank you for your understanding.

Indeed nothing is being piped here. It’s just a demonstration of the power of overriding __pipe__ in order to set the variables locally and as attributes in self at the same time using the tee() on the LHS. Does that help to clarify?

So it looks like a pipe, but it’s not actually a pipe. Thanks for your reply!