Introduce funnel operator i,e '|>' to allow for generator pipelines

Dear All,

Thanks for all the feedback.

I believe this discussion has been going in circles for almost a year now because everyone is talking about a different thing while pretending to be talking about the same one. I see at least 4 topics here:

1) “A pipeline statement” (represented by Kurt Bischoff & others)

  • good for debuggability
  • already possible with the syntax below
_= x
_= f(_)
_= g(_)
_= h(_)

2) “An operator for chaining partials” (represented by dg-pb and others)

  • modular
  • each stage evaluates to a valid object alone
  • already possible with any overloadable operator if the first item is a special Pipe object and/or some helpers are used
(Pipe(pd.read_csv("my_file"))
    >> X.query("A > B")
    >> X.filter(items=["A"])
    >> X.to_numpy().flatten().tolist()
    >> map@P(lambda x: x + 2)
    >> list
    >> np.array@P(_, order='K')
    >> X.prod()
)

3) “an auto-lambda/auto-partial syntax” (dg-pb and others)

~ + 2
map(~ + 2)
foo~("bar")

etc.

  • a related topic at best
  • the basic expectation due to the history of programming languages is for a pipeline to deal with calls like in any other language (with arbitrary expressions - better) not with callables (unless the syntactic distinction between the two is removed, see below)
  • the best auto-lambda would be to simply make an incomplete call (i.e. missing arguments) not an error but rather an auto-lambda generator
  • the proposed pipeline expression could accommodate such a change without a change in syntax, i.e. [1,2,3] |> map(str) would remain [1,2,3] |> map(str) even if map(str) became a partial/lambda; we would just start invoking those callables
  • still it would be more complicated for arbitrary expressions so potentially we need to drop back to allowing calls on the RHS only (still this would be in line with most/all other languages)
  • if anything, this would be an actual trait of functional programming

4) a “pipeline expression” (the OP, IIRC, myself and others)

[1, 2, 3] |> map(str) |> ", ".join()

Btw I use these short examples because they are well… short and easy to write. The community seems to desire real life examples at every step. I think it’s safe to just refer to real-life examples contained in this thread, use “toy” examples for brevity and let imagination do the rest.

  • actually pretty inconvenient, error-prone and hard to maintain using the existing Python syntax
  • well-defined and established in the broader programming ecosystem providing best practices, usage patterns and education materials
  • can be used in lambda expressions
  • can be easily inlined anywhere
  • not meant for debugging any more than + or * are - why is there no outrage at that?
  • when debugging - drop back to something debuggable like you do with other expressions
  • not any more “syntax sugar” / “convenience” than the matrix multiplication operator - that one was introduced for a single application domain and I have not encountered any good uses of overriding that one; yet it was accepted; this here is much more general!
  • can be heavily optimized if RHS is limited to be a call (like in other languages or in the first implementation) OR more advanced code analyses and generation techniques are employed to retain current behavior while removing the use of nameop for the placeholder
  • for those demanding some degree of customizability / extensibility - can be easily extended to accommodate a __pipe__() magic method on the LHS and pass the RHS lambda to it
  • __pipe__() magic would allow to use |> in its native capacity as well as instead of the alternative operator in the partial chaining approach (provided that the chain starts with a special class Pipe or rather PartialsPipe?) - delivering a somewhat unified syntax for chaining calls and chaining partials
  • not any more “functional” than functions themselves, we are talking about conveniently passing an argument here, it’s not going to make Python a “functional programming language”
  • allows to avoid the “God” classes (also, a nice alternative to Extension Members)
  • with custom __pipe__(), L being one of the dg-pb-style helpers, (L |> _ ** 2) could double as an auto-lambda / lambda shorthand, i.e. generate lambda _: _ ** 2 - just saying. map(L |> _ ** 2, x)
  • actually this made me think that omitting the LHS altogether could potentially also be made to generate the auto-lambda, this aligns nicely with the syntax so far. |> map(str) would generate lambda _: map(str, _), whereas |> map(str) |> ", ".join() would generate lambda _: ", ".join(map(str, _)) - like this we could hit two birds (topics 3 and 4) with one stone. map(|> _ ** 2, x)

Going forward I would propose to focus the discussion on the “pipeline expression” since the other topics (1, 2) are already implemented/possible in Python and they do different things, whereas (3) is a massive discussion on its own and completely unnecessary for the “pipeline expression” as it can work equally well with and without it and vice versa! At maximum, topic 3 could be incorporated, as outlined above.

On the other hand if the community keeps insisting that all of the topics (including the pointless [already implemented] ones) should be folded into a single ill-defined concept, then it’s clear to me that we have a situation here of trying to be “too many things to too many people” and I definitely want to drop the effort as the PEP author in this scenario due to lack of bandwidth for such a massive and (IMHO) misguided and doomed endeavor.

If the impasse is due to me not capturing this in the PEP, then I am happy to capture these considerations and how we agreed to pursue the “pipeline expression” in the PEP but I don’t want to participate in any more of the circular arguments.

- I need a pipeline expression because the pipeline statement does not satisfy my use case
- You don’t need a pipeline expression because you have a pipeline statement.
and so on

Do we agree to pursue the “pipeline expression” PEP?

Here is what this bit could look like with the recent auto-lambda idea.

class CubeSet(NamedTuple):
    red: int = 0
    green: int = 0
    blue: int = 0

    @classmethod
    def from_str(cls, segment: str):
        return (
            segment.strip().split(", ")
            |> map(|> str.split(maxsplit=1))
            |> map(|> ((color := _[0]).strip(), int(amount := _[1]))
            |> dict()
            |> cls(**_)
        )

For me, not unless you can explain why a pipeline statement (to use your term) doesn’t satisfy your use cases - and what those use cases are.

As far as I can see, the pipeline expression in the PEP is little more than syntax sugar for the statement form. I’ve seen no compelling use cases where having an expression rather than a statement is important.

1 Like

Easy, many of the points in 4) can be translated into use cases:

  • Pass short inline pipeline expressions to a function without intermediate variables / breaking up into multiple statements
  • Use short “pipelines” in lambda expressions
  • Inline short pipelines in generator expressions and other constructs
  • Have pipeline expression run as fast as the corresponding nested calls where applicable
  • Have pipeline expression construct lambdas and pass them to the __pipe__() of the LHS to allow innovative use
  • Expression-level chaining without “God” classes
  • Auto-lambda syntax for partials and pipelines

Unless one just rejects use cases in bulk, there is more than one reason to have this expression.

I have never seen any real life code that would benefit from these features. You can’t just translate benefits into use cases like this, you have to find actual code that needs the proposed syntax and is significantly improved by the pipeline syntax you propose. I think you’re missing the difference between “potential benefits” and “use cases”.

I’m definitely not rejecting use cases in bulk, because I haven’t seen use cases in bulk yet. I’ve seen a couple of cases where pipelining would be useful, but they have all been ones where a sequence of statements (which you term a “pipeline statement”) would work just as well.

Show us real-world code that would benefit from the proposed feature. You don’t need “cases in bulk” - two or three significant[1] examples would be enough. Persuade us that simply using the “pipeline statement” form doesn’t work just as well. Then you might have a proposal worth discussing. But remember - “it looks nicer” is very subjective, and is considered a weak argument for making changes to the language. So the benefits you identify need to be more than that.

The impasse is because you’ve stripped the proposal down to a point where people can’t see enough benefits (that matter to them) to it. People are suggesting extra features in an attempt to make the proposal useful to them - you’re not willing to extend the propsal, but people don’t find it useful in its current (limited) form. That’s the impasse here.

The work and energy you’ve put into this proposal is appreciated, and I’d like to see it result in something useful. But unless you understand the feedback you’re seeing (and adapt the proposal based on it) I’m not sure how that will happen.


  1. not made up ↩︎

7 Likes

Find at least one example in real code – in a library or in the stdlib – and then do the following:

  • provide a link to the source for reference
  • copy the example
  • rewrite the example using the proposed new feature
  • explain the benefit(s) gained

I’m still quite suspicious of this without a soft keyword to act as the name of the pipe output in intermediate expressions. I see things like map(str) and there’s an implicit expression rewrite being done there. I’d rather see map(str, PIPE) and know the shape of the expression which will actually run.
But with these toy examples, I’d really rather see a generator expression than any of this.

4 Likes

I agree very strongly with this. The proposal allows _ to be used like this - map(str, _). It’s just that “the last positional parameter” is the default place for _. But that’s the problem, I find that omitting the _ is less clear, and if you require the _ to be explicit, there’s very little benefit over

_ = data
_ = map(str, _)
...

Or if you really want an expression form:

(
    _ := data,
    _ := map(str, _),
    ...
)[-1]
3 Likes

My bad, I didn’t remember it.
But this could be updated if the PEP goes forward at any point, since it may not be the most realistic of use cases.

In this example, not much.

If we are using the operator, we know that the results are temporary and the libraries can adjust to ensure that the operations are automatically inplace.

For using libraries not written with method chaining in mind, it allows the same functionallity with similar syntax.

However, even in that case it is debatable if it is better enough than the alternatives, though I liked

which I interpret as

result = [
  a |> fn1 |> fn2 |> fn3
  for a in my_list
]

rather than

result = [
  fn3(fn2(fn1(a)))
  for a in my_list
]

but then again, this could be done with a function, so I am not saying that I can justify its addition to Python.

Thank you for such a thorough review. This has given some food for thought and nicely put things into places.

(1) “A pipeline statement”

I completely agree that this covers “pipeline statement” and there is no real need for anything else. Despite maybe slightly hacky syntax, the expressiveness that this allows (which is pretty much full python syntax :)) can hardly be beaten by any new invention. (I can’t believe this came up only so far down this thread)


(2) “An operator for chaining partials”

For this I think `functools.pipe` - Function Composition Utility would provide sufficient cover. It is a simple solution with minimal addition which would allow both - “performant function composition” for reusable pipelines and “eager pipeline evaluation” (user needing to implement operators by himself). And maybe operators could be added down the line later if unilateral consensus was reached on which ones are most appropriate.

(3) “an auto-lambda/auto-partial syntax”

Agreed, this is beyond the scope of this thread.

(4) a “pipeline expression”

I agree that we (well at least myself) can narrow scope of this thread to this as the rest is ruled out.

I agree with others here. Given (1) is covered, (2) has its own potential simple track and (3) is out out of scope, then what is left?

  1. So what is the actual proposal here? Or at least could you put the concept that you are currently at (or the state that you are at) into some code? I had a bit of trouble understanding all of the points. E.g. How would map(L |> _ ** 2, x) and map(|> _ ** 2, x) exactly work? What does __pipe__ do? How does L look? and how does |> without LHS work and how would its implementation look?
  2. What are the use cases for this?
  3. Which of those cases can not be sufficiently satisfied by (1), (2) and (3)?

Let’s review this toy example:

from functools import reduce

DISCOUNT_RATE = 0.10


def is_eligible(user):
    return user["age"] > 30


def apply_discount(user):
    # Create a new user dict to maintain immutability
    return {**user, "amount": user["amount"] * (1 - DISCOUNT_RATE)}


def get_amount(user):
    return user["amount"]


def functional(users):
    eligible_users = filter(is_eligible, users)
    discounted_users = map(apply_discount, eligible_users)
    amounts = map(get_amount, discounted_users)
    return reduce(lambda acc, x: acc + x, amounts, 0)


def imperative(users):
    total = 0
    for user in users:
        if user["age"] > 30:
            discounted_amount = user["amount"] * (1 - DISCOUNT_RATE)
            total += discounted_amount
    return total


if __name__ == "__main__":
    purchases = [
        {"name": "Alice", "age": 25, "amount": 100},
        {"name": "Bob", "age": 35, "amount": 200},
        {"name": "Charlie", "age": 40, "amount": 300},
    ]

    total = functional(purchases)
    print(f"Total discounted spending for users over 30: ${total:.2f}")

    total = imperative(purchases)
    print(f"Total discounted spending for users over 30: ${total:.2f}")

The functional paradigm clearly comes with its own caveats, such as the reliance on helper functions, much like the object-oriented paradigm depends on helper methods.

On the other hand, imperative code tends to read much closer to natural language. I honestly wouldn’t even know how to read the functional version aloud to someone else.

I’d recommend against doing this:

def functional(users):
    return reduce(
        lambda acc, x: acc + x,
        map(get_amount,
            map(apply_discount,
                filter(is_eligible, users)
                )
            ),
        0
    )

As far as I can see, the pipeline expression pre-PEP doesn’t provide any way for libraries to adjust the way you suggest - all of the proposal’s work is done at AST building time, so that by the time the library’s code runs, there’s no indication remaining that this is part of a |> expression.

100% agreed. But isn’t it true that all this demonstrates is that the example you constructed is not suitable for the proposed syntax?

It’s pretty easy to find examples of things where the new syntax doesn’t work. What’s proving far more difficult is to find examples where it does work. Someone who wants the proposal to succeed needs to find a few such examples, otherwise we’ll simply end up reaching the conclusion that there’s no evidence that pipeline expressions are useful in the real world.

My goal is to have at least a simple example when a real one isn’t available. I also pointed out that functional programming comes with helper functions.

Now the question is, how would the proposed syntax improve readability?

def functional(users):
    eligible_users = filter(is_eligible, users)
    discounted_users = map(apply_discount, eligible_users)
    amounts = map(get_amount, discounted_users)
    return reduce(lambda acc, x: acc + x, amounts, 0)

What could be more readable than this?

My mistake. I thought you were demonstrating that for this example, procedural code is superior (it certainly looks like it is to me).

But if you do think the functional version is better, then I agree, it seems fine to me. I could argue that naming all of the intermediates is a bit much. It depends on how meaningful the function names are, for example. But you can easily use anonymous names if you want something terser but still functional:

def functional(users):
    _ = filter(is_eligible, users)
    _ = map(apply_discount, _)
    _ = map(get_amount, _)
    return reduce(lambda acc, x: acc + x, _, 0) # or just return sum(_)

So either way, I don’t see the |> syntax helping much here.

Yes, it is in this particular case. I included it, along with a review, to show that there’s no real problem to solve. You can use imperative style for simple algorithms.

That said, imperative code can get lengthy and annoying, which might push you toward functional or object-oriented approaches. Python supports both. The functional version, in particular, looks quite readable using Python’s existing syntax.

Show us real-world code that would benefit from the proposed feature. You don’t need “cases in bulk” - two or three significant examples would be enough. Persuade us that simply using the “pipeline statement” form doesn’t work just as well.

I think that a lot of real life code that would benefit from this type of pattern already implements a method on a class to allow a chainable fluent api.

The pipe method on a pandas dataframe.

From the docs:

Instead of writing

subtract_national_insurance(
    subtract_state_tax(subtract_federal_tax(df), rate=0.12),
    rate=0.05,
    rate_increase=0.02)

You can write

(
    df.pipe(subtract_federal_tax)
    .pipe(subtract_state_tax, rate=0.12)
    .pipe(subtract_national_insurance, rate=0.05, rate_increase=0.02)
)

I use method chaining with querysets in django

From the docs:

The result of refining a QuerySet is itself a QuerySet, so it’s possible to chain refinements together. For example:

>>> Entry.objects.filter(headline__startswith="What").exclude(
...     pub_date__gte=datetime.date.today()
... ).filter(pub_date__gte=datetime.date(2005, 1, 30))

A good blog post about the pattern is here

I’d say both pandas and django are pretty popular python libraries. Chaining things can come with some down sides, but I don’t think it has to be a zero some game. Sometimes I make intermediary variables with names, other times it’s one big chain.

Generally I look at a pipe operator as another way to implement a chain functions, without needing to implement method chaining with a custom class for a fluent api.

    df  = (
        subtract_federal_tax(df)
        |> subtract_state_tax(rate=0.12)
        |> subtract_national_insurance(rate=0.05, rate_increase=0.02)
    )

If introducing some magic here such as |> implicitly passing an argument seems jarring, what about the magic of a decorator

From the docs:

def f(arg):
    ...
f = staticmethod(f)

@staticmethod
def f(arg):
    ...
1 Like

Is this not already simpler as a comprehension (with or without helper functions)?

def comprehension(users):
    return sum(
        user["amount"] * (1 - DISCOUNT_RATE)
        for user in users
        if user["age"] > 30)
2 Likes

Maybe some real examples can be found in e.g. the pytorch examples repository:

Example 1: MNIST forward

Source

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output

Similar to the _= f(_) trick, this turns out surprisingly difficult to improve.

Example 2: GCN forward

Source

def forward(self, input_tensor, adj_mat):
        # Perform the first graph convolutional layer
        x = self.gc1(input_tensor, adj_mat)
        x = F.relu(x) # Apply ReLU activation function
        x = self.dropout(x) # Apply dropout regularization

        # Perform the second graph convolutional layer
        x = self.gc2(x, adj_mat)

        # Apply log-softmax activation function for classification
        return F.log_softmax(x, dim=1)

The same pattern again.

Example 3: GCN features

Source

    # Process features
    features = torch.FloatTensor(content_tensor[:, 1:-1].astype(np.int32)) # Extract feature values
    scale_vector = torch.sum(features, dim=1) # Compute sum of features for each node
    scale_vector = 1 / scale_vector # Compute reciprocal of the sums
    scale_vector[scale_vector == float('inf')] = 0 # Handle division by zero cases
    scale_vector = torch.diag(scale_vector).to_sparse() # Convert the scale vector to a sparse diagonal matrix
    features = scale_vector @ features # Scale the features using the scale vector

Similar patterns also occur often with numpy. An improvement for this kind of code would be nice, but it’s not obvious any of the proposals so far would really help.

2 Likes

It is, but my point was to show a typical functional programming structure, similar to how you’d structure a class in object-oriented programming.

One important detail in the PyTorch examples is the presence of comments. While they may have skipped them in the first example for brevity, the other examples are heavily commented.

Maybe this was mentioned before: |> exists in various languages but there are some differences, for example:

Language Operator Implicit argument Explicit argument placeholder Comments
F# |> last no
Elixir |> first no
OCaml |> first no
PHP |> first no
Julia |> first no .|> vectorized version
Haskell & first no
Clojure -> first no ->> for last implicit argument
R %>% first yes: . provided by packages like magrittr
C# Multiple unaccepted proposals, similar discussions as here
4 Likes