Introduce funnel operator i,e '|>' to allow for generator pipelines

There is also Codon, which comes with pre-implemented pipelines, including support for parallel pipelines.

Examples from the Codon documentation:

def add1(x):
    return x + 1

2 |> add1  # 3; equivalent to add1(2)

def calc(x, y):
    return x + y**2
2 |> calc(3)       # 11; equivalent to calc(2, 3)
2 |> calc(..., 3)  # 11; equivalent to calc(2, 3)
2 |> calc(3, ...)  # 7; equivalent to calc(3, 2)

def gen(i):
    for i in range(i):
        yield i

5 |> gen |> print # prints 0 1 2 3 4 separated by newline
range(1, 4) |> iter |> gen |> print(end=' ')  # prints 0 0 1 0 1 2 without newline
[1, 2, 3] |> print   # prints [1, 2, 3]
range(100000000) |> print  # prints range(0, 100000000)
range(100000000) |> iter |> print  # not only prints all those numbers, but it uses almost no memory at all
1 Like

Yes, I think I mixed things up, I was thinking the inplace=True would be set in the __pipe__ operator, but now that I think about it, it doesn’t make much sense since we’re just replacing the placeholder and running each step sequentially

1 Like

This:

can be written as:

def with_pipe(users):
    return (users
        |> filter(|> (_["age"] > 30))
        |> map(|> (_["amount"] * (1 - DISCOUNT_RATE))
        |> sum())

with the current syntax.

I tend to think of two concepts now - a pipeline / “headless pipeline” / auto-lambda (i.e. a pipeline without LHS that can be called by providing the LHS as the argument) and a “pipeline instance” / pipeline (one would have to decide on the exact naming) - what I used to call a pipeline so far.

Depending on the associativity order we could get rid of some of those brackets. The current operator is in line with R’s implementation of |>.

I will look for some good use cases. This can take some time but I am sure there are good uses for such a nice syntax :slight_smile:

Is that just a new way of writing lambda x: x["age"] > 30?

While I’d love to see a more streamlined way of writing lambda functions in the language, I don’t think you should be sneaking one in as a side effect of a “pipeline expression” proposal.

There is nothing sneaky about it. One can have:

process = |> filter(lambda message: not message.deleted) |> map(lambda message: translate(message, 'french'))

result = process([ message1, message2, message3 ])

or

tail = |> windowed(2) |> enumerate() |> tqdm()
for index, bracket in brackets |> tail():
  # Code here
   pass

or:

for index, bracket in tail(brackets):
  # Code here
   pass

or:

for index, bracket in brackets |> windowed(2) |> enumerate() |> tqdm():
  # Code here
   pass

It all starts falling nicely in place together.

At the very least, I think the PEP needs to explicitly state something along the lines of:

This syntax also provides an alternative to the traditional lambda expression, with |> _ + 3 being an alternative form of lambda _: _ + 3.

It may be that for some people this will be a compelling advantage of the proposal. On the other hand, it may cause some people to complain that it’s not as flexible as lambda, or they don’t like the syntax, etc. “A shorter way of writing lambda functions” is another of the endless debates that never seems to go anywhere.

I’ll note that neither your current PEP not the Pyodide implementation currently allow unary |>. So you seem to have moved away from your earlier insistence that we should focus solely on the “pipeline expression” idea. On re-reading your post, I see that you introduced the “auto-lambda” idea, as well as the idea of a __pipe__ method, in the long list of bullet points. My apologies - I thought you were calling for a focus on the PEP as you’d written it, and I now see that wasn’t the case. But I’m now confused as to what precisely you are willing to include or exclude.

:person_shrugging: I guess it doesn’t matter. The discussion can continue exploring ideas for as long as people want. But at the point where you want to actually write a PEP for this, you’ll need to focus down on what you’re actually proposing, and refocus the discussion on just the features you plan on including.

Thanks. You are correct. I am still exploring as everyone else (although in the narrow viable (IMHO) direction outlined above). I think the new addition - “composable pipelines” (a.k.a. alternative lambda syntax - currently only available if you compile from source and not entirely finished, e.g. auto-injection is missing in the autolambdas) is a good idea but in the end I’d expect the community to “calibrate” this by deciding what to include/exclude and how to “configure” the respective pieces. I’d just rather have a “maximalist” draft which we then trim down and fine-tune rather than people endlessly pushing ideas out of the blue, since as you mentioned - if we ever hope to get this done, we/I need to focus. What I want to do in the immediate future is to add __pipe__() (and get back to dg-pb with an explanation of how it works) and rewrite the PEP draft. Regarding the flexibility of autolambdas / headless pipelines, I guess that’s the point - not to be an alternative lambda syntax but rather a shorthand for a well-defined and important use case.

@dg-pb with the __pipe__() magic (available when compiled from source) you can do the following:

from functools import partial


class PartialsPipe:
  def __init__(self, value):
    self.value = value  

  def __pipe__(self, rhs):
    p = rhs(self.value)
    return PartialsPipe(p(self.value))

  def __repr__(self):
    return repr(self.value)


def second(a, b):
  return b


my_split = partial(str.split, maxsplit=2)
print(PartialsPipe("a b c d e f") |> my_split)

print(PartialsPipe("a b c d e f") |> partial(second(_, str.split), maxsplit=2))

The trick with second() is needed due to the auto-injection behavior (i.e. the explicit use of _ eliminates the auto-injection). BTW I’ve updated the auto-injection behavior to be more intelligent. If the placeholder is shadowed or overwritten it does not count as explicit use of the placeholder and auto-injection will still happen, i.e. the following will not be the same as calling second():

print(PartialsPipe("a b c d e f") |> partial(_ := str.split, maxsplit=2))

In the latter case, PartialsPipe("a b c d e f") will be passed as the third argument to the partial() call, i.e. end up as the first argument to str.split and "a b c d e f" _ := str.split will end up as the sep argument to str.split once the pipeline is evaluated. At least this is the current state, I think overwriting needs to be counted as use (and eliminate auto-injection), since otherwise the value of the placeholder is already lost - it doesn’t make any sense.

The purpose of sharing these ideas is to gather feedback on the proposal. Since the original poster isn’t actively participating in the discussion, the proposal feels like it lacks clear ownership. That might explain the missing feedback loop in this thread.

I’m not trying to explore anything here. This just isn’t the right place for it.

Please start a thread in Python Help to discuss possible pipeline implementations, how other programming languages handle them, the trade-offs, and any related caveats.

The pipeline itself isn’t a new idea. What’s important is having a well-defined pipeline operator. Take a look at Codon’s example for reference. It’s as simple as that. No implementation is needed either.

I’ve created the topic here. I don’t have high hopes as I would imagine we are more advanced in this area here than the general public, including studying some of the other languages - at least me (R, Ocaml, Haskell, Idris, JavaScript). But who knows.

I’ve updated the Pyodide deployment: here.

You can do many interesting things with the __pipe__() magic now, including:

from functools import partial


class PartialsPipe:
  def __init__(self, value):
    self.value = value  

  def __pipe__(self, rhs, last):
    p = rhs(self.value)
    if last:
      return p(self.value)
    else:
      return PartialsPipe(p(self.value))
    

def second(a, b):
  return b


def mypartial(func, *args, **kwargs):
  def inner(_):
    return func(_, *args[:-1], **kwargs)
  return inner


def mypartial2(func, *args, **kwargs):
  def inner(_):
    return func(*args[:-1], _, **kwargs)
  return inner


my_split = partial(str.split, maxsplit=2)
print(PartialsPipe("lorem ipsum dolor sit amet") |> my_split)

print(PartialsPipe("lorem ipsum dolor sit amet") |> partial(second(_, str.split), maxsplit=2))

print(PartialsPipe("lorem ipsum dolor sit amet") |> partial(_ := str.split, maxsplit=2))

print(PartialsPipe("lorem ipsum dolor sit amet") |> mypartial(str.split, maxsplit=2) |> mypartial2(map, str.capitalize) |> mypartial(list))

@dg-pb how does it look to you?

Disclaimer : I’m pushing the evil side a bit…


Taking this example:

The placeholder syntax (that already sugarifies lambda and partial) could also be used as some kind of minilanguage to sugarify filter and map… For example, rewriting this :
(~ is already a builtin operator, so I use $ as placeholder in the following) :

process = |> not $f.deleted |> translate($m, 'french')

(Perhaps this can be made (much) more beautiful)
There might also be some possible syntax for iterators (generators) with $i ($g)… idk.

Currently this:

process = |> filter(lambda message: not message.deleted) |> map(lambda message: translate(message, 'french'))

can be also rewritten simply as (and I think that’s enough):

process = |> filter(|> (not _.deleted)) |> map(|> translate(_, 'french'))

I just didn’t want the more conservative colleagues to get a heart attack :wink:

1 Like

So last is True if it is the last rhs value in a chain?
Im in 2 minds about this.
Such approach seems to offer a reasonable value by being able to do fancier things than just to pass value forward, but also adds some complexity. I am still figuring out where you are at so maybe it is justified…

Why [:-1]? (same in mypartial2?)
This shouldn’t work?

So mypartial and mypartial2 just add placeholder at the beginning or at the end?
Couldn’t you just use functools.partial with Placeholder functionality? (if it can serve this purpose of course?).

my_split = partial(str.split, maxsplit=2)
print(PartialsPipe("lorem ipsum dolor sit amet") |> my_split)
# Ok

print(PartialsPipe("lorem ipsum dolor sit amet") |> partial(second(_, str.split), maxsplit=2))
# Whats with the `second`? `second(_, str.split)` is just the same as `str.split`. What are you trying to show here?

print(PartialsPipe("lorem ipsum dolor sit amet") |> partial(_ := str.split, maxsplit=2))
# Same here, what is the point of this walrus?

print(PartialsPipe("lorem ipsum dolor sit amet") |> mypartial(str.split, maxsplit=2) |> mypartial2(map, str.capitalize) |> mypartial(list))

From first glance, this does look more complex than I would like such to see.
However, I think I still don’t see the potential capabilities of this that you have in mind.


So __pipe__ stuff is above. But how does L look like?

How is |> (not _.deleted) implemented? And how can I understand the mechanics of it?

These tricks with mypartial() and mypartial2() are due to the auto-injection behavior. I will resolve this tomorrow.

1 Like

@dg-pb Now this is possible:

from functools import partial


class PartialsPipe:
  def __init__(self, value):
    self.value = value  

  def __pipe__(self, rhs, rhs_noinject, last):
    p = rhs_noinject(None)
    return p(self.value) if last else PartialsPipe(p(self.value))


print(
  PartialsPipe("lorem ipsum dolor sit amet")
  |> partial(str.split, maxsplit=2)
  |> partial(map, str.capitalize)
  |> list
)
['Lorem', 'Ipsum', 'Dolor sit amet']

There is nothing particularly magical that I thought of for this functionality yet, apart from offering a unified operator (|>) for chaining calls or callables (but not a mix for obvious reasons) OR anything else which IMHO is a good value. To have a symbol for “chaining” that can accomodate many use cases is a win. Some more exotic ideas probably involve logging / callbacks / transformations / exception handling, etc.

Regarding your question about L - it is no longer necessary since the expression |> _ is now valid and corresponds to lambda _: _. This is how |> (not _.deleted) works. Does it make sense? However the other helpers like for method calling or easier creation of partials still make sense to be used with the PartialsPipe if this is the preferred style/design for a given project/programmer.

PS. The Pyodide deployment is up to date if you’d like to test.

Async example

import asyncio


class AsyncPipe:
  def __init__(self, value):
    self.value = value
    self.tasks = []

  async def run(self):
    value = self.value
    for i, (t, fut) in enumerate(self.tasks):
      try:
        value = await t(value)
        fut.set_result(value)
      except Exception as e:
        fut.set_exception(e)
        for _, fut_cancel in self.tasks[i + 1:]:
          fut_cancel.cancel()
        raise
    return value

  def __pipe__(self, rhs, rhs_noinject, last):
    p = rhs_noinject(None)
    fut = asyncio.Future()
    self.tasks.append((p, fut))
    return (self, asyncio.create_task(self.run())) if last else (self, fut)
  

def async_partial(func, *args, **kwargs):
  async def inner(*inner_args, **inner_kwargs):
    await asyncio.sleep(1)
    return func(*args, *inner_args, **kwargs, **inner_kwargs)
  return inner


async def async_list(x):
  await asyncio.sleep(1)
  return list(x)
x = await (
  AsyncPipe("lorem ipsum dolor sit amet")
  |> (_1 := async_partial(str.split, maxsplit=2))
  |> (_2 := async_partial(map, str.capitalize))
  |> (_3 := async_list)
)
>>> x
['Lorem', 'Ipsum', 'Dolor sit amet']
>>> _
<PyodideTask finished name='Task-217' coro=<AsyncPipe.run() done, defined at <console>:5> result=['Lorem', 'Ipsum', 'Dolor sit amet']>
>>> _1
<Future finished result=['lorem', 'ipsum', 'dolor sit amet']>
>>> _2
<Future finished result=<map object at 0xd9a010>>
>>> _3
<PyodideTask finished name='Task-217' coro=<AsyncPipe.run() done, defined at <console>:5> result=['Lorem', 'Ipsum', 'Dolor sit amet']>

Clearly these examples will not be fundamentally different than when overriding any other operator. However, the added value could indeed be in last and with the newly-added support for the walrus operator along with __pipe__() - this allows to neatly capture all the intermediate async results. @dg-pb what do you think? Currently, this definitely cannot be done as cleanly with just the operator overloading.

I am getting back to PEP writing.

I’ve added use case 1 (None-aware access) and use case 2 (cleaner syntax for layers in Deep Learning) to the PEP. WiP. As mentioned - the goal is to have a “maximalist” proposal which then can be trimmed and fine-tuned. Please don’t panic (but do let me know) if some things look too exotic or not convincing enough - the goal for now is to explore broadly.

Added use case 3 (expression debugging).

Added use case 4 (code builder) and its twin use case 5 (SQL query builder). Admittedly this one is very un-Pythonic but potentially very powerful - it grants some degree of correctness check at parse time compared to at runtime!

Please take a look at use case 5 - it’s pretty exciting although as “un-Pythonic” as 4. Maybe we should reconsider what “Pythonic” actually means. It could be so much cleaner, more readable and more intuitive than what currently goes by the name “Pythonic”. Sometimes “Pythonic” sounds like “assembler” :wink:

Added use case 6 - extract members into local scope.