Introduce funnel operator i,e '|>' to allow for generator pipelines

dg-pb · May 18, 2025, 2:52pm

# No extensions in standard library required:
rpipe(pd.read_csv(...)) | MC.query("A > B") | MC.filter@P(items=["A"], _) | MC.to_numpy | MC.flatten | MC.tolist | map@P(lambda x: x + 2) | list | np.array | MC.prod | rpipe
pd.read_csv(...) |x| MC.query("A > B") |x| MC.filter@P(items=["A"], _) |x| MC.to_numpy |x| MC.flatten |x| MC.tolist |x| map@P(lambda x: x + 2) |x| list |x| np.array |x| MC.prod |x| rpipe |x

# Pure addition of `|>` operator
X = MethodCaller()
pd.read_csv(...) |> X.query("A > B") |> X.filter@P(items=["A"], _) |> X.to_numpy |> X.flatten |> X.tolist |> map@P(lambda x: x + 2) |> list |> np.array |> X.prod

# Yours
pd.read_csv{...} |> .query{"A > B"} |> .filter{items=["A"], ?} |> .to_numpy |> .flatten |> .tolist |> map{lambda x: x + 2, ?} |> list |> np.array |> .prod

dg-pb · May 18, 2025, 2:53pm

Using only |> addition in combination with available utilities

(pd.read_csv("my_file")
    |> X.query("A > B")
    |> X.filter@P(items=["A"], _)
    |> X.to_numpy().flatten().tolist()
    |> map@P(lambda x: x + 2)
    |> list
    |> np.array
    |> X.prod()
)

sadaszewski · May 18, 2025, 2:54pm

dg-pb:


# Pure addition of `|>` operator
X = MethodCaller()
pd.read_csv(...) |> X.query("A > B") |> X.filter(items=["A"], ?) |> X.to_numpy |> X.flatten |> X.tolist |> map@P(lambda x: x + 2) |> list |> np.array |> X.prod

Nice. However the ? would not come with the |> operator if we follow through with the component approach… it would have to be part of a “new lambda”. PS. I think we should keep placeholders in this example even if they don’t make sense - to maintain generalizability.

To me the lack of MethodCaller and @P is a clear win but YMMV of course. We shoud also benchmark against the regular

_ = ...
_ = ...(_)

sequence I guess. Character count matters.

dg-pb · May 18, 2025, 2:57pm

I am not sure what that specific case needs to do. |> .filter{items=["A"], ?} where does value get piped to? As object of which the method is called or sourced into second argument of filter?

I removed ? from that place in that post. I think it is a simple method call given your previous examples.

sadaszewski · May 18, 2025, 2:58pm

It’s a dummy example. Should come up with a better one.

dg-pb · May 18, 2025, 3:02pm

Ok, I put them back and added partial to that place. As long as we know that that specific place doesn’t make sense and it is only there to show a placeholder in partial.

sadaszewski · May 18, 2025, 3:09pm

BTW. Operator alone looks increasingly viable with the ____ shorthands etc. Keeping the partials high-level and having to manage symbols like that is not as tidy as I would like - but it could be a start.

dg-pb · May 18, 2025, 3:11pm

Let’s put placeholder in a place that makes sense after all…

(pd.read_csv("my_file")
    |> X.query("A > B")
    |> X.filter(items=["A"])
    |> X.to_numpy().flatten().tolist()
    |> map@P(lambda x: x + 2)
    |> list
    |> np.array@P(_, order='K')
    |> X.prod()
)

sadaszewski · May 18, 2025, 3:15pm

Could you put together an example with argument placeholder(s), keyword argument placeholder(s), star expression placeholder(s) and and double star expression placeholder(s) mixed in different sequences? I think it is not necessarily intuitive in which order those would be passed to the resulting call.

Also, do we need ___ = functools.KwdsPlaceholder ? Can’t we just have X.whatever(keyword=functools.Placeholder) ?

Similarly some magic should be possible to be able to do X.whatever(*functools.Placeholder) and X.whatever(**functools.Placeholder) no? Could Placeholder have conversions to list / tuple and dict respectively? Then the tuple / dict should contain a special marker.

dg-pb · May 18, 2025, 3:27pm

This is for a double star expansion. And those are just my initial scribblings, not the final concept. If I was to implement then a lot of polishing and rethinking would need to be done.

So, as I said, this is only my latest scribblings, but the latest logic that I have in mind is as follows:

_ = Placeholder
__ = StarPlaceholder
___ = DoubleStarPlaceholder

def foo(*args, **kwds):
    return args, kwds

1 |> foo@P(_, 2)      # ((1, 2), {})
[1, 2] |> foo@P(__)   # ((1, 2), {})

1 |> foo@P(2, a=_)    # ((2,), {'a': 1})
{'a': 1, 'b': 2} |> foo@P(___)    # ((), {'a': 1, 'b': 2})

I am still in process of what would exactly happen when all of them are present:

foo@P(_, _, 1, __, ___, a=_, b=_, c=1)

But for piping there is always one and only one placeholder, so not worth going into this too much now.

I see where you are aiming at, and for sure - actually using * as if it was a call would be ideal.
But I think we need a realistic case. It is uncertain what this needs to do:

X.whatever(*functools.Placeholder)

For piping it is either:

foo@P(__)
# or
X.whatever(<complete set of arguments>)
# as X is placeholder here

sadaszewski · May 18, 2025, 3:30pm

dg-pb:

_ = Placeholder
__ = StarPlaceholder
___ = DoubleStarPlaceholder

def foo(*args, **kwds):
    return args, kwds

1 |> foo@P(_, 2)      # ((1, 2), {})
[1, 2] |> foo@P(__)   # ((1, 2), {})

1 |> foo@P(2, a=_)    # ((2,), {'a': 1})
{'a': 1, 'b': 2} |> foo@P(___)    # ((), {'a': 1, 'b': 2})

I’d like to have this:

_ = Placeholder

def foo(*args, **kwds):
    return args, kwds

1 |> foo@P(_, 2)      # ((1, 2), {})
[1, 2] |> foo@P(*_)   # ((1, 2), {})

1 |> foo@P(2, a=_)    # ((2,), {'a': 1})
{'a': 1, 'b': 2} |> foo@P(**_)    # ((), {'a': 1, 'b': 2})

For *_ it’s possible - just checked. ~~**_ is problematic.~~ **_ is also easy after all.

dg-pb · May 18, 2025, 3:33pm

Nice one! Indeed:

class Placeholder:
    ...
    def __iter__(self):
        return iter((StarPlaceholder,))

Yup, might be possible to find a way for this too. But ** is used much less often - wouldn’t be tragic if there was no solution as such.

dg-pb · May 18, 2025, 3:34pm

Tell me please?

sadaszewski · May 18, 2025, 3:38pm

>>> class X(dict):
...   def __init__(self):
...     super().__init__()
...     self['_'] = 'DoubleStarPlaceholder'
...   def __iter__(self):
...     yield 'StarPlaceholder'
...
>>> x=X()
>>> dict(x)
{'_': 'DoubleStarPlaceholder'}
>>> list(x)
['StarPlaceholder']
>>>
>>> def fn(*args, **kwargs):
...   return (args, kwargs)
...
>>> fn(*x, **x)
(('StarPlaceholder',), {'_': 'DoubleStarPlaceholder'})

dg-pb · May 18, 2025, 3:43pm

_ is a valid argument name unfortunately… Although very unlikely to occur, but such unlikely to be accepted in stdlib.

def foo(_=1):
     pass

sadaszewski · May 18, 2025, 3:45pm

True but that’s not the clue of the solution. It can be a dict containing DoubleStarPlaceholder as any value. The key can be None or "" or anything else that does not work as argument name. Then the interpretation should be done by your partial.

dg-pb · May 18, 2025, 3:47pm

Ok, this works. Must be a string so None does not. So that is on the table:

In [81]: def foo(**kwds):
    ...:     return kwds
    ...:

In [82]: foo(**{'': 1})
Out[82]: {'': 1}

pf_moore · May 18, 2025, 3:54pm

I agree. The |> syntax looks too much like an operator for it to not behave like one.

I think you’re going to get a lot of resistance to anything that’s not simple ASCII. In general, anything that is too punctuation heavy doesn’t look natural in Python.

Personally, I’m fine with |>, but I’d want it to be a normal operator, so I could mix it with method calls the way @petercordia showed. I’d prefer a syntax that could do |> map(~ + 2) or map(lambda x: x + 2) rather than map(lambda x: x + 2, ~).

I’m quite impressed by the basic |> proposal, but at the moment it feels like a prototype, and the final changes needed to make it fit well with the language are going to be the difficult part. I don’t particularly like the direction @dg-pb is going in - for me, that’s moving away from the elegance and simplicity a proposal like this needs to be successful.

dg-pb · May 18, 2025, 4:12pm

I am just trying to keep this as simple as possible. i.e. binary |> operator that behaves as any other and there is nothing particularly special about it. And satisfy the rest of chaining needs via improving utilities without further syntactic changes.

And yes, it is definitely not as elegant as if it was made into a larger specialised construct, but given relative cost, to me, this doesn’t look like a particularly bad direction (keeping in mind that various parts of it can be improved later - further syntactic conveniences, optimizations, new utilities, etc…).

So given all that has been considered here, I would likely be open to extending partial in parallel to |> being implemented as a simple binary operator.

However, having that said, I think by now @sadaszewski and others involved have digested my POV - I am open to exploring other directions. And if there are better ones - thats great. There are surely ways to make this more elegant - I am just not sure what sort of complexity such additions would require.

elis.byberi · May 18, 2025, 5:35pm

I meant real-life examples, either from CPython or from projects in the wild. These should demonstrate whether the new syntax is truly syntax sugar, making code easier and more readable, or just another syntax without significant benefit.

>>> lst = [1, 2, 3]
>>> for index, item in lst |> enumerate():
...     print(index, item)

>>> lst = [1, 2, 3]
>>> for index, item in enumerate(lst):
...     print(index, item)

Which of these examples is more readable?