`functools.pipe`

1. Proposal:

class pipe:
    def __init__(self, *funcs):
        ...
    def __call__(self, obj, /, *args, **kwds):
        ...
    def append(self, *funcs, inplace=True):
        ...

# No arg case
identity = pipe()
identity(1)    # 1
signature(identity)    # functools.pipe(x)

# First member function can have any signature
def func(foo, bar):
    return foo + bar
new_func = pipe(func, lambda x: x + 3)
signature(new_func)    # functools.pipe(foo, bar)
new_func(1, 2)    # 6

It is a fairly simple addition.
It would probably be the simplest component in the functools module.

2. Applications

a) Missing identity function (@sayandipdutta’s idea)

Such does not exist in standard library:

  • github /lambda x: x( )?(\)|,)/ - 128k files
# No argument pipeline acts as identity function
pipeline = pipe()
pipe(1)    # 1

I am not a big fan defining lambda for identity, knowing it will be one extra function call. And identity functions tend to be used in places that are being called over and over. I suspect it would be ~2x faster than lambda.

b) function composition / iterator recipes

Would speed up a certain portion of iterator function calls with predicates and keys:
E.g.

lst = [{'a': [2]}, {'a': [1]}, {'a': [0]}]
pred = pipe(itemgetter('a'), itemgetter(0))
list(filter(pred, lst))
# [{'a': [2]}, {'a': [1]}]
sorted(lst, key=pred)
# [{'a': [0]}, {'a': [1]}, {'a': [2]}]

From my experience, there is a certain amount of iterator recipes that are not currently competitive. But they would, given efficient function composition.

Also, it is sometimes the case that using function composition would make code a bit more intuitive than alternatives that make use iterator functions in ingenious, but involved ways.

c) convenient piping

I think this is a missing component in stdlib to cover the basics for efficient and convenient function composition. It would allow slightly-less-elegant-than-some-of-ideas code to achieve: funnel operator

With 2 minimal inherited classes, user can enjoy a fairly pleasant syntax that allows sufficient flexibility for a large part of use cases. And can choose operators to one’s liking (or not use them at all).

class pipe(functools.pipe):
    __ror__ = functools.pipe.__call__
    __rshift__ = functools.pipe.append

class prtl:
    def __init__(self, *args, **kwds):
        self.args = args
        self.kwds = kwds
    def __call__(self, func):
        return functools.partial(func, *self.args, **self.kwds)
    __rmatmul__ = __call__

result = 1 | (pipe()
    >> opr.sub@prtl(_, 2)
    >> opr.neg
    >> opr.contains@prtl({1: 'a'})
)
result     # 'a'

pipeline = pipe(
    opr.sub@prtl(_, 2),
    opr.neg,
    opr.contains@prtl({1: 'a'})
)
# Or
pipeline = (pipe()
    >> opr.sub@prtl(_, 2)
    >> opr.neg
    >> opr.contains@prtl({1: 'a'})
)
pipeline(1)    # 'a'

3. PyPI:

  1. function-pipes · PyPI
  2. function-pipe · PyPI
  3. functional-pipeline · PyPI
  4. GitHub - JulienPalard/Pipe: A Python library to use infix notation in Python
  5. funcpipe · PyPI

All of the above are written in pure python.
Some of them are simple, some are complex.
IMO the above solution has nicer syntax than most of those.
Furthermore, it would be more efficient (with the exception with the one that attempts to create new function by parsing asts)

2 Likes

This produces a ton of false positives, including lambda x: x[1], lambda x: x != '', and various others. Of the first page of search results that I browsed, only ONE result was actually an identity function.

3 Likes

Thanks! Ok, here is a number that should not have any false positives:

  • /lambda x: x( )?(\)|,)/ - 128k files

+~5K in total with a, arg, y, z, foo.

I don’t get it. How would new_func(1, 2, 3) be handled to result in 6, and how would list(filter(pred) be handled to result in 2, 1, 0?

1 Like

They wouldn’t… Just mistakes. Thank you.

list(filter(pred) still can’t work. Missing a ) and probably lst.

If you “describe” a feature not with text but just “by examples”, better make sure they’re correct. (And I think at least a little text describing it should be there anyway.)

2 Likes

Little benchmark for iterators.

OSX. Python 3.12

from collections import deque
from operator import itemgetter
import cython

BHOLE = deque(maxlen=0).extend
iga = itemgetter('a')
ig0 = itemgetter(0)
lst = [{'a': [0]}] * 100_000

@cython.compile
def pipe(*funcs):
    def inner(obj):
        for f in funcs:
            obj = f(obj)
        return obj
    return inner

%timeit BHOLE(map(ig0, map(iga, lst)))          # 5.50 ms
%timeit BHOLE(map(lambda x: x['a'][0], lst))    # 8.70 ms
%timeit BHOLE(map(pipe(iga, ig0), lst))         # 7.40 ms

C implementation should be a bit faster than Cython.

Although double-map is the fastest, but it’s code is least readable IMO.

So pipe could potentially find its uses where readability is more important than speed, making things slightly faster than using lambda.

Readability: Plain non-functional code would read even better than pipe example.

Flexibility: Functional code needs more than a chain a maps. Along the way, pipelines need starmap, filter, filterfalse, islicing, teeing, and grouping. The functools.pipe does not support any of that, nor does it handle keyword arguments or multiple positional arguments.

Debugability: Mushing the calls into a single call would be challenging for debugging so that it is hard to add logging or breakpoints.

1 Like

This concept is for single argument function composition. Not to be confused with iterator piping.

Having that said, composing iterator functions via function composition achieves linear iterator piping.

E.g.:

pipeline = pipe(
    partial(filter, lambda x: x % 2),
    partial(map, lambda x: x/2),
    list
)
pipeline([0, 1, 2, 3, 4])
# [0.5, 1.5]

For more complex iterator piping (multiple argument function composition) there is a FunctionType for that.

For outer (outside of member functions) logging / debugging, one can write a function to mix into a pipeline. E.g.:

def log_debug(x, is_iterator=False):
    if is_iterator:
        for el in x:
            breakpoint()
            print(x)
            yield x
    else:
        breakpoint()
        print(x)
    return x

@sobolevn

  • Performance. Please, compare your proposal with direct function calls. It should not be much slower. Maybe some parts can be written in C to make this faster?

The proposal is to implement this class in C. That is the whole point. Pure Python implementation is too slow for some applications, e.g. map predicate composition.

  • Typing. Type safety would be a strong point to add this feature. Because regular function calls are type safe. So, if you switch from type safe code to unsafe one - that’s a minus.

Type safety of this component would be in line with other components of functools.

I think type safety should be addressed for the whole functools module separately.

  • DSL discussion.

This proposal doesn’t introduce DSL. I only gave examples how user could implement operators to achieve something similar to what was proposed in Introduce funnel operator i,e '|>' to allow for generator pipelines.

  • Async support? This is optional, but it would be great to read the discussion about this topic. Will it work? Do we need any specific changes in the design?

This also will be in line with other components of functools. Do any other functools components provide specific async support?

I have been playing with Pure Python implementation of this for a fair while and I am at a point where I am happy with my proposed concept.

However, the issue was closed due to “insufficient discussion”.

So if someone likes this I am ready to discuss and address concerns.

I’m neutral to negative on this. It seems reasonable enough, but I doubt I’d use it much in practice. And there seems to be very little evidence that this change would actually be worth it.

I know your examples were “toy” ones, but I’d write them as

def new_func(foo, bar):
    return func(foo, bar) + 3
def pred(el):
    return el["a"][0]

My versions seem to me to be obviously clearer than the ones using pipe.

Do you have any compelling examples of (relatively) complex real-world code that would be improved by rewriting using the new pipe function? And by “improved”, I specifically don’t accept that function composition is by definition better than procedural code.

1 Like

Just to note, you aren’t defining a “pipe”; this is just ordinary function composition. Any comparison with shell pipes is dubious, because shell pipes read from and write to a file handle, rather than taking arguments and returning a value.

The proposed | overload is just function application, and >> is just an operator version of function composition.

I think you might not appreciate how small this proposal is.

For its size, various benefits in different applications, to me, seem to be non-trivial.

Yes, the concept at its core is a function composition. And naming is in line with equivalent tools in other programming languages. Also, it would be in functools namespace, so that would make it clear that it is a function composition pipe.

No, I understand it’s small. But it’s not negligible, and it’s up to you to provide justification. I’m just saying that I don’t see that justification.

Nobody’s doubting that you see the benefit. You need to show it to others if you want to get the propoal accepted.

Ok, I see. Maybe I can make better effort here.

So I think it would be best to compare it to some other proposal.

Let’s take `itemgetter` split into 2 objects - #27 by dg-pb, which I am not sure has been officially accepted, but there seems to be an agreement for it being useful addition.

The size of implementation of functools.pipe to my best estimation is 1.5 x larger than operator.itemtuplegetter. And its “invasiveness” is also ~1.5 x larger in comparison. Additional “invasiveness” is the fact that few extra lines of code would be needed in inspect module, but that pretty much covers it.

Benefits of operator.itemtuplegetter can be summed up to:

  1. Consistent return value as opposed to current operator.itemgetter
  2. Performance benefit in contrast to solution, where one wraps operator.itemgetter using Pure Python to get the bahaviour of proposed operator.itemtuplegetter.

While benefits of functools.pipe:

  1. slightly faster identity function
  2. Creation of efficient pipelines, where user can implement operators to achieve more desired DSL. This was requested and proposed several times, where solutions suggested were custom statement implementation that are tens of times more work and code compared to this proposal.
  3. Utility for predicate piping, which although would not be the only best solution for all cases, but would be the desired approach in certain cases.
  4. Given pipe would have a signature of 1st callable, it would offer convenience for method customization (see github issue)

So these aren’t large benefits, but in contrast to other proposals, I don’t see anything unreasonable given its benefit-cost ratio.

Cost is fairly minimal.
While there is a non-trivial benefit of it being a fairly universal modular component that integrates well into Python’s toolkit in more than one aspect without ambiguities.

The biggest benefit, IMO, is that this would lift the weight from pipeline/funnel syntax/operator proposals. These take time and effort (both people requesting/proposing and people answering). Having this, there would be good-enough take-it-and-play-with-it-as-you-like solution that would pretty much cover it for most common needs.

I would say 50% of benefit is the above and the rest combined comprises remaining 50%.

You certainly need to if you expect to convince a core developer to support the proposal and merge your PR (I assume your plan is to write a PR for this at some point?)

I explicitly said that what would be best is to show real-world examples of code that would be improved by using the proposed function. I stand by that - if you can’t find any existing code that your new function would improve, then IMO your proposal is dead in the water.

I’m not convinced. All of what you said is theoretical, and a significant amount of it is basically just your opinion. I want to see real-world code, not comparisons with other proposals of yours.

As it stands, I’m not actively against the proposal, I simply don’t care about it. And there are a lot of things I do care about, so I have no intention of spending my limited free time supporting this proposal. The most I’m willing to do is give you some guidance on how to put together a more convincing argument - but if you prefer to ignore that advice, then that’s fine.

1 Like

Could you pinpoint where you gave an advice and how ignored it?

Here are some

2 Likes

@jamestwebber just quoted the exact two places I was going to quote!