`functools.pipe`

1. Proposal:

class pipe:
    def __init__(self, *funcs):
        ...
    def __call__(self, obj, /, *args, **kwds):
        ...
    def append(self, *funcs, inplace=True):
        ...

# No arg case
identity = pipe()
identity(1)    # 1
signature(identity)    # functools.pipe(x)

# First member function can have any signature
def func(foo, bar):
    return foo + bar
new_func = pipe(func, lambda x: x + 3)
signature(new_func)    # functools.pipe(foo, bar)
new_func(1, 2)    # 6

It is a fairly simple addition.
It would probably be the simplest component in the functools module.

2. Applications

a) Missing identity function (@sayandipdutta’s idea)

Such does not exist in standard library:

  • github /lambda x: x( )?(\)|,)/ - 128k files
# No argument pipeline acts as identity function
pipeline = pipe()
pipe(1)    # 1

I am not a big fan defining lambda for identity, knowing it will be one extra function call. And identity functions tend to be used in places that are being called over and over. I suspect it would be ~2x faster than lambda.

b) function composition / iterator recipes

Would speed up a certain portion of iterator function calls with predicates and keys:
E.g.

lst = [{'a': [2]}, {'a': [1]}, {'a': [0]}]
pred = pipe(itemgetter('a'), itemgetter(0))
list(filter(pred, lst))
# [{'a': [2]}, {'a': [1]}]
sorted(lst, key=pred)
# [{'a': [0]}, {'a': [1]}, {'a': [2]}]

From my experience, there is a certain amount of iterator recipes that are not currently competitive. But they would, given efficient function composition.

Also, it is sometimes the case that using function composition would make code a bit more intuitive than alternatives that make use iterator functions in ingenious, but involved ways.

c) convenient piping

I think this is a missing component in stdlib to cover the basics for efficient and convenient function composition. It would allow slightly-less-elegant-than-some-of-ideas code to achieve: funnel operator

With 2 minimal inherited classes, user can enjoy a fairly pleasant syntax that allows sufficient flexibility for a large part of use cases. And can choose operators to one’s liking (or not use them at all).

class pipe(functools.pipe):
    __ror__ = functools.pipe.__call__
    __rshift__ = functools.pipe.append

class prtl:
    def __init__(self, *args, **kwds):
        self.args = args
        self.kwds = kwds
    def __call__(self, func):
        return functools.partial(func, *self.args, **self.kwds)
    __rmatmul__ = __call__

result = 1 | (pipe()
    >> opr.sub@prtl(_, 2)
    >> opr.neg
    >> opr.contains@prtl({1: 'a'})
)
result     # 'a'

pipeline = pipe(
    opr.sub@prtl(_, 2),
    opr.neg,
    opr.contains@prtl({1: 'a'})
)
# Or
pipeline = (pipe()
    >> opr.sub@prtl(_, 2)
    >> opr.neg
    >> opr.contains@prtl({1: 'a'})
)
pipeline(1)    # 'a'

3. PyPI:

  1. function-pipes · PyPI
  2. function-pipe · PyPI
  3. functional-pipeline · PyPI
  4. GitHub - JulienPalard/Pipe: A Python library to use infix notation in Python
  5. funcpipe · PyPI

All of the above are written in pure python.
Some of them are simple, some are complex.
IMO the above solution has nicer syntax than most of those.
Furthermore, it would be more efficient (with the exception with the one that attempts to create new function by parsing asts)

1 Like

This produces a ton of false positives, including lambda x: x[1], lambda x: x != '', and various others. Of the first page of search results that I browsed, only ONE result was actually an identity function.

2 Likes

Thanks! Ok, here is a number that should not have any false positives:

  • /lambda x: x( )?(\)|,)/ - 128k files

+~5K in total with a, arg, y, z, foo.

I don’t get it. How would new_func(1, 2, 3) be handled to result in 6, and how would list(filter(pred) be handled to result in 2, 1, 0?

1 Like

They wouldn’t… Just mistakes. Thank you.

list(filter(pred) still can’t work. Missing a ) and probably lst.

If you “describe” a feature not with text but just “by examples”, better make sure they’re correct. (And I think at least a little text describing it should be there anyway.)

1 Like

Little benchmark for iterators.

OSX. Python 3.12

from collections import deque
from operator import itemgetter
import cython

BHOLE = deque(maxlen=0).extend
iga = itemgetter('a')
ig0 = itemgetter(0)
lst = [{'a': [0]}] * 100_000

@cython.compile
def pipe(*funcs):
    def inner(obj):
        for f in funcs:
            obj = f(obj)
        return obj
    return inner

%timeit BHOLE(map(ig0, map(iga, lst)))          # 5.50 ms
%timeit BHOLE(map(lambda x: x['a'][0], lst))    # 8.70 ms
%timeit BHOLE(map(pipe(iga, ig0), lst))         # 7.40 ms

C implementation should be a bit faster than Cython.

Although double-map is the fastest, but it’s code is least readable IMO.

So pipe could potentially find its uses where readability is more important than speed, making things slightly faster than using lambda.

Readability: Plain non-functional code would read even better than pipe example.

Flexibility: Functional code needs more than a chain a maps. Along the way, pipelines need starmap, filter, filterfalse, islicing, teeing, and grouping. The functools.pipe does not support any of that, nor does it handle keyword arguments or multiple positional arguments.

Debugability: Mushing the calls into a single call would be challenging for debugging so that it is hard to add logging or breakpoints.

This concept is for single argument function composition. Not to be confused with iterator piping.

Having that said, composing iterator functions via function composition achieves linear iterator piping.

E.g.:

pipeline = pipe(
    partial(filter, lambda x: x % 2),
    partial(map, lambda x: x/2),
    list
)
pipeline([0, 1, 2, 3, 4])
# [0.5, 1.5]

For more complex iterator piping (multiple argument function composition) there is a FunctionType for that.

For outer (outside of member functions) logging / debugging, one can write a function to mix into a pipeline. E.g.:

def log_debug(x, is_iterator=False):
    if is_iterator:
        for el in x:
            breakpoint()
            print(x)
            yield x
    else:
        breakpoint()
        print(x)
    return x