`functools.pipe` - Function Composition Utility

I think this is up for debate.

I understand why users of typing are wary of such things, but at the same time I know very good developers who don’t use typing.

I think it largely depends on a situation. E.g. for small expert teams typing would potentially just slow things down.

And CPython standard library itself is not using typing… So although I understand POV of those who care about such things, but to me it seems that although keeping typing in mind is important, the weight to decisions as such should be much smaller.

And also, as opposed to new things being blocked by “hard to write typing”, such things could instead serve as pointers to where typing could / should be improved.

Yes, that makes sense. Would need to check what typecheckers do for partial.
But I would rather see typing extensions for “conditionals”.

Also, I have just realised, there is one more nuance in proposed variation:

f = pipe()
# is the same as:
f = lambda x: x

To me this seems like a fun puzzle for typing experts. :slight_smile:

P = ParamSpec('P')
I = TypeVarIter('I')

class pipe[**P]:
    def __new__(cls, *funcs: Tuple[Callable[P, T[0]], Callable[T[I], T[I+1], 0, None]] | None[0]) -> Self: ...

    def __call__(self, *args: P.args if T.min == 0 else T, **kwargs: None if None[0] else P.kwargs) -> T1 if None[0] else T[0]

:slight_smile:

I do as well. Being a good Python developer and using typing are fully orthogonal.

I am very firmly not saying that everyone should or must use a type checker. What I’m saying is that anyone thinking about new features for the language and stdlib ought to consider what they are asking of type checkers and typing design.

You don’t have to agree with that, but I want to be very clear that this opinion is about language features, not language users.

I agree. It’s no different than asking how does a new feature interact with any other existing language feature. If a new feature didn’t work well with multiple inheritance, that would be a problem. Replace “multiple inheritance” with typing.

Typing is a first class language feature, just like multiple inheritance. You don’t have to use multiple inheritance if you prefer not to either.

1 Like

It is and it isn’t.

Multiple inheritance doesn’t penetrate every aspect of the language, while typing does.

Multiple inheritance is a part with which all other parts need to play nicely.

While typing is an extra layer on the whole.

There is a POV from which these are the same.

However in practice I think implications are different. Failures in effective expansion of something which is an “extra layer on the whole” might have much higher impact on progress.

I am not sure what exact differences does it currently make, but I think that there should be a pronounced emphasis developing typing to such a degree that it doesn’t obstruct the progress.

And I believe that exactly that might be happening, but as I am not exposed to conversations on high level strategy, I just don’t know.

I suppose this might be a sensible goal for typing.


I am just saying, that if this was violating “multiple inheritance” this would be a definite no-go.

While given this is typing, very valid POV is “ok, we need to put more emphasis on typing to accommodate more of the language.”

And yes, this is standard library extension proposal, but it has been shown that people are writing exactly same thing in their own libraries and typing can not support “perfectly valid language constructs”.



So I am happy to put this on hold until the right time, but I don’t think typing should be decisive factor for the fate of this. But rather this could serve as a pointer to where typing needs improving.

This is what has been done for typing for near-identical utility…: Expression/expression/core/compose.py at main · dbrattli/Expression · GitHub

1 Like

Yeah, it is a quite a neat way to work around the limitations of the type system (up to n=9). Do you know if it works robustly?

So I am happy to put this on hold until the right time, but I don’t think typing should be decisive factor for the fate of this. But rather this could serve as a pointer to where typing needs improving.

Personally I think the benefits of type safety outweigh the benefits of piping by a large margin, so I would never use this feature if it were not compatible with the type system.

1 Like

No idea. To me this is just an example how I would not like do it.

However, it might be a good enough temporary solution.

I am still fairly convinced that this could be a good addition to functools. I think it is the next best one to complement basic set of utilities for robust functional programming.

I have laid out the benefits in several iterations. I will not repeat those, but in short:

  1. performance
  2. serialisable ad-hoc compositions
  3. utility to build upon for syntactic piping

The soft conclusion of Introduce funnel operator i,e '|>' to allow for generator pipelines - #358 by jamsamcam was to:

  1. Try to see if this can end up in stdlib
  2. Concentrate efforts on macros

I still can’t tell what the main reason for pushback for this is:

  • typing, which was the latest focus
  • Some other issues that others sense intuitively, but weren’t unable to articulate yet
  • This is simply more undesirable than desirable

Regarding typing, there are solutions to this, such as Expression/expression/core/compose.py at main · dbrattli/Expression · GitHub. They aren’t perfect, but in the grand scheme of things they could be good enough temporary solution (I am sure there are worse cases than this) until typing is able to handle such completely.


If this got traction, I would iterate it few more times, but current version is:

class pipe:
    def __init__(self, *funcs):
    def __call__(self, obj, /, *args, **kwds):    # args, kwds for first `func`
    def __get__(self, obj, objtype=None):         # In line with `partial`

To use operators, for the time being, user would need to inherit and implement these by himself. If over (long) time some convention becomes generally accepted, they could potentially be added later.

Thus, poll time. functools.pipe?:

  • Yes, functools.pipe.
  • Yes, but functools.<other name>.
  • Yes, but wait until typing is evolved enough to handle this case better.
  • Maybe, I am not convinced yet. Let’s wait and think more about this.
  • No, I don’t think this is useful enough. (some evidence that contradicts benefits, findings, examples, use cases presented so far would be appreciated)
  • No, this is a bad idea. (please tell more)
  • Other. (care to elaborate?)
0 voters

I don’t have any evidence like that. The evidence I have has been presented already, and no case has (IMO) been provided refuting that evidence. Issues with typing is one thing. Another is that I don’t see enough compelling use cases (you’ve presented use cases, I’m telling you I don’t find them compelling). And I haven’t seen any real evidence that this can’t live outside the stdlib.

I didn’t vote “let’s wait and think more about this” as I don’t think there’s any point repeatedly coming back to this. I’d prefer “Let’s drop this - if it’s really important, someone else will bring up the idea in the future, and we can revisit it if that happens”. But that wasn’t an option.

2 Likes

I’m pretty certain my reasons for saying no have also already been posted but for the sake of completeness: I have yet to see any of these map(filter(partial(operator.attrgetter(), itertools.what_on_earth_is_a_starmap), more_itertools.make_my_head_hurt()))) concoctions that couldn’t be written more legibly as a comprehension loop without loss of performance [1]. Adding pipe to the arsenal feels like it would encourage more of them.


  1. not that I’m likely to care even if I did loose a few microseconds ↩︎

3 Likes

srm sr-murthy

22m

Would the comprehension equivalent of your example would be simpler? Legibility is a matter of preception also: if you’re trying to implement these kind of pipe-like operations with functional constructs, the readability of the code will depend on a few things, foremost, probably indentation, comments etc. Functional expressions are often nicer than the imperative equivalents, and much more elegant.

And I should also add that there are many legitimate use cases for pipe operations in the real world: a lot of data engineering / ETL code is just pipes, featuring a sequence of transformations, each transformation implemented by chaining multiple smaller steps into a single statement, and consuming the output of the previous transformation, e.g. there’s a lot of Pandas code that can be written this way. Provided it is indented and commented properly, it is quite readable, and nice too.

Here are two examples from recent work projects, the first from catastrophe loss modelling:

        pd.merge_asof(
            sorted_loss_tiv_ratios_df,
            enhancement_params_df,
            left_on='loss_tiv_ratio',
            right_on='mean_damage_ratio',
            direction='nearest'
        )
        .set_index('loss_tiv_ratio')
        .loc[loss_tiv_ratios, :]
        .reset_index(drop=True)
        ['enhancement_factor']

A second example (also from catastrophe loss modelling):

        hazard_table
        .drop(columns=drop_columns)
        .loc[1:, keep_columns]
        .astype(float)
        .fillna(0)
        .rename(columns=output_columns)
        .reset_index(drop=True)

I won’t attempt the imperative equivalent of this in Pandas, but it won’t look pretty: if you see people doing it, it often involves iterating over rows, and performing row-wise operations. Slow and horrible.

My examples use Pandas, but I’m trying to draw an analogy with piped operations in general. What the OP (@dg-pb) proposes would, I believe, enable users to write this kind of piped code where you can build and compose several pipelines cleanly into one, in a very consise way.

I’ve still not seen convincing evidence that the proposal solves a problem which is not better solved by the def keyword. It gives you concision, but in exchange for that, I think you lose a lot of clarity.

It’s telling to me that if you have a composed pipeline and you need to troubleshoot it because it’s not working, you stop using that syntax and start doing things more imperatively, so that you can set breakpoints and add logging more easily.

That is, if I have

mypipe = pipe(f, g, h)

and I think g has a bug, I’m likely to rewrite it to what, IMO, it should have been in the first place:

def mypipe(data):
    h_result = h(data)
    g_result = g(h_result)
    return f(g_result)

so that I can look at h_result and g_result and understand what they are.

Also, naming h_result and g_result and putting a docstring on mypipe will improve the clarity of my code.

I don’t think I’m going to be convinced, but I’m also not here to advocate against functools.pipe(). I don’t think it’s a terrible idea which will cause big problems – I just know I’d never use myself it and would probably have opinions about its use during code reviews.

Note that Pandas has created an API for you here which is based around chaining and vectorized operations which are internally optimized. The correct point of comparison for this case is not a for loop. It is instead to declare functions which name the different steps, and then explicitly chain those functions using the already-available def and function call syntax.

I’m not saying that “it’s pretty” or “this is how I would write it”; I just think it’s important to understand that for all of the map and filter cases above, we’re talking about things which are equivalent to in-Python iteration. With an internally optimized tool like Pandas or numpy, the point of comparison is “that same exact function call”, since anything else changes the operations dramatically.

I’m not familiar with the problem domain, so this will be a bit clumsy, I’m sure. But I think the rewrite looks something like…

hazard_table = ...

def filter_columns(table: DataFrame) -> DataFrame:
    return table.drop(columns=drop_columns).loc[1:, keep_columns]

def normalize_floats(table: DataFrame, *, fill_value: float = 0.0) -> DataFrame:
    return table.astype(float).fillna(fill_value)

def compute_results(table: DataFrame) -> DataFrame:
    filtered = filter_columns(table)
    normed = normalize_floats(filtered)
    return normed.rename(columns=output_columns).reset_index(drop=True)

result = compute_result(hazard_table)

If the above reads poorly to you, then maybe this is a use-case where pipe would also read poorly.

2 Likes

For me, this is still a combination of “not every easily copied recipe needs to be in the standard library”, along with the few places that seem like they might benefit from it, are the same kind doing things in a way to speed things up natively that already chains reasonably.

That reads fine: it’s not about how it looks, but about being to express this as a chained process in a concise way that also reads intuitively. I don’t see a great deal of difference with the Pandas examples and function composition: the common factor is the idea of chaining processes, to use a more abstract word, in a way that the output of one becomes the input of another, with the final output being the value returned back to the caller.

With the hazard table example, and the OP’s pipe function, I’d like to be able to express it like this:

hazard_table = functools.pipe(
    hazard_table,
    filter_columns,
    normalize_floats,
    ...
)

to express hazard_table = ...(normalize_floats(filter_columns(hazard_table))).

It doesn’t look bad to me, and I’m speaking from a user’s point of view. This would clean up and simplify a lot of code.

A more general hypothetical example format for pipe

<output> = functools.pipe(
    <initial data>,
    <callable_1>,
    <callable_2>,
    ...
    <callable_n>
)

It’s not perhaps cleanly expressed, but I think the idea is clear enough: initial data for fhe first callable, and then one or more callables that are to be strictly chained. Whether the callables provided are correct, and satisfy the function composition requirements, and are given in the right order, would be entirely up to the caller.

I don’t see how these are relevant. They’re using methods so how would pipe() help them?

I kinda think something like this might be better as a pypi package. There’s invoke which allows defining a task pipeline from functions that are basically piped input/output, but quite cli oriented. I’d say this proposal is in a similar vein and you’d probably get feature requests at a rate that would better suit pypi than the standard lib.

PHP recently implemented a pipe operator on a third proposal.

Apparently there were various existing PHP pipe libraries.
Maybe they have some convincing examples and arguments to learn from?

1 Like

First of all, I tried to read through the proposal but many parts were off-topic to me or irrelevant for my reply so sorry if I’m repeating what has already been said or argued against.


I voted “no” because there already exists a package for that, namely pydash which is the Python equivalent of the Javascript lodash package, which essentially has many utilities that users want despite not being part of the standard JS library. The helper in question is Method Chaining - pydash 8.0.5 documentation . The doc is a bit simple but you can essentially invoke any pydash methods on an object, effectively making a pipeline. There is an initial value that is used but it can be reused with .plant()so the laziness you wanted is here. It’s not exactly a pure pipe as you had but I think it’s boils down to the same idea.

A simpler pipeline is exposed through funcy.compose and funcy.rcompose ( Functions — funcy 2.0 documentation ) so I honestly don’t think we need to incorporate the pipeline into the standard library.

In Java, we have a similar interface, though only on sequences, named the Stream API. It’s not using pipes but is based on method chaining, which is essentially what a pipe could achieve without having to implement the methods on the objects themselves. It’s mimicking what we can in JavaScript with .map() etc, but not everyone in the Java community is well-versed in functional programming so I don’t think it’s that used (and last time I checked, well it was some time ago, it’s slower than plain for-loops). I personally used it in my projects, but plain loops or separate functions are usually better, both for readability and debugging.


While PHP does now have a pipeline operator, Javascript still doesn’t and Javascript has much more functional programming than PHP. The rationale behind Javascript’s pipeline operator ( proposal-pipeline-operator/README.md at main · tc39/proposal-pipeline-operator · GitHub ) is that jQuery provides method chaining but only for methods that are implemented, thereby making it sometimes not that useful, and lacks support for await and yield constructions for instance. I don’t know why PHP actually wanted to add this to their langage when JS still hadn’t.

The rationale for JS stems from real-world examples that can be found in React, e.g.:

// https://github.com/facebook/react/blob/12adaffef7105e2714f82651ea51936c563fe15c/scripts/jest/jest-cli.js#L295-L303
console.log(
  chalk.dim(
    `$ ${Object.keys(envars)
      .map(envar =>
        `${envar}=${envars[envar]}`)
      .join(' ')
    }`,
    'node',
    args.join(' ')));

Using pipe operators would make it

Object.keys(envars)
  .map(envar => `${envar}=${envars[envar]}`)
  .join(' ')
  |> `$ ${%}`
  |> chalk.dim(%, 'node', args.join(' '))
  |> console.log(%);

In Python, I don’t think I’ve ever seen a production code where we really have multiple nested calls without intermediate variables or a comment, or intermediate functions. So, unless there are popular libraries that could benefit from either a pipe operator, I don’t think it warrants its addition to the standard library for now.


Now, let’s discuss about the pipe object in this proposal. If I were to choose, I would rather have a pipe operator than a pipe object. A pipe object is essentially a huge composed function. However, it could lack support for await and that could also be a huge no. The proposal doesn’t explain how await my_func() could work. This would still need something like this:

async def my_func(x):
    y = do(x)
    z = await async_then(y)
    return z

Similarly, it’s not possible to use yield constructions that regular functions allow. On the other hand, a pipe operator could solve at the interpreter level the problem of async and yield by essentially creating the equivalent function (e.g., my_func = do |> await async_then would be equivalent to the above explicit function definition). Since the current proposal lacks support for async and intermediate yield, I’m against it (well, mostly because it doesn’t support async well; yield is rarer).

1 Like