Addition: `functools.argorder`

dg-pb · May 15, 2024, 5:27pm

Proposal: add the function with the following logic to functools module

def argorder(func, order):
    def inner(*args, **kwds):
        return func(*[args[i] for i in order], **kwds)
    return inner

What issues is this a potential answer to:
a) It would improve efficient interactions between different modules in standard library. Specifically: functools, itertools and operator
b) There are already a lot of packages that have invented various variations of partial to address issue which can be solved using this proposal. To name a few packages:
* partiell · PyPI
* partial-apply · PyPI
* partial · PyPI
* better-partial · PyPI
c) Also this is related to threads:
* Support for using partial with positional only argument functions - #3 by mariocj89
* Functools.partial extension to support specific positional arguments
c) Also directly related to: Builtins.any performance - #28 by dgrigonis
d) And somewhat related to: For any & every - #53 by Nineteendo as it improves performance of “short-circuiting” solutions that do not involve loop, but use map function.

why is this better than all of the above?
a) More efficient solution to the problem when combined with functools.partial compared to pure python external packages or to lambda solution.
b) It doesn’t require any changes to existing code in contrast to modification proposals.

The most common response to making changes to functional tools regarding arguments is:

“you can just use lambda”

However, this is a good argument when the cost is making existing tools (e.g. partial) implementation more complex. While in this case, the proposal is well contained in itself and doesn’t require any changes to existing code.

It non-invasively improves performance by more than 50% compared to lambda solution, which amounts to significant improvements when predicates are called on every iteration.

initial implementation

static PyObject *
argorder_call(argorder *self, PyObject *args, PyObject *kwds)
{
    PyObject *new_args = PyTuple_New(self->nargs);
    PyObject *arg, *index;
    PyObject *result;
    Py_ssize_t pos;
    for (int i=0; i < self->nargs; i++) {
        index = PySequence_Fast_GET_ITEM(self->order, i);
        pos = PyLong_AsSize_t(index);
        arg = PyTuple_GetItem(args, pos);
        PyTuple_SET_ITEM(new_args, i, arg);
    }
    result = PyObject_Call(self->func, new_args, kwds);
    Py_DECREF(new_args);
    return result;
}

naming
Didn’t give much thought about it yet, I am open to suggestions, if this was considered.

Stefan2 · May 15, 2024, 8:51pm

Just another Python version (Attempt This Online!):

def argorder(func, order):
    ig = operator.itemgetter(*order)
    def inner(*args, **kwds):
        return func(*ig(args), **kwds)
    return inner

kknechtel · May 15, 2024, 8:56pm

Could you be a bit more concrete about what problems you hope to solve with this tool, please?

pf_moore · May 15, 2024, 9:02pm

Also, why is it not sufficient to add this 3-line function to your code when you need it?

dg-pb · May 15, 2024, 10:34pm

It is a matter of micro-optimization here. Lambda function is more efficient than this. It only makes sense in C.

It is generally the same reason why C partial implementation exists. Just the benefit here is much smaller.

dg-pb · May 15, 2024, 11:36pm

Ah yes. This should have been part of the thread. I think I just looked at this enough to start thinking that others have seen what I have seen. This tool is mainly for predicate construction in conjunction with partial.

It as a faster convenience than lambda for cases when function arguments are not in convenient order.
E.g.

from operator import contains
from functools import partial
pred = partial(argorder(contains, [1, 0]), 9)
pred2 = lambda d: contains(d, 9)

a = [{i: i} for i in range(10)]
list(filter(pred, a))       # [{9: 9}]
list(filter(pred2, a))      # [{9: 9}]
%timeit list(filter(pred, a))       # 600 ns (30% faster)
%timeit list(filter(pred2, a))      # 860 ns
# -----
b = [{i: i} for i in range(100_000)]
%timeit list(filter(pred, b))       # 4.1 ms (40% faster)
%timeit list(filter(pred2, b))      # 6.8 ms

Also, note that it might be possible to optimize it better. I am far from expert in writing extensions.

Stefan2 · May 15, 2024, 11:52pm

What are the times with lambda d: 9 in d? Or even [d for d in b if 9 in d]?

The “65% faster” doesn’t look right, btw.

dg-pb · May 15, 2024, 11:53pm

Python implementation, although elegant, has no benefits compared to lambda:

pred3 = partial(argorder2(contains, [1, 0]), 9)
a = [{i: i} for i in range(10)]
b = [{i: i} for i in range(100_000)]
%timeit list(filter(pred3, a))      # 2.4 µs
%timeit list(filter(pred3, b))      # 22.5 ms

kknechtel · May 16, 2024, 12:03am

It should say 40% faster. Rather, the slow version is 65% slower. It’s not valid to reverse the numbers directly.

dg-pb · May 16, 2024, 12:06am

Yes, its 40%. Thanks.

Lambda although more efficient, still slower:

pred3 = lambda d: 9 in d
%timeit list(filter(pred3, a))      # 730 ns (20% slower than argorder)
%timeit list(filter(pred3, b))      # 5.3 ms (30% slower than argorder)

List comprehension is much faster, but it is not the same. I.e. list conversion in examples is only made to process the iterable - in reality, what is being tested is filter(pred). Also, it uses expression and not predicate function, which is not always possible. Results:

%timeit [d for d in a if 9 in d]    # 500 ms (20% faster than argorder)
%timeit [d for d in b if 9 in d]    # 2.5 ms (40% faster than argorder)

dg-pb · May 16, 2024, 12:13am

A closer equivalent, although still doesn’t apply to much of what proposal applies to. Expression can not be supplied to functions that accept predicate, also list comprehension functionality is limited compared to functionality of many functions that accept predicates.

%timeit list((d for d in a if 9 in d))    # 700 ms (17% slower than proposed)
%timeit list((d for d in b if 9 in d))    # 3 ms (30% faster than proposed)

blhsing · May 16, 2024, 1:29am

operator.itemgetter unfortunately has the design choice of an inconsistent return type when given a single argument, which is convenient when arguments are fixed but troubling for generalization.

blhsing · May 16, 2024, 2:38am

I’m -1 on this because I find explicit argument indices unreadable and unmaintanable.

A wrapper function with named arguments may be slower but is much friendlier to both code readers and writers.

Using a custom string class with a regex substitution method as an example:

class Str(str):
    def sub(self, pattern, replacement):
        return re.sub(pattern, replacement, self)

versus:

class Str(str):
    sub = argorder(re.sub, [1, 2, 0])

the former is much clearer and maintainable IMHO.

dg-pb · May 16, 2024, 2:59am

I generally agree. However this is mostly intended to be used in conjunction with partial, although could be used as utility on its own.

Having that said, I myself am at 0 at this point, maybe even slightly negative. Although it does offer some improvements, overall they aren’t big enough to get where I wanted.

Good findings, but unless someone can provide more support for this, I believe this is a no-go.

dg-pb · May 16, 2024, 3:08am

Also, just to add. I explored alternative ways to achieve this small benefit without explicit addition in C. No way to get to the same performance with cython.

One last idea is that maybe lambda could compile to optimized version in case where only argument passing forward is done.

(These are for 2-arg case)
Calling partial is 10-20 ns overhead on top of function call.
Calling argorder is ~20 ns overhead.
Wrapping in lambda is ~40 ns overhead.

Although I highly doubt it.

kknechtel · May 16, 2024, 3:39am

I would rather just see a more powerful third-party replacement for partial, honestly.

dg-pb · May 16, 2024, 4:48am

What do you have in mind by saying “powerful”?