Add `index: bool` keyword to `min`, `max`, `sorted`

dg-pb · August 10, 2024, 5:51pm

When you say when you are not convinced of something, just put it into perspective. E.g.:
“I am not convinced that this is special enough, because sum had 5M use cases and it was still was a hard choice to make special case for it” while this hardly reaches 1M."

Any tangible backing up of statements or doubts would be beneficial. I would have a chance to learn and improve from your feedback then.

I am open to ideas.

The point of interest to me could be summarised as follows:

“Intermediate iterator technique which at C level could be used to not unnecessarily convert C types to Python types”

I think integers would have most benefit for the start. After all, they are most commonly used and are quite expensive given they are not 64bit, but more complex objects.

Note, this is very beneficial, as some large number algorithms are sufficient to be coded at Python level and external libraries such as gmpy2 are overkill. E.g. my factorization algorithm is pure python, just because it is fast enough (for problems up to certain size).

A lot of optimization has already happened here. E.g. sum, itertools.count, itertools.range are doing C addition up to sys.maxsize.

However, although internal operations are done in C, they still inevitably generate Python objects and need to do PyLong_FromSsize_t on every iteration.

If there was a signalling protocol that could be used to signal callee at C level, this could be avoided.

The one that I presented is just initial idea. It is quite primitive in a sense that it “hacks” the object from outside. Furthermore, the state management from outside is not a very good idea.

Instead some sort of intermediate protocol could be devised. E.g.:

do_something_with_iterable(PyObject *iterable) {
    PyObject *it = PyObject_GetIter(iterable);
    int has_ip = Py_HasCIter(it);
    int return_type = Py_CIter_Type(it);
    Py_ssize_t citem;
    while ((item = iternext(it)) != NULL) {
        if (has_ip and return_type == 1) {
            citem = Py_CIter_Next(it);
        }
        else {
            item = PyIter_Next(it);
        }
    }
}

It is. It has both C and Python implementation, which is a bit of an issue. operator module is a potentially good collection of predicates to be parsed internally, but there would be quite a few problems to be solved.

Something generic could look like:

func(PyObject *item, PyObject *pred) {
    int op = UnaryCanUnwrap(pred);
    if (op != -1) {
        return UnaryOp(op, item);
    }
    else {
        return PyObject_CallOneArg(pred, item);
    }
}

I don’t know yet. These are just very initial ideas. I think it can turn out to be not worth it, but at the same time there is a possibility that this could cover a large variety of cases of non-numeric operations where integers are involved and also numeric operations.

E.g. math.prod(range(20)) would run at C speed without producing any Python objects (apart from final result of course).