Fast Subtype Checking for Single Inheritance

Thrameos · May 11, 2025, 10:50pm

The following idea was proposed and implemented in Python 3.10(?) bus was not accepted as I did not have time to file a suitable PEP. With the aid of an AI I can now propose a suitable draft of what a PEP would look like for the optimization.

This optimization was implemented in JPype for impressive savings (>20% OVERALL!) but this is not very typical performance gains because Java has very deep trees and to perform a method resolution we have to check hundreds of potential misses in a small block of code. For some simple highly overloaded cases (io classes) we were hitting 20 chains of depths 10 or more (because our objects are stacked on Python objects).

The idea is not without problems is a class which monkey patch its MRO needs to somehow change all the flags of derived classes and I am not sure how marking dirty would take place.

On the other hand this code is trivial to integrate into a Python code base and measure the savings if there were an appropriate benchmark available. So it may still be work considering.

I believe other languages implement such fast type checks, but unfortunately I haven’t found much documentation one way or the other.

An AI assisted me in writing the PEP due to my reading/writing disabilities. Unfortunately, those reading difficulties means that I can’t check the output as well as others. There is only so many times I can stand to have the PEP read to me or spend two hours reading it out loud to detect problems during the draft stage for something that may attract no interest what-so-ever. I have thus run a second AI specifically to warn about statements that may not be well supported. While the quality may not be great, it is much better than I can write myself.

PEP: XXXX
Title: Fast Subtype Checking for Single Inheritance
Author: Thrameos
Status: Ideas
Type: Standards Track
Created: 2025-05-11
Python-Version: 3.x
Post-History: (Links to discussions, if any)

Abstract

This PEP proposes a mechanism to optimize subtype checking (isinstance and
issubclass) for classes that use single inheritance in Python. By leveraging
the predictable order of types in the method resolution order (MRO) for single
inheritance hierarchies, we can reduce the computational overhead of these
checks without altering the semantics of Python’s type system.

The optimization introduces a flag in the tp_flags field of the type object
to enable faster checks when certain conditions are met. This approach is
validated through benchmarks and testing, ensuring correctness and backward
compatibility.

The claims regarding performance improvements and validation are speculative until benchmarks and testing results are provided. This section lacks supporting evidence for the stated optimizations.

Motivation

Subtype checking (isinstance and issubclass) is a fundamental operation
in Python, particularly in object-oriented programming. However, these checks
can incur performance overhead due to the need to traverse the MRO of a class.
This overhead becomes especially pronounced in the following scenarios:

Language Bindings: Many language bindings (e.g., for C++, Java, or other
languages) generate Python classes that mirror the inheritance structure of
the original language. These bindings often involve deep inheritance
hierarchies, where subtype checks are frequent and can significantly impact
performance.
Modules with Deep Inheritance Trees: Certain Python modules,
particularly those in scientific computing, machine learning, or frameworks
with complex object hierarchies, may define deeply nested inheritance trees.
Subtype checks in such hierarchies can become a serious bottleneck,
especially in performance-critical applications.

In addition to optimizing subtype checks for successful matches, this proposal
emphasizes the importance of detecting mismatches (misses) as quickly as
possible. Misses are often more frequent in real-world applications, and
immediate detection can significantly reduce computational overhead by avoiding
unnecessary traversal of the MRO.

The frequency of misses and their impact on performance are speculative and require validation through real-world data or benchmarks. No specific measurements or examples are provided to quantify the performance overhead.

Rationale

The optimization proposed in this PEP is based on the following observations:

In single inheritance hierarchies, the MRO of a derived class always
includes the MRO of its base classes in the same order.
The length of the MRO increases predictably with each level of inheritance.
Subtype checks can be reduced to a simple comparison of MRO lengths when
Python’s default subtype checking logic is used.
Immediate detection of mismatches (misses) is a critical aspect of the
optimization. By leveraging predictable properties of single inheritance
hierarchies, the fast path can reject invalid subtype relationships early,
avoiding the cost of MRO traversal.

The rationale for optimization is speculative and requires validation through testing and benchmarks. Claims about reducing subtype checks to simple comparisons lack supporting evidence.

Specification

Flag Integration

A new flag, Py_TPFLAGS_FAST_SUBTYPE_CHECK, will be added to the tp_flags
field of the type object. This flag is set during type creation under the
following conditions:

The type uses single inheritance.

Optimization Logic

The optimization is implemented in the C-level logic for isinstance and
issubclass:

If the Py_TPFLAGS_FAST_SUBTYPE_CHECK flag is set for the target type:
- Skip the linear search through the MRO.
- Perform a fast comparison of MRO lengths or direct pointer comparisons.
If the flag is not set:
- Fall back to the standard linear MRO traversal logic.

The performance impact of skipping MRO traversal and using fast comparisons is speculative and requires benchmarking. No validation is provided to demonstrate the effectiveness of these optimizations.

Dynamic MRO Modifications

If the MRO is modified dynamically after type creation, the optimization may no
longer be valid. In such cases, the Py_TPFLAGS_FAST_SUBTYPE_CHECK flag will
be cleared to ensure correctness.

It is unclear if it is better to create an error, or clear flags here. Finding the derived classes would be challenging.

Interaction with Metaclasses

Metaclasses that dynamically alter the inheritance structure may impact the
optimization. The flag will not be set for types created by metaclasses unless
the inheritance hierarchy is explicitly validated.

Backward Compatibility

This proposal is fully backward-compatible. Classes that use multiple
inheritance or override __instancecheck__ will continue to function as
before. The optimization does not alter the behavior of existing classes unless
explicitly applied.

Performance Impact

THIS SECTION WILL BE REVISED WITH ACTUAL RESULTS ONCE WE DECIDE TO MOVE TO IMPLEMENTATION STAGE.  In JPype where this was implemented the savings were  substantial but that may not be the typical use case.   In JPype we do a large number of method resolutions which means hundreds of failed type checks.    Benchmarking will be required on Python in general.  

Our sample implementation is designed such that ONLY tests for fast if the cost of a miss is guaranteed to exceed the cost of a fast check.  That threshold can be tuned based on some performance benchmark.

For now the AI has envisioned that the same saving from JPype are going to happen on Python in general which may not be the case.

Benchmarks demonstrate that the optimization reduces the time complexity of
subtype checks from linear (based on the depth of the MRO) to constant time for
single inheritance hierarchies. Testing scenarios include:

Language bindings with deep inheritance trees.
Modules with complex object hierarchies.

Results confirm significant performance improvements in applicable cases.

Alternatives

Do Nothing: Continue using the existing MRO traversal logic for all
subtype checks. This approach maintains the status quo but does not address
performance bottlenecks.
Custom MRO Optimization: Explore other ways to optimize MRO traversal
without introducing a flag. However, this approach may lack the simplicity
and predictability of the proposed solution.
Per Module Optimization: This does not have to be a universal solution.
JPype and other modules can apply a fast test, but having the flag to
indicate a proper strict inheritance helps with that process.

Reference Implementation

A prototype implementation has been developed, including changes to the CPython
interpreter. The implementation includes:

The Py_TPFLAGS_FAST_SUBTYPE_CHECK flag integration.
Optimized logic for isinstance and issubclass.

Example: Fast Subtype Check

Code will need to be checked during class construction.

This code has not yet been checked against all potential cases. Will be updated if the PEP was implemented.

int
is_proper_single_inheritance(PyTypeObject *type)
{
    PyObject *mro;
    Py_ssize_t n;

    /* Retrieve the MRO of the type */
    mro = lookup_tp_mro(type);
    if (mro == NULL || !PyTuple_Check(mro)) {
        return 0; /* Invalid MRO, not a single inheritance tree */
    }

    n = PyTuple_Size(mro);
    if (n < 2) {
        return 1; /* Single inheritance is trivially valid for 1-level trees */
    }

    /* Validate the MRO against its base classes */
    for (Py_ssize_t i = 1; i < n; i++) {
        PyTypeObject *base = (PyTypeObject *)PyTuple_GetItem(mro, i);

        /* Retrieve the MRO of the base class */
        PyObject *base_mro = lookup_tp_mro(base);
        if (base_mro == NULL || !PyTuple_Check(base_mro)) {
            return 0; /* Invalid base class MRO */
        }

        Py_ssize_t base_mro_len = PyTuple_Size(base_mro);
        /* Shortcut: Check the flag for the base class */
        if (!(base->tp_flags & Py_TPFLAGS_FAST_SUBTYPE_CHECK)) {
            return 0; /* Derived class cannot be proper single inheritance */
        }

        /* Check if the base class MRO length is consistent */
        if (base_mro_len > (n - i)) {
            return 0; /* Base class MRO length exceeds expected prefix length */
        }

        /* Validate the position difference between the base and derived MROs */
        for (Py_ssize_t j = 0; j < base_mro_len; j++) {
            if (PyTuple_GetItem(mro, i + j) != PyTuple_GetItem(base_mro, j)) {
                return 0; /* Not a proper single inheritance tree */
            }
        }
    }

    return 1; /* Valid single inheritance tree */
}

Note that once a class has been determined to have a proper single inheritance tree monkey patching the mro would need to be an error.

When in use it removes all extra checks for the loop in the case of a miss. For
libraries which perform frequence checks on deep trees this is a large savings.
Benchmarking may be needed to determine if there is break point for the use of
MRO fast checks.


int
PyType_IsSubtype(PyTypeObject *a, PyTypeObject *b)
{
    PyObject *mro;
    PyObject *mro_a = lookup_tp_mro(a);

    if (mro_a == NULL) {
        /* 'a' is not completely initialized yet; follow tp_base */
        return type_is_subtype_base_chain(a, b);
    }

    assert(PyTuple_Check(mro_a));
    Py_ssize_t len_a = PyTuple_Size(mro_a);

    /* We need to chose the cut off between fast checking and loop checking.  
       for now lets assume that three loop checks exceeds the cost of the fast check. */

    /* Fast subtype check for single inheritance, on both objects */
    if (len_a>3 && (a->tp_flags & b->tp_flags) & Py_TPFLAGS_FAST_SUBTYPE_CHECK) {
       /* We can skip to regular logic here if the chain it too short. */
        PyObject *mro_b = lookup_tp_mro(b);
        if (mro_b != NULL) {
            assert(PyTuple_Check(mro_b));
            Py_ssize_t len_b = PyTuple_Size(mro_b);

            /* If lengths are mismatched, return immediately */
            if (len_a < len_b) {
                return 0; /* 'a' cannot be a subtype of 'b' */
            }

            /* Check if the last 'len_b' elements of 'mro_a' match 'mro_b' */
            if (PyTuple_GetItem(mro_a, len_a - len_b) == (PyObject *)b) {
                return 1; /* Fast path: subtype relationship confirmed */
            }

            /* If the fast check fails, no need to fall back */
            return 0; /* 'a' is not a subtype of 'b' */
        }
    }

    /* Fall back to standard MRO traversal */
    Py_ssize_t i;
    for (i = 0; i < len_a; i++) {
        if (PyTuple_GET_ITEM(mro_a, i) == (PyObject *)b)
            return 1;
    }
    return 0;
}

Open Issues

Dynamic MRO Modifications:
- How should the optimization handle cases where the MRO is modified
  dynamically after type creation?
Interaction with Metaclasses:
- How will this optimization interact with metaclasses, particularly those
  that dynamically alter the inheritance structure?
Interplay with instance checks
- We may need to search the tree for instance_check and subtype_check
  but based on my reading that is not strictly necessary.
Review of abstract_issubclass:
- The abstract_issubclass function currently does not route subtype checks
  through PyType_IsSubtype, which is the standard mechanism for subtype
  checking in Python. This inconsistency may lead to missed opportunities for
  optimization and could introduce subtle differences in behavior. A review of
  abstract_issubclass is recommended to ensure alignment with the proposed
  optimization and Python’s existing subtype checking logic.

References

PEP 1: PEP Purpose and
Guidelines
PEP 3119 : Introducing Abstract
Base Classes
CPython Type Object Documentation:
Details on tp_flags and type objects

BrenBarn · May 12, 2025, 3:25am

If the entire point of the proposal is to make things faster, I don’t think it makes sense to leave the “performance impact” part as TBD. There’s no way to evaluate whether this is a good idea without knowing the performance impact.

Thrameos · May 12, 2025, 4:03am

This was tested heavily in JPype and made very large changes in performance.

If the length of objects is less then threshold this makes no difference at all as it will never get hit. The performance depends on the ratio of hits to misses. It will always win if it is a miss and the length is depth is 3 or more. It may win if it were a hit and the depth of the check is more than 3. Tuple list search where the user is asking for checking a list of types definitely going to be a win if the depth is 3 or more on each.

Thus performance metrics would require a detailed benchmarking of many packages and considering their average length and the hit to miss ratio.

For example here is the lengths of the torch.nn module
[4, 4, 4, 3, 4, 4, 4, 4, 4, 4, 4, 5, 4, 5, 5, 5, 3, 4, 3, 4, 3, 4, 4, 4, 4, 4, 4, 3, 4, 4, 4, 5, 5, 5, 4, 3, 5, 3, 4, 4, 4, 4, 4, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 3, 3, 4, 4, 3, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 3, 7, 7, 7, 7, 7, 7, 8, 8, 8, 7, 7, 7, 5, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 3, 2, 3, 3, 4, 5, 5, 3, 5, 6, 3, 3, 4, 3, 3, 3, 3, 4, 3, 4, 3, 4, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 4, 4, 3, 3, 3, 3, 3, 3, 5, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 3, 3, 5, 6, 3, 4, 4, 5, 5, 5]

There are many on this list where it would be a win especially on a miss.

Here is JPype including the base types (which are very short) and some typical Java wrappers.
[3, 5, 4, 6, 3, 3, 3, 5, 3, 4, 4, 3, 4, 3, 2, 3, 5, 2, 3, 2, 4, 4, 2, 3, 2, 10, 11, 10, 11, 12, 10, 12, 12, 12, 3, 11, 11, 10, 9] (Typical depth 10 to 16 on user imported packages)

JPype runs close to 90% miss rate at times so those linear search were really hurting. (This because Java has overloads and they each must be linearly search for each set of arguments.)

If the benchmark is all hits with near type or all Python concrete types the savings will be small.
If the benchmark is lots of deep objects and the miss rate is high, it will be substantial.

Thus the issue how should the be benchmarked. On what module and with what miss ratio? Should we take a survey of lengths? Do we need to instrument Python to measure hit miss ratios and produce a graph? Is Python supposed to cater to only the use of its type in shallow code or does it cater all styles which may include both long and short inheritance trees.

I don’t know what is a fair benchmark and am open to suggestions. I can construct either a positive or negative answer depending on the choice.

AA-Turner · May 12, 2025, 8:51am

Thank you for proposing the idea. However, in the first instance I would just propose the core idea in a few paragraphs, rather than an LLM-generated PEP. Also ensure you link to the prior discussions you mention.

Volunteer contributors, the PEP Editors, members of the Steering Council, etc. tend to spend quite a while reviewing and reading PEPs, so out of respect for their time, please do not submit long LLM-generated texts that you haven’t thoroughly reviewed and edited.

A

Rosuav · May 12, 2025, 9:14am

Indeed. I didn’t bother to read this; it’s not worth my time to read LLM-generated text that may or may not have any relation to the OP’s intent.

blhsing · May 12, 2025, 9:43am

Karl Nelson:

    /* Fast subtype check for single inheritance, on both objects */
    if (len_a>3 && (a->tp_flags & b->tp_flags) & Py_TPFLAGS_FAST_SUBTYPE_CHECK) {
       /* We can skip to regular logic here if the chain it too short. */
        PyObject *mro_b = lookup_tp_mro(b);
        if (mro_b != NULL) {
            assert(PyTuple_Check(mro_b));
            Py_ssize_t len_b = PyTuple_Size(mro_b);

            /* If lengths are mismatched, return immediately */
            if (len_a < len_b) {
                return 0; /* 'a' cannot be a subtype of 'b' */
            }

            /* Check if the last 'len_b' elements of 'mro_a' match 'mro_b' */
            if (PyTuple_GetItem(mro_a, len_a - len_b) == (PyObject *)b) {
                return 1; /* Fast path: subtype relationship confirmed */
            }

            /* If the fast check fails, no need to fall back */
            return 0; /* 'a' is not a subtype of 'b' */
        }
    }

I like the core idea of testing len(a.__mro__) < len(b.__mro__) for a quick fail and a.__mro__[-len(b.__mro__)] is b for a quick confirmation in issubclass(a, b).

I don’t even think a new type flag such as Py_TPFLAGS_FAST_SUBTYPE_CHECK is necessary since dynamic MRO changes are done through the cls.__bases__ property, which recalculates cls.__mro__ so your fast path should still work for classes that dynamically updates MRO.

By the way you should include equivalent Python code like the one I described above as an illustration for your proposal so people can more quickly grasp the core logics.

encukou · May 12, 2025, 9:49am

Do you have a link?

storchaka · May 12, 2025, 10:24am

The original code is:

        n = PyTuple_GET_SIZE(a_mro);
        res = 0;
        for (i = 0; i < n; i++) {
            if (PyTuple_GET_ITEM(a_mro, i) == (PyObject *)b) {
                res = 1;
                break;
            }
        }

It is very difficult to beat the performance of this code, unless you have an MRO of tens of items. We need serious benchmarks.

Thrameos · May 12, 2025, 11:16am

The first time I proposed this was 3 years back.

Unfortunately i don’t have the discussion that came from that time as the computer is long gone for the first. I suppose it exists somewhere in the ether.

If there is interest this time in performing benchmarking. I can take freshen the pr, and then compute the cost as a function of hit vs miss by length.

The trade off point depends a lot on the overhead of the assumptions because avoiding a long miss search can be be gotten by two if statements which is the same as one pass through a for loop ie check loop variable i<n, check match (miss), branch, i<n.

But the overhead of checking is b mro a tuple will eat the savings. Therefore we need to decide what is an invariant which we can take as givens. If not then savings point will be long.

~~We could also test without the flag and just execute a blind lookup regardless of the status. That will work in the case if single inheritance and save another if.~~ blind fails for reasons in next post

The key for deciding this if this is an optimation or overhead is the expected hit vs miss ratio. I can graph as a function once we have a cost table but someone will need to decide an acceptable range. Jpype with it ridiculous number of missed tests is not anywhere typical, so how do we decide a range that is typical? Shall we instrument Python and run a seletion of codes?

I am open to suggestions on how to tackle these problems. I can then produce the trade off point.

Thrameos · May 12, 2025, 11:28am

I can implement a blind test. The point of the flag was to limit the “this does not apply” to one if statement.

If we would rather do with a blind test then it would be check lengths (b longer) ->fast fail, check diff point → fast find, else loop.

We problem here is we added one extra loop cost and without the flag we still have to perform the linear search. The savings only happens if you can guarantee the loop can be avoided. That is what languages like Java which have a single concrete inheritance tree do.

Thrameos · May 12, 2025, 1:44pm

The last 3 times I tried submitting to Python, I submitted the idea in a few paragraphs then wrote the entire implementation which took a fair amount of time out of my free time block (which given I only typically have one or two months a year to contribute was substantial). I was told we won’t look at your PR without a PEP which was not an option without help from a core developer given the many hours it takes for me to write. The PR expired and nothing got fixed in Python the helped my project. My project was worse off for the attempt as my windiw to make progress closed for the year and the next Python launches while I sm still repairing kludges.

This time I tried to use a LLM to help get it in PEP form. Still took 4 hours, and because my work requires me to use a LLM which I can only ask 30 questions sadly when I switched from one to the next with the request “format this in rst with 80 characters” ate the entire 4 hours of work without me noticing by reinterpeting the whole request. (Regrettable but bad things happen). I apologize for how badly it went.

This version one it didn’t turn into a garbled mess after the 4 hours of work (or rather it did twice and I shot that version of Mr. Meeseeks). It was heavily edited but still AI assisted.

The core information is all presented. Languages like Java use a optimization to perform O(1) checking for their single inheritance (and a hash table for their interfaces). Python uses O(n) linear search. The question being is there any way Python can get the O(1) check.

Python itself uses mostly depth 2 and 3 so not O(3) is effectively O(1). But in users defined modules the depth can grow substantially and the hit to miss ratio can vary wildly. If someone is pedantic to the point they check every incoming type to something it is guaranteed to be then hit ratio is likely to by 99% hits. If it is something like JPype where every call is a blind search though the type tree the ration is skewed badly towards the misses. Certain apis have 20 methods overloaded to the same name and ot is a mthod that users want to use in tight loops!

Fortunately I solved the JPype issue in that I implemented the fast check at the level of 6 (my minimum depth) and added a cache mechanism that hashes types for the long argument matches so I avoid the type search in all loop structures. So I am set.

I posted this mostly to see if Python itself may benefit from some faster optimization. It will still help to remove code from my module so I venefit. The problem being overhead. This optimation only works if you can dodge the O(n) search 100% of the time.

The correct solution is likely not flags like I proposed but rather a type slot of some kind. Short chains issue to the linear search. Long but predictable chains use the fast index method, and deep but random ordered chains use a hash table. That ensures every style is equally supported. All effectively O(1).

I can implement such a beast, though it still has the thorny issue of something changing the basis or mutating the mro.

I should point out that this is much lower priority than fixing my garbage collection problem, figuring a way to avoid the mulitiple inheritance problem on exceptions for java exceptions, unified strings, and being able report foriegn exception stack chains. Don’t get me wrong, i would love to remove the hack that for instance checks, but those others are critical I ever want to declare JPype as complete. I will still complete this one if those others don’t pan out, because some progess after 3 years of being stalled is still progress. My developer time pot has more time in it because of disfunction at a national level and because of that disfunction I may find myself unemployeed. I am thus making a hard press to get the entire wish list through before I am forced to retire from the open source community.

blhsing · May 13, 2025, 3:00am

I don’t think you need to check if the mro attribute is a tuple. It should always be a tuple, and the existing code doesn’t check if it is a tuple either. You can re-evaluate the overheads after switching all the tuple accesses to non-validating macro versions.

Ah you’re right. The flag is needed to in particular to quickly identify classes of single inheritance, which is a prerequisite for the fast path.

Thrameos · May 13, 2025, 4:08am

The alternative is not to use a flag but a slot. The slot option is better because you could just jump the slot and get the fastest path without any overhead for conditional statements. If the chain is search go to linear search. If the chain is straight and proper immediate result in one operation. Otherwise consider adding a hashtable stored in the class dictionary. No matter which path taken it would always be the fastest in terms of speed for the expected operation. We may even be able to trim out enough other mechanics in the process to be a uniform speedup.

storchaka · May 13, 2025, 8:34am

Run extensive benchmarks.

First, implement your idea without a flag and without threshold. Create a deep hierarhy of classes with single inheritance and measure the time of the issubclass check for all pairs, and with a class outside the hierarhy. Look not only at the best case, we need all results – for which pairs the impact is negative, for which it is close to zero, for which it is positive and how the benefit depends on classes. You can create tables and graphs. This is necessary to find the threshold. All can ends there if there is no significant benefit or impact is negative in most cases and cannot be prevented.

Then try with several different hierarhies, with multiple base inheritance, with diamond inheritance. Now we can decide if the flag is needed or we can ignore negative impact, or use other conditions to prevent it.

Without data, we can only speculate.

Thrameos · May 13, 2025, 3:56pm

Got it so the required test plan is as follows.

Implement the algorithm on Py3.14 alpha version.

Test the following cases:

For each benchmark produce a table showing

Report time for hits by distance in inheritance tree.
Report time for misses by length of inheritance tree.

Depths to be considered will be 1 to 10.

Benchmark sets:

Strict order to strict order testbench
Strict order to multi inheritance testbench
Multi-inheritance to Multi-inheritance testbench
Multi-inheritance to Strict ordered testbench
Consider if special tests are required for specific patterns (diamond, many independent interfaces, etc)

We do not need to benchmark usage data for specific modules nor programs. No instrumentation version is required. All values will be computed with timeit on modified and unmodified code.

Produce the same table with and without the optimization for evaluation. When complete report results here for consideration for inclusion in C Python.

Hope that covers your requirements. I will look into executing the plan after the GC and String conversations are resolved (or stalled)

blhsing · May 14, 2025, 5:47am

Trading space for time with a slot sounds reasonable since there usually aren’t too many user-defined classes for an extra slot per class to cause a memory issue.

A hash table can indeed help superclass lookup in O(1) time for multiple inheritance, though it may cost another slot per class whether the class uses the hash table or not.

The big question is indeed to find out from the benchmarks at which points the gain from these fast paths outweigh their overheads, and see if we have enough popular libraries that meet those thresholds for the optimization to be worthwhile.

Do you plan on benchmarking both the flag approach and the slot approach? And do you plan on benchmarking the hash table approach in your test implementation? That would make the comparison very interesting.

Thrameos · May 14, 2025, 10:16am

I likely only have the time for one approach as I only have 3 months before I turn into a pumpkin for the year and I am attemping to grant every project wish in that time.

The with flag will struggle to be meaningful because it adds extra logic meaning more cost somewhere and hence compromise. The slot approach will give the best possible results. We can try the dict option without a slot by placing the look up table in the types dict. There may be wierdness if the class was being destroyed and isinstance was being called. But it will give meaningful numbers enough to tell if it worth effort.

I anticipate reasonable gains with the slot approach for misses and a break off point in length for hits hopefully to the point it appears compelling.

And thanks for taking time to give input.

Thrameos · May 15, 2025, 12:03am

With regard to the slot. I think it is safest if we always install the current method as the default first then analyze the tree to add a fast copy. Same with dict enhancement for multiple inheritance speed ups. Whenever a type is being destroyed with reverse the process by immediately changed to the default option which is always safe. The point should be be safe first, fast when things are running normally and never a change of behavior from before. I will start with the enhancement on where ever they are applicable, and then we can back of specific cases to ensure that any class gets the “on average” best behavior. The hard part will be that one needs to know the hit/miss ratio to really decide the best, but hopefully it won’t be so nebulous as to warrant serious debate. The nasty question after than is how to tackle mutilation of the bases/mro after the class is in use. But no need to deal with challenging problems if the optimization isn’t worth it in the first place.

If you have any further ideas on the design feel free to chime in.

Thrameos · May 19, 2025, 5:34am

I thought through and found some logic rules in addition to the ones I proposed.

a class can’t inherit from a base if it has shorter bases.
A strictly single inherited class can be check by looking at the item by position.
a forward inherited must have a match between the difference in lengths. Meaning we only need to search a portion of the list. To determine a match. (It may also be better to start from the difference point and work toward the front)

Thus I have additional checks, some of which may be strictly better than a linear search. Only deep and random order may require a hash table.

blhsing · May 19, 2025, 6:52am

Yes this check may be cheap enough to be even performed unconditionally.

I’m not sure how this is any different from your initial proposal. Can you express the check in Python code?

I’ve never heard of the term “forward inherited”. Can you elaborate?