PEP 687: Isolating modules in the standard library

From recent discussions around “what should have a PEP”, it’s clear that this should have been a PEP long ago. Better late than never, I guess!

We submit this PEP to explain the changes, seek consensus on whether they are good, propose the remaining changes, and set best practices for new modules.

Please head to the PEP page to read it. The initial version is below, so you can quote it, and so that if the PEP is changed future readers can make sense of the discussion.

Text of the PEP:

PEP: 687
Title: Isolating modules in the standard library
Author: Erlend Egeberg Aasland erlend.aasland@protonmail.com, Petr Viktorin encukou@gmail.com
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Requires: 489, 573, 630
Created: 04-Apr-2022
Python-Version: 3.11

Abstract

Extensions in the standard library will be converted to multi-phase
initialization (:pep:489) and where possible, all state will be
stored on module objects rather than in process-global variables.

Note on Backdating

Much of this proposal has already been implemented.
We submit this PEP to explain the changes, seek consensus on
whether they are good, propose the remaining changes,
and set best practices for new modules.

Motivation & Rationale

The informational :pep:630 describes the background, motivation, rationale,
implications and implementation notes of the proposed changes as they apply
generally to any extension module (not just the standard library).

It is an integral part of this proposal. Read it first.

This PEP discusses specifics of the standard library.

Specification

The body of :pep:630 will be converted to a HOWTO in the Python
documentation, and that PEP will be retired (marked Final).

All extension modules in the standard library will be converted to multi-phase
initialization introduced in :pep:484.

All stdlib extension modules will be isolated. That is:

  • Types, functions and other objects defined by the module will either be
    immutable, or not shared with other module instances.

  • State specific to the module will not be shared with other module instances,
    unless it represents global state.

    For example, _csv.field_size_limit will get/set a module-specific
    number. On the other hand, functions like readline.get_history_item or
    os.getpid will continue to work with state that is process-global
    (external to the module, and possibly shared across other libraries, including
    non-Python ones).

Conversion to heap types

Types whose methods need access to their module instance will be converted
to heap types following :pep:630, with the following considerations:

  • All standard library types that used to be static types should remain
    immutable. Heap types must be defined with the Py_TPFLAGS_IMMUTABLE_TYPE
    flag to retain immutability.
    See bpo-43908 <https://bugs.python.org/issue43908>__.

    Tests should ensure TypeError is raised when trying to create a new
    attribute of an immutable type.

  • A static type with tp_new = NULL does not have a public constructor, but
    heap types inherit the constructor from the base class. Make sure types that
    previously were impossible to instantiate retain that feature; use
    Py_TPFLAGS_DISALLOW_INSTANTIATION. Add tests using
    test.support.check_disallow_instantiation(). See
    bpo-43916 <https://bugs.python.org/issue43916>__.

  • Converted heap types may unintentionally become serializable
    (pickle-able). Test that calling pickle.dumps has the same result
    before and after conversion, and if the test fails, add a __reduce__
    method that raises TypeError. See PR-21002 <https://github.com/python/cpython/pull/21002/files>__
    for an example.

These issues will be added to the Devguide to help any future conversions.

If another kind of issue is found, the module in question should be unchanged
until a solution is found and added to the Devguide, and already
converted modules are checked and fixed.

Static types that do not need module state access, and have no other reason to
be converted, should stay static.

Process

The following process should be added to the Devguide, and remain until
all modules are converted.
Any new findings should be documented there or in the general HOWTO.

Part 1: Preparation

  1. Open a discussion, either on the bug tracker or on Discourse. Involve the
    module maintainer and/or code owner. Explain the reason and rationale for
    the changes.
  2. Identify global state performance bottlenecks.
    Create a proof-of-concept implementation, and measure the performance impact.
    pyperf is a good tool for benchmarking.
  3. Create an implementation plan. For small modules with few types, a single PR
    may do the job. For larger modules with lots of types, and possibly also
    external library callbacks, multiple PR’s will be needed.

Part 2: Implementation

Note: this is a suggested implementation plan for a complex module, based on
lessons learned with other modules. Feel free to simplify it for
smaller modules.

  1. Add Argument Clinic where possible; it enables you to easily use the
    defining class to fetch module state from type methods.
  2. Prepare for module state; establish a module state struct, add an instance
    as a static global variable, and create helper stubs for fetching the module
    state.
  3. Add relevant global variables to the module state struct, and modify code
    that accesses the global state to use the module state helpers instead. This
    step may be broken into several PR’s.
  4. Where necessary, convert heap types to static types.
  5. Convert the global module state struct to true module state.
  6. Implement multi-phase initialisation.

Steps 4 through 6 should preferably land in a single alpha development phase.

Backwards Compatibility

Extension modules in the standard library will now be loadable more than once.
For example, deleting such a module from sys.modules and re-importing it
will result in a fresh module instance, isolated from any previously loaded
instances.

This may affect code that expected the previous behavior: globals of
extension modules were shallowly copied from the first loaded module.

Security Implications

None known.

How to Teach This

A large part of this proposal is a HOWTO aimed at experienced users,
which will be moved to the documentation.

Beginners should not be affected.

Reference Implementation

Most of the changes are now in the main branch, as commits for these issues:

  • bpo-40077, Convert static types to heap types: use PyType_FromSpec() <https://bugs.python.org/issue40077>_
  • bpo-46417, Clear static types in Py_Finalize() for embedded Python <https://bugs.python.org/issue46417>_
  • bpo-1635741, Py_Finalize() doesn't clear all Python objects at exit <https://bugs.python.org/issue1635741>_

As an example, changes and fix-ups done in the _csv module are:

  • GH-23224, Remove static state from the _csv module <https://github.com/python/cpython/pull/23224>_
  • GH-26008, Allow subclassing of csv.Error <https://github.com/python/cpython/pull/26008>_
  • GH-26074, Add GC support to _csv heap types <https://github.com/python/cpython/pull/26074>_
  • GH-26351, Make heap types converted during 3.10 alpha immutable <https://github.com/python/cpython/pull/26351>_

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

4 Likes

+1 from me.

This looks a typo in the conversion guide (heap & static the wrong way around):

“Where necessary, convert heap types to static types.”

1 Like

Identify global state performance bottlenecks. Create a proof-of-concept implementation, and measure the performance impact. pyperf is a good tool for benchmarking.

I’m only aware of one change related to PEP 687 (heap types and module states) and performance: the _functools extension store some objects in instances to avoid instance -> type -> module -> module state lookups: commit IMO when performance matters, it’s a reasonable trade-off.

2 Likes