Moving all stdlib packages into wheels

I wrote a little thing that builds wheels for each top level name or directory in cpython’s Lib/ . It is simple but could be augmented with dependency declarations and more sophisticated splitting. If developed further a special installer might build your custom Python standard library with desired batteries. It would be possible to add metadata about which are pure python and which would need a different implementation for an alternative interpreter. https://github.com/dholth/nonstdlib

4 Likes

@dholth I really like this approach

Thanks. Suppose we get rid of the standard library entirely. python --nonstd. Gets you whatever Python requires to boot. Brutally minimal.

Now distribute CPython with a default, named (virtual?) environment that includes the standard library. When you are running the Python interpreter as an application, rather than as an application runtime, you get that. Useful libraries an import statement away.

Include individually wrapped (packaged) standard library modules. Wheels, long sys.path-style folder-per-module, importlib hooks so that they are not importable by default, whatever. Since the individually wrapped modules are distributed with CPython, they can be added to a new Python environment instantly without any of that pesky internet access.

Start moving individually wrapped modules from the “importable by default” set to the “must be declared explicitly to be importable” set.

As a bonus could it become possible to remove the virtual from virtualenv because there was no big, system-specific default environment to overcome?

In this way a library that is a former- or soon-to-be- member of the standard library can be special without being importable by default.

1 Like

I think this should be moved/split off to another thread, but I don’t know how to do that in Discourse. (Does it need an administrator?)

Getting the dependency declarations right – and maintaining them – won’t be easy.
For starters, it turns out we don’t currently have a canonical list of modules in the standard library. (Building Python and looking doesn’t count; you’d need to do it on all platforms/configurations. Docs don’t count either; there are plenty of undocumented internal modules like _socket.)
If you’d like to help with creating such a list, let me know. It would help with PEP 534, which aims to split the stdlib into “mandatory” and “optional” parts. (That could also be a first step toward a more modular stdlib.)

I’m not sure what permissions are needed, but I split it off anyway. (If you do have permissions, then you do it by clicking the wrench icon on the right side of the thread view, then clicking “select posts”.)

Thanks, I requested a split-off by flagging the first post yesterday. I don’t have permission to split off the thread myself.

So far the split is based on the names in the un-built Lib/ directory.

The next step will be to generate dependencies between modules based on static analysis. https://github.com/dhellmann/python-stdlib-dependencies is promising. This kind of dependency metadata would be re-built automatically with SCons whenever Lib/ changed. I also considered subtracting sys.modules (before import x) from sys.modules (after import x). For the use case of subsetting Python the goal would be to generate the list of required dependencies instead of an exhaustive list.

After that you’d want to keep a folder with diffs or additions to be combined with the automatically discovered metadata. For example a rule to package a module together with its internal underscore-prefixed implementation. Or a link to the docs.

Longer term Lib/ itself could be a build product created from packages. If you wanted it to look as it does today, have an installer that puts the *.dist-info in a separate metadata folder to keep track of what’s installed so far without adding clutter.

We’ve found this similar effort in another discussion. They split the stdlib but not into wheels. Need a pull request with this against nonstdlib. https://git.openembedded.org/openembedded-core/tree/meta/recipes-devtools/python/python3/python3-manifest.json

It’s a little embarrassing but we did get everyone to stop using the distutils features that don’t package very well in wheels; I guess it was easier for them than contributing to packaging; egg did a similar feat years before. So I think it is doable for better or for worse. A game developer will want to redistribute your library. They will submit a pull request “please include this bit of extra metadata”, and so it goes. The payoff would be better support for Python applications as opposed to using the Python interpreter as an application.

I think most of the extra metadata would be the responsibility of the application developer instead of the library developers. Imagine something like py2exe but during the entire development process instead of at the end. When you run your application, the subset of Python you’re not planning to depend upon would be hidden. Then you would be continuously ready to redistribute however much of Python when it is time to share the program.

The imagined tool would be optional, like virtualenv.

Since you might already get a subset of the standard library if you are using a Linux distribution, it could be surprisingly helpful to make it official.

I’m sorry, I find it hard to follow what you are writing, jumping from things that happened in the past (I don’t recall the details and you don’t provide them) to hypotheticals.

But I realized I don’t mind as long you require the extra work only for apps who want to use your accelerated Python implementation.

Who will produce a version of pip that can operate in an environment that doesn’t contain the stdlib? It could add the stdlib to sys.path to run, but only search for dist-info on the original paths.

I like the energy and goals here, but I think a more incremental approach (like Ruby did) is more likely to be acceptable to other devs and users, and therefore be more effective in the end.

I’m interested in the extreme case. The necessary artifacts will be useful for regular users too. A no-standard-lib pip is also a not-reinstalled-in-each-venv pip.

1 Like

Hmm…

pip currently gets information about where to install, from the running interpreter and environment. Changing this up, to get information from an independent environment, will likely require more work than just vendoring stdlib stuff in or whatnot.
Unless, you’re suggesting having pip sit outside site-packages or something to that effect.

1 Like

poetry does it. In fact it seems to implement most of the ideas required for a bold new packaging vision. It presents as a standalone application that manages the virtualenv applications wind up in, and it manages its own dependencies in a novel way.

There’s nothing technically unviable to implement a package manager that runs outside of an environment, but the amount of modification to transform pip into that is so much that you’d basically rewrite the whole thing. I’d say it’s better to start fresh if you really want that (I would love such an implementation FWIW), rather than trying to reshape a square peg that is pip into the out-of-env round hole.

You would need a stub that added itself to PYTHONPATH to run under the target environment. It could be as much as running the whole pip under the target Python but not leaving it there, or something that reported back only with the dependencies and paths in the target env. There’s already been a lot of work getting pip to do isolated builds. It might be easier than you think.

You are right. If it gets information from independent environment instead of running than it may create more issues for virtual env users.