I’m thinking of making an alternative to pip, firstly as a personal project to understand packaging better, secondly if it’s successful to provide an API so it can be used as a library for other tools. I have some further soft objectives I’m thinking about such as standard library only, plugable architecture, minimum functionality, etc.
What PEPs or documentation should I be reading? I’d prefer to start working on supporting newer forms of package installation and work back to older forms.
Sorry if this is obvious but I haven’t seen a cohesive place for all packaging PEPs so I’m not sure if my current list of PEPs is comprehensive or not.
Which parts of pip would you like to replace? Pip has been
incrementally replacing itself for years now by moving various
functionality out to external dependencies. As the Sagan quote goes,
“If you wish to make an apple pie from scratch, you must first
invent the Universe.”
It may not be obvious at first glance, but a lot of the
functionality people associate with pip is actually implemented in
other externally-developed libraries which pip vendors into its
codebase. Have a look at https://github.com/pypa/pip/tree/main/src/pip/_vendor for a current
list. Are you out to replace the libraries pip depends on as well,
or just the ways pip glues them together?
If it were me, I’d start with designing the API you want your
library to present, and work backward from there in order to
determine what minimally you need to write in order to make it a
reality. You can reuse a lot of the same libraries pip does if you
want, or you can make your own implementations of those as well. If
you reuse existing libraries, you may not need to worry too much
about the various PEPs they implement, so that’s likely to be a
determining factor in which ones you’ll need to familiarize yourself
with.
At a high level I think: get packages, resolve dependencies, install packages.
One of my soft objectives is to only rely on the standard library by default and not vendor anything.
To start off with if some functionality can’t be achieved without vendoring I would like to just not provide that functionality.
Yes, I ultimately want the stable part of the library to be the API so designing that first seems like the most sensible choice. I then want the code that provides each API to be replaceable in some way by the user so I can provide minimal functionality by default and if a user has some specific requirement they can code that themselves.
This needs something to interact over the network (eg: requests) and parse a PEP 503 index page, to get available files for a package. mousebender has a parser for such pages.
This is the hairy bit IMO – resolvelib has the core resolver that pip uses, mixology has the core resolver that Poetry uses. Most of the complexity of the resolver lives in those packages though, and you’ll have to deal with that.
This logic is also closely coupled with being able to get dependency metadata, which for wheels is reading a file and for sdists involves generating a wheel (see below).
The complexity of this depends on whether you allow sdists – installer provides an API for installing from wheels.
If you need to build a wheel a wheel from an sdist, PEP 517 + PEP 518 + PEP 660 define an interface to build packages via. See Build System Interface - pip documentation v23.3.2 for the relevant details w.r.t. how pip does things.
This is actually the part I’m most interested in coding. Compared to resolvelib I would like to design the resolution engine to be far more opinionated about what it’s resolving, but also compared to pip to be able to far easier replace the resolution engine so users could replace it wholesale if they wanted.
Indeed, I feel like a resolution engine can make some interesting choices compared to classical algorithms if it knows that the some dependencies can only be discovered as it searches the dependency tree.
One of my soft objectives is to only rely on the standard library by default and not vendor anything.
To start off with if some functionality can’t be achieved without vendoring I would like to just not provide that functionality.
Personally, I’d very much recommend using the appropriate libraries, e.g. requests/httpx for web requests, tqdm/rich for progress bars, toml/tomli for reading pyproject.toml. Many of these libraries cover a lot of edge cases that are very hard to get right without a large userbase
I’d prefer to start working on supporting newer forms of package installation and work back to older forms.
imho you only need to support sdist and wheel, apart from .egg-info editable installs I haven’t seen anything else for quite some time.
I feel too often that tools end up coupling themselves too tightly to requests. I’d rather have a higher level “get packages” API that could be implemented using any of requests, httpx, boto3, git, etc. for the specific need. So my idea is for the default class that implements “get packages” for simple indexes is to use urllib.request, but this could be easily replaced.
As I’m thinking of this tool as first a library to interact through an API and second a CLI I don’t plan to implement any progress bars by default.
I’m thinking of starting the tool as Python 3.11+ so that I can hopefully use tomllib from the standard library.
Thanks! Although a fantasy idea I have is the tool would be extensible enough to drop in conda package support.
See shadwell (disclaimer: I’m the author, and it’s not seen much real-world use so its API hasn’t been battle-tested). This uses a “sans IO” approach to package finding, where the caller provides a “source” which is a generator that yields package objects. The caller can therefore use whatever network machinery they want.
Just FYI, this is the nominal goal of @vsajip 's PyPA distlib package, essentially to provide an API that can be used by other tools to perform core packaging functions. It is fairly mature, though it hasn’t seen much adoption and new features, however, in favor of more modular but limited-scope packages like packaging.
Pradyun, why is that? I’ve been peeking inside sdists, both freshly built ones (of flit’s making) and ones on PyPI, and the PKG-INFO file contains Requires-Dist: entries with all the dependencies, much like the METADATA file in wheels. Or is my sample tainted by projects that conform with PEP 517/518/660?
I’m not @pradyunsg and cannot claim his depth of expertise on the practical implementation of Python packaging standards (though in truth, few people can), but basically, PKG-INFO is de-facto an unstandardized, nominally-Setuptools-specific implementation detail of sdists. All that’s really standardized about them is the rough naming format and that they are a tar.gz of the source contents containing a pyproject.toml in the root (previously, in practice, a setup.py), so at least per the standards, the presence of PKG-INFO inside .egg-info cannot be explicitly relied upon.
Furthermore, unless and until PEP 643 is fully implemented by all sdist-producing build backends, and the Requires-Dist is explicitly specified as being static (i.e. not dynamic) for all the distribution archives you are considering per the specification in that PEP, it cannot be relied upon not to be different for wheels, and for different platforms, Python versions and anything else dynamically considered by any non-declarative install code (e.g. setup.py) and the build backend itself.
This is the key point when it comes to sdists: their metadata was not standardized until then, so if PKG-INFO doesn’t declare a core metadata version of 2.2 (or newer when there is a newer), you have to assume the data is just advisory and definitive.