For making an alternative to pip what PEPs or documention should one read?

I’m thinking of making an alternative to pip, firstly as a personal project to understand packaging better, secondly if it’s successful to provide an API so it can be used as a library for other tools. I have some further soft objectives I’m thinking about such as standard library only, plugable architecture, minimum functionality, etc.

What PEPs or documentation should I be reading? I’d prefer to start working on supporting newer forms of package installation and work back to older forms.

Sorry if this is obvious but I haven’t seen a cohesive place for all packaging PEPs so I’m not sure if my current list of PEPs is comprehensive or not.

1 Like

Probably this is a good place to start: PyPA specifications. Each child page seems to point to the relevant PEPs.

1 Like

[…]

Which parts of pip would you like to replace? Pip has been
incrementally replacing itself for years now by moving various
functionality out to external dependencies. As the Sagan quote goes,
“If you wish to make an apple pie from scratch, you must first
invent the Universe.”

It may not be obvious at first glance, but a lot of the
functionality people associate with pip is actually implemented in
other externally-developed libraries which pip vendors into its
codebase. Have a look at
https://github.com/pypa/pip/tree/main/src/pip/_vendor for a current
list. Are you out to replace the libraries pip depends on as well,
or just the ways pip glues them together?

If it were me, I’d start with designing the API you want your
library to present, and work backward from there in order to
determine what minimally you need to write in order to make it a
reality. You can reuse a lot of the same libraries pip does if you
want, or you can make your own implementations of those as well. If
you reuse existing libraries, you may not need to worry too much
about the various PEPs they implement, so that’s likely to be a
determining factor in which ones you’ll need to familiarize yourself
with.

1 Like

At a high level I think: get packages, resolve dependencies, install packages.

One of my soft objectives is to only rely on the standard library by default and not vendor anything.

To start off with if some functionality can’t be achieved without vendoring I would like to just not provide that functionality.

Yes, I ultimately want the stable part of the library to be the API so designing that first seems like the most sensible choice. I then want the code that provides each API to be replaceable in some way by the user so I can provide minimal functionality by default and if a user has some specific requirement they can code that themselves.

This needs something to interact over the network (eg: requests) and parse a PEP 503 index page, to get available files for a package. mousebender has a parser for such pages.

This is the hairy bit IMO – resolvelib has the core resolver that pip uses, mixology has the core resolver that Poetry uses. Most of the complexity of the resolver lives in those packages though, and you’ll have to deal with that. :slight_smile:

This logic is also closely coupled with being able to get dependency metadata, which for wheels is reading a file and for sdists involves generating a wheel (see below).

The complexity of this depends on whether you allow sdists – installer provides an API for installing from wheels.

If you need to build a wheel a wheel from an sdist, PEP 517 + PEP 518 + PEP 660 define an interface to build packages via. See Build System Interface - pip documentation v23.3.2 for the relevant details w.r.t. how pip does things.

1 Like

This is actually the part I’m most interested in coding. Compared to resolvelib I would like to design the resolution engine to be far more opinionated about what it’s resolving, but also compared to pip to be able to far easier replace the resolution engine so users could replace it wholesale if they wanted.

Indeed, I feel like a resolution engine can make some interesting choices compared to classical algorithms if it knows that the some dependencies can only be discovered as it searches the dependency tree.

Thanks for all the info!

That’s going to involve reinventing a lot of wheels. (Sorry, unintended pun!)

Off the top of my head, the following bits of pip are available as libraries:

  • packaging - requirements parsing and checking, versions, markers, wheel tag checking, etc.
  • installer - installing a wheel
  • build - converting a sdist into a wheel
  • resolvelib - the core resolver algorithm
  • shadwell - the package finder logic
  • mousebender - parsing simple indexes

I certainly wouldn’t want to try to write a monolithic no-dependencies package that did what pip does without leveraging all that pre-existing work.

Bits of pip that aren’t really packages anywhere else are:

  • network logic - we use requests, but there’s a bunch of stuff like proxy handling, authentication, caching etc, that we layer on top of requests.
  • VCS support
  • Error handling and the UI in general
  • Supporting non-standard stuff, like legacy setup.py based projects, non-PEP440 versions, old-style metadata, etc.
2 Likes

See also https://github.com/brettcannon/mousebender#the-steps-to-installing-a-package

1 Like

These are the ones I read for installing wheels:

For getting wheels from pypi, the official warehouse documentation is really good: JSON API - Warehouse documentation

One of my soft objectives is to only rely on the standard library by default and not vendor anything.

To start off with if some functionality can’t be achieved without vendoring I would like to just not provide that functionality.

Personally, I’d very much recommend using the appropriate libraries, e.g. requests/httpx for web requests, tqdm/rich for progress bars, toml/tomli for reading pyproject.toml. Many of these libraries cover a lot of edge cases that are very hard to get right without a large userbase

I’d prefer to start working on supporting newer forms of package installation and work back to older forms.

imho you only need to support sdist and wheel, apart from .egg-info editable installs I haven’t seen anything else for quite some time.

1 Like

Thanks for all the info!

I feel too often that tools end up coupling themselves too tightly to requests. I’d rather have a higher level “get packages” API that could be implemented using any of requests, httpx, boto3, git, etc. for the specific need. So my idea is for the default class that implements “get packages” for simple indexes is to use urllib.request, but this could be easily replaced.

As I’m thinking of this tool as first a library to interact through an API and second a CLI I don’t plan to implement any progress bars by default.

I’m thinking of starting the tool as Python 3.11+ so that I can hopefully use tomllib from the standard library.

Thanks! Although a fantasy idea I have is the tool would be extensible enough to drop in conda package support.

See shadwell (disclaimer: I’m the author, and it’s not seen much real-world use so its API hasn’t been battle-tested). This uses a “sans IO” approach to package finding, where the caller provides a “source” which is a generator that yields package objects. The caller can therefore use whatever network machinery they want.

2 Likes

Just FYI, this is the nominal goal of @vsajip 's PyPA distlib package, essentially to provide an API that can be used by other tools to perform core packaging functions. It is fairly mature, though it hasn’t seen much adoption and new features, however, in favor of more modular but limited-scope packages like packaging.

2 Likes

Pradyun, why is that? I’ve been peeking inside sdists, both freshly built ones (of flit’s making) and ones on PyPI, and the PKG-INFO file contains Requires-Dist: entries with all the dependencies, much like the METADATA file in wheels. Or is my sample tainted by projects that conform with PEP 517/518/660?

I’m not @pradyunsg and cannot claim his depth of expertise on the practical implementation of Python packaging standards (though in truth, few people can), but basically, PKG-INFO is de-facto an unstandardized, nominally-Setuptools-specific implementation detail of sdists. All that’s really standardized about them is the rough naming format and that they are a tar.gz of the source contents containing a pyproject.toml in the root (previously, in practice, a setup.py), so at least per the standards, the presence of PKG-INFO inside .egg-info cannot be explicitly relied upon.

Furthermore, unless and until PEP 643 is fully implemented by all sdist-producing build backends, and the Requires-Dist is explicitly specified as being static (i.e. not dynamic) for all the distribution archives you are considering per the specification in that PEP, it cannot be relied upon not to be different for wheels, and for different platforms, Python versions and anything else dynamically considered by any non-declarative install code (e.g. setup.py) and the build backend itself.

2 Likes

This is the key point when it comes to sdists: their metadata was not standardized until then, so if PKG-INFO doesn’t declare a core metadata version of 2.2 (or newer when there is a newer), you have to assume the data is just advisory and definitive.

1 Like