Creating a package to _just_ install a wheel

Full disclosure, I am too.

Seriously though, pip (and Conda for the matter) is hadicapped by how low-level it operates, and it’s extremely difficult for it to handle various operations correctly, such as partially upgrading an environment, and auto-removing dependencies when uninstalling a package.

Thinking this the other way around, maybe the best approach a compatitor can take is to not care about dependencies at all, and advertise itself as a fast tool to populate a package into an environment. Tools like Pipenv and Poetry can use it instead of pip (since they already have dependency resolution anyway), which makes it easier to gain a user base and identify missing gaps.

7 Likes

If such tool would exist we would probably choose to use it in Nixpkgs to install wheels instead of pip. No need to handle deps, download anything. Just install in a designated directory.

This would be a really good gap to fill. It would also help clarify what an “environment” is (which has some subtle edge cases).

2 Likes

I’ve actually implemented this inside https://github.com/pypa/virtualenv/blob/master/src/virtualenv/seed/via_app_data/pip_install/base.py. It installs a given wheel into a folder exactly as pip would do (so then pip can uninstall it later). For now using to install the seed packages wheel for virtualenv.

By exactly the same, does it also replicates some of pip’s suboptimal behaviour? :stuck_out_tongue_closed_eyes: e.g. pip does not actually checks whether the wheel content matches RECORD as the spec mandates, IIRC.

1 Like

Indeed I don’t perform this operation :smile: though would be easy to do so.

This checks, as does the wheel unpack command. https://github.com/pypa/wheel/blob/master/src/wheel/wheelfile.py

Nowadays the hashes could be computed by a io module stream wrapper hooked into ZipFile.open()

Could be helpful to rip it out as a separate library, this seems more general purpose than a setuptools extension, as what I think the wheel package off.

1 Like

A general purpose library to go from wheel -> installed package would be very useful and we have the name installer on PyPI, which would be pretty perfect for this library IMO.

I’m happy to do the work of moving this code / pip’s code into there, and wrapping it into proper reusable library and make pip start using that. Any pointers beyond the ones above to help me get started on this?

2 Likes

Wheel used to have an installer. Maybe not entirely general-purpose? https://github.com/pypa/wheel/commit/353217fb496d61b3c5ce287b9c61a229e2ed27fe

1 Like

Add me as interested to contribute/review, virtualenv would adopt.

1 Like

I wonder if it’s useful to make the API sans-IO. Probably not since most IO things are filesystem calls, which work synchronously anyway, but there could be some design tricks to make the API easier to work with in an async context.

A project called installer sounds to me it will be able to install almost anything, not just wheels. This is definitely out of scope now, but it’d be a good idea to poperly scope the project before deciding on a name.

Definitely count me in.

1 Like

The only other format is PEP 517 sdists as far as standards are concerned and the sdist->wheel transition would definitely not fit in, or at least be enough complexity to defeat the “a fast tool to populate a package into an environment” goal here. :man_shrugging:

re: IMO, sans-I/O would be appropriate, with a common-case I/O utility provided on top of it. :slight_smile:

2 Likes

Seems like we definitely have different ideas about the project. I was thinking it would fit. The building part can be its own project (and installer depending on it), but the project name sounds to me it should have such an API. But I can be convinced on this :stuck_out_tongue:

1 Like

I mean, it does fit in… but I don’t want us to start with that scope/goal.

I do think “start with a smaller scope and grow” would work better for us… plus we can solve the shared install logic implementation problem much more easily than the shared common build logic implementation problem.

What I’m saying is that there’s no reason for installer to not be able to depend on packagebuilder once they both exist; but I’d like to be cautious till then. :slight_smile:

I also agree it should only do extract wheel to folder. To build wheel we have pep517 package already.

1 Like

A sans-I/O approach would also make supporting installing both from a .whl file or exploded on disk possible.

And I’m obviously up for helping out. :slight_smile:

So, what are the next steps? Create a repository, agree on initial scope, and figure out the API? If it is then should the repo be created in the PyPA org on GitHub based on who all is volunteering to help out? :wink:

All these sound reasonable to me. I think everyone agrees with the initial scope (to install a wheel). My concern was more about the name of the package, but that can be discussed until there is actually something to release.

PEP 427 already outlines the rough steps to install a wheel. Metadata readers are aleady mostly implemented by importlib-metadata, so what’s left from what I can tell is to parse WHEEL, a RECORD writer, a script writer, and a nice interface to streamline the usage (that last part sounds difficult already).

Another thing came to my mind reading the installation steps. The last step specifically talked about the uninstaller, which is not mentioned anywhere else in the document. Is this something we should take a look, and potentially include in this tools as well? This “smart enough” part seems quite vague and under-discussed.

This way makes sense, to me, today,

  1. use random access to read WHEEL, RECORD and prepare for hash checking

  2. generate series of (paths inside the wheel, and filelike to get the data) - include the ZipInfo for necessary metadata like +x bits and “isdir”

  3. automatically check wheel integrity/consistency here (hook on the readable stream for each archive member, raise error if .close() and hash mismatch)

  4. split paths between {package}.data/category/ /rest/of/path or ‘root of archive’ /rest/of/path for files not in the data directory

  5. map from {package}.data/category or ‘’, to category name one of PURELIB, PLATLIB, SCRIPTS, … at this stage we can no longer tell the difference between files at ‘’ or {package}.data/purelib if Root-Is-Purelib

  6. map from category name to installation target directory

  7. join target directory with /rest/of/path

  8. stream file contents to disk

  9. rewrite legacy scripts etc.,

  10. RECORD

  11. build pyc’s? the ‘smart enough to uninstall’ step just means any files you generate as a result of installing the wheel, also go into RECORD

If steps can be combined or optimized away then that should happen. If it is streaming the installer should be prepared to roll back after an error, say, if the last file doesn’t match its hash.

We want to change step #2 to improve compression so it would be helpful for that to be independent.

Since we’re defining scope here, we should enable signature validation as well, probably just as a hook inside Daniel’s step 1 while reading RECORD (because once we know that file is trusted, we can trust the hashes included in it). Give it the whole metadata directory and let it fail if it doesn’t like something.

It needs to be a hook though, with access to the rest of the wheel contents, as different platforms/users will have different needs here. I expect PyPI to require wheels be unsigned for now.