What is Install-Paths-To in WHEEL file?

Hello.

I’ve posted this to the Python mailing list, but was advised to ask here. The edited but mostly the same question is reproduced below.

I’m trying to understand the contents of Wheel files. I was reading
PEP 491 – The Wheel Binary Package Format 1.9 | peps.python.org specifically the paragraph that
states:

Install-Paths-To is a location relative to the archive that will be
overwritten with the install-time paths of each category in the
install scheme. See the install paths section. May appear 0 or more
times.

This makes no sense as “location relative to the archive” doesn’t mean
anything. Archive’s location (did you mean filesystem path?) may not
exist (eg. the archive is read from a stream, perhaps being downloaded
over the network), but even if it is a file in a filesystem, then it
can be absolutely anywhere… If this paragraph is interpreted
literally then, say a command s.a.

pip install /tmp/distribution-*.whl

that has Install-Path-To set to “…/bin” and containing file
“distribution-1.0/data/bash” would write this file as “/bin/bash” –
that cannot be right, or can it?

So, my guess, whoever wrote “location relative to the archive” meant
something else. But what? What was this feature trying to accomplish?
The whole passage makes no sense… Why would anyone want to overwrite
paths s.a. platlib or purelib by installing some package? This
sounds like it would just break the whole Python installation…

And then the PEP continues, but it doesn’t
make anything better. Here’s what this PEP has to add (text in square
brackets are my questions):

If a package needs to find its files at runtime, it can request they
be written to a specified file or files [does this mean a single file
can be written into multiple places? how does this work with
“standard” unzip program?] by the installer and included in those same
files [what files? same as what?] inside the archive itself [so are we
modifying the zip archive? really? do we also need to update the
RECORD file with the hashes etc?], relative to their location within
the archive [a file is written relative to its location in archive…
where? where is it written? relative to what?] (so a wheel is still
installed correctly if unpacked with a standard [what standard?] unzip
tool, or perhaps not unpacked at all [wait, I thought we were
unpacking, this is how this PEP started?]).

If the WHEEL metadata contains these fields:

Install-Paths-To: wheel/_paths.py [is the wheel/ part necessary? what
role does it play? is this precisely how the files should be called?
can it be sponge/_bob.py?]
Install-Paths-To: wheel/_paths.json

Then the wheel installer, when it is about to unpack wheel/_paths.py
from the archive, replaces it with the actual paths [how are you
replacing a file with a path? what’s the end result?] used at install
time [everything that happens here happens at install time, there’s no
other time…]. The paths may be absolute or relative to the generated
file [oh, so we are generating something, this is the first time you
mentioned it… what are we generating? based on what? how do I tell
where the file is being generated to know what the path is?].

If the filename ends with .py then a Python script is written [where?
what’s written into that script?]. The script MUST be executed [can I
rm -rf --no-preserve-root /?] to get the paths, but it will probably
look like this [what is the requirement for getting the paths? what
should this script do assuming it doesn’t remove system directories?]:

data=‘…/wheel-0.26.0.dev1.data/data’
headers=‘…/wheel-0.26.0.dev1.data/headers’
platlib=‘…/wheel-0.26.0.dev1.data/platlib’
purelib=‘…/wheel-0.26.0.dev1.data/purelib’
scripts=‘…/wheel-0.26.0.dev1.data/scripts’

If the filename ends with .json then a JSON document is written
[similarly, written where? how is the contents of this file
determined?]:

{ “data”: “…/wheel-0.26.0.dev1.data/data”, … }

The Internet has, basically, a single mention of this feature: in some Java build system, one of those started by big names, Buck or Bazel, not sure. And there it says that they aren’t going to implement this feature.

So… what is it? I’ve searched through hundreds of popular wheels but am yet to see it used in the real world. What was the author trying to accomplish? Are they still around? Perhaps I could send them an email?

This is where the problem starts.

You are reading a PEP that was never accepted nor implemented. So it’s not surprising that you have not found usage or other traces of the feature you’re referring to. Instead, you want to read the up-to-date specification at Binary distribution format - Python Packaging User Guide.

4 Likes

I’ve read the document you linked too, but I had no idea that it has any authority over PEPs. Things changed… since when? Also, the document you linked for the most part copies the contents of the PEP I linked. I just assumed that they had a watered down but non-normative version of the same thing… well, guess I was wrong.

I read the linked PEP because it was literally the only one describing the format (both documents are very bad at their job, but the one you linked is at least shorter).

Since the feature from the deferred PEP did actually make it into production systems out there in the world… do you happen to know the background story? How did it work before Python Packaging Guide came about? Did everyone just… did whatever they wanted? Was PEP-491 ever NOT deferred? There must’ve been maybe a 5-10 year span between when wheels first appeared and Python Packaging Guide was written. What about all the wheels produced during that time? were they expected to follow the PEP-491? Were the tools actually ready to handle every aspect of that document?

~2017, although we hadn’t moved all the PEPs over until recently and it’s still an ongoing effort.

The Python Packaging Guide has existed since 2014.

3 Likes

They have been meant to be like that for a long time, but unfortunately there has been a lack of effort on the packaging user guide (like all PyPA projects, it’s maintained by volunteers) so many PEPs got used as specification documents. Things have been changing a lot in the past month. However, the wheel spec has been on packaging.python.org for almost 3 years already.

The format was not introduced by the deferred PEP 491, but by PEP 427, which was accepted. You’ll see that Binary distribution format - Python Packaging User Guide is mostly the content of PEP 427. And the purpose of PEP 491 was to introduce a new feature to the wheel standard, namely that feature with Install-Paths-To and stuff that you were trying to understand, but that PEP was deferred so this specific feature never existed in the wheel standard.

2 Likes

Here’s where this feature made it into the production world (at least in the form of a comment and a swear to never implement it): MovePythonWhlDataStep (Buck)

So, I guess, I’m not alone in thinking that it was used or at least intended to be used.

I had a lot of tabs open must’ve been confused, because the contents of these documents is so repetitive.

Anyways, now that I’ve learned that Python Packaging Guide is the ultimate authority on the format, my goal is to understand whether the actual world honored this, or did it like myself and whoever that person from Facebook was who wrote MovePythonWhlDataStep got confused and actually implemented it?

Every now and then I run into various deviation from what I believe to be the letter of these documents in PyPI. For example, there are instances of wheels with names not normalized according to described normalization rules (and plenty of those). But the evidence is that those were successfully used in practice.

So, for example, the deferred PEP also mentions a bunch of directories it calls “install paths”, which allegedly were taken from autotools. The other PEPs don’t seem to have an exhaustive list, but have wording similar to:

Each subdirectory of distribution-1.0.data/ is a key into a dict of destination directories, such as distribution-1.0.data/(purelib|platlib|headers|scripts|data) .

I’m concerned by the “such as” here: it doesn’t instill confidence in this being all the possible options, it reads as giving examples rather than giving the whole list

Am I to conclude that autotools-based directory list never materialized? Am I to believe that the list given in this passage constitutes an exhaustive list of all possible destinations? How historically accurate is it?

It’s not very surprising to see implementations of non-accepted standards in the wild, since one usually needs to have at least a prototype implementation for a standard to be accepted (in order to avoid standardizing things that are found during implementation to be impractical or less ideal than expected). Furthermore, it can take time before a standard is accepted, which can lead to people implementing the PEP before it gets blessed as a standard.

This is not very different from how, e.g., C++ standards gain features that have already been implemented by major compilers as non-standard language extensions.

This specific part of the standard has indeed been a problem because of some misunderstandings during the PEP approval and implementation process. See the long thread Change in PyPI upload behavior. Intentional, accidental, pebkac? - #78 by barry .

I’m not aware of other known deviations from the wheel standard observed in practice.

Yeah, the spec says “The initially supported paths are taken from distutils.command.install.”, which should be updated now because distutils has been removed. I think the authoritative list is that of sysconfig schemes, can an expert confirm?

Tying up some loose ends, perhaps @dholth would consider updating PEP 491 from Deferred (“postponed pending further research or updates”) to Withdrawn?

4 Likes

The statement is still accurate. If anything, we probably want to add “as of <Python version active at the time the PEP was accepted>”.

sysconfig has changed and will continue to evolve. Referring someone to a contemporary version for information about a historical action doesn’t make sense.

A “this was the list at the time the spec was initially accepted” statement is technically true, but it’s not what I expect when reading a spec on packaging.python.org (as opposed to a PEP). The spec should describe the current state, not its evolution (that’s the role of the PEPs, and the history section at the end of the spec).

1 Like

That’s a fair point, but it mostly reflects the fact that transferring a PEP verbatim isn’t always the right thing to do. Unfortunately, we haven’t had the resources to do much more than that with many of the older PEPs, so it’s what we have - and “as of a particular Python version” is a quick and uncontroversial fix.

Improving the specification is certainly possible and would be a good thing to do, but we’d have to take care not to accidentally change the specification in doing so (the way that referencing sysconfig could, for example). So it would take extra effort, and potentially a wider discussion than a simple PR to the specifications document would trigger. We’ve had problems in the past with spec changes that seemed to be simple clarifications, but which broke things - so we should be cautious.

For reference this is our current process for making fixes and minor updates to existing specs. It’s open to some interpretation itself - for example, I’m not clear where to find the list of who the “PyPA core reviewers that are also PEP editors” are, who can accept a change as textual fixes, let alone whether we actually follow that rule - but it’s the process that we should be using for something like this.

OK, but… is it that complicated to define what the keys under the .data/ directory are? I (naively, perhaps) thought there was some common understanding of it that everybody knew but which the specification had just not been updated with. If sysconfig is not the canonical source for install locations, what is it? What does pip do? Can it depend on the installer?(!)

I don’t know, to be honest. (I could look at the pip code, but I don’t know without checking). Basically, I think the reality is that pip simply takes what’s in the wheel and finds a location for it - probably by looking at sysconfig, yes[1]. Pip doesn’t have to enforce what’s supported, that’s more for wheel builders to deal with, I think. And yes, other installers can do different things, as long as they conform to the spec. Installer looks like it lets the user supply a scheme:path mapping.

The problem here is that there’s a difference between what the spec requires and what the tools do, and anything outside the spec is “implementation defined behaviour”. We don’t prohibit or control implementation-defined behaviour, so “what existing implementations do” isn’t a 100% sufficient guide to what the spec should say (unless of course you survey every existing implementation, in which case you can say that a change won’t affect anyone, but that’s not practical).

Basically, as I was trying to say, judgement and caution is needed…

Personally, I’d expect something like a fixed list of allowed values, or something tying the spec to sysconfig, to be fine - but I’d want to give tool authors and users a chance to confirm that, so I’d expect at a minimum, a dedicated discussion thread, and possibly even a PEP, for such a change. And I’d want input from parties likely to be affected, such as Python distributors, build backend and installer maintainers, etc. But that’s my personal approach if I were making such a proposal myself, it’s not any sort of pronouncement on the required process.


  1. although I wouldn’t be at all surprised to find a bunch of legacy exceptions to that :slightly_smiling_face: ↩︎

IIRC, there wasn’t a clean story for how the paths should be computed after distutils died, and pip’s implementation is basically the canonical definition at this point in time.

installer doesn’t quite do something that’s the same, and I plan to align the two when I find some time to do so in the coming months.

2 Likes

The problem with sysconfig is that it only defines where CPython’s files are, and doesn’t in any way pretend to provide the paths where additional packages should be installed.

Defining the concrete list is a great idea, but it should be based on the current reality (either pip or setuptools, depending on exactly where it gets defined - I assume some stuff ends up in .data with subdirectories?) and not on a technically unrelated module.

Historically that’s probably the case, and it may well still be true in principle. But packaging tools have been moving towards more use of sysconfig, particularly since the alternative, distutils, was removed from the stdlib. We need something that defines the core interpreter policy for how packages are laid out.

As far as I know, @FFY00 (the current lead maintainer for sysconfig) views packaging uses of sysconfig as a reasonable use case, so I think don’t think it’s unreasonable to go in this direction.

None of which alters the fact that the existing standard says

This version of the wheel specification is based on the distutils install schemes and does not define how to install files to other locations.

which is both annoyingly vague (what precisely are “the distutils install schemes”?) and at the same time explicitly tied to distutils. So any change, whether it’s to be more explicit about what are the valid schemes, or to link the specification to something other than distutils, would almost certainly need a PEP[1].


  1. i.e. it would in my view, and I’m the nearest thing to an authority on this that we have ↩︎

2 Likes

The core interpreter has a policy - packages go on sys.path. That doesn’t help with where data files, headers, import libraries, etc. go, but the core interpreter doesn’t care about those. So the policy is independent from the runtime, and can be defined independently.

The core interpreter does have a responsibility to tell users where its own native libraries and headers are located. So if anything, the example is that “linkable” packages should provide a sysconfig equivalent (more likely some well-known entry points) and then install everything wherever they want (hopefully inside their own install directory).

Personally (as someone who maintains a build backend that would be interested in finding all the directories to pass to the extension module compiler), I’d be quite happy with entry point metadata that points into the default install directory. I know other people live on systems where everything goes into a single/few shared location(s), but in that case I’m not sure how we expect sysconfig to know where those are better than the OS (without enhancing sysconfig to provide platform information, as opposed to CPython information).

Sorry, maybe “core interpreter” was too restrictive, I meant to include the stdlib. The layout of virtual environments is clearly defined in the stdlib, both via the context object in the venv module’s API, and in the venv scheme in sysconfig. You can claim these are not intended as defining the locations where wheel contents should be installed, but I’m pretty sure it would be incredibly disruptive if tools (or the packaging standards) chose to not conform to those locations.

I get that in general the core developers prefer not to get too involved with packaging, and my impression is that you’re a strong supporter of that separation, but I think there needs to be at least some level of common understanding. Otherwise, we’re going to continue to have frustrating discrepancies and Python’s packaging story will keep looking like a bit of a mess to the average user (which isn’t good for either the packaging ecosystem or the core Python distribution). So personally, I’d rather acknowledge that sysconfig acts as the “bare minimum” point of common understanding (and I say “acknowledge” rather than “make” because that’s basically what we’re doing right now, like it or not).

If I’ve misunderstood your position, I apologise - your alternative idea is rather general and I may have read it in a way that you didn’t intend. For example you talk about “entry point metadata that points into the default install directory” - I can’t see how that would work as entry points are per package, not per install location, and so I don’t know what entry point (or who would provide that entry point) you’d be expecting an installer like pip to look at when trying to install a wheel for a new package. If that makes a difference to what I said above, please feel free to clarify (or better still, give an explicit example of how you see this working).

2 Likes

Only as far as the Lib\site-packages subdirectory and Scripts/bin. That doesn’t help with any of the other paths people want Install-Paths-To to define.

But maybe I’ve misunderstood, and people really do only care about the normal install path and the location to create entry points? In that case, sure, venv covers it. But it doesn’t cover anything else like data files, headers, import libraries, etc. And there’s nothing in the stdlib that ever needs to look them up (now that distutils is gone), so the stdlib is entirely unopinionated about where they should be.

I don’t think pip would look, I expect a build backend to look.

For example, lets say a package “A” includes a native extension module (e.g. numpy) that other packages “B” are allowed to link their extension modules against (e.g. scipy), so it has to expose a set of header files and import libraries for those other modules to compile with.

People seem to want the wheel format to have a include or headers directory, and a libs directory, so that “A” can install these extra files to the “correct” place.

For “B”, its build backend/configuration is going to have to run the compiler with /INCLUDE<that path> and /LIB<that path> (or whatever the options really are). One possibility is for it to “just know” that it should /INCLUDE:C:\IncludeFiles and /LIB:C:\LibFiles, and another is to /INCLUDE:{sysconfig.get_path(...)}.

But another possibility is that when the configuration says that “A” is a build-time dependency, the backend can look up the "include" entry point for all those dependencies, call their where() that returns a path, and add them as /INCLUDE: options. Or it can enumerate all packages and just add them all in. This is totally independent from the core runtime and stdlib, can be implemented and backported today to all versions, and doesn’t require any changes to CPython itself. It’s trivially versioned, and the returned path can be inside the normal package files.

But my main point is that this can be totally defined outside of the core runtime as long as we don’t say that sysconfig has the answer. It can live in packaging, or it can live in a document that build backends agree to use (which they all will, since people just want to be told what to do here), but it doesn’t need anything in core to change.

1 Like