Refine wheel spec to designate directory entries

In this comment I learned that Python has a limitation that PEP 420 namespace packages (packages without a file) aren’t recognized by the zip importer. Although that issue might be addressed in Python, support for these packages will only be available in newer Pythons.

Investigating further, I found that the zipimporter will recognize a PEP 420 namespace package if the zip file includes directory entries (zero-length files named with a trailing /), a convention that seems to be common among the most popular zip file implementations (including python -m zipfile, Info-ZIP, and Windows Explorer).

Wheels, however, are constructed without directory entries - only including the files, so I filed pypa/wheel#287 to explore the possibility of the wheel format including these directory entries to bring compatibility to wheels that happen to be on sys.path, acknowledging that such use is discouraged.

This functionality was released with wheel 0.33.2, but caused unexpected problems with distlib when it encountered those dir entries (also distlib#122), expecting them to be omitted.

At @pfmoore’s direction, I’ve begun work on amending the wheel spec to clarify that the presence of these directories are preferred.

Feel free to discuss, but note that the consequences of not accepting this change (or similar) might mean unraveling all of the aformentioned work and returning to the drawing board as far whether eggs should be enhanced and preferred for tools that need plugins.

I have two main concerns here:

  1. By only preferring the directory entries to exist, you haven’t made any difference to what tools will need to do. Wheels without the directory entries are still valid, and therefore tools will have to support them. So how is this better than the current situation?
  2. You need this, as you say, to allow you to put wheels on sys.path, using them as runtime containers in the same way that eggs worked in the past. But this is an explicit non-goal of the wheel format, and the discussion at seems to me to be generally pretty negative towards using wheels on sys.path, as opposed to, for example, designing an “importable container” format similar to wheels but with a different purpose - a suggestion that I don’t think you’re particularly interested in pursuing yourself (which is, of course, fine, but which doesn’t mean the case against wheels on sys.path is any different).

I don’t have any objection to the wheel spec saying that it would be better for tools to include the directory entries (I think it’s pointless, see (1) above, but I don’t object). I would be unhappy if such a statement resulted in pressure for tools to add those directories just to support wheels on sys.path.

Given that the “aforementioned work” was never a supported use case for wheels, I don’t think this is a fair argument. I think we should “go back to the drawing board” as you put it, and design a dedicated “importable container” format. That discussion is, it seems to me, onging in, and I’m almost tempted to suggest that we reject this change precisely so that it doesn’t derail that work.

To be 100% clear - what I said elsewhere is that a change like the proposed one here “should be uncontroversial”. But that was before making the change in the wheel project resulted in incompatibilities with distlib, and was also under the (apparently mistaken) impression that the need for this was unrelated to the ongoing "wheels on sys.path" debate. I’m no longer quite so sure the change is quite as innocuous as I believed then.

Edit: Also, note that we’re only discussing this (as I understand it) to work around a Python bug - zipimport not recognising PEP 420 namespaces when they are stored in zipfiles like this. So we’re talking about a spec change to work around a bug triggered by unsupported usage.

Makes sense to me that if an empty directory can be imported then it is Python code in a way we did not anticipate.

Has it been six years since we said please don’t import wheels even though it will work a lot of the time?

I think we should do the following:

Consume a version number 1.1

A wheel may contain empty directories encoded in the customary way in a zipfile, as an empty file ending in /.

Empty directories must appear in RECORD. (No opinion on whether they appear after installation.)

Most controversial: 0-length files may have an empty string as their hash.
Or a standard sha256=… hash of the empty string.

Recommend that implied directories are not included as separate entries, just because it is wasteful.

These rules preserve the purpose of RECORD and allow for enscons-style “make RECORD by iterating over a zipfile without caring about the type of each entry” RECORD generation.

I’m not sure about the empty string rule, and am tempted to require the hash of the empty string like we already do for an empty It will compress well.