Announcement: distlib 0.3.4 released on PyPI

Instead of namespace packages, you could make distlib a proxy package, splitting out the parts into separate distributions and have distlib import and expose the functions from those new packages in the same structure as before. Those components would still be able to depend on each other.

What I’ve found is the main issues are import time (important for a command-line app) and developer mental capacity (it’s easier to remember a smaller list of functions).


The current use of distlib may not necessitate any changes: depends on its goals.

One thing I will strongly recommend is releasing a v1.0: it allows for a better indication of the separation of breaking changes, features and fixes. There’s nothing wrong with having v1.0 identical to v0.3.4

Sure, but it’s not really clear whether there is actually a need to do this - as in, whether there is a real problem to be solved / what would be gained from doing that (given that it would make maintenance more onerous than it is currently).

I take it you mean in general, rather than distlib specifically? Just to see, I did a quick unscientific test to see what sort of figures we’d be talking about:

$ python -m timeit "import distlib"
2000000 loops, best of 5: 133 nsec per loop
$ python -m timeit "import packaging"
2000000 loops, best of 5: 137 nsec per loop
$ python -m timeit "import distlib.version"
1000000 loops, best of 5: 300 nsec per loop
$ python -m timeit "import packaging.version"
1000000 loops, best of 5: 308 nsec per loop
$ python -m timeit "import distlib.markers"
1000000 loops, best of 5: 295 nsec per loop
$ python -m timeit "import packaging.markers"
1000000 loops, best of 5: 301 nsec per loop

which would appear to show that the time for importing various distlib modules doesn’t seem much different to, say, packaging modules which cover the same areas.

In terms of developer mental capacity, I’d think the amount/quality of documentation is probably more important than just the length of the list of functions, in terms of assessing the cognitive load on potential users. I generally use search to find my way round an unfamiliar API (the search terms being chosen around what I want to achieve).

You may be right about that. Generally I’m conservative with version numbering, giving plenty of headroom for feedback and changes before an API is stable enough that it likely won’t see breaking changes any time soon. But since there has been so little feedback on distlib over the years, the API hasn’t really changed much (I always try hard to maintain backward compatibility, even with low point versions). So it might as well be version 1.0.0 as 0.3.4, I suppose, even though that seems more about the optics - somewhat akin to pricing something at $9.99 rather than $10.00 :wink:

2 Likes

This is not a good way of measuring import time. In general timeit should not be used for things that are cached. To time imports you need to run in a completely new process:

$ cat t.py
import numpy
$ time python t.py

real	0m0.238s
user	0m0.268s
sys	0m0.168s
$ time python t.py

real	0m0.229s
user	0m0.232s
sys	0m0.192s
$ python -m timeit 'import numpy'
2000000 loops, best of 5: 127 nsec per loop

$ ipython
Python 3.8.9 (default, Apr  3 2021, 01:02:10) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.29.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: %time import numpy
CPU times: user 192 ms, sys: 272 ms, total: 464 ms
Wall time: 138 ms

In [2]: %time import numpy
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 18.6 µs

The problem here is that timeit runs your statement repeatedly in a loop and then reports the minimum times observed. Every call to import after the first is just retrieving the module from sys.modules. Even running repeatedly in a fresh process is tricky given that the OS probably caches the files that are looked at by import so actually running after a full reboot of your whole computer might show a significantly slower time if there are many files to be loaded.

In any case though the import time still needs to be considered in the context of the likely time needed for whatever operation happens after the import.

1 Like

Ah no, you’re right - I was lazy about that, and covered it by sticking in “unscientific” as a caveat. But I don’t know that distlib is particularly problematic in terms of import time, as that doesn’t appear to have been an issue when used in the wild. And, as you say, the import time has to be taken in the context of the entire operation being carried out, rather than in isolation. Sorry for the noise!

I don’t know if there’s much point in worrying about the history at this point, it seems unlikely that people are going to switch from packaging if it’s doing the job. But on the assumption that feedback is still useful anyway, and there’s value in having an alternative for new users that might better suit their needs, the following points are what I personally find relevant.

It’s not immediately obvious how well distlib tracks active standards. As a check, I took a quick look at the distlib.metadata API, and it claims to support metadata 2.0 (PEP 426), which was withdrawn. If I were writing a tool that read metadata, I’d be concerned about using that API. Less so if it was an adhoc tool, very much so if it was something like pip, which needs to be careful about what standards it supports. Overall, distlib feels like it hasn’t kept up with recent standards, and in addition still has “experimental” stuff that never actually got standardised.

There are some surprising failures, which suggest that what’s going on in distlib isn’t what you’d expect. For example:

>>> from distlib.locators import locate
>>> osn = locate('oldest-supported-numpy')
>>> osn is None
True

oldest-supported-numpy is a real package on PyPI so it’s not at all obvious why the distlib API doesn’t pick it up. Presumably there’s some assumption that distlib is making that this package doesn’t conform to, but I’d certainly expect a standards-conforming library to be able to locate it.

There was (I don’t know if it still exists) a dependency on a red-dove website for package metadata. I can’t find anything in the documentation which mentions this, so I’m left with a nagging concern that I don’t know if what I’m using is standalone, or depends on that or another external resource. It may be, for example, that the above oldest-supported-numpy example fails because the red-dove database doesn’t contain that package. But how would I know?

On the subject of splitting up distlib, the one piece I would love to see separated is the script API. It’s standalone, well-tested, and many tools would benefit from being able to access it, while having no need for other parts of distlib (whether that’s because they use packaging, or simply because they don’t need any other packaging features).

I also personally find the packaging documentation nicer to read than distlib’s. But that’s a matter of opinion/preference, and not something I think you can avoid. Some people will always prefer one style over another.

Psst, installer.scripts - installer

installer uses @vsajip’s excellect simple_launchers repository (which is also what distlib uses IIUC) to generate the scripts, and makes them available via installer.scripts; with a different API compared to distlib.

The tracking of standards by me has not been particularly active of late - when I had the time, I spent a fair bit of time tracking PEP426 which was actively being discussed at the time. But the amount of time I personally had to do this has decreased, and that’s why (as I said earlier) I tend to be more reactive than proactive - it’s been based on the level of engagement by others. I did make updates to support Metadata 2.1 (PEP 566), but perhaps haven’t kept the documentation up to date and removed references to withdrawn PEPs. I would expect to look at PEPs when they’re finalised rather than when they are draft (that’s the case with e.g. PEP 639).

But even PEPs that are final don’t always seem to be conformed to by other packaging software. For example, the current Metadata spec 2.1 (PEP 566) says about the Description field:

In addition to the Description header field, the distribution’s description may instead be provided in the message body (i.e., after a completely blank line following the headers, with no indentation or other special formatting necessary).

However, if you upload a wheel to PyPI which has the description in the Description: header rather than in the message body, PyPI doesn’t pick it up and wrongly displays “The author of this package has not provided a project description”. It doesn’t say in PEP 566 that the Description: header is deprecated, just that an in-the-body description is allowed as an alternative. Is there something in the spec or some addendum I’ve missed about that header being deprecated, and only in-the-body descriptions being accepted now?

I’ll certainly look into it! Thanks for letting me know.

Minor point, 2.2 is the latest version and Core metadata specifications - Python Packaging User Guide is the canonical location of the spec these days.

Metadata is something there isn’t a library tracking the standards yet, so everyone makes their own implementation, which is unfortunate, as it does mean that everyone has their own set of bugs :slightly_frowning_face:

I think what people want most is implementation of agreed standards, so tracking PEP 426 while it was in development may have been a mistake, in hindsight. I didn’t check the code itself, I basically just did what I would expect to do if I were looking for a metadata parsing library, I read the docs and took a view from there, which was “doesn’t look like it’s up to date if it’s accepting 2.0”. But I completely understand the point about lack of time/motivation to be proactive - I don’t intend my comment to be a complaint, merely a suggestion about why people might not be using distlib.

Of course there will be the inevitable cries of “it should go in packaging because that’s where everything else is”. I can understand that (why depend on two libraries with such a significant amount of overlap?), but I can also completely understand why you’d feel less motivation to maintain that API in distlib if that’s the response you get…

This sounds like a bug in either the tool that built the wheel or the tool that uploaded the wheel to PyPI. PyPI does zero introspection of the file contents and relies on the uploader to provide the right form fields when uploading the distribution file, and there’s only one description field, regardless of where it came from.

But PEP639 (Metadata 2.2) has a status of Draft, whereas PEP566 (Metadata 2.1) has a status of Final. This is why I cited 2.1 as “current”. In terms of the Description field, there’s (from my reading) no change between 2.1 and 2.2 specifications. I expect I will look at PEP639 support once it reaches Final status. As you said, tracking Draft PEPs might be a mistake.

I wouldn’t be too quick to kick this particular can down the road. Either the metadata uploaded to PyPI conforms to the spec, or it doesn’t. If it doesn’t, it’s fair enough to blame the generating tool. (The uploading tool was twine, I think.) An example release which exhibits the problem is distlib 0.3.1. There are two uploaded files for this - a .zip and a wheel. Here’s the METADATA from the wheel (elided most other headers for brevity):

Metadata-Version: 1.1
Name: distlib
Version: 0.3.1
Summary: Distribution utilities
Description: Low-level components of distutils2/packaging, augmented with higher-level APIs for making packaging easier.
Home-page: https://bitbucket.org/pypa/distlib

Here’s the PKG-INFO from the .zip, similarly elided:

Metadata-Version: 1.1
Name: distlib
Version: 0.3.1
Summary: Distribution utilities
Description: Low-level components of distutils2/packaging, augmented with higher-level APIs for making packaging easier.

In both cases, the Description: header contains what looks to me like a valid description. According to the spec (at the time), this was valid metadata, so it should be reflected in PyPI’s UI for the release. I reported this as an issue to Warehouse in June 2020 - getting on for 18 months ago, and from what I can tell it’s still broken on PyPI, at least as of a couple of days ago. (Unless you think the above metadata is invalid, and if that’s so, how?) You [Dustin] quickly identified the issue as being in pkginfo, which is presumably a library that Warehouse uses internally. You closed the Warehouse issue right away, so I raised it as an issue with pkginfo the same day. They accepted it as a bug the next day, but it’s taken until 18 November 2021 for the fix to be released. I don’t blame you for forgetting, Dustin, it seems like a lifetime ago :wink:

I know. Bugs happen, these are volunteer projects, things take their time because people have other priorities. If it seems like I’m belabouring this, it’s because Paul made a comment about a bug in distlib and from that seemed to suggest that distlib was somehow suspect (he didn’t use that word) because it wasn’t tracking PEP specs. Isn’t this a case of Warehouse doing the same thing, for an even more basic bit of functionality? It’s surely secondary if this was happening in a dependency library rather than Warehouse code itself, as it’s PyPI’s UI that is misleading.

Would that it were so simple. I think part of the reason why distlib hasn’t seen much adoption is because these things are determined more (than some might think) by social capital than other things such as technical reasons. In other words, the preferences of those who are most closely associated with key tools like Warehouse, pip and so on play a large part in determining what other tool developers do. If one of the people with higher social capital than me starts a project which overlaps distlib functionality and promotes it, what are most people going to go for? They’re not going to spend time evaluating anything. In this very thread, which was started to announce a distlib release, Pradyun is pointing people to his own alternative to some distlib functionality - a part of distlib which you acknowledged was useful. (Of course he is free to do that, but I can’t say I find that to be in the height of taste.)

I maintain distlib for its users, even if they are few compared to packaging. But it is somewhat disheartening when the rhetoric of a standards-based approach to packaging is belied by the reality of how it actually seems to be, at least some of the time. Even that minor comment by you about a personal preference for packaging’s documentation over distlib will send a signal to some about which they should prefer (without them needing to look into either package in any depth), even though I’m pretty sure you didn’t intend that.

I thought this sounded familiar. :slightly_smiling_face:

No, this is a library that twine uses to extract the metadata from the distribution file prior to upload. There’s nothing PyPI/Warehouse can do to fix this, hence why I closed the issue.

PEP 639 isn’t Metadata 2.2. PEP 566 made packaging.python.org the canonical location, and the spec there notes 2.2 is valid, but it was added by PEP 643[1], not PEP 639 (which needs updating to say that it introduces metadata version 2.3).

There is no change in Description between 2.1 and 2.2, so you’re right on that point.

My point here was simply that distlib’s documentation states the specs that it follows, and at least one of those is not an accepted standard (worse, it’s withdrawn). If I read the docs, distlib doesn’t meet my needs (I want something that follows accepted/final standards). If I disregard the docs, I have no means of knowing if distlib does what I want.

I don’t want to labour this point, either. I completely understand that keeping docs up to date isn’t an easy or rewarding task. All I’m trying to do is explain why I, personally, would be cautious about choosing distlib if I wanted to parse metadata. And I’m only doing that because you seemed interested in feedback on why people don’t find distlib appealing.


  1. Which is marked as “Accepted”, not “Final”. My bad, I’ll submit a PR to make it final. The distinction is unclear for packaging standards… ↩︎

Oh, I see - sorry. I hadn’t realised I should have logged the issue with twine as well, as the comments on the Warehouse issue weren’t clear to me.

You can’t fix the bug in twine, sure, but you (technically) could update the PyPI description with the values in metadata in uploaded archives. I guess you don’t mean technically, and more as a matter of policy - you choose not to look at metadata in uploaded archives.

OK, but it does say it is 2.2 in the title, and also says that it replaces PEP 566, which is why I drew the conclusion I did. Whereas the Core metadata specification does say

Fields defined in the following specification should be considered valid, complete and not subject to change.

I couldn’t tell for sure that it was definitively the latest version. At least the PEPs can indicate if they’re superseded by or superseding another one.

Right, but it seems a bit of a pot-and-kettle situation if the PEPs themselves (well, 639) are misleading as to what they are actually documenting, and surely it’s more important to get those right than some reference to a withdrawn PEP in a little-used library? Aren’t you applying different standards to the two cases? I have some difficulty believing you would give distlib serious consideration for parsing metadata, even if I were to overhaul the documentation to remove references to PEP 426 and ensure that 2.2 metadata was complied with :slightly_smiling_face:

I think we’re getting to the point of diminishing returns here (if we haven’t already gone way beyond that :wink:). I agree that the state of things regarding standards is a bit of a mess, at least to people who haven’t been closely involved with the relevant standards.

Someone really should tidy everything up so that we properly conform to the documented process. That would likely also mean publicising that there is a documented process, and ensuring that PEPs are written in a way that reflects that process. I’m not going to be that “someone”, though, so there’s not much more I can say here.

Maybe. It’s quite possible that I’m too close to the standards docs to see their flaws. If so, then I apologise.

As far as using distlib is concerned, I honestly don’t know at this point, but you’re probably right. I was an enthusiastic early adopter of distlib, but nowadays I tend to look at the newer libraries. I’m frustrated that there’s no metadata parser in packaging, but would that be enough to add distlib as a dependency for my projects? In all honesty, probably not.

I guess I’d have to agree with you, that it’s not clear where distlib fits in the current packaging landscape (except for the fact that as an alternative implementation of the standards, it demonstrates the important point that it is feasible to have multiple implementations - something I’m very keen on).

But if almost no-one uses it, it’s just an alternative in theory only, which smacks of tokenism. The “blessed” tools and libraries just keep on getting used, and even if they don’t conform to the standards, no one will be really interested in any alternatives. So what price the standards then? We’d be back to the particular-implementations-as-de-facto-standards, which is where we started.

But hey ho, it’s not a technical problem - it’s a social problem. As you say, diminishing returns, so let’s leave it there. Thanks for the feedback.

2 Likes

Sorry I was absent; had to finish up a big NASA project by a tight deadline. Seems like I stirred a bit of a hornet’s nest, sorry, but it appears this discussion has come somewhat full circle. I did want to note that in response to:

For what its worth, that someone actually happens to be me, thanks to @pf_moore 's guidance; the complete update, revamp and adding a lot of additional documentation ended up stretching over many months, but I’ve had the PR up for a few weeks now and it has attracted considerable feedback and interest, and I’m now officially a co-author and de-facto champion of it since the original author has been very busy with other projects. As a packaging tool maintainer, your feedback is always welcome over there. Thanks!

3 Likes