Why isn't source distribution metadata trustworthy? Can we make it so?

One thing that I have no feel at all for, is how significant a proportion of projects fall into this category. Are we talking about 50% of downloads from PyPI? Or 10%? Or 1%? Are download counts important, or would some other measure better capture “importance” here?

For me, this is a classic 80-20 style of problem, if we could benefit 80% of the cases, I’d be happy with that. The added wrinkle, though, is that we have no real idea where to draw the line between the 80 and the 20. So we too often end up paralysed, unable to make progress because we can’t judge the importance of the use cases we’re considering.

Not entirely true, there are people who install with --no-binary, for example. As well as people on platforms where wheels aren’t available (am I right that Docker images that use musl don’t have wheels, for example?). Again, a better understanding of use cases would help here.

1 Like

If we’re going this far, might as well start doing “static evaluation” of setup.py to check if there’s anything dynamic happening in setup.py – @techalchemy had something for this if I remember correctly.

I do indeed. It’s very poorly implemented though. It’s probably imperfect but it is defintinely possible to traverse the AST for this information. It gets tricky because sometimes people import setup under an alias, e.g. from setuptools import setup as do_stuff (i’ve seen approximations of this) and I’ve even seen people rely on directory-local imports of their own code which imports setuptools.setup on one occasion (from .mymodule import my_version_of_setup which in turn called setuptools.setup). I do not believe my code handles that case :slight_smile:

This is an interesting conversation and one bit of information I would like to add that may be relevant is that I was at a packaging summit hosted by Microsoft recently with folks from npm, go, Java (maven/gradle), OCI, NuGet and a few others and the overarching theme seemed to be enforcement – putting tools in front of the upload process, whether they are strict enforcement or simply encourage the desired behaviors (someone from github suggested that they could fail a check if their wheels lacked metadata after a build). Rather than trusting that the user supplied good metadata, there was a lot of interest in actually validating or if possible generating the metadata at the index.

This is obviously super nuanced and I’m hand waving tons of complexity but I think we are relatively smart and we can probably get a basic solution working. It’s in keeping with what we discussed at PyCon last year and it ultimately all comes down to metadata.

As of last month I accepted a partly sponsored role with Canonical and I’ll be spending a chunk of my time on packaging related work so I’ll be glad to catch up on these. I believe @pradyunsg and I were supposed to draft a PEP related to extras based on some of the work @njs had done as an outcome of PyCon last year, but due to a lot of factors I hadn’t had any time to do anything open source related. Now that I have time I’d be glad to pick that back up (I’m sure it’s discussed on discourse somewhere).

To the original question, I’d suggest caution around making adjustments to metadata PEPs ahead of the resolver work in pip unless we are prepared to tackle the full extent of the issue surrounding our current metadata representations (see the previous paragraph about extras). If that’s something we are willing to tackle head on, I think it does make sense to do that first, however.

Sorry for the many words but hopefully that was mostly on-topic and clear.

I’m going to reiterate a point I’ve made elsewhere on this, though. How far should we go to support such usages? What requirements drive the use of such unusual approaches for the projects using them, and are those requirements sufficiently compelling to justify the significant amount of extra work required of the packaging tool community to cater for those usages?

I strongly believe we should avoid getting trapped in a mindset that says that we have to support absolutely every usage of setuptools imaginable, across all packaging tools. If the requirement for a particular project is strong enough, “use an older version of pip” is an option - and if the cost to the project of doing that is too high, then maybe the cost of supporting that usage in the packaging tools should also be considered too high.

We have to be cautious here - breaking backward compatibility should never be something we do lightly - but we should have the option available when it’s needed.

1 Like

Yea. I’m going to say that a system that covers the basic case – a literal defined in setup.py should be considered canonical.

Anything else, we can tackle that if we see the need to.

Completely agreed

Without getting too much in the weeds anymore we are basically on the same page. Ultimately (and I realize this position may still be controversial) I think we need to move as far away from executable package manifests as possible. I.e. define metadata in one place, and, if needed, build extensions in another. As long as we are stuck asking the question “do we need to write an AST parser for reading install_requires information or should I run python setup.py egg_info and parse the resultant metadata?” we are going to be building these overly complicated workarounds just to get basic metadata.

So what would we need to do in order to make that happen? What are the major barriers?

I’m saying, setuptools should do these shenanigans, to determine if the metadata from setup.py is “stable”. A field added to the metadata specification for declaring how sdists can have “stable” dependency data would be good to have too.

1 Like

I personally would not ban out dynamic metadata… but introduce a field into our metadata where tools/people can define if some metadata is dynamic or not. When dynamic metadata is needed we can do prepare_metadata_for_build_wheel. I would impose though that the prepare_metadata_for_build_wheel must be stable on subsequent call… that is calling it on the same machine twice, one after another, should give the same metadata.

Agreed, it’s sometimes necessary. But probably only rarely. So, coming round full circle on this, what’s wrong with the following proposal:

  1. Tools that create sdists (setuptools, flit, etc) work out how they can tell if a given metadata item is “static”. That doesn’t need any standardisation, it’s just a question of their UI. For example, flit can probably say “everything is”, and setuptools can maybe say "everything from setup.cfg is as long as it’s not then modified via setup.py". Worst case, tools could ask the user to say.
  2. We add a way for that information to be recorded in the sdist metadata. @chrahunt’s original suggestion of Metadata-Covers: X, Y, Z seems reasonable, and we can bikeshed as much as we want (or can endure :wink:).
  3. Tools like pip start to rely on the data that’s marked as reliable.

Step (2) is the only one that needs standardising. But all the benefits come from steps 1 and 3, so if the only blocker on achieving this is to agree on (2), then let’s focus on that. Is there anything wrong with the suggestion of Metadata-Covers: <list of items, or "all">? Does anyone have any better name than Metadata-Covers?

There is of course the other matter that the sdidt format and the existence of PKG-DATA is not yet standardised. So there’s no standard for the decision in (2) to update at the moment. We could either make standardising sdists a pre-requisite for this discussion, or we could get something agreed, put it in place as an implementation-defined behaviour for now, and tackle standardising sdists separately. I’m inclined to do the latter (says the interop standards BDFL-delegate :slightly_smiling_face:), because it allows us to make faster progress.

Thoughts? Is this too simplistic?

1 Like

I’m definitely on the side of someone first implementing a POC before making it proposal. Then we just iterate on that POC to cover edge cases, and standardize that.

Can you clarify in what circumstances it might be necessary? Are there certain fields where it’s going to be necessary?

I suspect that most cases of dynamic metadata are people who don’t know better - they have environment-specific dependencies or something like that and don’t know about environment markers.

The only situation I can think of where someone might legitimately want dynamic metadata would be if there’s a very specific environment that we don’t have any environment marker for and someone needs to do a workaround. Even in that case, we could almost certainly use a “conditional dependencies” mechanism that says, “Here are all the dependencies that definitely will be installed, here are some that depend on install-time conditions.” If that is necessary, it seems better than “metadata can be anything”.

Sorry, terminology may be getting confused here. And I shouldn’t have casually made my comment on the back of a comment about “dynamic metadata”, I should have been clearer what I meant.

I’m referring specifically to “if you take a sdist and look at its metadata, and then build a wheel from that sdist, you get different metadata”. That’s not (necessarily) because the metadata is calculated dynamically, but it’s effectively the same to pip - we can’t rely on the sdist metadata. This apparently does happen - see the initial post from @chrahunt, which mentions that it’s a problem with requests.

I don’t actually care whether we support dynamic metadata (even assuming there are use cases that need it - I don’t know if @bernatgabor had anything specific in mind). What I care about is sdist metadata being reliably1 the same as what we’d get by building a wheel, so that when we have a sdist in pip, we can use that data in the early (resolver) stages, and skip a call to the build backend just to get metadata that in theory we already have.

My impression was that the reason we couldn’t have that was because the metadata is calculated dynamically, and we can’t be sure it will still be the same at wheel-build time. But now that I check, I don’t see anything in the requests code that explains why the sdist doesn’t include Requires-Dist - is that actually just a bug (in setuptools or distutils or somewhere) that could be fixed?

Maybe step 1 needs to be for someone to understand and clarify why the metadata in sdists is incomplete/unreliable?

1 And by “reliably” I mean “we have a means to verify which values it’s OK to rely on”, not that it always has to be 100% accurate.

This is because the input isn’t reliably deterministic. Consider the extreme example from Dustin’s blog post on this:

from setuptools import setup
import random

setup(
  name="paradox",
  version="0.0.1",
  description="A non-deterministic package",
  install_requires=[random.choice(["Dep1", "Dep2"])]
)

The much more common scenario is one where the dependencies are generated based on the platform that’s building from sdist, and this use case has been replaced with environment markers (that most people don’t know about):

install_requires = ["Dep1"]
if sys.version_info < (3, 7):
  install_requires.append("importlib-metadata")

setup(...,
    install_requires=install_requires
)

By the time it gets to setuptools, it’s just a list, and we don’t know if it was generated dynamically or not. If the dependencies are specified in setup.cfg, we know they are reliable and there’s an open issue to fix this. As others in the thread have mentioned, we can almost certainly parse setup.py with an AST and in many basic cases determine whether the dependencies are deterministic or not.

Most of the options for “banning dynamic metadata” are not great and have the potential to break stuff that would probably already just work in most scenarios, but if we decided the cost was worth paying, I’m curious to know if we would be stymied because there are legitimate use cases that we won’t be able to support in deterministic metadata implementations in a realistic time frame.

I’m also curious to know if this is just install requires or if there are places where the metadata is being set “dynamically”. The one use case I know of / have for that is that dateutil does a search-and-replace in README.rst during the build, because PyPI doesn’t support .. doctest::. It’s still deterministic, but it would be difficult to detect that it’s deterministic through heuristics.

To tack on to the nomenclature confusion, when I hear static vs dynamic my brain keeps trying to put the version issue into there thanks to e.g. setuptools_scm calculating the version “dynamically” when setup() runs (same goes for people who use open() to paste in their long description).

But I think the key thing that’s being asked is static versus “dynamically environment-dependent” to differentiate from the “statically environment-dependent” that markers support).

And the only other thing I can think of along these lines are file inclusion, maybe entry points (and this is a guess; I have no real-world examples to back this up).

Well, in the case of requests, it actually is deterministic (in the sense of “does not depend on any external factor”), but I take your point that it’s not possible to verify that if the data is generated via setup.py.

I’ve certainly been guilty of using the terms “static” and “dynamic” sloppily. For me, the key point is “if a metadata value is specified in the sdist, and I build a wheel from that sdist, can I be sure that the metadata value in the wheel will be the same as the one from the sdist?” I don’t have a good term for that property, to be honest.

I did some experiments to verify what’s going on here, building a sdist and a wheel for requests. It looks like setuptools simply doesn’t include all the metadata in the sdist. I assume based on what you’re saying, that this is actually a deliberate decision by setuptools - if it can’t be sure the data is going to be the same as the wheel, it omits it? Although I’m not clear in that case why you feel comfortable to include the Requires-Python metadata, which can surely differ between the sdist and the wheel for exactly the same reason?

For pip’s use case, which is what triggered @chrahunt’s original post here. it seems like we need three things:

  1. The implementation of this feature request that you mentioned above.
  2. Some way for pip to know whether the lack of Requires-Dist (and Requires-Python, and maybe others) in the sdist metadata means “there are no dependencies” or “you need to call the build backend to get this data”. At the moment, both of these are signalled by the metadata not being present in the sdist.
  3. An assurance that any metadata values that are present in the sdist, will be the same in the wheel built from that sdist. That assurance could (at least as far as I’m concerned) simply be in the form of a statement that “consumers are allowed to assume that if a metadata item is in the sdist, then it will be the same in the wheel”, making projects that violate this rule are unsupported. Then the problem boils down to how the user and the build tool agree what can be included in the sdist.

Also, other tools that generate sdists need to follow the same rules, so they need to be written up as an interop standard - but that’s a bit of bureaucracy that can be done once we have a consensus.

I did some experiments to verify what’s going on here, building a sdist and a wheel for requests . It looks like setuptools simply doesn’t include all the metadata in the sdist. I assume based on what you’re saying, that this is actually a deliberate decision by setuptools - if it can’t be sure the data is going to be the same as the wheel, it omits it? Although I’m not clear in that case why you feel comfortable to include the Requires-Python metadata, which can surely differ between the sdist and the wheel for exactly the same reason?

setuptools actually stores sdists’ Requires-Dist metadata in $PROJECT.egg-info/requires.txt. I suspect this is due to some historical reason; Requires-Dist was only added to the metadata standard by PEP 345 (corresponding to Python 2.5), which (I believe) postdates setuptools’ support for install_requires. Between those two points in time, setuptools couldn’t store requirements in PKG-INFO as it wasn’t supported there, so it used its own metadata files, and they apparently never bothered to change it afterwards. On the other hand, support for Requires-Python, if I remember correctly, was added after the corresponding PEP came out.

1 Like

TBH it’s probably true of everything in the metadata file, it’s just that I’ve never heard of anyone setting platform-dependent values for anything other than requirements, so from a practical point of view it’s just something we don’t have to worry about.

I think @jwodder is likely correct as to why Requires-Dist is treated differently, though that may be just a stroke of good fortune since it would be fairly common for the Requires-Dist information in an sdist to be inaccurate for a given platform.

I think this is one of the options for banning this “dynamic” metadata (I’ll keep using this term until we come up with something better, I guess), but it’s not really going to prevent people from continuing to generate “broken” metadata in this way. People will open tickets in pip or whatever project saying, “Such and such project has the wrong dependencies according to X command”, and then you’ll close the ticket with, “X should be doing the right thing”, and maybe X will hear about it and complain, “How the hell was I supposed to know this?” I doubt it’ll move the needle on the status quo.

I think we can come up with a transition plan to move people away from “bad metadata” and on to “good metadata”, but I think maybe it’ll take a decent number of developer-hours and might have to encompass more than just the Requires-Dist part. Maybe we can say, “OK, we’ll drop support for the legacy system even before we get our act together and start moving people away from it, since the things we’re dropping support for are all new features blocked on this anyway”, but I think there are more than a few things out there in packaging especially where the old way is deprecated and the new way is not ready yet :frowning:. It doesn’t help our reputation to add another one of those things.

1 Like

I think the required PKG-INFO statement in PEP 517 may be sufficient:

A .tar.gz source distribution (sdist) contains a single top-level directory called {name}-{version} (e.g. foo-1.0), containing the source files of the package. This directory must also contain the pyproject.toml from the build directory, and a PKG-INFO file containing metadata in the format described in PEP 345. Although historically zip files have also been used as sdists, this hook should produce a gzipped tarball. This is already the more common format for sdists, and having a consistent format makes for simpler tooling.

1 Like

non-deterministic is what I’ve used for things like that.


Overall, I think adding a field to indicate the same is an approach that makes a lot of sense.

Works for me as that can encompass externally-influenced-at-build-time.

Probably not as long as build tools support executable code for gathering metadata which can’t really be controlled for.

Could pip do a comparison after a wheel build and raise a warning stating that the metadata differs and you should contact the maintainers to make the metadata consistent?

I agree that having the end-to-end solution in place before flags are raised by tools to say something is out of compliance is a good thing.