Why isn't source distribution metadata trustworthy? Can we make it so?

I would assume not, as there’s no already-completed “build” process for such requirements. So we have to assume such data is untrusted, as there’s been no chance to validate it. (Yes, we can say that projects should create a file xxx that contains metadata in this format, but without knowing that the file has been validated, tools can’t rely on it).

So the issue is actually that the generated metadata for wheels and sdists are not the same in case of setuptools. Without having looked into it in much depth I would argue this then to be a bug in the build system setuptools. Indeed, it is a known issue https://github.com/pypa/setuptools/issues/1716.

1 Like

+1 on treating this as “just” a backend bug. The problem is that once lost, trust is hard to regain - how can pip (or any other front end) detect that the backend is trustworthy in this respect?

Requiring backends to add a metadata field that (in effect) says “I don’t have a bug” seems a bit silly (and worse still, it would have to be added to the metadata standard as a required field, to be of any use!) but I can’t think of a better alternative.

There will also be cases where the sdist simply doesn’t know all the metadata for the final wheel, because it varies depending on what happens during the build. So we could think of this proposed field as “I can promise that my wheel metadata is not dynamic and will match the sdist”, rather than just “I don’t have a bug”.

Also if we did this, I think the trick would have to be that if you set this flag, then pip and other build tools need to actually enforce it, by comparing the sdist and wheel metadata and erroring out if they don’t match. That’s the only way to make it actually trustworthy.

But setuptools will never be able to set this flag automatically, because setuptools has no idea whether any given setup.py has tricky dynamicity in it. Which means that this flag would have to be something that individual projects have to opt-in to. Which is fine for projects that have active and diligent maintainers. … But those projects mostly distribute wheels already, so this flag is unnecessary. The projects that need it are the ones that only distribute sdists. Some of those projects do have active maintainers that could potentially be convinced to add this flag. But I think to make a real dent in the missing-metadata problem, you’ll need to find something that works for the inactive-but-still-used projects, and an opt-in flag won’t help with those.

2 Likes

Yes, exactly.

I’m also on board for enforcing metadata consistency. That is similar to an issue I filed here.

There are other use cases that would make this worthwhile, specifically:

  • Users that pass --no-binary :all:, where we won’t consider remote wheels
  • Users on platforms that don’t have pre-built wheels

The primary focus for me is coming to a conclusion on whether we can make anything about this PKG-INFO useful, or rule it out entirely. An opt-in flag is the only way I see forward on that front. I agree it will not help inactive-but-used projects.

This is not entirely true. There is an open issue to populate Requires-Dist for sdists if-and-only-if install_requires is specified in setup.cfg and not in setup.py.

Of course, one can make the argument that a setup.py could do this:

...

if some_condition:
  kwargs['install_requires'] = ["something"]

setup(**kwargs)

Even if install_requires is specified in setup.cfg. Even if we’re forced to consider this a possibility and not auto-set the flag, we have other options of decreasing value, e.g. only set the flag if the setup.py is generated by setuptools itself, or use the ast module to parse setup.py and only set the flag if setup() is called with enumerated options and without install_requires.

In any case, it’s definitely true that it’s somewhat tricky, but if we combine some zero-false-positive heuristics with education and documentation about the use of declarative metadata, we may get to a world where for the most part, even source distributions have reliable dependency metadata in setuptools.

One thing that I have no feel at all for, is how significant a proportion of projects fall into this category. Are we talking about 50% of downloads from PyPI? Or 10%? Or 1%? Are download counts important, or would some other measure better capture “importance” here?

For me, this is a classic 80-20 style of problem, if we could benefit 80% of the cases, I’d be happy with that. The added wrinkle, though, is that we have no real idea where to draw the line between the 80 and the 20. So we too often end up paralysed, unable to make progress because we can’t judge the importance of the use cases we’re considering.

Not entirely true, there are people who install with --no-binary, for example. As well as people on platforms where wheels aren’t available (am I right that Docker images that use musl don’t have wheels, for example?). Again, a better understanding of use cases would help here.

1 Like

If we’re going this far, might as well start doing “static evaluation” of setup.py to check if there’s anything dynamic happening in setup.py – @techalchemy had something for this if I remember correctly.

I do indeed. It’s very poorly implemented though. It’s probably imperfect but it is defintinely possible to traverse the AST for this information. It gets tricky because sometimes people import setup under an alias, e.g. from setuptools import setup as do_stuff (i’ve seen approximations of this) and I’ve even seen people rely on directory-local imports of their own code which imports setuptools.setup on one occasion (from .mymodule import my_version_of_setup which in turn called setuptools.setup). I do not believe my code handles that case :slight_smile:

This is an interesting conversation and one bit of information I would like to add that may be relevant is that I was at a packaging summit hosted by Microsoft recently with folks from npm, go, Java (maven/gradle), OCI, NuGet and a few others and the overarching theme seemed to be enforcement – putting tools in front of the upload process, whether they are strict enforcement or simply encourage the desired behaviors (someone from github suggested that they could fail a check if their wheels lacked metadata after a build). Rather than trusting that the user supplied good metadata, there was a lot of interest in actually validating or if possible generating the metadata at the index.

This is obviously super nuanced and I’m hand waving tons of complexity but I think we are relatively smart and we can probably get a basic solution working. It’s in keeping with what we discussed at PyCon last year and it ultimately all comes down to metadata.

As of last month I accepted a partly sponsored role with Canonical and I’ll be spending a chunk of my time on packaging related work so I’ll be glad to catch up on these. I believe @pradyunsg and I were supposed to draft a PEP related to extras based on some of the work @njs had done as an outcome of PyCon last year, but due to a lot of factors I hadn’t had any time to do anything open source related. Now that I have time I’d be glad to pick that back up (I’m sure it’s discussed on discourse somewhere).

To the original question, I’d suggest caution around making adjustments to metadata PEPs ahead of the resolver work in pip unless we are prepared to tackle the full extent of the issue surrounding our current metadata representations (see the previous paragraph about extras). If that’s something we are willing to tackle head on, I think it does make sense to do that first, however.

Sorry for the many words but hopefully that was mostly on-topic and clear.

I’m going to reiterate a point I’ve made elsewhere on this, though. How far should we go to support such usages? What requirements drive the use of such unusual approaches for the projects using them, and are those requirements sufficiently compelling to justify the significant amount of extra work required of the packaging tool community to cater for those usages?

I strongly believe we should avoid getting trapped in a mindset that says that we have to support absolutely every usage of setuptools imaginable, across all packaging tools. If the requirement for a particular project is strong enough, “use an older version of pip” is an option - and if the cost to the project of doing that is too high, then maybe the cost of supporting that usage in the packaging tools should also be considered too high.

We have to be cautious here - breaking backward compatibility should never be something we do lightly - but we should have the option available when it’s needed.

1 Like

Yea. I’m going to say that a system that covers the basic case – a literal defined in setup.py should be considered canonical.

Anything else, we can tackle that if we see the need to.

Completely agreed

Without getting too much in the weeds anymore we are basically on the same page. Ultimately (and I realize this position may still be controversial) I think we need to move as far away from executable package manifests as possible. I.e. define metadata in one place, and, if needed, build extensions in another. As long as we are stuck asking the question “do we need to write an AST parser for reading install_requires information or should I run python setup.py egg_info and parse the resultant metadata?” we are going to be building these overly complicated workarounds just to get basic metadata.

So what would we need to do in order to make that happen? What are the major barriers?

I’m saying, setuptools should do these shenanigans, to determine if the metadata from setup.py is “stable”. A field added to the metadata specification for declaring how sdists can have “stable” dependency data would be good to have too.

1 Like

I personally would not ban out dynamic metadata… but introduce a field into our metadata where tools/people can define if some metadata is dynamic or not. When dynamic metadata is needed we can do prepare_metadata_for_build_wheel. I would impose though that the prepare_metadata_for_build_wheel must be stable on subsequent call… that is calling it on the same machine twice, one after another, should give the same metadata.

Agreed, it’s sometimes necessary. But probably only rarely. So, coming round full circle on this, what’s wrong with the following proposal:

  1. Tools that create sdists (setuptools, flit, etc) work out how they can tell if a given metadata item is “static”. That doesn’t need any standardisation, it’s just a question of their UI. For example, flit can probably say “everything is”, and setuptools can maybe say "everything from setup.cfg is as long as it’s not then modified via setup.py". Worst case, tools could ask the user to say.
  2. We add a way for that information to be recorded in the sdist metadata. @chrahunt’s original suggestion of Metadata-Covers: X, Y, Z seems reasonable, and we can bikeshed as much as we want (or can endure :wink:).
  3. Tools like pip start to rely on the data that’s marked as reliable.

Step (2) is the only one that needs standardising. But all the benefits come from steps 1 and 3, so if the only blocker on achieving this is to agree on (2), then let’s focus on that. Is there anything wrong with the suggestion of Metadata-Covers: <list of items, or "all">? Does anyone have any better name than Metadata-Covers?

There is of course the other matter that the sdidt format and the existence of PKG-DATA is not yet standardised. So there’s no standard for the decision in (2) to update at the moment. We could either make standardising sdists a pre-requisite for this discussion, or we could get something agreed, put it in place as an implementation-defined behaviour for now, and tackle standardising sdists separately. I’m inclined to do the latter (says the interop standards BDFL-delegate :slightly_smiling_face:), because it allows us to make faster progress.

Thoughts? Is this too simplistic?

1 Like

I’m definitely on the side of someone first implementing a POC before making it proposal. Then we just iterate on that POC to cover edge cases, and standardize that.

Can you clarify in what circumstances it might be necessary? Are there certain fields where it’s going to be necessary?

I suspect that most cases of dynamic metadata are people who don’t know better - they have environment-specific dependencies or something like that and don’t know about environment markers.

The only situation I can think of where someone might legitimately want dynamic metadata would be if there’s a very specific environment that we don’t have any environment marker for and someone needs to do a workaround. Even in that case, we could almost certainly use a “conditional dependencies” mechanism that says, “Here are all the dependencies that definitely will be installed, here are some that depend on install-time conditions.” If that is necessary, it seems better than “metadata can be anything”.

Sorry, terminology may be getting confused here. And I shouldn’t have casually made my comment on the back of a comment about “dynamic metadata”, I should have been clearer what I meant.

I’m referring specifically to “if you take a sdist and look at its metadata, and then build a wheel from that sdist, you get different metadata”. That’s not (necessarily) because the metadata is calculated dynamically, but it’s effectively the same to pip - we can’t rely on the sdist metadata. This apparently does happen - see the initial post from @chrahunt, which mentions that it’s a problem with requests.

I don’t actually care whether we support dynamic metadata (even assuming there are use cases that need it - I don’t know if @bernatgabor had anything specific in mind). What I care about is sdist metadata being reliably1 the same as what we’d get by building a wheel, so that when we have a sdist in pip, we can use that data in the early (resolver) stages, and skip a call to the build backend just to get metadata that in theory we already have.

My impression was that the reason we couldn’t have that was because the metadata is calculated dynamically, and we can’t be sure it will still be the same at wheel-build time. But now that I check, I don’t see anything in the requests code that explains why the sdist doesn’t include Requires-Dist - is that actually just a bug (in setuptools or distutils or somewhere) that could be fixed?

Maybe step 1 needs to be for someone to understand and clarify why the metadata in sdists is incomplete/unreliable?

1 And by “reliably” I mean “we have a means to verify which values it’s OK to rely on”, not that it always has to be 100% accurate.

This is because the input isn’t reliably deterministic. Consider the extreme example from Dustin’s blog post on this:

from setuptools import setup
import random

setup(
  name="paradox",
  version="0.0.1",
  description="A non-deterministic package",
  install_requires=[random.choice(["Dep1", "Dep2"])]
)

The much more common scenario is one where the dependencies are generated based on the platform that’s building from sdist, and this use case has been replaced with environment markers (that most people don’t know about):

install_requires = ["Dep1"]
if sys.version_info < (3, 7):
  install_requires.append("importlib-metadata")

setup(...,
    install_requires=install_requires
)

By the time it gets to setuptools, it’s just a list, and we don’t know if it was generated dynamically or not. If the dependencies are specified in setup.cfg, we know they are reliable and there’s an open issue to fix this. As others in the thread have mentioned, we can almost certainly parse setup.py with an AST and in many basic cases determine whether the dependencies are deterministic or not.

Most of the options for “banning dynamic metadata” are not great and have the potential to break stuff that would probably already just work in most scenarios, but if we decided the cost was worth paying, I’m curious to know if we would be stymied because there are legitimate use cases that we won’t be able to support in deterministic metadata implementations in a realistic time frame.

I’m also curious to know if this is just install requires or if there are places where the metadata is being set “dynamically”. The one use case I know of / have for that is that dateutil does a search-and-replace in README.rst during the build, because PyPI doesn’t support .. doctest::. It’s still deterministic, but it would be difficult to detect that it’s deterministic through heuristics.

To tack on to the nomenclature confusion, when I hear static vs dynamic my brain keeps trying to put the version issue into there thanks to e.g. setuptools_scm calculating the version “dynamically” when setup() runs (same goes for people who use open() to paste in their long description).

But I think the key thing that’s being asked is static versus “dynamically environment-dependent” to differentiate from the “statically environment-dependent” that markers support).

And the only other thing I can think of along these lines are file inclusion, maybe entry points (and this is a guess; I have no real-world examples to back this up).