PEP 643: Metadata for Package Source Distributions

The point is that it acts like __length_hint__. It’s for situations where having a good guess now is better than having the exact right answer later. The canonical use case I mentioned would be if you are using a distributed workflow to do some sort of dependency resolution. For example, some pseudocode:

def dependency_graph(pkg, _node_cache={}) -> Node:
    if pkg_name in _node_cache:
        return _node_cache[pkg_name]

    # To make it more compact I'm assuming we need to use sdists,
    # though in practice we'd only hit this branch if no wheel is
    # available.
    sdist = get_sdist(pkg)  # Blocking, expensive
    metadata = get_sdist_metadata(pkg_name)

    # BLOCK 1
    if metadata.needs_build("Provides-Dist"):
        for dep in metadata.provides_dist:
             # Assume this is non-blocking; for brevity's sake
             # I have skipped any cancellation or de-duplication logic
             thread_pool.add(dependency_graph, dep)

        metadata = get_wheel_metadata(sdist)  # Blocking, expensive

    node = Node(metadata.name, metadata.version)
    # BLOCK 2
    for dep in metadata.provides_dist:
        node.add_dependency(dependency_graph(dep))

    return _node_cache.setdefault(pkg, node)

Obviously I’ve left out a lot of details there, but you can see the idea — in Block 1, you use the “hint” here to warm the cache in another thread or process while you are waiting for the build backend to get you the canonical answer to “what dependencies are available”. In most cases, the hint will be very close to accurate, so when you get to block 2, resolve_dependencies would hit the cache.

You could also imagine this sort of thing being used in a situation where you care more about not doing builds than you care about getting the right answer. “I don’t want to execute arbitrary Python code, but I want an estimate of what this dependency graph looks like”. The choices there are to completely ignore any nodes in the dependency graph that don’t supply wheels or to use the hint and get something that is probably pretty close to what you’d expect.

“A field is marked Dynamic in a wheel if it would be marked Dynamic when building an sdist.”

TBH I’m not entirely sure what’s confusing about it. What situations can you imagine where this would lead to ambiguity? What actions would you take or not take based on it being marked Dynamic in a wheel?

I see, thanks for explaining. My expectation was that setuptools can choose to not declaring the PKG-INFO it emits to conform PEP 643 if it cannot determine all of the required-to-be-static fields are static (i.e. not defined in setup.cfg, or are defined/overriden by a setup() argument). No builds would break because only the configs that setuptools can be absolutely sure conform to this PEP would use the new metadata version; others will continue to be built like they are right now, as if this new sdist metadata format does not exist.

I can see this may not be desirable, however, since the current PKG-INFO is not standardised (and useless), and the vast number of users won’t be able to take advantage of this sdist metadata due to package maintainers haven’t seem keen to switch away from setup() arguments.

My understanding from the PEP is that this is not an option:

The current wording indicates that we can’t fall back to earlier versions and we must report an error.

It means we do our best not to change things, but if legit reasons come up we will change it. Typically it means a new node type or an important simplification.

There are packages which help smooth the differences out like astroid and typed_ast.

Or in more fluffy language: “When read from an sdist/PKG-INFO, a field marked as Dynamic can have its value provided later in the METADATA file in a wheel. When Dynamic is in a METADATA file for a wheel, it is a marker for the provenance of the value as being generated by the wheel-building process and not directly from the PKG-INFO file from an sdist.”

2 Likes

I think naming the field something like Dynamic-In-Sdist would make it more true to its meaning

1 Like

I think we can put off the question of naming until after we nail down the general details. I personally don’t think it matters very much. There’s not much else that Dynamic could mean in the context of built wheel metadata. I’m not sure what we’d be guarding against by trying to telegraph that in the name.

2 Likes

The discussion appears to have died down, so I’d like to summarise where we are on these points.

  1. On Static vs Dynamic, I’m going to stick with Dynamic. You gave that a +1 here, so I assume you’re OK with that.
  2. You seem to have moved from just being “not crazy about the idea” of a whitelist to being fairly strongly against it. I still want to prohibit gratuitous use of Dynamic, so how about this as a compromise?
    • The fields Name and Version MUST NOT be marked as Dynamic.
    • Backends MUST NOT mark a field as Dynamic if they can determine that it was generated from data that will not change at build time. (This is intentionally a bit vague, to allow backends flexibility to decide how hard they try to determine if the data is static - I expect setuptools to initially just consider setup.cfg and pyproject.toml to be static, but maybe to add checks for setup() later, if they feel it’s useful - and I want the spec to allow that).
    • (the existing point) Backends SHOULD encourage projects to specify metadata statically, preferring to use environment markers on static values to adapt to details of the install location.
  3. I’m not going to fight for disallowing values against Dynamic or Dynamic in non-sdists, so how about:
    • Backends MAY record the value they calculated for a field they mark as Dynamic in a sdist. Consumers, however, MUST NOT treat this value as canonical, but MAY use it as an hint about what the final value in a wheel could be.
    • In any context other than a sdist, if a field is marked as Dynamic, that indicates that the value was generated at wheel build time and may not match the value in the sdist (or in other builds of this project). Backends are not required to record this information, though, and consumers MUST NOT assume that the lack of a Dynamic marking has any significance, except in a sdist.
  4. This has already been covered, but the location of the metadata remains as specified in PEP 517, and this is in the packaging.python.org spec update.

I’ll rewrite the rationale section of the PEP to take into account your previous comments as well - I agree the current rationale is weak, and unnecessarily tied to PEP 621. And I’ll sort out the other points you mentioned at the same time.

I think that’s all of the outstanding points on the PEP. Did I miss anything?

2 Likes

I’ve weakened that to “source distributions SHOULD use the latest version of the core metadata specification that was available when they were created”. As it stands, if we create a new version of the metadata spec, we instantly invalidate all existing sdists, which is silly and wasn’t the intention.

Does the name include information about the Provenance of the data item? Which agent generated the value? When? (At sdist build time.) Did they sign it?

Any such metadata can be more efficiently modeled with a schema that describes each data item.

FWIW, in terms of normative language in regards to schema,

RDFS+SHACL and/or JSONschema are two ways to model (meta)data schema which contains enough information to choose a widget and also do client-side validation.

W3C PROV in Python


:
a1 = document.activity('a1', datetime.datetime.now(), None, {prov.PROV_TYPE: "edit"})
# References can be qnames or ProvRecord objects themselves
document.wasGeneratedBy(e2, a1, None, {'ex:fct': "save"})
document.wasAssociatedWith('a1', 'ag2', None, None, {prov.PROV_ROLE: "author"})
document.agent('ag2', {prov.PROV_TYPE: 'prov:Person', 'ex:name': "Bob"})

^^ That generates triples and/or JSON-LD.

More complete examples https://github.com/trungdong/prov/blob/master/src/prov/tests/examples.py :

The spec for describing how Agents’ Activity ies generated/derived/used which Entity


It’s probably pretty easy to generate PROV JSON-LD without the (convenient) python prov library, or indeed any understanding beyond that the attribute names start with prov: and the schema is in a separate file.

  • “Dynamic” means “Computed at sdist build time”

  • Which Agent ran and signed that (downstream re-) build/compile Activity which involved the package Entity?

Presumably, the Dynamic value of the Entity metadata attribute is set by the Agent doing an Activity.

Presumably, the Dynamic value of the package metadata attribute is set by the Agent (sometime?)

Is this correct?:

[During/Before/After?] the build/compile Activity, the a Dynamic attribute is set to its currently static value.

Please take this to a separate discussion thread. I’m not even going to consider this for PEP 643.

1 Like

OK, I’ve made all of the outstanding updates to PEP 643, and I’ve updated the PR for the packaging spec. @pganssle is there anything else you feel is needed before this is ready for a decision?

@pf_moore This is great, thank you so much for your work on this.

I am very happy with the current state of the PEP. The only thing that’s a little weird to me is that "Allow Requires-Python to be Dynamic" is in “Rejected Ideas”, but it hasn’t actually been rejected (we rejected the idea of Requires-Python being special, but then also made it so it didn’t need to be special in that way anyway). But that’s acknowledged there anyway, and probably more information is better than less.

I’m tempted to accept this as Provisional (and finalize it once we have a working prototype in setuptools that we’ve run against some of the top PyPI packages), but I think that Provisional acceptance is not really appropriate for something that bumps the Core Metadata spec, so we may just need to be bold here.

That said, setuptools is likely going to be the most important implementation of this in the short run, and I am not going to have time to work on anything setuptools-related for at least a month, and I assume that I’m going to be the one to do that implementation (though I’m very happy to not be), so I don’t see a huge rush to actually flip the bit on this. So my proposal is this: let’s ping other backend authors: @jaraco @takluyver @ofek @sdispater, and give it 2 weeks for additional comments. If everyone chimes in and says LGTM before then, I’ll formally accept the PEP. If there are no open objections within 2 weeks (December 1), then I’ll formally accept the PEP. Sounds good?

2 Likes

That works for me.

I agree, the “Allow Requires-Python to be Dynamic” item is worded clumsily. How about I reword it as follows:

Special handling of Requires-Python. Early drafts of the PEP needed special discussion of Requires-Python, because the lack of environment markers for this field meant that it might be difficult to require it to be static. The final form of the PEP no longer needs this, as the idea of a whitelist of fields allowed to be dynamic was dropped.

1 Like

Yeah, that’s much clearer.

It has been two weeks and there have been no further comments, I hereby officially give this my seal of approval:

image

PEP 643 is now accepted! Congratulations, Paul! And congratulations Python ecosystem, because hopefully things are soon going to be that much better!

10 Likes

lol

Many thanks @pganssle both for the approval and for the comments and feedback you provided, which significantly improved the final PEP (and to everyone else for their contributions, too!).

1 Like

PR to mark as accepted here: https://github.com/python/peps/pull/1724

1 Like

FYI I have started to think about how to manage PEP 621, PEP 643, and core metadata in packaging via some metadata API. The introduction of dynamic makes things a bit more complicated/interesting in terms of managing that part of the metadata. Once I have an API pulled together I’m ready to discuss (or someone comes up with their own idea), I will post it over on packaging's issue tracker.

1 Like

Whenever that issue is, i again think it would be helpful to consider interoperability with an ecosystem of citeable resources when designing a metadata API, schema and specifying when certain attributes are expected to be calculated.

https://github.com/codemeta/codemeta/tree/master/crosswalks links to:

( …

If you give a mouse a cookie,
you might as well support at least these attributes please:
https://github.com/codemeta/codemeta/blob/master/codemeta.jsonld

Do I understand correctly?:

“Dynamic” = Build-time-(re-)calculated attributes; but we don’t store the datetime each attribute was calculated or any other information about the property value relation.

“Dynamic” is an attribute of the properties which is defined in the schema, which is defined in a standard format for interoperability.

I don’t think that will come into play into the API I’m thinking of, but others could obviously create a subclass which would allow for this.