Draft PEP: Recording the origin of distributions installed from direct URL references

(Chris Jerdonek) #21

Okay, here is my more detailed feedback on the PEP after going through the draft in more detail. Thanks for your patience, @sbidoul. Overall I like the PEP (so my comments are mostly minor). I think it will be very useful and fills an important gap. I also offered to @sbidoul to be the sponsor, and he agreed.

My comments:

This proposal defines additional metadata, to be added to the installed
distribution by the installation front end, which records the direct
reference for use by consumers which introspect the database of installed
packages (see PEP 376).

Going along with the idea of making sure that each packaging PEP has good human-friendly names for each concept being introduced (e.g. so people don’t have to use the PEP number, cf. the comment / thread here), I think it would be good to choose explicit names for perhaps (1) the metadata as a whole being defined, and (2) the types of origins that the metadata can include. This way people can talk more easily about this metadata. If this is done, in addition to including them in the abstract, it might also be good to include one of these names in the title.

url MUST be stripped of any sensitive authentication information,
for security reasons. The user:password section of the URL MAY however
be composed of environment variables, matching the following regular
expression::

On the authentication information, would this mean the environment variable names have to match the names that were used in the original invocation? Also, if authentication info was present in the URL but environment variables weren’t used, should there still be some indicator that authentication info was used / needed? (There is also the separate possibility that auth info was required and used, but not via the URL, e.g. from a user prompt. I’m not sure if that should also be reflected somehow.)

A vcs field MUST be present, containing the name of the VCS (i.e. one
of git, hg, bzr, svn). Other VCS SHOULD be registered by amending this PEP.

I think it would be good to include a subsection under the “Specification” section enumerating the registered VCS’s. The subsection can be called something like “Registered version control systems.” For each VCS, I think it should include the full name of the VCS (e.g. “Mercurial” as opposed to hg), a link to the VCS’s website, the VCS’s command name, and the string key that should be used for the vcs field. These sections can also include any additional info or fields specific to that particular VCS.

I would also clarify explicitly whether it’s okay to use VCS’s that aren’t registered, as well as whether it’s okay to use fields that aren’t named in the PEP (for registered VCS’s or unregistered VCS’s). Like, maybe only for unregistered VCS’s, it would be okay to use new fields in case the VCS requires concepts not contemplated by the PEP.

A resolved_commit_id field MUST be present, containing the exact
commit/revision number that was installed.

It might be worth clarifying the type of this value. Should it always be a string, or should it be an integer in the case where the commit ID is a number?

This may require a little extra work / research, but I think it would be worth including a very brief informational subsection on each VCS. The purpose could be to help explain how the fields in the PEP map to the terminology and behavior for that VCS. For example, for Mercurial it could say that resolved_commit_id should be the Mercurial changeset ID rather than a Mercurial revision number. For Subversion, it could say that commit hashes aren’t available, and that the revisions are integers, etc. This would help because not everyone knows about all VCS’s. It would also help for documenting the extent to which each VCS supports requesting individual revisions, and how it does so.

One suggestion I made before that I’ll elaborate on here is that I think a Git-specific resolved-ref field should also be defined (as “MAY” be present). This would be a string that records the Git ref that the given revision string matches, if any (or some value like the empty string if the requested revision was confirmed not to correspond to a ref). This is useful because it would let one know if the requested revision is one of a branch, a tag, or a ref, and which one (based on whether the ref starts with refs/heads/, refs/tags/, or something else, respectively). pip currently implements this detection logic here, but currently it only uses it to know if a branch was requested.

One standards-driven reason for including this information is that it would let a front-end use the @<tag>#<commit-hash> form described in PEP 440 as opposed to just the @<commit-hash> form:

To handle version control systems that do not support including commit or tag references directly in the URL, that information may be appended to the end of the URL using the @<commit-hash> or the @<tag>#<commit-hash> notation.

This is useful because one can’t tell from a commit hash alone what is being installed, but a tag (like a version number) does give this information (and a tag is semantically immutable).

Commands that generate a direct_url.json:

* pip install ./app
* pip instal file:///home/user/app

I haven’t thought through what I think the right answer should be here, but I think the behavior should be clarified for installing from directories that also happen to be VCS repositories (but aren’t invoked as such). For example, in this case should the VCS info be recorded? Also, what if the commit ID can be detected, but the working tree is “dirty” or contains extra files? My instinct is that VCS origin info should only be included in the metadata if the directory is being installed via a genuine VCS url as opposed simply to a path, so that e.g. things like dirty working trees wouldn’t affect the install.

(Stéphane Bidoul) #22

@cjerdonek thanks a lot for sponsoring this!

I’m not sure what to do with that part. Do you have concrete suggestions?
Shall I add a sentence in the abstract such as “This PEP and the metadata it specifies should be referred to informally as Direct URL Origin”.

Yes. I’ll clarify that.

I don’t think that’s necessary. More precisely, I don’t see how tools would use it. I’d say we can introduce it later if/when a use case pops up.

Ok, I’ll do that.

Good point, I’ll clarify it must be a string.

I still have doubts about this one, mainly because I don’t visualize the use cases.
Producing a PEP 440 url is feasible by combining the requested revision with the resolved_commit_id as @<revision>#<resolved_commit_id>. It’s true PEP 440 mentions <tag> yet I don’t think it meant to exclude branches or any other kind of revision specifier. If revision was itself provided as a commit id, resolving it to a ref is possibly ambiguous.

In which case would a tool use resolved-ref over revision?

I agree. I’ll see to make that explicit in the spec.

(Stéphane Bidoul) #23

New version at 5546deec, and original post updated.