PEP 665, take 2 -- A file format to list Python dependencies for reproducibility of an application

I have done an initial review of the PEP. The following points seem unclear to me on first reading. Can someone please point me to discussion of these points (if such a discussion exists). If the points haven’t been covered yet, they will need to be addressed somehow before acceptance (probably as a list of conditions on the PEP being accepted).

  1. The filename field is not completely specified. I believe the intention is that the field must be a bare name, with no directory part (excuse the clumsy terminology, I don’t believe there’s a commonly accepted term for such a name, “basename” implies that suffixes get stripped, doesn’t it?) This should be made explicit, because it’s too easy to read this as being a path to the wheel.
  2. The handling of extras is unclear to me. An example that includes extras would be really useful to clarify how the spec works here.
  3. There is no way to specify a location for a wheel file relative to the lock file. As a result, I don’t see any way for a project to ship a self-contained distribution bundle unless all requirements and their dependencies are available as wheels on publicly visible[1] servers. Is the consensus that it’s OK not to support cases where this isn’t so? If so, it should be explicitly noted. If not, the PEP doesn’t say how these cases will be covered.

As the PEP has been submitted for approval, I’m not looking to start a discussion on these points. Rather I’m looking for someone to confirm that consensus was reached and point me to the relevant part of the discussion, or statement in the PEP. If there is no such consensus, then that’s fine, I just need to know that (points 1 and 2 will likely just be something I’ll ask to be clarified in the final version of the PEP, point 3 might be more problematic and could affect acceptance).


  1. More precisely, visible to all parties that might want to install the project. ↩︎

Also, following on from my posting in this topic, I would like to discuss in a bit more detail the transition plan here. I don’t consider this a mandatory part of the PEP (as I posted that note after this PEP was developed) but I think it would be useful as a supporting discussion. Points I think it would be useful to cover include:

  • What would be key milestones in the process of moving from the current situation (tool-specific lockfiles, based mostly on the requirements format) to widespread adoption of PEP 665 lockfiles? Which tools are the most critical when it comes to adopting PEP 665?
  • How would individual projects move to PEP 665 lockfiles once their tools support that format? And of course, how do they check whether PEP 665 lockfiles fit their workflow in the first place?
  • How can the packaging community work to get to a position where cloud providers like Heroku and Azure support PEP 665 as a primary deployment format?

In future PEPs, as I noted, I would like to make it an explicit requirement to cover this subject, so we could consider this as a “practice run” of the process :slightly_smiling_face:

Correct. I’ll clarify the PEP.

Sure. I’ll try to come up with something and add it (probably coverage[toml] just because it’s the package I know with an extra off the top of my head that’s easy to write by hand).

Actually it does. The section on url says “The installer MAY support any schemes it wants for URLs (e.g. file: as well as https:)” which is explicitly under-specified for maximum flexibility. So you could use a relative file path in url if you so desired. I can update the PEP to explicitly say, “no restrictions are placed on the format of the string and installers MAY choose which formats they support.”

I’ll see what I can come up with.

I also see what you might be doing here: get a transition plan listing tools needed for success and then have it in writing they sign off on the PEP and plan before accepting the PEP, which nicely deals with the “sdist question” and whether people will even use this without it. Clever. :wink:

Lucky me. :wink:

1 Like

Thanks. I did wonder if that was the intent, although I’m a little bothered that by making this tool-dependent, we’ll end up with people shipping distrubution filesets that still need a particular tool to install them. But as my point is to ensure that the PEP clearly states its position, not to reopen discussion, I’m fine if you just clarify the PEP with an example of a relative file: URL.

I’m not being quite that devious :slightly_smiling_face: I mostly just want to start to establish the principle that it’s a PEP’s responsibility to present a picture of how we move from where we currently are, to a situation where the PEP is in general use. I think that in the past we’ve been guilty of creating standards and then just waiting to see what happens, with no real plan.

I also think that PEPs can sometimes end up being a bit “dry” and formal, and I want to encourage PEP authors to share their vision of how the proposal will improve things a bit more. Precision is fine, but inspiration is good too.

I only pick on you because I know you care :slightly_smiling_face:

The end of the discussion can be found at PEP 665, take 2 -- A file format to list Python dependencies for reproducibility of an application - #52 by brettcannon, so you can work backwards from there if you want to revisit the thinking behind it.

But since url is more of a hint than anything thanks to having the file hash, I assume installers will let users specify other indexes and directories to search for the files. I can also make that more explicit in the PEP.

:grinning_face_with_smiling_eyes: I also have the time this week as I don’t go on vacation until Thursday, so it’s all good. I can at least outline an ideal scenario based on who has previously expressed support for this PEP and see how it all looks.

1 Like

PEP 665: address review feedback · python/peps@79cddd5 · GitHub covers the url and filename changes. I did tweak the section for url by:

  • Saying any url with no scheme MUST be considered a local file path (both relative and absolute)
  • Installers MUST support local file paths and HTTPS (magic of hashes is you don’t have to worry about attacks from hitting the network for a file); I think we all know installers will do url.startswith("https://") and if that fails then do pathlib.Path(url) and then see if they can find the file in either place.

@pradyunsg @uranusjr let us know if you disagree with that change (EAFP :wink:).

The transition plan is coming next.

2 Likes

And here is the commit with a potential transition plan: PEP 665: outline a potential transition plan · python/peps@991cc7a · GitHub

1 Like

Since we’re updating the url rules, can we add a recommendation note to string replacement support? This is possible under the current framework, where url is only fundamentally informational and the installer is always free to interpret it however it likes, but a note showing treating \${\w+} regex matches as placeholders and perform install-time string replacement as an example of such interpretation would be useful (and something I want to nudge installers to pre-emptively implement).

2 Likes

If you’re supporting relative paths natively, why do we need string replacement? I was under the impression string replacement was proposed to work around file URIs not being able to express relative paths.

Is there any reason to be relying on a heuristic? Why not make url polymorphic like PEP 621’s readme and license? _version_.url = "..." is a URL; _version_.url.path = "..." is a path.

1 Like

Please don’t. As @layday’s response shows, this will trigger new debate, and the PEP has been submitted for approval at this point. If it becomes necessary to add semantics for URLs, or to recommend particular features, this should be a follow-up PEP or document.

In fact, this request suggests to me that this will be another area of potential non-portability for lockfiles, with “requires an installer that supports string replacement” being something a lockfile could need. Also, given that lockfiles are supposed to be tool-generated, why would a tool generate a replacement field? How would it know what variables are being provided by the installer?

In particular, I’d strongly argue when pip comes to implement lockfiles that we don’t support placeholders, precisely because it would make the lockfiles pip-specific.

… Right, I’ll refrain from trying to be helpful.

No, I think your comment was very helpful, in clarifying that @uranusjr’s suggestion was more than just a wording tweak, and as such constituted a PEP change. You’re actually reiterating some of the reservations I’ve already expressed about leaving URL syntax to the installer to define. The reason I don’t want to re-open that debate while I’m working on a decision is precisely because I’m currently thinking about whether I consider this to be a sufficiently significant issue to affect my approval, and if it’s under active discussion, I have to stop and wait for that new discussion to die down and then review the situation again.

If the PEP authors want to make or debate that change, then that’s fine, but they need to be clear that the PEP is going through another revision, and is not ready for pronouncement.

1 Like

Sorry, I read more into your response than I should have. I understand that it would be inconvenient for you and I do not want to rekindle the debate either. But a substantive change has already been made to the PEP - the url field did not previously support file paths. Just as I don’t want to interrupt your review, I don’t want you to pronounce on the PEP and then have to return to it because a late change wasn’t noticed or discussed (at least briefly). If there’s an important use case that is not covered by file paths that would be covered by string replacement in file URIs, we should know about it now.

Actually, you’re right. The PEP previously said that the URL field was completely unspecified (interpretation was entirely the installer’s responsibility). Now it says that bare file paths must be accepted. If nothing else, that seems like it might be a security risk for an installer that (say) runs on a service provider like Heroku. I don’t see an actual exploit here, but I can imagine not implementing file paths would be easier than taking the risk in practice, and we’ve just disallowed that option.

Thank you for persisting over this, I’d missed that implication.

@brettcannon please either revert the change to the interpretation of the URL spec, or we’ll have to go back to the “PEP discussion” stage.

My personal view (both as an individual and as PEP delegate) is that the PEP should go back to discussion stage. What I would like to see clearly specified is:

  1. Installers MUST have a means of specifying, external to the lockfile, where they will look for files matching the requirements in the lockfile. (This is non-normative, but makes it explicit that it’s not the lockfile that says what index gets used, etc. A minimal such means could be “just look at PyPI” with no configurability).
  2. If the URL field is not specified, the installer is responsible for locating the right file based on the sources noted above.
  3. If the URL is specified, the installer SHOULD respect it, but MAY still get the file from anywhere it would normally use, as long as the hash and filename match. (I fully expect this to be a source of complaints, as people are in my experience very prone to arguing “but I specified an explicit location, you must use it” - hopefully the fact that hashes are required and therefore people can’t use this as a way of forcing a particular build to be used will defuse such arguments, though).
  4. The minimum set of URL formats allowed must be specified in the PEP. I suggest only https is made a requirement, with a note that if they want to emit portable standard lockfiles, lockers MUST NOT use any format other than https URLs.

This would leave the following open questions to be addressed:

  • How to handle “I want to distribute this lockfile and a couple of wheels that it needs which aren’t available via a public URL”. If this means the URL field adds support for lockfile-relative paths, the security implications must be discussed in the PEP. An alternative might be to simply say that such supplied wheels must be made available to the installer via the installer’s mechanism for specifying where to find wheels (for example, pip could use the --find-links parameter for this).
  • Is there even a need for the URL field? In my experience, the main use for it in pip is for pointing at source distributions (git URLs, source tree archives, etc) and it’s not really needed for wheels.

Full disclosure - if the PEP authors choose not to re-open discussion on the interpretation of the url field, I’m leaning towards considering the difficulties in handling it to be sufficient to reject the PEP. I’m not completely decided yet - the second open question above suggests that the field might merely be useless, rather than actively harmful - but it’s a distinct possibility.

typo in the example: shouldn’t that be “requires-python”?

1 Like

I fixed it in the PEP itself but forgot to copy the newest version here. :sweat_smile: I’ll update it after I finish this response.

It isn’t a risk from the security posture the PEP is taking. This can only be a problem if:

  1. Your file permissions are wrong.
  2. The file you are trying to get at magically matches the hash of the file you specified in the lock file or the installer has a bug and doesn’t validate the hash.
  3. The attempted install of the malicious code succeeds.

That’s a lot of layers of security to fail before what this PEP allows becomes an issue. And since the PEP explicitly says installers must validate file hashes then that suggests installing malicious code posing as something you locked against isn’t possible.

I think all of that means we are re-opening the discussion around url. :wink:

I think there are a few options (but not what is in the PEP currently based on Paul’s comment).

The options

Option 1: no url

Let’s talk out the ramifications of dropping url for a v1 and experience can inform whether we need to bring it back.

Without url, I suspect direct would also go as you don’t have the URL to record in direct_url.json. Now that doesn’t mean there couldn’t be some mechanism that an installer provided to specify this, but at least from the PEP’s perspective, the concept of direct URL installs are out of scope.

It would take two HTTPS requests to get a file from a simple repository server:

  1. Get the archive links page (e.g. for Django)
  2. Download the file (e.g. Django-4.0-py3-none-any.whl)

I know @pradyunsg had brought up that he wanted to avoid that extra request/indirection caused by having to query an archive links page if possible. Once again, v1 can make it out-of-scope for the PEP and installers can record their own URLs to use to avoid the archive link request if they want as an optimization.

In other words this proposal says lock file help determine what files to install, but it’s entirely up to installers to figure out where to get those files. This is the simplest and most portable, but could very likely lead to installers storing information out-of-band (e.g. in [tool]).

Option 2: HTTPS-only for url

The other option is to tighten the definition:

  1. People MAY specify url
  2. url MUST be a syntactically valid HTTPS URL
  3. Installers SHOULD provide a way for users to specify where to look for files on the local system
  4. Installers SHOULD use url if the file cannot be found locally on the system
  5. Installers SHOULD provide a way to specify index servers implementing the simple repository API

This would mean specifying alternative ways to get a file is out-of-scope and up to the installer. I would expect most installers would let people specify directories, look in pip’s cache, etc. They could also specify alternative ways like IPFS or whatever out-of-band (e.g. [tool]).

In other words this makes url suggest where to find a file via HTTPS as that’s assumed to be the common case, but otherwise it’s up to you and the installer to work it out. This is probably the most portable while still offering a potential installation performance benefit for the (seemingly) common case.

Option 3: Require HTTPS, but other values allowed

I believe @pf_moore is suggesting (with * denoting what’s different from option 2):

  1. People MAY specify url
  2. url SHOULD be a syntactically valid HTTPS URL *
  3. Installers MUST provide a way for users to specify where to look for files *
  4. Installers MUST at least support HTTPS URLs

In other words this proposal is the most flexible by allowing anything in url, but it still sets the floor of HTTPS. This could lead to the potential of less-portable lock files if people (ab)use the flexibility.

Option 4: you tell us!

It’s possible someone will have a good suggestion we have not thought of. Please either leave a comment or show your support for a follow-up comment that you prefer over the options outlined above.

The poll

So, which one? To help speed this along over the holidays I am going to preemptively create a poll.

  • Option 1: no url
  • Option 2: url is HTTPS-only
  • Option 3: url can be at least HTTPS, but allowed to be anything
  • Option 4: Either leave a comment or you’re voting for a follow-up comment

0 voters

1 Like

That’s not what I was suggesting. My suggestion was much closer to option 2 - I want the url field to be tightly specified, but installers to have runtime options (not part of the lock file) to specify the search order for packages which don’t have a url field.

I’m voting option 4 in the poll, not because I have a different suggestion, but because, as PEP delegate I don’t think I should vote, but I can’t see the poll results without voting. So please disregard my vote…

Also, I’d much prefer it if people added comments on why they voted the way they did, rather than just voting. I’d like this to be a discussion rather than just a numbers counting exercise.

1 Like

Sorry about that, it was just my interpretation of the following:

If what I wrote isn’t what you intended then I’m not sure how to interpret that sentence.

As for how I voted, I went with option 2 in the poll. It’s seems the most pragmatic in a way that it doesn’t potentially back the url field into a weird corner that doesn’t require a version bump on the file format.

I’ll also mention that I’m about to go on my annual open source detox, so I very likely won’t be responding to comments past later today until January 4th. I will obviously read and reply to anything when I get back, although I’m also sure @pradyunsg and @uranusjr can reply as well in my absence if they happen to be available.

1 Like

A massive thanks goes out to @frostming who has written a proof-of-concept locker at GitHub - frostming/pep665_poc: A POC implementation of PEP 665 !

I have gone ahead and updated the PEP to link to it.

2 Likes

It’s not worth worrying about. I’m happy enough with your proposed options (and it’s your PEP not mine, anyway!)

I actually prefer requiring url to be https, I just didn’t think you would be OK with that (as you seemed fairly insistent on leaving the details of what urls were valid as tool-defined, previously).

I didn’t intend to say that installers must be user configurable. My example of an installer that only looks on PyPI and isn’t configurable was deliberate, but I messed up the wording in that point, and made it say that users had to be able to configure the installer. My apologies, that was my error. In reality, I expect most installers will be configurable (pip certainly is).

This bothers me, as it implies that the expectation is that lockers will fill in the URL most of the time. I’d personally assumed that the URL would only be used when an index search wasn’t possible (the direct URL case). Always having the URL won’t interact well with environments where packages should be installed from a curated local repository - a lockfile created in that environment will put internal URLs in the lockfile. And conversely, a lockfile created externally will have PyPI URLs in the lockfile.

In the first case, using the lockfile externally will result in a URL lookup failure followed by an index search for every package. And in the second case, using the lockfile internally will either have the same fail and lookup behaviour (if PyPI is blocked) or will fetch packages from PyPI (if it’s not).

There’s no security issue here (hashes avoid that) but fetching from PyPI might break policy, and in any case the failed download will incur a (possibly high, depending on things like timeouts) performance penalty.

I don’t intend to pursue this point, as I don’t work in such an environment, but someone should explore this. Otherwise I can see people raising issues with pip insisting that pip needs to “fix” this, because the behaviour is broken. At the very least, I’d like to have a paragraph in the PEP I can point to in that situation, to say that it’s not pip’s fault and the standard needs fixing if it’s a problem.