PEP 665, take 2 -- A file format to list Python dependencies for reproducibility of an application

Since we’re updating the url rules, can we add a recommendation note to string replacement support? This is possible under the current framework, where url is only fundamentally informational and the installer is always free to interpret it however it likes, but a note showing treating \${\w+} regex matches as placeholders and perform install-time string replacement as an example of such interpretation would be useful (and something I want to nudge installers to pre-emptively implement).

2 Likes

If you’re supporting relative paths natively, why do we need string replacement? I was under the impression string replacement was proposed to work around file URIs not being able to express relative paths.

Is there any reason to be relying on a heuristic? Why not make url polymorphic like PEP 621’s readme and license? _version_.url = "..." is a URL; _version_.url.path = "..." is a path.

1 Like

Please don’t. As @layday’s response shows, this will trigger new debate, and the PEP has been submitted for approval at this point. If it becomes necessary to add semantics for URLs, or to recommend particular features, this should be a follow-up PEP or document.

In fact, this request suggests to me that this will be another area of potential non-portability for lockfiles, with “requires an installer that supports string replacement” being something a lockfile could need. Also, given that lockfiles are supposed to be tool-generated, why would a tool generate a replacement field? How would it know what variables are being provided by the installer?

In particular, I’d strongly argue when pip comes to implement lockfiles that we don’t support placeholders, precisely because it would make the lockfiles pip-specific.

… Right, I’ll refrain from trying to be helpful.

No, I think your comment was very helpful, in clarifying that @uranusjr’s suggestion was more than just a wording tweak, and as such constituted a PEP change. You’re actually reiterating some of the reservations I’ve already expressed about leaving URL syntax to the installer to define. The reason I don’t want to re-open that debate while I’m working on a decision is precisely because I’m currently thinking about whether I consider this to be a sufficiently significant issue to affect my approval, and if it’s under active discussion, I have to stop and wait for that new discussion to die down and then review the situation again.

If the PEP authors want to make or debate that change, then that’s fine, but they need to be clear that the PEP is going through another revision, and is not ready for pronouncement.

1 Like

Sorry, I read more into your response than I should have. I understand that it would be inconvenient for you and I do not want to rekindle the debate either. But a substantive change has already been made to the PEP - the url field did not previously support file paths. Just as I don’t want to interrupt your review, I don’t want you to pronounce on the PEP and then have to return to it because a late change wasn’t noticed or discussed (at least briefly). If there’s an important use case that is not covered by file paths that would be covered by string replacement in file URIs, we should know about it now.

Actually, you’re right. The PEP previously said that the URL field was completely unspecified (interpretation was entirely the installer’s responsibility). Now it says that bare file paths must be accepted. If nothing else, that seems like it might be a security risk for an installer that (say) runs on a service provider like Heroku. I don’t see an actual exploit here, but I can imagine not implementing file paths would be easier than taking the risk in practice, and we’ve just disallowed that option.

Thank you for persisting over this, I’d missed that implication.

@brettcannon please either revert the change to the interpretation of the URL spec, or we’ll have to go back to the “PEP discussion” stage.

My personal view (both as an individual and as PEP delegate) is that the PEP should go back to discussion stage. What I would like to see clearly specified is:

  1. Installers MUST have a means of specifying, external to the lockfile, where they will look for files matching the requirements in the lockfile. (This is non-normative, but makes it explicit that it’s not the lockfile that says what index gets used, etc. A minimal such means could be “just look at PyPI” with no configurability).
  2. If the URL field is not specified, the installer is responsible for locating the right file based on the sources noted above.
  3. If the URL is specified, the installer SHOULD respect it, but MAY still get the file from anywhere it would normally use, as long as the hash and filename match. (I fully expect this to be a source of complaints, as people are in my experience very prone to arguing “but I specified an explicit location, you must use it” - hopefully the fact that hashes are required and therefore people can’t use this as a way of forcing a particular build to be used will defuse such arguments, though).
  4. The minimum set of URL formats allowed must be specified in the PEP. I suggest only https is made a requirement, with a note that if they want to emit portable standard lockfiles, lockers MUST NOT use any format other than https URLs.

This would leave the following open questions to be addressed:

  • How to handle “I want to distribute this lockfile and a couple of wheels that it needs which aren’t available via a public URL”. If this means the URL field adds support for lockfile-relative paths, the security implications must be discussed in the PEP. An alternative might be to simply say that such supplied wheels must be made available to the installer via the installer’s mechanism for specifying where to find wheels (for example, pip could use the --find-links parameter for this).
  • Is there even a need for the URL field? In my experience, the main use for it in pip is for pointing at source distributions (git URLs, source tree archives, etc) and it’s not really needed for wheels.

Full disclosure - if the PEP authors choose not to re-open discussion on the interpretation of the url field, I’m leaning towards considering the difficulties in handling it to be sufficient to reject the PEP. I’m not completely decided yet - the second open question above suggests that the field might merely be useless, rather than actively harmful - but it’s a distinct possibility.

typo in the example: shouldn’t that be “requires-python”?

1 Like

I fixed it in the PEP itself but forgot to copy the newest version here. :sweat_smile: I’ll update it after I finish this response.

It isn’t a risk from the security posture the PEP is taking. This can only be a problem if:

  1. Your file permissions are wrong.
  2. The file you are trying to get at magically matches the hash of the file you specified in the lock file or the installer has a bug and doesn’t validate the hash.
  3. The attempted install of the malicious code succeeds.

That’s a lot of layers of security to fail before what this PEP allows becomes an issue. And since the PEP explicitly says installers must validate file hashes then that suggests installing malicious code posing as something you locked against isn’t possible.

I think all of that means we are re-opening the discussion around url. :wink:

I think there are a few options (but not what is in the PEP currently based on Paul’s comment).

The options

Option 1: no url

Let’s talk out the ramifications of dropping url for a v1 and experience can inform whether we need to bring it back.

Without url, I suspect direct would also go as you don’t have the URL to record in direct_url.json. Now that doesn’t mean there couldn’t be some mechanism that an installer provided to specify this, but at least from the PEP’s perspective, the concept of direct URL installs are out of scope.

It would take two HTTPS requests to get a file from a simple repository server:

  1. Get the archive links page (e.g. for Django)
  2. Download the file (e.g. Django-4.0-py3-none-any.whl)

I know @pradyunsg had brought up that he wanted to avoid that extra request/indirection caused by having to query an archive links page if possible. Once again, v1 can make it out-of-scope for the PEP and installers can record their own URLs to use to avoid the archive link request if they want as an optimization.

In other words this proposal says lock file help determine what files to install, but it’s entirely up to installers to figure out where to get those files. This is the simplest and most portable, but could very likely lead to installers storing information out-of-band (e.g. in [tool]).

Option 2: HTTPS-only for url

The other option is to tighten the definition:

  1. People MAY specify url
  2. url MUST be a syntactically valid HTTPS URL
  3. Installers SHOULD provide a way for users to specify where to look for files on the local system
  4. Installers SHOULD use url if the file cannot be found locally on the system
  5. Installers SHOULD provide a way to specify index servers implementing the simple repository API

This would mean specifying alternative ways to get a file is out-of-scope and up to the installer. I would expect most installers would let people specify directories, look in pip’s cache, etc. They could also specify alternative ways like IPFS or whatever out-of-band (e.g. [tool]).

In other words this makes url suggest where to find a file via HTTPS as that’s assumed to be the common case, but otherwise it’s up to you and the installer to work it out. This is probably the most portable while still offering a potential installation performance benefit for the (seemingly) common case.

Option 3: Require HTTPS, but other values allowed

I believe @pf_moore is suggesting (with * denoting what’s different from option 2):

  1. People MAY specify url
  2. url SHOULD be a syntactically valid HTTPS URL *
  3. Installers MUST provide a way for users to specify where to look for files *
  4. Installers MUST at least support HTTPS URLs

In other words this proposal is the most flexible by allowing anything in url, but it still sets the floor of HTTPS. This could lead to the potential of less-portable lock files if people (ab)use the flexibility.

Option 4: you tell us!

It’s possible someone will have a good suggestion we have not thought of. Please either leave a comment or show your support for a follow-up comment that you prefer over the options outlined above.

The poll

So, which one? To help speed this along over the holidays I am going to preemptively create a poll.

  • Option 1: no url
  • Option 2: url is HTTPS-only
  • Option 3: url can be at least HTTPS, but allowed to be anything
  • Option 4: Either leave a comment or you’re voting for a follow-up comment

0 voters

1 Like

That’s not what I was suggesting. My suggestion was much closer to option 2 - I want the url field to be tightly specified, but installers to have runtime options (not part of the lock file) to specify the search order for packages which don’t have a url field.

I’m voting option 4 in the poll, not because I have a different suggestion, but because, as PEP delegate I don’t think I should vote, but I can’t see the poll results without voting. So please disregard my vote…

Also, I’d much prefer it if people added comments on why they voted the way they did, rather than just voting. I’d like this to be a discussion rather than just a numbers counting exercise.

1 Like

Sorry about that, it was just my interpretation of the following:

If what I wrote isn’t what you intended then I’m not sure how to interpret that sentence.

As for how I voted, I went with option 2 in the poll. It’s seems the most pragmatic in a way that it doesn’t potentially back the url field into a weird corner that doesn’t require a version bump on the file format.

I’ll also mention that I’m about to go on my annual open source detox, so I very likely won’t be responding to comments past later today until January 4th. I will obviously read and reply to anything when I get back, although I’m also sure @pradyunsg and @uranusjr can reply as well in my absence if they happen to be available.

1 Like

A massive thanks goes out to @frostming who has written a proof-of-concept locker at GitHub - frostming/pep665_poc: A POC implementation of PEP 665 !

I have gone ahead and updated the PEP to link to it.

2 Likes

It’s not worth worrying about. I’m happy enough with your proposed options (and it’s your PEP not mine, anyway!)

I actually prefer requiring url to be https, I just didn’t think you would be OK with that (as you seemed fairly insistent on leaving the details of what urls were valid as tool-defined, previously).

I didn’t intend to say that installers must be user configurable. My example of an installer that only looks on PyPI and isn’t configurable was deliberate, but I messed up the wording in that point, and made it say that users had to be able to configure the installer. My apologies, that was my error. In reality, I expect most installers will be configurable (pip certainly is).

This bothers me, as it implies that the expectation is that lockers will fill in the URL most of the time. I’d personally assumed that the URL would only be used when an index search wasn’t possible (the direct URL case). Always having the URL won’t interact well with environments where packages should be installed from a curated local repository - a lockfile created in that environment will put internal URLs in the lockfile. And conversely, a lockfile created externally will have PyPI URLs in the lockfile.

In the first case, using the lockfile externally will result in a URL lookup failure followed by an index search for every package. And in the second case, using the lockfile internally will either have the same fail and lookup behaviour (if PyPI is blocked) or will fetch packages from PyPI (if it’s not).

There’s no security issue here (hashes avoid that) but fetching from PyPI might break policy, and in any case the failed download will incur a (possibly high, depending on things like timeouts) performance penalty.

I don’t intend to pursue this point, as I don’t work in such an environment, but someone should explore this. Otherwise I can see people raising issues with pip insisting that pip needs to “fix” this, because the behaviour is broken. At the very least, I’d like to have a paragraph in the PEP I can point to in that situation, to say that it’s not pip’s fault and the standard needs fixing if it’s a problem.

Are you saying you view url only for the direct case?

Right, but what’s wrong with that? As long as that lock file doesn’t leave the control of the environment then there shouldn’t be an issue. This implies the lock file is being shipped with closed-source code, so the potential of the lock file getting out with private URLs is going to be the least worrisome issue if the entire code base leaked.

In my work environment I don’t think having an internal URL for internal code is problematic.

I don’t see why installers can’t be instructed to prefer hitting indexes instead of URLs and making the URL the fallback.

But this actually sounds more like a locker policy/option to not insert URLs unless absolutely necessary. And so I would assume it would be controlled at that point and not at the installer level.

I assume you mean by having someone speak up? Otherwise I’m not quite sure how to explore this specifically beyond asking Twitter for feedback from a wider audience.

1 Like

Hello! I have implemented something that looks a lot like this format in a proposed pip resolve command: metadata-only resolve with new `pip resolve` command! by cosmicexplorer · Pull Request #10748 · pypa/pip · GitHub until @pf_moore noted that the JSON output I was hacking together felt a lot like this PEP. I’m getting more familiar with the specification atm, but I will almost definitely be modifying the output of the pip resolve command to produce this file format instead. It also may not be called pip resolve; instead pip install --dry-run --report was proposed as an alternative which I think I like more.

3 Likes

I wanted to announce that the PEP has been rejected (@pf_moore messaged me privately about it). The reason for the rejection is the lack of sdist support was seemingly enough to cause a lukewarm reception overall to the PEP. I am now going to go think about what an opinionated lock file would look like from me and consider implementing it as part of my mousebender project.

Thanks to everyone who provided feedback! Thanks to @frostming @sdispater for providing early feedback. Extra thanks to @frostming for writing a proof-of-concept locker. And a special thanks to my co-authors, @pradyunsg and @uranusjr .

6 Likes

My thanks as well to everyone for their participation in the discussion here.

As Brett said, we had a discussion offline as to how to proceed with the PEP, and the conclusion was that it should be withdrawn/rejected.

This discussion has persuaded me that we definitely need a lockfile format that’s better than the current “pinned requirements file” approach[1], but I think we need a better understanding as a community of what we actually want in terms of functionality. The question of how and if we handle sdist support is the biggest issue, but a clearer understanding of use cases in general (many of which are “hidden” behind closed-source corporate environments) is also needed.

Thanks to the PEP authors, @brettcannon @uranusjr and @pradyunsg for their work on this, and I’m sorry we didn’t get the result you hoped for.


  1. Recent examples of supply chain issues with npm and similar demonstrate that locking is becoming far more critical. ↩︎

1 Like

Hi, thanks for the update. What I’m not sure about is where next. If someone were to pick this up what would they do to get the ball rolling? You alluded to the sdist problem, but what about the rest? (I’m not sure how much corporate environment overview we can get here, from the communities POV)

PS. I’m personally sad that the PEP failed due to not supporting sdist. I was hoping we were taking an iterative approach as was the case with PEP-517 and then PEP-660. Start with wheel support, and let a future PEP extend/handle sdists. If this path is not feasible perhaps the rejection motivation should clearly state why not, so a future iteration can address it.

4 Likes

Honestly, I think the key think is for any new PEP to start from a very clear and well-defined list of scenarios that it is designed to support/address. This was where I felt PEP 665 fell down, it focused on reproducibility, rather than on specific use cases. And as a result, it was extremely hard to determine whether it intended to handle particular scenarios.

For example, I never really got a clear answer on whether the PEP was intended to support deployment of an app to a service like Heroku. In a broad sense, “obviously” it would, but when you get into details, suppose I had a dependency that’s not on PyPI as a wheel, how do I ship the combination of “lockfile + wheel” to Heroku in a way that works regardless of what lockfile installer Heroku chooses to use? That was the root of the “should we allow relative paths in the URL field” question, which was still open at the point where we decided to reject the PEP.

I think the reality is that someone is going to have to pick (or maybe even invent) some use cases that they believe are important, and write a PEP based on those, and be prepared to defend the PEP on the basis of those use cases. That would at least mean that readers could quickly see whether the PEP helps them in their particular situation (and then, if necessary, propose additional use cases that the PEP should add).

It wasn’t solely sdists. As I say, it was mostly a lack of clear definition of what use cases the PEP was trying to support, and “not all of our dependencies are distributed as wheels” is part of that.

The PEP was still (in my mind) perfectly viable when the scope was reduced to wheel only. In fact, on a personal note, I prefer not requiring all installers to be able to build from source. What finally tipped the scales, though, was the question of how to distribute a lockfile with a set of wheels, which is important for the use case where the locker hits some sdist-only packages, and wants to build wheels for them and provide those wheels with the lockfile. That problem is solveable in a number of ways, but we got stuck on the question of whether that is a problem the PEP even wanted to solve.

I’m still perfectly happy with that approach, as long as the initial (wheel support only) phase is clearly viable in its own right, and not simply a placeholder that no-one will use until sdist support gets added.

2 Likes

I’m not certain everyone saw @cosmicexplorer’s post above, but it seems clear to me that whatever comes next will surely be based on/utilize the functionality introduced in metadata-only resolve with `pip download --dry-run --report`! by cosmicexplorer · Pull Request #10748 · pypa/pip · GitHub