PEP 665: Specifying Installation Requirements for Python Projects

You didn’t say what you were trying to prove, so I can’t evaluate whether you succeeded :slight_smile:

Formulating the properties you want the PEP 665 representation to satisfy would go a long way, I think. You can see me trying to do that above, e.g. here, but that ended up demonstrating that the PEP 665 lockfile representation doesn’t necessarily do the main thing you’d naively expect from a lockfile format, i.e. actually specify a unique set of packages.

2 Likes

The resolver in the installer is a trade-off and “something that just works”, I also prefer the neat solution as you mentioned but failed to make it robust.

When you record “markers” in the package table and try to make the installer entirely “blind” to the dependency tree, the markers should be propagated to the sub-dependencies and merge with the existing markers there. And when you do merging, you should assume the coming and existing markers are arbitrary complex, like:

coming: platform_system == 'Darwin'
previous: sys_platform not in 'win32, linux' and os_name == 'posix'

Try combining them with ‘and’ and ‘or’, the locker should be smart enough to deduplicate the result so that it won’t end up too long.

To conclude, my key concern for the “blind installer” is marker merging. Yeah, the limitation might come from the existing implementations and poor resolution algorithms. I am just restating it to clarify the situation we are facing.

1 Like

However, there’s nothing in the PEP that requires use of that field (if there is and I missed it, could you please point out where?), so the examples @njs and I gave are still valid, and the installer still has a problem because it has to choose which version to install.

If the point here is simply that the “dumb installer” I proposed is not powerful enough to support PEP 665, then I don’t see why it’s so hard to just say so.

Of course there’s a follow-up question which is precisely what is the minimum level of capability that an installer that can support PEP 665 must have. But I’m not asking that question yet. I just want the PEP authors to admit explicitly that the installer I proposed cannot consume the full range of PEP 665 lockfiles (or explain what i’ve missed). Then we can move on to the follow-up.

Honestly, it feels like at the moment there’s a major disconnect because either the PEP authors don’t understand the questions people are asking, or the questioners don’t understand the answers being given. That’s why I’m trying to focus on a stupid-simple case, to avoid the possibility of misunderstanding. But it doesn’t seem to be working :frowning:

1 Like

Because it’s an Open Question :disappointed: I feel it’s more like we have a disconnection what state the PEP is, and what kind of feedback is the discussion trying to field. To me, this is currently a draft with multiple open questions, so the current goal should be to gather feedback so the authors can work out what a PEP 665 lock file is supposed to be. Most feedback we’ve got now are more along the line of “PEP 665 lock file does not work because X”; yes, that does not work, but the statement needs to be more accurately qualified because “a PEP 665 lock file” is still nto fully defined due to the Open Questions.

To put it another way, I feel the conversation is going like this:

Feedback: Does PEP 665 lock file not handle my situation here?
Me: Do you think this Open Question may be the solution?
Feedback: But that’s an Open Question. So does PEP 665 lock file handle my situation or not?
Me: (I honestly don’t know how to continue)

1 Like

Ah, OK. I get what you’re saying now. If the question hasn’t been resolved yet, then that’s a fair answer (for now) to my question. I do have some follow-ups, but I need to think some more about how to word them clearly.

2 Likes

OK, so thinking further about the “installers need a resolver” question, I tried to tackle it from the other direction.

As far as I can see, there’s nothing stopping someone from creating a “lockfile” that, for a set of packages, contains an entry for every version on PyPI, plus for each package/version a “needs” entry that’s simply a copy of the dependency data from the package metadata¹. So no actual locking, just copying existing data.

Clearly that’s not something we’d expect anyone to do in practice, but this isn’t intended to be a use case, it’s intended to demonstrate a point :slight_smile:

Given that an installer can be presented with a lockfile like this, I don’t see any way that we can avoid requiring installers to either (a) implement a full backtracking/SAT/whatever dependency resolver, or (b) error out if a lockfile is “too hard” (in some unspecified, implementation dependent, sense). And I’m not even convinced that “detecting a set of requirements needs backtracking” is something that can be determined any more easily than by actually doing the resolve - so option (b) essentially means that there’s no way to determine in advance if any given lockfile is going to be usable.

Let me ask a related question. If pip (for whatever reason) were to refuse to add support for PEP 665 lockfiles, would the PEP be viable? What would be involved in someone writing a “lockfile installer that isn’t pip” which would be usable in real world scenarios? I feel like in contrast to the “build backend” situation, where we’ve worked hard to put all build backends on an equal footing, we’re going in the opposite direction with installers, making it harder and harder for anyone to implement alternative frontends to compete with pip.

This is why I feel that it’s important to consider whether a minimal installer is viable. If a lockfile is suitable as a communication medium between a standalone “dependency resolver” and such a minimal installer, that’s potentially a practical approach to breaking up the monolithic pip arcfhitecture.

¹ With the exception that (as far as I can tell) extras aren’t allowed, so at least one pain point is excluded :wink:

2 Likes

Marker merging sounds like a convenience to keep the lock-file for-size small. os_name == "nt" and os_name == "nt" is still a valid marker, albeit less readable. One benefit of merging, however, is that you could catch impossible markers in the locker, eg os_name == "nt" and os_name == "posix".

1 Like

Looks good!

1 Like

This is the least of my concerns based on what people are currently objecting to, so I’m not going to start debating this specific point until much later.

:+1:

Correct.

That depends on whether people are ready for the idea of having an installer other than pip in the world.

Implement the example installer flow.


I want to add some colour to the rejected idea entry on having a flat list of versions as this seems to be the contentious thing right now. To keep this grounded, I am going to start from the pip freeze use-case and how we got to where we are with the PEP, but you can also think of it as the pip-tools scenario as it leads to the same output.

Initially the plan was to do the traditional “lock file” thing and only list a flat list of package versions and installers would literally iterate through the list and install what was there. Nice and simple, just like pip freeze and I think what everyone expected when they walked into this topic/PEP.

But then Python isn’t “nice and simple” when it comes to packaging. :wink: What do you do for multiple OSs (since we all know plenty of people need/want to support more than one OS)? pip freeze is inherently tied to the exact setup you had when you ran the command, so it isn’t portable. That’s when the pyproject-lock.d idea came in along with tags and marker; create separate lock files for each setup that record the assumptions that were made when generating the files and you’re set (think the “I develop on macOS but deploy to Linux” scenario).

But then what about environment markers you didn’t consider when you generated your lock files? There are 12 different environment markers, so the possible combinations are very large (and that’s assuming you know all the potential values for each marker ahead of time to even generate an exhaustive list of combinations). Do you guess ahead of time what potential environment marker values you want to support and then let people generate new lock files for their unique set later which will quite possibly deviate from the other lock files unless they too are regenerated to match the newest versions of packages? If you’re developing a cross-platform, multi-Python-version package you end up wanting this sort of thing.

And this is how we ended up where we are now: listing all relevant information needed to “resolve” which package and version listed in the “lock file” to support any potential environment marker combination (and those words are in quotes as I am not going to try and officially define them as I think people’s view of them differ and that’s part of the confusion). @frostming/PDM and @sdispater/Poetry seemingly ended up at the same conclusion ahead of us and thus why their feedback was to change the PEP to be what it is today where you don’t make so many assumptions in what setup you lock for. And so we came up with a format that can be flexible enough to cover the vast amount of possibilities of what a machine may need to know to install from a lock file, but which could be restricted enough to give people the simple pip freeze experience (you can always leave out details if you’re aiming for a very restricted lock file; remember, marker and tags exist for a reason).

So this is how we ended up with this “lock file” PEP. I realize this is bumping up against some typical views of lock files and wanting simple installers, but there’s logic to this “madness” as to how we ended up with this version of the PEP.

So my question to the pip freeze folks who are objecting to the current state of the PEP is how do you want to address your desires for a simple installer and lock file with those of the PDM/Poetry folks where that doesn’t meet their needs, but you’re somewhat a subset? Do you want more permission to have installers error out if they can’t/won’t resolve things (which is in the PEP already but we can make it more pronounced)? Do you want a key in the lock file to say “if metadata.marker and/or metadata.tags works, just install everything blindly”?

1 Like

I had been trying to stay out of this, but a few people have recently come and asked about it, and that’s caused me to start diving into this more.

I think there are a few problems here, as I see it after reading the PEP and the discussion here, so I’m going to call them out.

Issues

1. PEP 665 does not standardize “Lock” Files

The body of the PEP regularly refers to lock files, but I don’t actually think that this PEP standardizes a lock file. It standardizes a format that, if the emitter is careful, could represent a lock file, but it also could be something completely different besides that. As @pf_moore mentioned, it’s perfectly legal within this PEP to simply copy the entirety of PyPI into the packages data, and that’s fully legal, and that certainly doesn’t fit with anybody’s definition of what a “lock” file is. I also think that this adds to the confusion around whether or not an installer consuming one of these files needs to have a dependency solver or not.

I think you’re getting a weird amalgam of features and mixed messaging because the terminology doesn’t match what you’re actually specifying, and some of the features don’t make sense in that context.

If we take a step back, I think what we actually want here is a successor to requirements.txt, which can actually slot in and be usable for a number of use cases, one of which is as a lock file format, but it’s also a much more general format overall.

My suggestion would be to rewrite this to remove most or all of the references to lock files (except as an example of where you might use this) and call this a file to specify the creation/dependencies of a python environment. You might use this in a Python project, you might use it completely independent of that, but ultimately what you’re describing is how to create an environment that you need. That environment might specify wide open version specifiers, or it might specify locked down specifiers, but either way you’re defining what needs to be installed into an environment.

2. Do not require no network installation support

I think that it’s a fine goal to support no network installations that minimize the risk of a repository changing causing breakages. However I think that it is a mistake to mandate this. There are a lot of cases where you don’t care or don’t want that, but you still want to create an environment (this also feeds back into the first item).

Thus I would make the listing of items in the package field optional, and also add a field for listing sources that specific packages could be searched for in.

If you do that, you could maybe specify that if a particular dependency is in the package array you need to use the data in that, but I would say that it should rather be strongly recommended, rather than required to enable some possible interesting edge cases.

3. No network installation support should have untouched data to resolve with.

This could also be roughly renamed to package.<name>.needs is the wrong abstraction.

I think a fair amount of confusion comes into place with the treatment of this field. It’s been suggested that this should be unmodified from the package itself, it’s also been suggested that the tools that emit this file should also munge this data to turn it into == dependencies (or maybe something in between?).

I think that to support the full range of things, you’re going to, at a minimum, want to reproduce the exact dependency information from the artifact itself. You might also want to specify additional constraints, generated by the tool that emits this, but I personally think that should be in it’s own section somewhere. Maybe under metadata, though I think this is a seperate concern then metadata.needs, which to me represents the human intent, so maybe metadata.constraints, which are intended to further constrain any resolution that occurs, without causing something to be installed otherwise.

I think this also helps with a use case that the PEP currently fails on: It suggests to implements locking largely by limiting the files that it lists in packages, but that makes a fairly large assumption that the URLs from the repository are static, but that is not something that any of our PEPs require of a repository (and I am aware of some internal PyPIs at some companies that use file URLs on their index pages which have temporary tokens to enable authentication embedded in them). Moving locking to a “constraints” field means that you divorce locking and no network installs, which means situations like that still continue to work.

4. Ignoring the Inputs / Human Write-able side means you’re not actually solving for dependabot et al

As @njs mentioned, tools like Dependabot don’t just care about the outputs of a lock, but also on the inputs. I think you should either remove them from justification OR you should say that this file is also human writeable, and define a naming convention for the locking process.

For instance, you could imagine the specification saying that the package field is not intended for human writing, but that the rest of the fields are. Then you could have a work flow where you have rtd.INPUTPLACEHOLDER.toml that is human editable, and generally will only contain metadata.needs, but maybe metadata.marker or metadata.tags or metadata.constraints, and we define that *.INPUTPLACEHOLDER.toml is the input to something like *.OUTPUTPLACEHOLDER.toml.

The biggest problem I see with this, which, probably makes it a non starter IMO, is focusing on the output means that you can simplify the features you have to support, but if you also focus on the input, that means you have to start defining how a particular input gets compiled into an output, and you either lose the ability to get differentiation in tools and all features have to be “baked” into the spec (e.g. if I want to include multiple files to mix and match things, the spec would have to support that) or you end up where if you use the wrong tool, you get broken output (possibly subtly broken so, imagine the Openstack case where I have some default constraints that are externally managed, so i want some form of an include, if we let that feature be tool specific and you used the wrong tool, you’d just get no user supplied constraints).

The other problem is if you don’t deal with inputs, you basically can’t modify this file at all in any sort of automatic or agnostic fashion, you can only consume the information it presents. For instance, say VS Code grows support for this hypothetical file and we don’t define the inputs. A reasonable feature VS Code might want to add is the ability to bump the version of something (either in constraints or in needs), but without modifying the input to this file, it’s likely that when the next user comes along and uses the tool that originally emitted this file, which doesn’t know about this change ends up blowing away that change VS Code (or Dependabot, or whatever) made.

So we’re going to have to decide a trade off between all the problems that come along with dictating the input to this file OR stating that anything that needs to modify this file in a tool agnostic way is simply not supported, and you can only read this file.

Whatever trade off is decided on, the PEP should be updated to remove the justifications that don’t actually work (e.g. if it’s decided to only allow reading, then mentioning VS Code generating lock files needs to be removed).

5. Needlessly ties implementation to pyproject.toml, and a particular directory structure.

There are lots of reasons why you might want to create a Python environment (locked or not), and this PEP makes the assumption (at least in the Rationale) that the input is going to be a ppyproject.toml, but that feels wholly wrong to me. Environments might coexist with a python project, but they might also not have anything to do with a Python project (ex. static blog made using Hugo, which is written in Go, using Fabric to upload to a remote server. I want to have a Python environment, but I do not have a Python project).

I suggest removing any references to pyproject.toml, and I would go one step further and suggest that we should discourage using pyproject.toml as the input file completely. It creates the same kind of confusion that people have with setup.py and requirements.txt. Defining an environment and defining a project are two different tasks, and should not share an input/source format.

This would also mean that the pyproject-lock.d directory needs to change, and honestly I would just get rid of this concept completely. It feels completely unneeded, and largely like specifying something for the sake of specifying it (plus it’s relation with pyproject.toml, which as I said is wrong IMO). To enable discovery I would just define an extension, which is the most common way to handle file discovery, and to enable out of the box syntax highlighting I would make it a two part extension, like .lock.toml or .env.toml or something.

6. Level of abstraction for package table is wrong

This PEP makes the assumption that different artifacts of the same version will have the same metadata. This is an invalid assumption with Python’s metadata as it stands today. This data needs to be broken down per file or it is fundamentally incompatible with the entire vast bulk of software out there. As @dustin mentioned, PyPI made this mistake and it’s been a todo list item for a long time to finish it. PyPI mostly gets away with it because that data that is wrong isn’t being used anywhere “important” (it’s the JSON API and the Web UI, neither of which get consumed by installers), but I suspect it would be a much larger problem if we were feeding it into the resolution algo.

You can argue that those people are “doing it wrong”, but the fact of the matter is it’s a pretty simple structurally change to fix it (and afaict most installers already treat this data on a per file basis anyways, so they’d just end up synthesizing it anyways) to reduce a lot of potential frustration.

In the PEP you mention:

Luckily, specifying different dependencies in this way is very rare and frowned upon and so it was deemed not worth supporting.

However you don’t provide any data to back up that claim. I would guess that it is not rare, given that was traditionally the correct way of doing so, and in most cases it never stopped working for people. Most people, in my experience, don’t regularly go around updating their packaging until something breaks so I suspect that there are a lot of projects out there doing just that, simply because that used to be the way to do it and it never broke for them before. Just randomly picking names from the top 100 downloaded projects from PyPI it took me 5 tries to find one that does it (psutil, which actually uses a conditional on PyPy or not to add additional extras which themselves use marker syntax). I’m sure if I spent more than 5 minutes looking I would find more.

Overall, it seems like a bad hill to die on to me. I tend to view the entire ecosystem as having a limited “breakage budget”, and this doesn’t seem like something worth spending against that budget for.

7. Some data for resolution is missing

The list of files doesn’t need to contain python-requires, but it needs to, it’s a layer of data that needs to be considered during resolution. This feeds back into 6 above (and on PyPI this data is properly file specific).

8. Hashes only supports one kind of hash

This is somewhat nitpicky, but it would be really nice if hashes was a table instead of two individual keys. That will make possible future migrations to new hashes much easier as we can just include a new key in the table alongside the old key.

9. Items installed through this should not be direct URLs unless they were, in fact, actually direct URLs.

This PEP currently says that anything installed here should be marked as a direct URL, but that feels wrong to me. Just because you’ve precomputed some parts of the resolution, doesn’t mean that those files were not originally from a particular repository and they’re now direct URLs.

In my opinion, only things which were originally specified as a direct URL, should be marked as a direct URL.

10. More explicitly state that it’s ok for installers to support a subset of features available here

The PEP alludes to this by saying:

Installers MUST error out if they encounter something they are unable to handle (e.g. lack of environment marker support).

But I think that it would be better if it was explicitly called out that installers are free to support a limited subset of features here to enable installers that can enforce certain constraints (e.g. an installer that does no “real” resolution, and anytime it traverses the dependency tree it just blindly accepts whatever is listed in constraints or errors if an item isn’t explictly listed in constraints or constrains contains a non exact pin).

11. Versioning is too strict and/or an integer doesn’t contain enough information

A monotonically increasing integer for version means that every change has to either be considered backwards incompatible OR every change needs to be considered backwards compatible OR you only rev the version on backwards incompatible changes and do nothing for backwards compatible changes.

My experience with packaging suggests that backwards compatible changes are far more likely and common, but that backwards incompatible changes are not unheard of. Thus it is extremely useful to be able to have a signal for both. For instance, adding a new key to the file is a very likely future update, for instance if we don’t solve 8 above now, in the future we would possibly need to add support for what I suggested in 8. We could do that in a backwards compatible way easily, but the versioning scheme in use here doesn’t afford that capability unless we just add it and ignore it.

Generally I would recommend not exactly semver, but a two part version, major.minor only. Tooling should error out if they get a major version they do not understand, but they should only generate a warning if they get a minor version they do not understand.

12. Needlessly breaking out the triplet on wheels?

Under the “code” tag, is there a reason for breaking out the platform triplet from the filename into dedicated keys? That just seems like you’re inviting bugs where the broken out values and the filename don’t agree since you’re putting that data in the same place twice. It doesn’t even save the installer from implementing the code to extract that information from the wheel filename since those tags are optional and the PEP mandates that the installer MUST be able to fall back to extracting from the filename itself… so it seems like it actually just complicates reading these files?

13. Bikesheds

I’m not a fan of the new terminology of “needs”, like others we already have the “requires” terminology and changing it seems like churn for no reason.

I’m also not a fan of “code”, it should probably be file or artifact or distributions. Code is ambiguous in that some could take it to mean the repository the code lives in (e.g. why would you use “code” for a compiled C extension) and not all artifacts contain any code at all.

Summary

Overall I think there is something here that could be a viable replacement for at least part of requirements.txt, but as it stands it feels like it’s sitting in a really weird place where it is trying to be a lock file, but then some tools implement “lock files” that aren’t actually lock files (and I have serious doubts that those tools are actually producing correct multi platform lock files, but that’s neither here nor there) so you started adding additional features, so you’ve ended up with a weird frankenstein that isn’t either a traditional lock file OR a particularly good replacement for requirements.txt.

I think if you make the changes I outlined above, but more specifically 1-5, you’ll end up with a much more consistent, and flexible result that can both be used for generated lock files AND for other more interesting use cases.

I also think this flexibility more accurately reflects the intent of “common basis, but not a ceiling for functionality”, as breaking apart some of these intermingled features so you can mix and match them affords tooling a lot more ability to create interesting new combinations of features.

Sorry for the wall of text!

12 Likes

[…]

So this is how we ended up with this “lock file” PEP. I realize
this is bumping up against some typical views of lock files and
wanting simple installers, but there’s logic to this “madness” as
to how we ended up with this version of the PEP.

So my question to the pip freeze folks who are objecting to the
current state of the PEP is how do you want to address your
desires for a simple installer and lock file with those of the
PDM/Poetry folks where that doesn’t meet their needs, but you’re
somewhat a subset?
[…]

So to restate how I interpreted the background (thank you by the way
for the summary, it does make the reasons for those choices much
clearer): what started out as a specification for something people
would generally consider a “lockfile” in other package ecosystems
gradually evolved toward having the same features as a typical
“requirements file” (like you would pass to pip install -r ...)
just with a novel syntax. Is that pretty much it? If so, why not
simply reuse the existing syntax pip already supports? If not, what
part did I misunderstand?

1 Like

As a very rough summary.

Because that’s only roughly correct. :wink: A key thing to understand about a requirements file is it’s very much formatted for pip; it can essentially prepend each non-continued line to pip install and get an appropriate result. But what does -r mean to any other tool? Plus I don’t want to try and define a parsing spec just for a requirements file – I speak [from experience](vscode-python/pip-requirements.tmLanguage.json at main · microsoft/vscode-python · GitHub in doing this in a very coarse way – and then also have to create the subsequent parser. The format also lacks some details that the PEP currently includes (e.g. do you know which hash that pip-compile --generate-hashes corresponds to?).

1 Like

Not as typically defined, no. We could make call them “dependency files” or something, but new terminology is always asking for bikesheeding (see “needs”). Plus I think I clarified this misunderstanding/disconnect while you were typing. :wink: Plus I don’t think trying to take over “requirements file” would have been any more clearer in discussions.

So you want the requirement to stop at naming packages and versions and let an external search mechanism be allowed, much like saying mousebender==1.0.0 in a requirements file tells you what to install but not where to install from? Isn’t that a constraints file? I think your point 3 suggests making the requirements lax enough that the file can act more like a constraints file at the minimum which is what this would do.

Depends on who you ask and how important it tracking this info is, hence why it’s an open issue.

I would argue having a tool hide that level of detail such as what’s in another file and not putting into the “lock” file is asking for trouble.

… in a generic fashion, yes. That’s what PEP 650 -- Specifying Installer Requirements for Python Projects | Python.org tried to solve and people never seemed to really get on board with it.

And when you say “you”, you mean treating the “lock” file as immutable and if you want to update it you will have to regenerate it?

I’m okay with removing reference if it makes sense, but I’m not okay with discouraging. The hope is eventually most dependencies will be written down ala PEP 621, so discouraging that be used as input to a locker to make the “lock” file is going too far.

It’s for discoverability which I specifically need from a tooling perspective.

Sure, we can bikeshed on this, but the key point is the directory is just a different way of defining discoverability.

I will also say I don’t know if a file extension like .lock.toml or .env.toml is too generic and may clash with other communities.

Not “will”, just “most likely”.

From my understanding, PDM and Poetry don’t break it down like this and it hasn’t been a major issue. This also isn’t supported by requirements files since they can’t support per-file requirements that way while supporting other files for the same requirement. So supporting this would be a novel thing based on the current installer tools that write down what to install.

Quantitative data? No. But anecdotal evidence from tools not supporting this suggests to me it’s not a horrible assumption either (else I wouldn’t have written that sentence in the PEP :wink:).

I think you meant to leave off “need to”? But what specifically is missing from either markers or wheel tags that necessitates this? I would assume the locker would have dealt with this as appropriate and recorded any assumptions in the metadata table.

Trick with that is not making diffs horrible to read and disconnecting the hashes from the files such that auditing a diff is hard when trying to tie a hash back to the file.

But I would argue from the installers perspective they are direct URLs. I guess the question is what are direct URLs meant to record; how something was found or how something was installed?

Another instance where I think you were typing while I was as I asked about this in my post just before this one. :grinning_face_with_smiling_eyes:

I’m not specifically tied to the current approach, it just seemed simpler. Plus if you’re assuming files will get regenerated as necessary then simpler seems fine.

You’re assuming the file name will contain the wheel tags. Much as you said you have seen folks do stuff like embed temporary tokens into their simple index, there’s nothing that was we must require a URL to end in a file name that contains a wheel tag when that makes sense.

I wanted to require the field, but I got push-back.

1 Like

I want easily accessible information in the lock file that allows me to check whether it uses capabilities that my installer does not implement, before I start trying to do the installation. Ideally, I want a way for the user to tell the locker what capabilities the target installer has, so that the user can be sure at the time of locking that the resulting file will work with the target installer. “Permission to error out” happens far too late - the user has already done a bunch of work only to find that the result isn’t usable.

Wait, what? Of course people are ready for the idea of having another installer. The installer project is one. In many cases, people can install wheels by unzipping, or (even though it’s unsupported) just by putting them on sys.path.

What I think you mean is are people ready for the idea of “a replacement pip”. I sure hope not, we don’t need another pip. But conversely, I really don’t want to continue to dump the burden of being “the only installer” on pip for any longer than we have to. Being able to say “lock your requirements and then you can feed the lockfile to any installer you like” means that pip can stop having to support every use case under the sun. Linux distros can write a tool that integrates with their package manager and installs from a lockfile. Conda can do the same. Heroku can have a dedicated installer that exploits specific features of their environment. All without having to try to get pip to implement support for their special requirements. And users with simple requirements can use something that has no resolver and only handles wheels. Why not? Less moving parts to go wrong.

But if a PEP 665 installer has to implement complex, fragile, and (sadly far more often than we would have liked) badly performing resolver logic, no-one’s ever going to do that, they’ll just turn PEP 665 into a dedicated pip input format, and dump all of their weird edge cases on the pip maintainers to solve.

(Hmm, another example - not an “installer” as such, but it would be extremely useful to have a debugging tool that read a lockfile and reported out precisely what would get installed. We’re getting into the question of how reproducible the environment generated by a lockfile is at this point, but even so, it’s a good example of an alternative “consumer” of lockfiles.)

If I’m reading that correctly, it’s pip’s legacy (broken!) resolver behaviour - first set of requirements encountered wins. Do we really want to suggest that is a valid way to resolve a set of requirements? At a minimum, I think we need to be very clear that we’ve eliminated all of the failure cases that made pip’s legacy resolver broken (and it’s not obvious to me that we have done that…)

Thanks for this background. I see how you ended up with what you did. But to be honest, I think that in doing so, you’ve ended up defining something that tries to solve too many problems at once.

2 Likes

This is part of why I think it would make sense to focus on the environment aspect, of which creating a lock file is just one particular use case for them.

No that’s not a constraints file. In pip terminology a constraints file is a file that will add additional constraints, but will not otherwise cause something to be installed. So if you have mousebender==1.0.0 in your constraints file, and you do pip install -c constraints.txt requests, you’ll end up with an environment without any mousebender installed. If you do pip install -c constraints.txt mousebender, you’ll get mousebender 1.0.0 (or an error if that creates an unresolvable dependency tree) no matter what other version specifiers for mousebender exists.

This would be replacing what the typical output of pip-compile etc are, which is a requirements file with the full version set “locked” to a specific version, but not otherwise mandating where it comes from. That’s actually more generally useful with how Python’s packaging is typically setup because it means I can install from PyPI when I’m at home, and from the company mirror when I’m at work without having to recompile the lock file.

Even my solution is missing the ability to specify hashes without specifying where the files come from, which is also super useful.

I personally have no real interest in a lock file that doesn’t let me continue to use mirrors as normal.

It doesn’t really matter though if you think it’s asking for trouble or not. If you don’t specify the input format and how it’s compiled into this format, then people can do things like that. If you don’t want them to do that, then you need to specify the input format (but then you’re removing the ability for tooling to experiment).

Just to be clear, in my example the “lock file” would have that information fully embedded in it, but the “locker” would have an input file that allows specifying an external file for some data ( as an example feature ).

So at a minimum the PEP needs to remove the idea that this PEP means that a tool like VS Code could generate lock files without implementing tooling specific code. Unless I misunderstand at least.

Yes, because you don’t know what the “locker” tool uses for it’s source of truth. You have an output that the locker produced, but you have no way to know that changes to that file will persist the next time the locker is ran.

That sounds very wrong to me, and falls into the same kind of confusion that I first wrote about in setup.py vs requirements.txt · caremad. pyproject.toml is for abstract dependencies. You cannot conceptually use it as the input for a lock file (the only way you can do that, is to generate an empheral “input” that contains one entry, the source tree that contains the pyproject.toml).

How does a pyproject.toml dependency specifier indicate that it needs to add my internal company PyPI? It can’t and never should be able to, otherwise you end up with keys in pyproject.toml that only sometimes matter, which is the exact kind of spooky action at a distance that confuses people.

I understand poetry does this, but it is, IMO the wrong choice to make, and we should not be perpetuating that.

If that’s the goal of this PEP, then I would be a hard -1 on it (which of course I’m not the PEP delegate so that doesn’t block it).

I don’t think it’s purely bikeshedding, I have repositories with several environments that end up being created, centralizing all of those lock files to a single directory just ends up making things way more confusing IMO. Now I have to worry about namespace collisions, unless we end up littering the directory tree with pyproject-lock.d directories.

As an example, I have a project that has two docs.txt requirements files in different directories (API docs, user docs). If I have to colocate the locked output of those into the same directory, I end up having to munge names around.

You could take the .pyc approach and put directories colocated next to those files, but that needs to be decided at least and spelled out.

I mean, I have ancedotal evidence of it causing problems outside of a resolution context within Warehouse. I would guess the tooling that does this, the people who ran into problems with it just stopped using those tools. If you want I can do more digging and come up with more popular projects where it would be a problem.

Overall I don’t understand why we’re choosing this hill to die on. Nobody has suggested that there aren’t projects it will break on, just that it’s “rare”. Ok fine let’s accept that on face value, why are we choosing to break those projects when we know in advance we might? What is the benefit to us? Slightly shorter lock files?

Yea the first need to was wrong.

Why would we assume this? This feels like a common thread in this discussion, the spec gives us the power to not have the locker deal with it. We need to either remove that power and mandate that the locker has dealt with it, or we need to provide the tools to deal with it ourselves.

This is a common pattern for projects that want to produce a universal python 3 wheel, but don’t want to continue to support old versions of Python. py3 technically works on 3.x, including 3.0, then projects will typically include python-requires to further filter it. Marker doesn’t solve it because marker is for the entire file, not specific files (and this spec doesn’t require the file only be for a specific environment, so the locker can’t always “handle” it in advance anyways).

Can’t we just suggest or mandate inline tables for hashes if that’s the only problem? Designing a new format making a decision that we can tell will cause pains for us in the future if something we’ve already had to do at least once happens again seems like a bad call?

I personally would be very annoyed if using lock files turned all of my version references into url references. Maybe I’m the weird one, but I can forsee that causing frustration and confusion I think.

That’s making assumptions that the installer and locker get upgraded in lockstep no? Otherwise all the problems I mentioned still exist.

I don’t think pip has ever supported a wheel url that didn’t end in a filename that was well formed. I’d have to test it to make sure, but I’m pretty sure this problem doesn’t exist for anyone using pip. Do other installers implement it differently?

3 Likes

This is sort of what I feel, except I think it either doesn’t solve enough problems at once, or it solves too many problems at once :smiley:. I think we need to either trim it down or we need to make it more general. The middle ground it’s in feels like it has too much power for “lock file” case, but not enough power for the other cases.

1 Like

You can be if you want to be :slightly_smiling_face:

1 Like

One quick note. Both PEP 503 and PEP 508 direct URL currently only allow exposing one hash algorithm/value pair per URL, so even if we make this a table, it would only contain one single element unless the lock file is built from non-standard sources. I’m OK changing that in a new revision of the file format in the future, but it is unnecessary to do it at this time IMO.

1 Like

PEP 503 only supports a single hash because it wasn’t really designed and it developed organically, or rather the simple API did, and PEP 503 attempted to just document the status quo rather than make any drastic changes. When we did the migration from MD5 to SHA256, we did it by making any client that couldn’t understand the SHA256 hash simply not use any hash at all. That was “OK” at the time since, if my memory serves, all of those clients weren’t using TLS at all anyways.’

If we were designing it from scratch, it would be silly not to include a mechanism for gracefully migrating hashes (and certainly any sort of new repository API would have that, and if we ever do have to migrate hashes again, we will shoe horn that into PEP 503).

Here’s the example from the PEP, rewritten to use what I would propose.

version = 1

[tool]
# Tool-specific table ala PEP 518's `[tool]` table.

[metadata]
marker = "python_version>='3.6'"

needs = ["mousebender"]

[[package.attrs]]
version = "21.2.0"
needed-by = ["mousebender"]

[[package.attrs.code]]
type = "wheel"
url = "https://files.pythonhosted.org/packages/20/a9/ba6f1cd1a1517ff022b35acd6a7e4246371dfab08b8e42b829b6d07913cc/attrs-21.2.0-py2.py3-none-any.whl"
hashes = {sha256 = "149e90d6d8ac20db7a955ad60cf0e6881a3f20d37096140088356da6c716b0b1"}


[[package.mousebender]]
version = "2.0.0"
needs = ["attrs>=19.3", "packaging>=20.3"]

[[package.mousebender.code]]
type = "sdist"
url = "https://files.pythonhosted.org/packages/35/bc/db77f8ca1ccf85f5c3324e4f62fc74bf6f6c098da11d7c30ef6d0f43e859/mousebender-2.0.0.tar.gz"
hashes = {sha256 = "c5953026378e5dcc7090596dfcbf73aa5a9786842357273b1df974ebd79bd760"}



[[package.mousebender.code]]
type = "wheel"
url = "https://files.pythonhosted.org/packages/f4/b3/f6fdbff6395e9b77b5619160180489410fb2f42f41272994353e7ecf5bdf/mousebender-2.0.0-py3-none-any.whl"
hashes = {sha256 = "a6f9adfbd17bfb0e6bb5de9a27083e01dfb86ed9c3861e04143d9fd6db373f7c"}

[[package.packaging]]
version = "20.9"
needs = ["pyparsing>=2.0.2"]
needed-by = ["mousebender"]

[[package.packaging.code]]
type = "git"
url = "https://github.com/pypa/packaging.git"
commit = "53fd698b1620aca027324001bf53c8ffda0c17d1"

[[package.pyparsing]]
version = "2.4.7"
needed-by = ["packaging"]

[[package.pyparsing.code]]
type="wheel"
url = "https://files.pythonhosted.org/packages/8a/bb/488841f56197b13700afd5658fc279a2025a39e22449b7cf29864669b15d/pyparsing-2.4.7-py2.py3-none-any.whl"
hashes = {sha256 = "ef9d7589ef3c200abe66653d3f1ab1033c3c419ae9b9bdb1240a85b024efc88b"}
interpreter-tag = "py2.py3"
abi-tag = "none"
platform-tag = "any"

It seems the PEP is taking the stance that we’re not going to need to do that ever. So if we never need to use this functionality, it is almost exactly equal to the original example in terms of readability. The only real difference is it moves everything onto one line and adds a set of curly braces.

2 Likes

But what are those capabilities? That’s an undefined thing here since I don’t know what your installer does (not) support. It seems you’re asking for the PEP to define what potential capabilities an installer would need to have and then list those required capabilities somehow. You don’t have this with requirement files today either (but maybe you wish you had this?).

That’s fine and between you and your locker.

Sure, but users are not generally doing any of that day-to-day.

Right, so I still don’t know how to take that. :sweat_smile: Is another tool encroaching on pip’s territory and gaining traction a good or bad thing?

I wouldn’t read into the resolution behaviour too much; assume a proper resolver.

That’s between you and your locker just like it’s between you and pip today.

I’m not specifically, but it also requires innovating even more to support it as I don’t know any tooling that directly supports it short of creating a “lock” file that is very much tied to your platform and the exact files you installed (which is fine and possible with this PEP).

Not sure, but I don’t see any reason not to support this use-case just because pip doesn’t support it. In the end it’s bits off the wire, so providing out-of-band info so those bits can be interpretting appropriately doesn’t seem like a bad thing.


OK, I’m going to flat-out ask: what do people want here?

There’s the “give me requirements files” which still requires running a resolver and doesn’t really “lock” in a traditional sense, but it does restrict what is considered at installation time and allows for the potential cross-platform dependencies files that PDM and Poetry have found successful.

Then there’s the “give me requirements files, but w/o needing a resolver” which basically means a traditional lock file which doesn’t require a resolver (at most marker resolution), but which inherently means the lock file is platform-specific; what pip freeze/pip-tools found successful.

We tried to come up with something that services both needs as you can view the more flexible PDM/Poetry solution having a stricter subset to cover the pip freeze/pip-tools solution. To me, it seems to have failed based on the reaction we are getting (at least in its current form).

I have only one full rewrite left in me on this PEP (unless @pradyunsg and @uranusjr have more :grinning_face_with_smiling_eyes:), so I am now asking all of you to vote on what you want. I can then have a think on the topic and make a file format proposal we can iterate on and then based on the agreed-upon format, update the PEP.

  • Traditional lock file (i.e. no resolver necessary; pip freeze/pip-tools)
  • Basically pip requirements file (i.e. resolver required; PDM/Poetry)

0 voters

2 Likes