Seems easy to rectify by splitting out whatever pip does for parsing into a separate lib (or put in packaging
).
Of course you can do it this way but my point was, with TOML, you can have the information directly without introducing another dependency and complexity.
To be fair, what pip does isnât standardised, so do we want to make pipâs behaviour a standard here?
But this begs the question, if we do go with an exploded format, which has extensions like this that make it more expressive than PEP 508, what happens to PEP 508? If âexploded formatâ canât be translated to a semantically identical PEP 508 format, how do we store what people specify in pyproject.toml
in the package metadata (which mandates PEP 508 format here).
So I think that we should be very cautious about suggesting that the âexpanded formatâ would express anything that couldnât also be expressed as a PEP 508 string. Thatâs a much wider question, and not one I want to open here, if I have the choice.
Hmm, thereâs a thought. Can someone please clearly state what precisely is the âexpanded formatâ that is being proposed here? If I end up being PEP delegate, Iâd expect the PEP to be very explicit about the format being proposed (I assume anyone else doing the job would too).
PEP 508 format:
[project]
dependencies = [
"foo",
"bar >= 2.0",
'requests [security,tests] >= 2.8.1, == 2.8.* ; python_version < "2.7"',
"pip @ https://github.com/pypa/pip/archive/1.3.1.zip#sha1=da9234ee9982d4bbb3c72346a6de940a148ea686"
]
Exploded format:
What?
Iâd take the view that if thereâs even a question to be answered here, someone needs to decide pretty quickly exactly what is being proposed as the âexpanded formatâ, and write it up in the same format as the rest of the PEP. (And how to map between PEP 508 and the expanded format should be clearly specified, too).
The example in the PEP should also be updated to include dependencies, in whichever format is going to be proposed.
Not really. Youâd want an central algorithm for parsing, even if itâs as simple as if 'git' in data: dep_type = GitDep...
. Otherwise we all have to implement the same thing for every tool.
To expand @sdispaterâs list:
- Easy extensible: We might come to point where we need more keys to describe a dependency, e.g. what type of gpu must be present. Users just need to learn how this new key is called. Much easier for them then to learn how this is integrated in an endless string.
Thatâs the strongest argument for the table imo.
Probably not. I know I at least purposefully never specified it as thatâs a separate bikeshed to paint. Pipenv and Poetry have differing names and such, so there isnât a unified format to go with. If you want to start with what Poetry has, you can look at Dependency specification | Documentation | Poetry - Python dependency management and packaging made easy (although it doesnât appear to document extras
which is a field which takes an array of strings).
If you want a bunch of alternatives, https://github.com/brettcannon/pyproject-metadata/issues/17 has a lot from when we initially discussed this topic (that repo is private to PEP co-authors).
If you want to see other languages and how they do it, https://github.com/brettcannon/pyproject-metadata/issues/17#issuecomment-624624249 covers Rust, Dart, and Ruby.
But it still needs to encode to PEP 508 in order to work in a wheel file. So that doesnât buy us anything unless you can show that some future addition cannot be nicely expressed using some addition to PEP 508 while it can be nicely expressed in TOML (and this is purposefully ignoring how we would then handle it for expressing in wheels as thatâs a separate discussion if this were to occur).
And I think this is the current crux of the problem. There may be a time where PEP 508 is demonstrably worse than some exploded format. But that day does not exist and planning for a possibility instead of an guaranteed eventuality is quite pricey when we would be asking the Python community to understand two formats for who knows how long (e.g. tox or PEP 518 may never ditch PEP 508 and so that will help keep that encoding around âforeverâ).
I will say I donât think supporting both in a transition would have an ambiguity problem. This is not meant to support or argue against the point that there is a definite cost if we went with an exploded format due to this, just that we could transition if we had the stomach for it. I would expect we would extend âpackagingâ with a function that takes a dict from a TOML parse and it returned the appropriate representation for the build system regardless of how it was specified (which we maybe should do anyway unless pep517
has something like that already).
Art the end of the day, PEP 621 allows for dynamic fields, including dependencies. Tools could have a feature which allows the user to specify the dependencies in exploded table format in another key of pyproject.toml (eg project-dependencies-table
)
Perhaps this feature would become popular enough that tools have a fast path to handle this key for dependencies (yay for ecosystem-defined standards; at least it would be borne out of popularity)
Brett, the GitHub links you gave return a 404 error. I donât know if your repository is private?
Yep, itâs private to the co-authors of the PEP; sorry for forgetting to mention that (I also never got permission to open it up, hence why I probably will leave it private). Paul has access, though, and the examples are that interesting; they are just variations on the exploded table format which you can only get so creative with.
Thanks. I wonât look at these again now (I do recall the discussion) as ultimately what matters is whatâs in the PEP that gets submitted for approval. If itâs going to have an âexplodedâ format, the colour of that bikeshed needs to have been decided by that point - thatâs really all I was trying to point out here.
With my personal / co-author hat on, I donât really care, and Iâm explicitly trying not to develop an opinion as I feel that I may end up being PEP delegate (no-oneâs objected yet!) and I want to remain somewhat impartial for that. But the âexploded formatâ proponents do need to get their act together and come up with an agreed definition, or they stand every chance of losing the debate by default (I definitely wonât accept a PEP with âdetails to be decided laterâ).
Iâll note that I think one of the reasons this has not been done is that the first part of the discussion weâve been having is whether or not to do an exploded format. I think we have a general sense what such an exploded format would look like, and what its advantages are, and the question of whether or not to go with one has not come down to the fine details.
Personally, I think the fact that we donât have to design a second way to specify dependencies is one of the perks of going with PEP 508: those decisions are already made and we donât have to spend any further time debating them. I probably wouldnât weigh in on design decisions of an âexploded TOMLâ format until after it was clear that we were abandoning PEP 508.
One more argument for exploded tables:
-
Custom properties: If we allow this - and in my opinion we should do this - build-backends could use the exploded table to store custom properties (maybe prefix with something like
x-
that say, this is custom) they need for their work. For examplepoetry
has a flagdevelop
to indicate, if a dependency should be installed in editable mode. More are imaginable.
I think pretty much anything that involves expansions will require a corresponding change in PEP 508, as was mentioned in the somewhat related argument above.
Even for situations where itâs just information for the specific backend (and I would think that would probably be stuff that goes in the backend-specific configuration if itâs not actually standardized, TBH), we should be able to allow explicit flags in even the PEP 508 version in a potentially backwards-compatible way by accepting either a string or a list (or some other structure) for any given entry, so some variation on this:
[dependencies]
"requests"
{dependency = "something @ git+https://somewhere", x-custom="value"}
It depends. If itâs something related to how the dependency is described, yes. If itâs an additional information for a specific tool (in my example, poetry should install this dependency in editable mode), no.
This leads us to the question, already asked somewhere earlier in this thread, who is the audience of PEP 621 and the dependency specification. I see three groups:
-
Human beings
For me the most important audience. This can be people just want to look up information about the package or those who want to add/modify the metadata. The knowledge of these people can be quite different. It can be beginners or experts, they can have experience in other language ecosystems or not.
In my opinion, the best way we have, to satisfy most of the people is the exploded table, because it is self-explaining when reading and are familiar to those coming from some other languages. -
Tools using the metadata (except build-backends)
This is mainly already describe in the Motivation part of PEP 621. Those tools can be package manager that try to resolve dependencies. But that can also be tools that try to do some statistics on the metadata. Having in exploded table for dependency definitions, they can have direct access on the keys and there values, which makes it a lot easier.
Another use case is shown bypoetry
, which uses the dependency definition for building the package and manage the virtualenv. For the later one additional keys can be added to the dependency to provide more functionality. Without that possibility, users had to store those flags somewhere else, leading to more or less redundant data, as they have to define dependencies (at least the name) a second time. -
Build-Backends
Of course the build-backend needs the information as well. They need to create a PEP 508 compliant string for wheels. Concatenating the several parts from the exploded table is not a big thing.
The order of these groups I gave, corresponds to the importance I see. PEP 621 should be an interface. And so it should be made as easy as possible for people and tools to use it.
If itâs for a specific tool, it should be in the tool-specific section of pyproject.toml
, not in one of the standard-defined sections.
As you say, this is a matter of opinion.
Tools analysing metadata should be using the final METADATA file available in the wheel - or whatever ends up being the formal metadata in the sdist, once sdists are standardised. PEP 621 is not intended as a way to let tools avoid using the official source of metadata in those cases - and the official metadata spec mandates PEP 508 format. Tools will have to use PEP 508 for those situations, so using PEP 508 for pyproject.toml
is easier because you then only have one thing you need to implement.
Again, itâs mostly a matter of opinion which of those two arguments you find compelling.
Not having to do so is even less of a big thingâŠ
Maybe itâs time to run the survey. I propose these options:
I think we should use PEP 508
I volunteer to develop, test, validate, and specify a new PEP so I can vote for that one
(FTR, Iâm voting for the first.)
Thatâs a matter of definition. If we define: âHere are the dependencies for this package, the following keys describe the dependency [âŠ] and there might be additional information for different use-cases other then dependency resolutionâ, itâs absolutely fine to store them here, because all these information belonging to the dependency.
So whatâs the goal then for PEP 621? Iâm really interested in the answer! Because at the moment the PEP seems to contradict itself:
Once a build back-end has produced an artifact, then the metadata contained in the artifact that the build back-end produced should be considered canonical and overriding what this PEP specifies. In the eyes of this PEP, a source distribution is considered a build artifact, thus people should not read the metadata specified in this PEP as the canonical metadata in a source distribution.
But later:
When metadata is specified using this PEP then it is considered canonical [âŠ]
Sure. But throwing away all other advantages, just to make a quite easy thing more easy?
Having in exploded table for dependency definitions, they can have direct access on the keys and there values, which makes it a lot easier.
Not really imo: PEP 621: how to specify dependencies? - #144 by ofek
Youâd want an central algorithm for parsing, even if itâs as simple as
if 'git' in data: dep_type = GitDep...
. Otherwise we all have to implement the same thing for every tool.
In my opinion, the best way we have, to satisfy most of the people is the exploded table, because it is self-explaining when reading and are familiar to those coming from some other languages.
I think one of the major gaps here is that to some of us itâs obvious that the exploded table is more readable, but to others thatâs not obvious at all :-). How is one of these more self-explanatory to the naive reader?
# exploded table
[dependencies]
some-package = ">= 1.2"
another-package = {"version >= 0.3", python = "<3"}
# PEP 508
dependencies = [
"some-package >= 1.2"
"another-package; python_version < '3'"
]
I just donât see how slightly different quoting, or the choice of python
versus python_version
, is going to make one of them more self-explanatory for naive users.
I can see that you and @sdispater and others are passionate about the exploded form being better, so I know I must be missing something. Can you help us bridge that gap? What is it that youâre fighting for, that makes this decision so important to you?