I don’t think this is an issue becuase even in your specification validating e.g. the version strings are valid requires an extra step. So given that you already need an extra step, that extra step might actually do this operation too automatically.
AFAIK the only part of PEP-598 that doesn’t specify the key explicitly is the extras. You get just as explicit with platform/python version keys. So in this sense the table format helps a bit, but just for extras.
The validation the TOML offers is at best a light validation. You can specify a lot of entries that’s valid TOML, however incorrect python specification. Package names, version and python requirement specifiers come immediately to mind (but could go onto what happens when someone passes a list instead of dict as value in the table, or uses integer as key instead of string, etc). Considering for good UX you’d want to validate all these you are already looking at an extra dependency.
We should improve things, but we need to balance benefits against the price we as a community have to pay to get there. And in this case from what I seen until now we’re talking about marginal benefits with significant resource investment to get there.
We need to work with what we have, not what we wish we had. Going down the TOML path will put strain on the entire ecosystem, not just 1-2 tools in Python, sadly.
I agree that the TOML approach has advantages when you’re utilizing multiple features of PEP508 at once. I think it’s either slightly worse or about the same in the simple cases.
Like I said. I don’t think this is a situation where either solution is just better across the board for the end users experience. I think you can construct real world situations where either one “wins” depending on which aspects you personally want to optimize for.
If one solution was just better in every situation I think you’d see a lot more enthusiasm for standardizing on that one solution. When there is no clear winner, status quo is typically the winner.
I would be very interested in hearing from people like @rhettinger who teach or have taught Python professionally. Do any of us here have regular interactions with newcomers outside of bug reports?
I think we have acknowledged that everyone in this conversation is bringing biases based on the tool(s) they maintain and what that tool currently supports. That pretty much guarantees a clean answer will not happen among ourselves.
So, how do we want to settle this? A poll here that we promote as widely as possible? A bunch of individual polls where we then come back with the results? We reach out to trainers and teachers and ask them to talk to their current classes to see if beginners have a true preference?
I think that the unstated assumption here is that we don’t realize what people find difficult about using our tools because we don’t find it difficult, but I think it’s actually really hard to get the information we want here.
In some ways, people already involved in Python packaging are the perfect people to ask about it, because we’re the ones dealing with a diverse group of users, dealing with bug reports, etc. We also tend to do the most complicated things with packaging and know the right way to do things. Also, many of us were motivated to get involved in packaging to fix problems we had ourselves.
I am not saying we should ignore our users (quite the contrary), but I also think that we need to acknowledge that often if you ask beginners whether they like X or Y, they’re often making that choice without a deeper context, and in the end we would get a worse experience by taking them at their word. (I say this as someone who has made UX suggestions that were accepted in beta tests and come to regret them enough times to feel hesitant about it.)
I don’t think we should be polling people for preferences. I think if we can agree that the question of which way to specify dependencies comes down to (or would be significantly be informed by) a disputed factual question, we should come up with a strategy to determine the truth.
That said, I’m not convinced that our differences really come down to factual questions. People don’t usually complain or have problems with PEP 508 and they don’t really complain about poetry’s way to specify dependencies. That suggests to me that both are good enough and that people won’t be actively confused by using either one. Given that PEP 508 is already standardized, already in wide use, we cannot deprecate it in favor of a TOML-based system (which won’t work with all config systems) and people will need to learn it anyway makes me say that a tie should go to PEP 508. Maybe the result would be different if we were designing this from scratch in a vacuum, but I find it much more plausible that people will lament the proliferation of ways to declare dependencies than they will lament the fact that we’re using a compact DSL.
I haven’t formally polled this, but my gut-feel (based on many conversations, for example, I’ve just spent an entire day at EuroPython chatting with attendees about all kinds of stuff) is that the most popular approach would be “just pick one and tell me exactly what to do”.
And I agree with Paul (all of the Pauls ): existing standard wins over writing a new standard.
Both pipenv and Poetry changed the metadata format and I didn’t see any backlash for that. So I think people are more than willing to follow any new standard.
And since we are specifying a brand new standard anyway I don’t see why we couldn’t go all in.
Note that last I heard, the long-term plan is to deprecate the concept of “extras” and make them into regular packages with [brackets] in the name. (So e.g. we’ll have a requests[security]-2.24.0-py2.py3-none-any.whl, which is a regular package that contains no code, and depends on the appropriate versions of requests + pyOpenSSL.)
So it doesn’t make a lot of sense for a new format today to hardcode the string extras, or to split up the extra from the package name.
It is indeed off-topic, but what is relevant here is that the concept of extras is in discussion (more precisely, there have been some comments, but no-one has really had the energy yet to make anything of it). We may end up looking at something more like how Unix distributions do things, with “recommends”, for example.
The term “extra” is pretty well-known in the Python packaging community, so even if we decide to, changing it won’t be easy - but promoting “extra” to a named element of a dependency specification (rather than just a specific bit of syntax the way PEP 508 has it) will make it even harder to “rebrand” the idea.
I don’t have a particular opinion on the matter here (other than saying that I find the terminology around extras pretty confusing, personally) but I think that’s the key thing to take away from @njs’ comment.
[mod hat on] Please don’t make this kind of content-free judgement about other folk’s work without even knowing the details; it’s not helpful for having good technical discussions. [mod hat off]
FWIW, my personal feeling is that most of this discussion is a red herring / classic bikeshedding. IMO the two genuinely complicated parts are learning the operators for version comparisons, and mini-language for markers. These still exist in the “exploded” version, they just have a TOML-shaped frame around them instead of a PEP 508-shaped frame.
I appreciate the comparison here, but to me it’s pretty underwhelming once you fix the spacing in the PEP 508 version to make it readable. There are a few edge cases that might favor one or the other, but seriously, who cares about "cachy ~= 0.3.0" versus cachy = "^0.3.0".
I think the only reason other languages use TOML/JSON is because they started out using those formats for all their metadata, so it wasn’t worth the bother of specifying a complete DSL for dependencies. Python OTOH doesn’t have that history and already has the DSL specified, so we might as well use it.
Anyway, that’s my 2 cents. IMO the most productive thing would be to pick one and move on
That said, I guess there is a substantive question here, which is how to encode extensions to PEP 508. AFAIK every pinning format has a mechanism for doing things like requesting in-place installs, allow dev-dependencies, specifying hashes, etc., and it would be good to have a place to put those. Maybe that should be a separate thread though, since this one is so deep in the weeds?
I definitely could have been clearer about explaining it, but this is the key benefit pipenv/Pipfile get from moving the package names out as TOML table keys: it allows the value to be a string for a plain dependency, or an inline table for something more complicated.
That said, it would be reasonable to say that these standard fields are intended for the kind of dependency metadata that can go into a built package, and anything like editable installs will remain in tool specific formats for now.
Interestingly, many tools using the exploded form for specification from the beginning still invent a DSL at some point anyway. Cargo, for example, uses exploded TOML, but dependencies in Cargo.lock (which is also in TOML!) uses a string form of the same specification. Bundler uses a Ruby-based DSL for specification, but the DSL is compiled into a string form in Gemfile.lock, despite the rest of the file is in a YAML-ish custom syntax that clearly supports structured data.
I don’t know their reasoning behind it, but it does not seem that strange to me in practice to have different dependency specification formats for humans and machines.
Thus far the discussion on specifying dependencies is mainly on the format. I think another topic related to specifying deps is important to have at this point, and that is whether the chosen format should be “ready” in case we want to support specifying native dependencies or even consider cross-compilation.
One thing I would really like to see in the future is a a field where one can list the run-time executables it expects, as well as run-time libraries that are dlopen’ed. In Nixpkgs we commonly need to patch such references, which is quite a pain. With conda that should also be done, though I don’t know how much effort is put into it there. I imagine for other distro’s it also becomes easier this way to compute what native packages should be added as dependencies.
I imagine starting off with two sets:
dependencies-runtime-executables is a list of executables
dependencies-runtime-libraries is a list of libraries
Then, in the code, instead of directly using say subprocess.run with the executable name or find_library with the library name, one uses something like dependencies.executables("bash") and dependencies.executables("archive"). The dependencies are stored, like we do now with entry points, in a file in the dist-utils folder. That way downstreams can patch the references in that one file.
Why did I not include build-time dependencies here? Because those are the responsibility of the back-end. One could argue that’s the case for the pure-Python dependencies as well, but those are also solely run-time dependencies.
That’s an entirely different topic, and is likely to spark a fairly extensive debate (as far as I know, general dependency management is a significant and not-well-solved problem that goes far beyond Python).
I don’t think there’s much value in trying to make this spec “ready” for such a hypothetical future extension. Rather, we should initially focus on the question of how we’d persist such data in package metadata, and only when we have an agreed specification for that (and ideally, a working implementation so that we have an assurance that the spec is usable in practice) should we worry about how we handle the data entry side of that.
If you want to start that discussion, feel free, but please create a new topic for it.
Here are my 2 cents. I vote strongly for the exploded table format. And this is why:
The draft of PEP 621 already uses TOML syntax. Anyone who will use it, has to get familiar with that syntax. Once you get the quite simple concept of key-value, the user experience gets confused, if one suddenly find a (more or less) long string that contains multiple information.
For me consistency is important for a good user experience. TOML was chosen for good reasons for PEP518. It is the favorite format for the suggested PEP621. So for the sake of consistency we should follow this road consequently.
It’s easy to extend. If it’s necessary, new keys can be added or obsolete keys can be removed from the table. No hacky regex needs to be adopted, to be able to read the necessary information.
It’s imaginable to allow tools to add additional information (e.g. with a key like x-some-key=) for there work without breaking something.
In my opinion the argument, that one can copy&paste a PEP508 string to pip doesn’t count, because this should be out of scope of this discussion. CLI should define there own way, how the arguments should look like, as this is a question of user experience and here it is absolutely ok to have different tastes.
Is the PEP supposed to define a minimum spec, where extra fields are ignored (and tools can’t rely on their existence), or a complete spec, where an error should be raised if extra fields are found?
If tools need further information about a project dependency, it would be nice to have it all in one place, rather than in an external file or in another section of the pyproject.toml. Adding this information in a string-style (PEP-508) specifier would be unwieldy.