Wait, does flit use the expanded table? I thought it used PEP 508.
I definitely meant PEP 508 style. Hopefully Paul is just bringing in some additional concerns, and not disagreeing with me while also agreeing with me
Wait, does flit use the expanded table? I thought it used PEP 508.
I definitely meant PEP 508 style. Hopefully Paul is just bringing in some additional concerns, and not disagreeing with me while also agreeing with me
Apologies. Iâm not familiar with flit so I misread âthe DSLâ as meaning table-style.
Apparently I was agreeing with you while confusing the issue. Sorry
Thanks for bringing this up. I made a similar suggestion in the pre-PEP private conversation, but it never managed to catch on among the authors. Letâs give it another shot since I still believe this is an acceptable middle-ground.
The only difference in my proposal was to use this form in the simplest example:
simple = "" # Instead of "*"
The main reason I prefer this is itâs easier to create a PEP 508 string from the table:
from packaging.requirements import Requirement
requirements = [
Requirement(f"{key}{val}")
for key, val in pyproject["dependencies"].items()
]
Yeah, while I was in favour of the star for Pipfile, I now agree it would have been better to stay fully PEP 440 compliant and use the empty string for âany versionâ.
If PEP 621 opts to use the empty string, I expect Pipfile will follow suit, making the explicit star purely a backwards compatibility feature.
As the author of Poetry some might say that I am biased but there is a reason why I did not choose to go with the PEP 508 specification: User experience/friendliness.
We are all seasoned Python developers here and we somewhat know how to read a PEP 508 string but I assure you that for newcomers and people coming from other languages this is far from the case. And reading a complete specification to even remotely understand what it does is proof of user unfriendliness.
What we are talking about here is user facing metadata, which is completely different from metadata that end up in the distributions, and should be readable and intuitive.
This becomes more apparent in cases like this:
name [fred,bar] @ http://foo.com ; python_version=='2.7'
Compare that to (Poetry example):
name = { url = "http://foo.com", extras = ["fred", "bar"], python = "2.7.*" }
To me the second one is much more explicit and readable. And there is no awkward parsing to be done, the parsing already happened at the TOML level.
And like others mentioned, if only the version range needs to be specified, itâs possible to reduce it to a single string. Thatâs what Poetry does: Dependency specification | Documentation | Poetry - Python dependency management and packaging made easy
And Iâll repost something I said elsewhere:
To me it seems Python tries really hard to be an exception in the overall programming languages ecosystem. Almost every language that I know of represents its dependencies as a dict-like structure (which helps tremendously to know if a dependency is declared by a given package instead of going through a complete list of requirements) and some of them have support an exploded representation for complex cases (Cargo, Dart and Ruby come to mind).
Hey, Iâd said I like it.
Iâll also repost something Iâd looked up and mentioned earlier (albeit not on discuss.python.org):
One thing to note about these forms is that PEP 508âs is the only representation that doesnât mention âgroupsâ/âextrasâ (or equivalent) â at least, amongst the dependency specification formats I could look up easily.
The only difference is additional punctuation, a (likely redundant) âextrasâ, and a definitely redundant âurlâ.
PEP 508 syntax also supports or
/and
expressions in environment markers. That wonât work in the example you gave from Poetry without using an alternate table (which I assume exists, but means you have redundant syntax in there).
This is an interesting idea: what if environment markers had their own tables? That would allow reuse of platform detection expressions without copy/paste, but doesnât complicate the simple version constraint cases.
This is basically what setuptools used to do when environment markers are keys in extras_require
, but I thought people decided itâs not a good idea and chose the PEP 508 approach instead
Yeah, Iâm fairly neutral on that idea, but I donât see anything else that compelling from any other ecosystem.
The biggest advantage for new users of the expanded table format is that theyâre more likely to guess that thereâs another key they could use without having seen it before. Theyâll still have to look up a reference to find it, but if theyâve only ever seen simple version constraints then they wonât realise that URL references or environment markers even exist (same applies to any table format with a âsimplifiedâ mode).
Some of the comparisons ought to be done on libraries with a lot of dependencies. Thereâs only limited difference between the two formats with a single specification, but once you multiply it by ten or twenty the PEP 508 syntax is much more efficient and no harder to read.
(Also, itâs much easier to include an existing specification by reference than creating a second specification. We also wonât have to spend the rest of time explaining why itâs arbitrarily different from what weâve previously agreed on.)
I think you might be missing the point being made here â this redundancy and use of structures that are from the syntax of the language being used (TOML here, YAML for dart, Ruby code for Gem) is exactly what would make this more user friendly, compared to a DSL (PEP 508) that works well to compactly represent the information.
Iâll reiterate that the audience for this includes not-seasoned Python users, who havenât heard of PEP 508 or know what that even looks like.
But they are already fluent in TOML?
And theyâve managed to get to this point in packaging without looking anything up?
This is not a place where we need to optimise for untrained users. Readability-in-context is the highest criteria here, along with ease of finding the reference material (which hopefully will be well served by searching for the filename and section heading).
What are all of the criteria? What are their levels of importance?
I appreciate the intent of making the list, but Iâm not going to indulge it, sorry
Iâve been a professional tools developer for nearly a decade, and at this point Iâm convinced the three most important things to have are strong gut feel (that you can back up with logic, as I did above), a broad range of anecdotes (to push back against gut feel from directions you may not be used to) and a commitment to whatever you decide (or else itâs doomed to failure).
Trying to turn it all into a ranked list is tempting, but has never been significantly helpful in my experience. (Sometimes items on the list come with helpful anecdotes, though. And occasionally the list is a good tool to persuade someone else that youâve done more work than they are willing to do to argue with you, but thatâs politics, not design )
We definitely have all three of those things here. The only one I worry about it commitment to the final solution, which includes following through with docs, tutorials, talks, advocacy, etc., and not immediately creating an alternative project just to prove a point. If we can do that part, either approach will work fine.
They probably didnât get there without looking anything up, but from my experience most likely got to this point only looking up the very bare minimalâeven to the point experienced eyes may not understand how that bare minimum is enough to get them there. I canât say exactly how many percentage of the users are in this category, but pypa/packaging-problems has many examples on this. I tend to agree that untrained users are not the top priority here, but they are also derfinitely not a problem that can be hand-waved away.
This also reminds me of pypa/pip#8285, which describes how pip fails to pick up the extras from 'setuptools_scm >= 3.5[toml]'
, reported by @bernatgabor. It turned out to not be a bug, but I failed to diagnose the problem immediately. Maybe you can, but the point is that PEP 508 is evidently not easy to get right, even for experienced Python developers with years of context under their belt.
I agree to a point, but nor should we optimise for non-experts, when those non-experts are likely to have many other things that theyâll struggle to deal with.
Better documentation and tutorials, and more accessible examples of âhow to set your project/workflow upâ, would be of far more use, I suspect, to the average ânon-expertâ than simplified syntax for one fairly niche area of packaging. The problem is that no-one is writing such documentation, no-one is (to my knowledge) sharing problems and solutions, and people are picking stuff up cargo cult fashion.
Thatâs maybe a really good area for funded work - developing newcomer-friendly guides and tutorials for packaging. We could get specialist technical writers and trainers to produce such things, rather than (as we currently do) relying on packaging specialists who are too close to the problem to know how to present things.
Honestly, that probably mainly demonstrates that tools like pip should validate their inputs more eagerly and bail out on errors. (It may be that âcompatibility with legacy formsâ is the justification for not doing so, but maybe we should just bite the bullet and start rejecting invalid syntax).
I agree that PEP 508 syntax isnât simple to get right (but nor is regex syntax, for example!) but tools not telling you when you made a mistake is not going to help. Once you know you made a mistake, you can look up the syntax. Itâs when you think you got it right, but the tool veered into âundefined behaviourâ territory without warning, that you have problemsâŚ
Never suggested ignoring them completely, though I think we can ignore untrainable users. But untrained users will need the resources to become trained, and enough context to locate those resources.
As one example, imagine we could use backticks in Python to perform âis Noneâ checks. A user seeing that for the first time canât google for backticks, and may not even know what name to use. Whereas âis Noneâ can be searched for. This makes self-training possible.
In this case, we should have a section header (âdependenciesâ) and a file name (âpyproject.tomlâ) that can be searched for, and then we make sure something useful shows up in the results. This covers not-yet-trained users, and makes the actual syntax less relevant to how we address that particular audience.
In contrast, if every section header was the dependency name, then you couldnât search for it, because youâd find the package itself rather than the guides. That would be one way for us to get this wrong, and again, the actual syntax of the dependencies is irrelevant.
To me, I think the most important point here is that we already have PEP 508 syntax which will be used in requirements.txt
, tox
and tons of other places where dependencies are specified in something other than a TOML file. If PEP 508 were super complicated, then maybe it would be justified to add the additional mental load of a second specification, but by and large itâs not that complicated.
The parts that are complicated tend to be complicated on the âwriteâ side rather than the âreadâ side. So:
name [fred, bar] @ https://foo.com; python_version=='2.7'
I donât think I would intuit that syntax or the ordering or anything, but I think that if you see something like this itâs obvious what it means in most cases. name @ https://...
is pretty obviously "pull the package name
from the url https://...
. The ; python_version=='2.7'
is obviously some sort of filter on the Python version, and I think you can easily infer that it means âThis rule only applies when python_version
is 2.7. The only possibly confusing thing is the [fred, bar]
but the names people choose for their extras and the fact that []
is often used to delimit optional things almost always makes it clear that these are feature flags of some sort, so coverage[toml]
is very obviously "coverage
with toml
supportâ.
I think the case for making people learn two ways to write down their dependencies is very weak, particularly since I know that I for one would have to look up the names of the keys every time I wrote anything remotely complicated anyway (as I frequently do when writing Rust).
And thatâs invalid PEP 508 right here
>>> from packaging.requirements import Requirement
>>> r = Requirement("name [fred, bar] @ https://foo.com; python_version=='2.7'")
Traceback (most recent call last):
File "packaging/requirements.py", line 98, in __init__
req = REQUIREMENT.parseString(requirement_string)
File "pyparsing.py", line 1955, in parseString
raise exc
File "pyparsing.py", line 3814, in parseImpl
raise ParseException(instring, loc, self.errmsg, self)
pyparsing.ParseException: Expected stringEnd, found 'p' (at char 36), (line:1, col:37)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "packaging/requirements.py", line 100, in __init__
raise InvalidRequirement(
packaging.requirements.InvalidRequirement: Parse error at "'python_v'": Expected stringEnd
You need a space after the URL, because URL can contain semicolons:
>>> r = Requirement("name [fred, bar] @ https://foo.com ; python_version=='2.7'")
>>> r.url
'https://foo.com'
>>> r.marker
<Marker('python_version == "2.7"')>
I donât know⌠maybe PEP 508 is super complicated? I get the point itâs on the write side, and tools can catch the error for you before itâs read by anyone else. But I just canât help but see the irony here. And I also get you only need get that wrong once, but at some point that one write can be more frustrating than all the readability difficulties combined.
IMO, PEP 508 is relatively simpleš for simple cases, but it gets complex and messy for complicated situations. Conversely, the table form scales up better to complicated cases, but (1) itâs a second format for people to learn, (2) itâs more complex than PEP 508 in simple cases, and (3) itâs not immediately obvious to me that TOML syntax doesnât add its own layer of complexity here for people to learn (the string syntax is sufficiently close-but-not-the-same to Pythonâs that I can see that causing issues).
Realistically, this is the same old question - are we trying to optimise for âsimpleâ and âcommonâ cases, or for worst-case situations? Can you find an example as complex as that name [fred, bar] @ ...
case that has actually occurred in a real-life requirement specification?
Personally, Iâm mildly in favour of PEP 508 syntax because itâs âone less thing to learnâ (and specifically, I donât need to learn a new way to spell pip >= 20.0
or foo[extra]
²). But Iâm strongly against deciding based on complicated examples that will almost never occur in real life.
A survey of how the two forms would look if used in a number of common projects on PyPI would IMO be far more useful than spending a load of time debating edge cases.
š I appreciate I am biased here, but Iâd argue that the effect is mainly to adjust the precise point where things get âcomplicatedâ rather than fundamentally affecting my argument.
² I went hunting, but couldnât find a good example of a project with extras that youâd put in a dependency - maybe I didnât have enough imagination, or maybe even extras are on the âunusually complicatedâ side of things???
Regardless of my comments above, yes it is ironic, and I apologise for taking your example over-seriously
I mean, thatâs not âsuper complicatedâ, itâs a syntax rule. By the same token you could call TOML âsuper complicatedâ because it forbids multiline inline tables, so Iâd have to rely on the tool to know that:
name = {
url = "https://foo",
extras = [fred, bar],
python = "2.7"
}
is invalid.
Presumably there would be other gotchas, like maybe that python = "2.7"
canât be python = 2.7
or something of that nature.
I think in the end the fact that I had a typo in there that could have instantly been caught by the tool doesnât invalidate my point at all. Especially since my point was that it is acceptable for a DSL like this to be a little hard to write.
TBH, you acknowledge that my point was, âYeah it might be a bit hard to write but itâs easy to read.â Itâs not like a big âgotchaâ if thereâs a minor failure when writing basically the most complicated format PEP 508 has, considering itâs still very simple to read it.