PEP 621: how to specify dependencies?

pganssle · August 11, 2020, 3:39pm

I think we probably have as much information as we’re going to get on this. From all the polling at various levels of formality I’ve seen, the general public has very weak opinions on this, and the strongest opinion I’ve seen expressed is “pick a way to do it and tell us what that way is”.

I don’t think a general poll is a way to decide between two things with reasonable and compelling points on both sides — some of which are second-order effects. A more fruitful approach would be longer-term UX research where we sit with people and watch them work, but even that I think will be of only marginal value; it might solve some basic factual questions about where there are points of confusion, but it’s very unlikely that we’ll be dramatically surprised here.

The feeling I get from the community is that they want strong and decisive statements (not including dependency specification in the PEP would be a bigger problem than either individual choice), and that they want consistency. I also think that people would like it if we minimized churn.

The fact that we cannot even identify a mechanism to decide this question is probably more harmful than making either choice, to be honest. As I’ve said before, the consequences here aren’t that high, and it makes sense to default to PEP 508.

brettcannon · August 11, 2020, 9:01pm

I will say that I personally plan on figuring out some decision by the end of the month so this PEP can move forward to be accepted or rejected (I’m a bit swamped ATM to make a decision and try to drive this forward again for acceptance). At that point I will ask we figure out on how we will manage acceptance/rejection of this PEP (maybe a vote from pypa-committers since Paul can’t do it? Or Paul figures out who to delegate to?).

To be upfront, if I was asked to make a snap decision right now I would go with PEP 508 because it has the widest uptake in the packaging ecosystem, there’s no clear preference in this massive thread between the two proposals or in my informal discussions with people in other places, and in a relative tie like this I prefer to go with the status quo.

But I am in no way tied to this line of thinking. If the TOML/exploded-table supporters can figure out some way to show wider community support or some clinching argument between now and September I’m still personally open to push for the exploded table approach in the PEP (and I’m being purposefully vague in how to prove this so people have flexibility and because I just don’t know what demonstrable community support even looks like). But I believe the onus is on the exploded table approach to prove its superiority to PEP 508 at this point.

Or the whole PEP could end up getting rejected . In the end it’s just a PEP and an idea, but if normalizing data like this just isn’t what people want then so be it; I have had enough PEPs rejected over the years that one more won’t make me quit or anything.

pf_moore · August 11, 2020, 9:42pm

If anyone who isn’t a PEP author wants to volunteer to be PEP delegate, please speak up. @dstufft is the obvious possibility, as he’s a standing PEP delegate for packaging anyway. But I’m not going to nominate someone who doesn’t volunteer, as I don’t think that’s fair or reasonable.

I’m a little concerned about the idea of a vote. For something that’s been this inconclusive, I suspect a vote will also be close, and I’d be concerned about how we’d handle a split vote over something like this. Maybe that’s just my experience of Brexit speaking…

I’ve said it “might be inappropriate” for me to be PEP delegate as well as one of the authors. I honestly don’t know if it would (this is only a PEP, after all, why would I bother rigging the result? ) But I am willing to do it if necessary, as long as there is no-one who objects. I think the main points of contention have been the ones raised here, and the PEP authors haven’t had a unified voice on them anyway, so I doubt that me being a PEP author is any more of a problem in this specific case than me participating in a PEP discussion would be in the normal course of events. But I repeat - if even one person objects, I’ll step down. I’m not looking to push my preferred option here, and I don’t want to give anyone the impression that I am.

ofek · August 11, 2020, 11:00pm

Just FYI I’m really close to finishing the Hatch rewrite (v1.0.0rc1) and it implements this PEP, including dependencies / optional-dependencies as PEP 508 specifiers (for now). Thank you all for a standard way to define metadata

finswimmer · August 12, 2020, 4:59am

Let me reiterate the most common arguments for using PEP508 and my opinion about them.

PEP 508 already defines how dependencies should be defined.

That’s correct. But PEP 508 was described with use-cases in mind, where there no flexible data structure can be used, like a cli argument or a text-only file. Here it makes perfectly sense. Now we are using TOML which give us flexibility. No one needs to take care about the order of arguments and separators. Furthermore if PEP 508 is chosen to describe the dependency, we are mixing two different syntax, which is a bad design and will confuse especially new users.

PEP 508 is easy to read, write and validate.

If stated out by other if it’s easy to read and write is subjective. I would say it’s acceptable to read, but hard to write for beginners. IMO we should chose a way that’s also easy to write for beginners and if you ever have used a dictionary in python, than this is most likely the exploded table format.

People who say that a PEP 508 DSL can be validate easily mix “easy” and “short”. Yes, one can write a regex and validate the dependency description in one run. Regex is always error prone and you need a lot of exercise to read and write them properly. Whereas checking whether a specific key is available and extract it’s value can be done by everyone.

I don’t want to learn yet another language.

I find this argument a bit funny If one decide to describe the metadata according to PEP 621, they already have to use TOML.

uranusjr · August 12, 2020, 6:54am

I just thought of a potential issue with the exploded TOML table format. We are already using PEP 508 for build-system.requires, so if we invent a exploded TOML format for dependencies et al., we’ll need to allow that format in build-system as well to avoid user confusion. Which means that we either have to allow both formats, or deal with a very long deprecation tail since there’s already a significant number of PEP 517 documentation out there.

Personally, I still think exploded TOML is better than PEP 508 in a pure technical sense. But the more practical approach here would be to use PEP 508 (or allow both formats), given the existing usage in pyproject.toml. I find it sad (I really really dislike writing PEP 508 strings in TOML, especially in an array), but that’s life.

sdispater · August 12, 2020, 8:53am

So, after the PyPA somewhat endorsed Pipenv, and the Pipfile format, which uses the exploded TOML table format, and now that Poetry, which uses it too, is gaining traction, and after having taken into account that most languages out there uses an exploded form for their dependencies – making it a logical choice for a new standard – we are still ready to tell users that we are backpedaling on this because the status quo wins. It does not feel like a strong argument in favor of PEP 508 to be honest.

bernatgabor · August 12, 2020, 9:30am

This is a false perception. Neither the project pipenv or the pipfile format were endorsed at any point or level. There’s no somewhat endorsed status, only endorsed and not endorsed. Both of those were experiments carried out with the help of some PyPa members to see if it could and how it would work. The fact that it did not translate into an accepted PEP and that pipenv struggles to attract a full-time maintainer showcase the difficulty/struggle of that experiment. And again, same as with poetry, I don’t think users pick up pipenv for using exploded dependency tree. Or at least we should collect data on this before we make such statement.

sdispater · August 12, 2020, 9:39am

I does not feel like an experiment to me, and is the recommended tool for managing dependencies here Managing Application Dependencies - Python Packaging User Guide so it feels like an endorsement.

And I’ll reiterate, where do we want to fit in in the overall programming landscape? Do we want to keep doing our own thing or do we want to build something that people would be familiar with?

steve.dower · August 12, 2020, 1:38pm

The very easy rebuttal to this point is that that page describes application dependencies and PEP 621 is for library dependencies.

This does seem like a “game over” kind of point. Thanks for realising (we probably should have noticed it earlier…)

I’m withdrawing my proposal of deferring a decision here and saying firmly that we should go with PEP 508 style.

I’m also totally okay with Paul declaring on this proposal despite being a contributor. I’d offer to do it myself, but that would (a) take longer (I’ve only really engaged with the controversial parts) and (b) would probably be less popular, given how strongly I state my positions during discussions when I don’t feel at risk of being called upon to decide

sdispater · August 12, 2020, 1:50pm

I’m sorry but PEP 621 describes a project metadata and I don’t see any mention of “library”.

pf_moore · August 12, 2020, 3:10pm

As I’m sure you’re aware, packaging terminology around “project” vs “distrbution” vs “library” vs “application” is fairly fuzzy, almost entirely for historical reasons combined with a lack of commonly-available words to express nuances.

If you want to insist, I’d be fine with PEP 621 clarifying that it’s focused on pyproject.toml, and as such is not intended to be used for “applications” in the sense of that page (I don’t know how to word such a clarification, as I don’t really understand why you’d think otherwise, but that’s a minor detail). To be blunt, though, I think you’re stretching pretty hard here, and you’d make your point better if you dropped this insistence that having pipenv mentioned in the packaging guide implies that the exploded format is somehow therefore endorsed or superior.

I don’t think you’re going to sway people by arguing that poetry uses expanded format, poetry is popular, therefore expanded format is good. My feeling is that poetry is gaining popularity in spite of “doing its own thing” over configuration formats. People accept its approaches because it fulfils an important need that isn’t otherwise well satisfied, but IMO that says very little about the (relatively superficial) matter of its configuration format.

I think we should let the matter of “users prefer format X” drop. There’s no practically useful evidence on either side, and I’d much rather we focus on information that will help inform the decision that will ultimately have to be made here.

dstufft · August 12, 2020, 3:14pm

I think it’s fine if you do it yourself, but I’m also happy to take it on if you or others would feel more comfortable that way.

uranusjr · August 12, 2020, 3:34pm

To be clear, the game over choice is to only support exploded TOML IMO, not the exploded format entirely, IMO

Now is literally the only chance we can “ask the people” about their preference. Once one of the formats is picked exclusively, there is no turning back. And who knows, maybe people actually think it is a good idea to use different formats in build-system.requires and dependencies. Maybe they won’t even notice they are expressing the same concept and just accept the different syntaxes!

My personal preference would be to provisionally accept both formats, and see which one catches on.

sdispater · August 12, 2020, 3:47pm

So, we will specify a new standard that will only apply to a subset of Python projects, confusing users even more about which files they need for their project? I must admit that I am confused here.

I am not stretching but merely pointing out a contradiction about dependency specification format that can lead to confusion.

And yet, there are a few reasons why the expanded format was chosen for Poetry, that I mentioned before, and that have helped its development, features and extensibility.

And yet, this is the heart of the matter. People are used to an expanded format which is present in most programming languages, and particularly recent ones (Rust/Cargo, Ruby/Bundler or gel, Dart/pub, Elixir/mix, Java/maven, Crystal/shards), and there must be a reason for that, maybe it’s user-friendliness, readability or programmatic manipulation. By sticking to a custom DSL we’re just going against the tide without a real reason other than: “it’s the historic way”. I find that disappointing from an end user standpoint.

pf_moore · August 12, 2020, 4:47pm

If not all projects adopt the new standard, then yes. That’s unfortunate, but possible. I’d like it if projects that participated in this discussion ultimately accepted that the consensus decision was fair, and adopted the new standard, regardless of their personal stance, but we can’t make that happen. And we can’t choose a new standard that avoids this issue, because as you have repeatedly pointed out, we already don’t have a common format. Some projects, such as setuptools, use PEP 508, some, like Poetry, use an exploded format.

OK. Consider it noted. I think it’s probably been pointed out enough by now.

And they have been noted. What you can’t do, is assume that those reasons will immediately close down the argument in favour of the expanded format. The nature of a consensus discussion is that people should hear the arguments put forward, and make a decision knowing those arguments. Not that they can’t have different opinions.

If you’re concerned that the reasons the expanded format was chosen for poetry have been lost in the debate, I invite you to post them again, as a simple list of points, so that they are recorded explicitly and clearly. They shouldn’t need re-debating, but recording them explicitly seems fair. If I end up being PEP delegate, I assure you they will be considered (that would be the case even if I had to re-read the full thread, which I’d expect to do, but I do understand the fear that your arguments may have got “lost in the noise”).

But it’s not actionable. We have (I believe adequately) demonstrated that nobody knows what users prefer. We all have opinions, and they differ. We can go round the same cycle endlessly, but it won’t bring us any closer to making a decision.

I’m not saying that we should ignore the question. Just that we should not continue to rehash points that have already been made.

If you genuinely think that, then I politely suggest that you should re-read this thread. Many, many reasons have been given for the PEP 508 format being used. You don’t agree with them - I understand that. But please don’t dismiss the points others have made as “no real reason”, particularly when you’re expressing concern that your arguments haven’t been heard…

Your point is good, but I’m afraid that we could just be leaving ourselves with the same decision to be made later if we do this, and making the PEP provisional will deter people from adopting it as they “prefer to wait until it’s final”. So if we want to avoid this, how do you (or anyone else) propose that we collect data during the “provisional” phase to decide which one “caught on”? User feedback? Notoriously difficult to get, or to ensure it’s representative. What tools adopt? Which tools would you propose? How do we cater for questions of developer resources (e.g., if pipenv didn’t adopt PEP 508 format because they have so little resource that they never got round to it)? How would this be any different than picking a set of tools and allowing their maintainers to vote?

I’d be far more willing to accept the idea of provisionally allowing both if I knew what the proposal was for establishing how to finalise things.

steve.dower · August 12, 2020, 4:51pm

Modifying PEP 518 is not in scope here, I don’t think. Which means we can either choose to have consistent requirements formats within the same file, or contradictory formats. (By “game over” I meant to imply that there’s no further point in discussing it, because the answer is obviously consistency.)

pganssle · August 12, 2020, 5:17pm

As I’ve detailed elsewhere in the thread, while I wish a compromise like this could work, I think this is worse than either option. People will be confused by what is appropriate when and there’s a very good chance that what will “catch on” is whatever is most prominently documented in the tool or template users use. That would hardly be a good referendum on which is preferable, especially since — to the extent that it isn’t just a question of which documentation is more popular — this is also biased towards what is easier to write than what is easier to read, and we haven’t even really come to a consensus as to whether we’re optimizing for reading or writing, beginners or advanced users, edge cases or common cases.

I think there’s a good chance that the opinions people have about their preferences with regards to the exact format are very weak compared to dozens of other factors in play. I think we have a lot of evidence that to the extent that “the users” prefer one or the other, it is not a major decision-making axis.

sdispater · August 12, 2020, 6:09pm

Ok, I’ll write them down again with some examples:

Explicitness/readability: in the expanded form, at least the one supported by Poetry, every element is clearly named so it’s easier to make the distinction between a directory, a file, a git or a url dependency.

foo @ bar/foo
foo @ bar/foo.whl
foo @ git+https://example.com/foo.git@sha1
foo @ https://example.com/foo.whl

in TOML you could have

foo = {path = "bar/foo"}
foo = {file = "bar/foo.whl"}
foo = {git = "https://example.com/foo.git", rev="sha1"}
foo = {url = "https://example.com/foo.whl"}

As a readability bonus you get syntax highlighting which you don’t have with PEP-508.

Programmatic manipulation and reading: with the existing tooling there is no way to know immediately if a dependency is a directory, a file, a git repository or a url.

>>> from packaging.requirements import Requirement

>>> req = Requirement("foo @ git+https://example.com/foo.git@sha1")
# You just have a url attribute that does not tell you anything
# unless you parse it
>>> req.url
'git+https://example.com/foo.git@sha1'

With a TOML format you get

>>> import toml

>>> req = toml.parse('foo = {git = "https://example.com/foo.git", rev="sha1"}')
# req is a standard dict with all the information needed
>>> req
{'foo': {'git': 'https://example.com/foo.git', 'rev': 'sha1'}}

Consistency with what exists in other popular languages: this one has been missed a lot (even by you in your response) while I think that’s one that should be analyzed in more details. There must be a valid reason, and not a simple coincidence, why so many languages, tools and communities have settled on an exploded format. I would like to think it’s due to some of the points I already made but there may be other reasons that I am not aware of. But in the end, we, and the users, would gain a lot by incorporating this “standard” into Python.

pf_moore · August 12, 2020, 6:18pm

Thank you. All of those points are noted.

I will respond to this one specifically. I haven’t missed it, I just don’t know what I’m expected to do with it. Yes, some other languages use an exploded format (I’m not familiar with enough language ecosystems to know how high a proportion, but I’ll concede that it’s a non-trivial number). But you say “should be analyzed in more detail” - fine, but no-one is doing that analysis, and I definitely don’t think that blindly copying what other people do “just because they do it” is any better as a justification than many of the other arguments you have objected to. So, in the same way that I challenged the “support both” proposal, I’ll challenge this - who is going to do that analysis, and when can we expect the answers?