Split tags into a separate package

uranusjr · August 19, 2020, 7:31am

I feel wheel is a special case here. It having dependencies wouldn’t be an issue at all in a perfect world where every package uses PEP 517 and is built in isolated environments, but alas, we’re stuck with people requiring it in their runtime environments.

We need to be careful not to fall into the XY problem. The focus should be put on reducing dependencies of wheel so it works better in the imperfect world, not reducing dependences of packaging because it is depended by wheel. I am personally supportive toward dropping the pyparsing dependency from packaging because I never liked it (as I said earlier), but that should not be treated as the solution since it is neither the actual solution to the problem, nor achivable short-term.

bernatgabor · August 19, 2020, 7:32am

Why I agree that packaging may be more lightweight, any reasons why we do not want tag to be in its own package? Feels to me it would be a very low maintainance overhead because it implements a PEP, needs little extension unless a new PEP is submitted? And my reasons for splitting it out are spead and resource. Even if someone drops pyparsing as dependency wheel would probably need to pull in at least 5X more resource (disk space + network space) then needed (and this space is spent for every virtual environment created going ahead). And for those arguing we’re going with left-pad, I think it’s not the case, we are far from that. Tags is few hundred lines, and implements a well defined standard.

pradyunsg · August 19, 2020, 7:33am

Request: Let’s move the virtualenv discussion over to What makes bootstrapping virtualenvs difficult??

bernatgabor · August 19, 2020, 7:37am

My comment above has nothing to do with virtualenv. It’s actually aiming to bring back the discussion to the initial point, tag parsing is simple/complicated enough to live on its own and we should split it out from packaging. Along the way we identified that packaging can drop pyparsing/six but that’s IMHO off-topic here.

pf_moore · August 19, 2020, 8:17am

The same could be said of specifiers, versions or markers. They are all things that implement a PEP, they are all stable, they are all too fiddly to do “by hand”. The argument for splitting out packaging.tags, when divorced from the question about wheel, is simply the debate about how granular packages should be.

Personally, I think Python gets this mostly right. I don’t want to end up in a situation where library APIs are so tiny that virtually every function is a separate dependency to manage. And in particular, I think packaging is a very reasonable size at the moment (if we ignore the point about wheel).

pradyunsg · August 19, 2020, 8:54am

The argument for splitting out packaging.tags , when divorced from the question about wheel, is simply the debate about how granular packages should be.

And, in case it isn’t clear yet, I don’t think packaging should be broken up into smaller packages.

brettcannon · August 19, 2020, 7:03pm

As the creator of packaging.tags and thus as someone very likely to get pulled into helping maintain it as a separate project, the reason is overhead. We all know that keeping a project running has costs, from cutting releases, etc. so it isn’t free no matter how miniscule it is. And as of right now the packaging team is big enough to make keeping packaging.tags up-to-date not too bad. But break it out on its own and I don’t know if that “ease” will continue.

The interface is in a PEP, but actually inferring what those tags should be is very much not specified. It took me months to figure it all out and to make sure it made sense (hence why I found so many tags that e.g. PyPy was getting left out of).

And looking at all of the packaging.tags issues shows to me that the issue rate is not zero and has been pretty consistent since it started being used by other tools.

As a microcosm, sure. But what about the total dependencies pulled in by the typical (or even small) project? Is this still going to be the biggest contributor to disk space and network usage? I honestly think putting in the effort into Improving wheel compression by nesting data as a second .zip and Making the wheel format more flexible (for better compression/speed) would lead to greater gains than what we are discussing here (and I’m saying that as the person who sparked those discussion precisely to try and help with disk and networking utilization).

I would be supportive of such a PR in this specific case.

dstufft · August 19, 2020, 7:08pm

As the person who wrote the PEP 508 parser, I don’t think it matters if we use pyparsing or not. I used it because I felt it made implementing and maintaining the parser easier, but that came with the trade off of having a dependency. I still think the readability of the parser is important (and honestly, I’ve wanted to switch the version regex to a parser before too), but if folks feel the trade offs aren’t worth it, then that’s OK.

bernatgabor · August 20, 2020, 10:10am

My main worry is my microcosm, as maitainer of virtualenv I look at this from POV of virtualenv. And for virtualenv as detailed above pulling in pyparser+packaging for every run is non-neglibable. But given people don’t feel that bad about virtualenv getting slower I guess we’ll leave things as they are then for now

pf_moore · August 20, 2020, 11:55am

I’m not sure that’s a correct characterisation. People do dislike the idea of virtualenv getting slower (I do, for a start!) The problem is that there are trade-off here, and people have different priorities. Personally, I’m less convinced that wheel should be installed in every virtualenv, because I tend to not care so much about building from source. But I understand why others feel that’s important.

If you’re saying that you don’t plan on worrying about, or trying to reduce, the performance cost of the extra dependencies, then that’s fine - that’s your call as virtualenv maintainer. But I don’t think it’s fair to imply that virtualenv creation got slower because the packaging maintainers weren’t willing to split tags out into an independent package.

bernatgabor · August 20, 2020, 1:31pm

Compared to the status quo the only performance regression expected is wheel taking extra dependencies via packaging (much of which space wise does not need), so not sure what’s fair or not. There’s no implying here, this is fact. That being said wheel has the prerogative do so, and virtualenv has no choice but pay those performance hits as consequence. The idea of both the wheel and virtualenv maintainer was to split tags out of packaging to ease this impact. The maintainers of packaging voted against it (which again they have right to do so). Not sure who is to be blamed here, but it is where we stand.

brettcannon · August 20, 2020, 9:17pm

I don’t think “blame” is a fair term to be using here as it makes it out like someone is purposefully trying to make this a bad situation. It’s all just an unfortunate collision between legitimate reasons: virtualenv’s semantics, wheel trying to minimize how much code they maintain, and the ‘packaging’ team not wanting to set up another project to have to maintain.

bernatgabor · August 20, 2020, 10:36pm

I agree, blame was a bad word choice. The regression for end users in performance behaviour still remains though.

steve.dower · August 20, 2020, 11:55pm

Maybe add an extra that omits (via an environment marker) the pyparsing requirement?

Or include the pyparsing requirement through a default extra that wheel can then exclude when it adds its dependency?