Twitter thread re: Big Picture

I’m a bit lost: what are we supposed to discuss about?

The idea or hope was for people participating in the Twitter discussion (at least some of them) to continue their discussion here. It’s very hard to follow the discussion on Twitter, and important ideas IMO were being expressed. But if they don’t do so, then I’m not sure the discussion would be worth having. (I do encourage you to try to read the thread, though, if you can track it all down.)

(And yes, I did tell the people on Twitter about this post.)

1 Like

Some discussion started on Structured, Exchangeable lock file format (requirements.txt 2.0?)

Discussions are difficult to move when people keep replying in the original place, and I actually quite enjoyed the rapid pace of Twitter for this one (though nothing being discussed was new to me).

It’s probably best to pick the topic that’s worth following up on and starting that thread. For packaging, I’m not sure what that is right now. Maybe a vision doc and definition of target audience for the things that PyPA does, and an honest evaluation of which tools currently satisfy that audience (in two clear steps - one thing that came up was that we need a lot of peoples input on who target audience is).

This may also not make much sense into the same thing is done for Python itself. It sounded like at least one council member (Carol) wants to do this, so it may happen.

That’s because you were not automatically included in every reply. :stuck_out_tongue_closed_eyes:

2 Likes

Really? There was more that I didn’t see? I’m sure I had 99+ notifications (certainly saw 30+ more than three times)

For those (like me) trying to keep up with that Twitter thread, I’d suggest https://github.com/paulgb/Treeverse which helps make the heavily threaded nature of the discussion easier to visualize and read.

FWIW, I don’t want to shift the discussion here anymore. I think it’s a good idea to link to that thread from here (visibility, discoverability etc) and vice versa. We should probably note down any actionable items from the Twitter discussion into a single place and this thread can serve for that.

1 Like

One of the things I learned from that thread is that there are people and groups of people (companies, etc) that should be talking to one another but aren’t – to the larger detriment of the Python community and ecosystem. For example, here is one comment Nathaniel made last night:

this libstdc++ thing sounds way worse than I realized. Are they literally vendoring their own libstdc++? And then also doing something to break all the encapsulation stuff we do? I thought they were still running auditwheel to get isolation, despite using the wrong toolchain

which lead to this Warehouse issue (#5420) being filed.

Where should those discussions that need to be happening take place? Twitter? Among individuals rather than in a more open forum? I know we can’t force people to talk about things on Discourse, but I thought this was the forum we set up for that purpose?

Independent of that, can someone with a good understanding of the issue give a bird’s eye summary of what the larger issue is?

Like, here is one comment from Peter Wang (Co-Founder & CTO of Anaconda) that drew my eye:

I don’t have any silver bullets to suggest. I’m merely cautioning against “good enough incrementalism” because I don’t think it’s enough to get us across this adaptive valley, at the speed that our users need.
Maybe it’s a time for a long, hard look at Python’s module/pkg system.

Incidentally, Peter was also a candidate for the Steering Council.

His comment was both a bit concerning to me, as well as mysterious, because I didn’t know what he was getting at by the last sentence.

1 Like

Amusing, at a time when people are promoting the supposed improvement of linear discussions (Discourse-like) over traditional threaded discussions.

@pzwang You’re being mentioned here :wink:

1 Like
Click to see a tree visualization of the twitter thread, from about an hour ago.

Yea, I had a similar take-away from this. This is a conversation to be had – how to better communicate with folks external to PyPA about Python Packaging. I remember there were some discussions regarding this elsewhere but I’ll avoid digression here.

This category was set up as an equivalent to distutils-sig, so the scope of it is the same as that; though I don’t think we have as many people who have adopted this vs distutils-sig. FWIW, I think we’ve definitely had much more discussion here than on the mailing list since we added this category.

I prefer linear discussion formats too but that’s a discussion for another category. :stuck_out_tongue:

I think “engage” might be a better word. The issue seems a bit broader than improving how PyPA communicates because it seems like the communication isn’t even happening in the first place, because of silos, different priorities, etc. For example, @njs mentioned on Twitter that some promising conversations had started at PyCon, but stopped after that. Or if there is communication, it’s just one way from PyPA outwards (through its PEPs, release notes, and deprecation warnings, etc) rather than bidirectional. I agree this would be worth spinning out its own topic: how to engage the wider community and stakeholders and the proper forum for that, etc.

Edit: Actually, there are some places where sustained bidirectional communication is happening with a broad group. The various issue trackers are one example. But I’m not sure to what extent major stakeholders participate there.

1 Like

When we communicate with many Python users, we need to follow hundreds of messages.
But it is difficult to follow hundreds of message on Twitter or ML.

Personally speaking, I like reddit UI. For example, there is a thread about PEP 582.

We can fold subthread when not interested in it.
We can upvote messages including some important thing.
We can downvote messages based on misunderstanding.
And when there are many downvotes, subthread is folded by default.

Another example which having 100+ comments.

The nested comment UI of Reddit, HackerNews, etc. is tailored for threads that the majority of the participants don’t read in whole. Folding hides irrelevant messages (whether decided by the user or the system), and votes prioritise “good” messages so people can choose to read only top ones.

Discourse works more like mailing lists, and is designed more toward users that tend to read every message in threads. In that case, folding is not that useful since the user reads the conversation as it happens (instead of all at once a while after it took place), and ordering by relevance either since everything is going to be read anyway (and ordering by time at least gives you extra context between comments).

Both of the designs are useful for their usages, as evident by their equal popularities, but for the specific goal of this forum, Discourse is the correct choice IMO.

1 Like

Thanks for posting the twitter thread here. I for one, am still not used to the discuss.python.org site and don’t frequent it enough. I suspect Peter is in the same position. Though, I don’t work with Peter directly anymore at Anaconda, and so am not entirely certain.

I do know that we have interacted with many users of Python over several decades and have struggled through many packaging difficulties before developing conda as a general-purpose packaging solution (that we then used to package Python and R).

My main recommendation is that the PyPA should stop trying to make pip into a general package manger which it must ultimately be if it is to support all uses of Python. The PyPA should re-emphasize a more limited scope for pip than what is sometimes implied by users. The limitations of pip seemed to disseminate better in the past. I recall Nick Coghlan telling me that a user of pip is a “self-integrator”. When that term is explained, it is a reasonable description of the roles and responsibilities that someone using pip is taking on.

The problem I see is that most users of pip don’t realize that they are taking on the role of self-integrator and what the real consequences of that role are. They are also not provided guidance as to when they might want to use a more general (not language-specific) package manager to install their packages. In addition, people who build wheels often do things like vendor non-Python data into the package for initial convenience (which often causes later difficulties).

There are a few concrete things the PyPA could do:

  • strongly discourage “vendoring” of other non-Python libraries — with mechanisms for pip to detect when needed libraries are not installed (thus encouraging the installation of those libraries as outside the scope of pip).
  • provide an interface for general purpose packaging managers to integrate with pip meta-data.

There may be others as well. In general, users would benefit from a more limited scope for “pip install” and the subsequent interfaces and solutions and recommendations that would emerge.

The main problem I see is that because the PyPA has such visibility in the world, it’s recommendations define what the majority of people do — often to their own detriment as in the case of Tensorflow telling people to pip install wheels that “vendor” and install very particularly built binary libraries which cause problems with other packages the user then tries to install.

The conda-forge community can provide many more examples of challenges that ultimately cannot be solved unless pip takes on the role of a general-purpose packaging solution rather than a language-specific package manager.

7 Likes

I wonder about that too. I agree that conda is good but language-specific packaging tools are not going away. It is very convenient to pretend that Python is the operating system for a while. You have some function, it depends on some other packages. Then maybe you add a command line interface, or a Unix server process, and before you know it your little library is partly a library and partly a useful application. At this point the workflow breaks down. There is about a 0% chance you will make the leap from just publishing to pypi to also start publishing RPMs for the subset who care about the application bits of your package. The trouble is that it is both an application and a library, without ever having been declared one or the other, so it doesn’t fit.
It would be nice if it was easier to express the library and application natures of your repository and have pip deal with one and conda with the other.

This part to me says we need to find ways to reduce the friction in (a) recognising that you’ve hit this situation, and (b) actually producing the packages that are more easily distributed. And there’s no reason why (b) has to be complicated, it should just require per-distro tools that can convert a set of requirements and entry points into a distro-specific package (see BeeWare’s briefcase project, for example).

1 Like

Since PyPI/wheel, conda, and RPM solve different problems, there are different things you need to know about to be a good citizen in their ecosystems
If all you need is the common thing, getting the bits on users’ disks, then conversion tools work pretty well!
But there’s more to a good RPM package. Requirements are the easy part. Then there are build requirements, which are a bit harder (at least before PEP 517’s “build-backend”). Then there are things like how running tests (requirements & invocation), or license information – for those there’s a place in package metadata, but not a good standard for the contents.

How much easier can this be if we assume someone already has a valid/tested wheel/sdist? Could we say “tested externally” (for those that aren’t being published by Red Hat)? That’s no worse than PyPI, and still better in other ways.

And to be clear, I’m assuming the tools need writing or improving before we can seriously recommend them for this case. Having to restructure a project completely for each target is obviously a non-starter, but most projects seem to be starting with wheels in mind these days.

Not just the tools, the packaging metadata can improve too.
The extensible pyproject.toml is actually quite exciting: if you have a valid project for wheel/sdist, it should get you a RPM that puts the bits on disk, plus there’s a way to add the details missing for other purposes.
Then, if some of those details are useful more widely, they can be blessed by PyPA and distro packagers told to offer them upstream rather than keep them in distro-specific configuration.

Overall I think we agree that the future is bright, just need some engineering to make it happen.
On my list of priorities for Fedora, these tools are right after rebuilding everything for Python 3.8 (to get the beta tested this time), and removing Python 2 :‍)

2 Likes