Packaging Vision and Strategy - Next Steps

smm · November 29, 2022, 10:48am

We have received a better than expected response to the Python Packaging Survey. The responses and key takeaways from the survey are summarized in this document.

Once the community has reviewed the feedback, I believe there is value in discussing the feedback and deciding how we want to proceed with it. Depending on the level of engagement, these discussions can lead to setting the long-term strategy for Python Packaging.

These discussions can be conducted via a virtual meeting or on Discourse.

Please indicate if you are interested in participating in these discussions and which forum you prefer. These discussions are open to maintainers/contributors of any PyPA and non-PyPA tool. While these discussions will be open only to Packaging maintainers and contributors, there is no way to enforce it on Discourse.

Once we have a clear indication of the preferred forum, we will proceed with deciding the agenda.

I prefer to participate in the strategy discussions over

One 4 hour virtual meeting
Two to three 75-90 minutes virtual meetings
Discourse posts
Do not want to participate

0 voters

Please feel free to share your thoughts and questions below.

You can download the individual responses to the survey here. All open ended responses have been removed to anonymise the responses.

sumanah · November 29, 2022, 4:05pm

@smm Thanks for the post! And thanks especially for including the “n=8774” on page 4 and the fact that overwhelmingly people found this survey through the banner on PyPI.

It’s so revealing that only about half of the respondents perceived themselves as using PyPI, even though probably a much greater proportion do use it. If the user very infrequently types the thing’s name or thinks about it, they don’t think of themselves as using it!

I would like a little more understanding of the key takeaway “There is a strong case for … Improving user documentation and making it more beginner friendly.” Are you deriving that from the strong showing for “Helping users to better understand Python packaging” in the “What should the PSF and PyPA focus on” question? I am not sure that improving the user documentation that we directly control and making it more beginner-friendly is the most strategic way to help users better understand Python packaging; we might instead want to invest in further user experience research and design work to make our tools easier to understand and use, and outreach to the many obsolete blog posts/Stack Overflow answers/Reddit threads/etc. that people find when they search for information. If we do invest in our documentation, I think we need to think broadly about what “documentation” means, including making web-based sandboxes for people to try tools in simulation, making videos, and of course in-application error messages.

I request that the key takeaways/summary document also include the dates that the survey was open for filling out, the name of the people who performed the analysis, and the date the summary was written, since I imagine we’ll be referring to this document again in years to come. And of course I hope it will be archived somewhere on wiki.python.org or www.python.org rather than only living in Google Docs.

Thanks again!

uranusjr · November 29, 2022, 9:29pm

One interesting thing I noticed while reading the report. People seem to generally disagree with the statement “Python packaging deals well with edge cases and/or unique project requirements”, but also feels that “Supporting a wider range of use cases” is the least impactful thing for the PSF and PyPA to focus on. I kind of sense there’s some interesting thinking going on but can’t quite identify what exactly.

brettcannon · November 29, 2022, 11:05pm

People want their problem solved, but don’t care about others and their problems.

pf_moore · November 29, 2022, 11:09pm

My reading of that was that people want us to continue focusing on the majority cases, or at least that handling edge cases isn’t a good use of our time. To me, that fits with the desire for an “official” workflow and associated tool(s) - the edge cases would be outside the “official” workflow, and hence not a priority.

Of course, that doesn’t tie in with the evidence we see (on pip, at least) where people really don’t take kindly to being told their use case is unusual or unsupported. But maybe this implies support for us just saying “no” in those cases?

dustin · November 29, 2022, 11:27pm

I agree with @pf_moore, that was my reading as well: that we generally cater too much to edge cases, which leads to both the majority cases and edge cases being poorly supported.

brettcannon · November 29, 2022, 11:54pm

Have we ever done the exercise of outlining what the “majority cases” are? Would it be anything we have a tutorial or guide for on packaging.python.org (assuming that page is up-to-date )?

Actually, maybe figuring out what we consider the majority cases to be and then making sure they are covered on packaging.python.org solves both the documentation issue and helps us focus (and communicate to the community) on what we consider core workflows that we will make sure work.

btskinn · November 30, 2022, 3:36pm

I’ve always been under the impression that the ‘flexibility’ side of the ‘complexity+flexibility’ double-edged sword of Python packaging has been one of the reasons that Python has been able to rise as far as it has, popularity-wise.

You can package darn near anything in Python, even though it may take figuring out a complicated three-step-and-a-hop process to get there… and I suspect that this has been part of what’s enabled Python to grow into its “second best programming language for every task” aphorism.

I understand the urge to move toward simplicity, and I completely agree that examining the packaging ecosystem for sharp edges and pain points that can be resolved is an important endeavor. But, I do have some concern about possibly going too far, and accidentally removing an important and hard-to-specify … “meta-feature”, I suppose … of Python packaging.

btskinn · November 30, 2022, 3:39pm

Related to my prior comment, I gave this a low rating because it seems to me that Python packaging already supports a huge range of use cases. (It may support them in a way that requires a lot of work from the people trying to do some unconventional packaging, but it supports them.)

pf_moore · November 30, 2022, 4:42pm

Thanks. Those are both extremely useful comments that I hadn’t considered (I can’t speak for others, obviously).

I still think it’s useful that we focus on making the “common cases” as straightforward and well supported as possible, but it’s important that in doing so we don’t prohibit the edge cases. To steal a principle from Perl, “Make the easy things easy, and the hard things possible”. I’d argue that this might well be a useful “Vision Statement”^[1] for Python packaging.

Sorry for the “business speak” ↩︎

btskinn · November 30, 2022, 11:34pm

I think this is implicit in your remarks, but probably worth stating explicitly: I would imagine this slice of respondents think of themselves as using pip, not PyPI. Some may not even know that PyPI is what is backing most or all of their pip installs.

pradyunsg · December 1, 2022, 12:09am

This is almost certainly not true for the respondants on this survey at large – over 90% of the survey responses came from users navigating via the PyPI banner.

mwichmann · December 1, 2022, 12:36am

Which is really kind of odd, when you think about it… how often do you actually go to the PyPI website in normal usage? Now and then you may look for something there, sure. I’m there and saw the banner because I answer a fair number of beginner questions and like to give them a link to look at when something doesn’t work because they tried to install a package just after a new Python dropped and that package doesn’t have wheels yet for that Python version (though I’d gotten notice of the survey elsewhere, so I’m not one of the 90%). But once you’re working normally you have your set of packages in use, and you use pip to update them, or maybe you get them from a Linux distro packaging update, or you rebuild a venv using a requirements file using pip, or you read something that describes an interesting package, by name, so you try it out via pip… etc.

Does this imply the survey may be skewed by who it reached?

btskinn · December 1, 2022, 12:53am

Excellent point. Now I’m quite curious what the distribution is of respondents’ interpretations of “using PyPI”.

CAM-Gerlach · December 1, 2022, 3:27am

That’s a great point to keep in mind—a survey is only as accurate about a given population of interest as its sample captures a representative sample of that population. If the overwheamling majority of people saw it via PyPI, the population is effectively limited to “people who browse PyPI”, which may happen to be reprisentative of the average Python user, or may be widely skewed (if, say, only packaging experts, or only non-scientific/ML Python users normally browse it).

Furthermore, there’s another layer, that of self-selection. Presumably, only a small fraction of users who browse PyPI are likely to click the survey, and self selection can be notorious for biasing a sample toward people with certain characteristics (most likely, those with substantial interest and expertise in packaging itself, as opposed to as a means to an end).

Which begs the question—what is the intended population the survey was targeting?

btskinn · December 1, 2022, 3:45am

There’s also at least some measure of geographical bias to the respondent population. It’s hard to infer accurate proportions from a map visualization, but certainly the U.S. is the country with the largest absolute respondent count, by far.

How does the distribution of survey respondents correlate with data on geographical distribution of overall Python users?

Do modes of packaging use correlate with geographical location?

I don’t know if it’s worth the time and effort to try to answer these questions, but they’re something to be aware of.

webknjaz · December 3, 2022, 11:55pm

Hi @smm, I’ve found something concerning and disappointing in this document. It shows Crimea as a part of r*ssia effectively spreading the terrorist view of the world. This is insulting to me as well as any other Ukrainian who might see it.
I’m sure it’s not intentional, and given that some map providers choose to ignore internally recognized borders it might be easy to overlook such details.
I only hope that you can edit it showing no border between the peninsula and the rest of the country, now that it’s been noticed.

I invite you to check out https://stand-with-ukraine.pp.ua for more pointers to reputable sources.

CAM-Gerlach · December 4, 2022, 5:51am

FWIW, even at maximum zoom, I can’t tell for sure if there’s an actual line drawn either between Crimea and Ukraine or its just the complex geography of the region combined with the low resolution of the map, given it lacks enough pixels to reliably determine exactly what it intends to show.

This isn’t helped by the Mercator projection it uses (which considered the cartographic equivalent of Comic Sans). It shrinks the size of countries closer to the equator (particularly in the global south), while grossly inflating those toward the poles (including Russia). Beyond just geography, due to this map being a choropleth this substantially distorts the data being displayed, as it skews the size of the shaded area, and thus the perceived significance by the viewer. If it is possible to change the map, a more appropriate equal area projection should be used, such as Equal Earth or Mollwide, to correct this issue.

layday · December 4, 2022, 10:00am

I think the biggest issue with the map is that the scale is linear - there are no countries lying in the two bands between 773 and 1287 responses and only three countries that fall in the two preceding bands. I assume there’s a lot of countries with single-digit responses (my own included), which you are not able to distinguish from countries with hundreds of responses.

A few other oddities with the map:

The antimeridian forms a border in Russia’s far east.
Greenland is not shaded despite being part of Denmark - I don’t know if that’s because Greenland was a separate option in the questionnaire or if it’s been left unshaded by mistake.
The majority of island nations in the Caribbean and the Pacific are not visible on the map - these are typically represented by a ~10k sqm circle.
Has nobody from Taiwan completed the survey? I find that surprising.

I don’t know how Mercator distorts the data, unless by (flawed) inference: “How can a country so big have so few respondents?” But Russia’s pretty sizeable, whichever way you ~~slice~~ warp the globe…

pitrou · December 5, 2022, 8:18am

Also, people may think that “edge cases and/or unique project requirements” don’t allude to their own cases