We have received a better than expected response to the Python Packaging Survey. The responses and key takeaways from the survey are summarized in this document.
Once the community has reviewed the feedback, I believe there is value in discussing the feedback and deciding how we want to proceed with it. Depending on the level of engagement, these discussions can lead to setting the long-term strategy for Python Packaging.
These discussions can be conducted via a virtual meeting or on Discourse.
Please indicate if you are interested in participating in these discussions and which forum you prefer. These discussions are open to maintainers/contributors of any PyPA and non-PyPA tool. While these discussions will be open only to Packaging maintainers and contributors, there is no way to enforce it on Discourse.
Once we have a clear indication of the preferred forum, we will proceed with deciding the agenda.
I prefer to participate in the strategy discussions over
One 4 hour virtual meeting
Two to three 75-90 minutes virtual meetings
Discourse posts
Do not want to participate
0voters
Please feel free to share your thoughts and questions below.
You can download the individual responses to the survey here. All open ended responses have been removed to anonymise the responses.
@smm Thanks for the post! And thanks especially for including the ân=8774â on page 4 and the fact that overwhelmingly people found this survey through the banner on PyPI.
Itâs so revealing that only about half of the respondents perceived themselves as using PyPI, even though probably a much greater proportion do use it. If the user very infrequently types the thingâs name or thinks about it, they donât think of themselves as using it!
I would like a little more understanding of the key takeaway âThere is a strong case for ⌠Improving user documentation and making it more beginner friendly.â Are you deriving that from the strong showing for âHelping users to better understand Python packagingâ in the âWhat should the PSF and PyPA focus onâ question? I am not sure that improving the user documentation that we directly control and making it more beginner-friendly is the most strategic way to help users better understand Python packaging; we might instead want to invest in further user experience research and design work to make our tools easier to understand and use, and outreach to the many obsolete blog posts/Stack Overflow answers/Reddit threads/etc. that people find when they search for information. If we do invest in our documentation, I think we need to think broadly about what âdocumentationâ means, including making web-based sandboxes for people to try tools in simulation, making videos, and of course in-application error messages.
I request that the key takeaways/summary document also include the dates that the survey was open for filling out, the name of the people who performed the analysis, and the date the summary was written, since I imagine weâll be referring to this document again in years to come. And of course I hope it will be archived somewhere on wiki.python.org or www.python.org rather than only living in Google Docs.
One interesting thing I noticed while reading the report. People seem to generally disagree with the statement âPython packaging deals well with edge cases and/or unique project requirementsâ, but also feels that âSupporting a wider range of use casesâ is the least impactful thing for the PSF and PyPA to focus on. I kind of sense thereâs some interesting thinking going on but canât quite identify what exactly.
My reading of that was that people want us to continue focusing on the majority cases, or at least that handling edge cases isnât a good use of our time. To me, that fits with the desire for an âofficialâ workflow and associated tool(s) - the edge cases would be outside the âofficialâ workflow, and hence not a priority.
Of course, that doesnât tie in with the evidence we see (on pip, at least) where people really donât take kindly to being told their use case is unusual or unsupported. But maybe this implies support for us just saying ânoâ in those cases?
I agree with @pf_moore, that was my reading as well: that we generally cater too much to edge cases, which leads to both the majority cases and edge cases being poorly supported.
Have we ever done the exercise of outlining what the âmajority casesâ are? Would it be anything we have a tutorial or guide for on packaging.python.org (assuming that page is up-to-date )?
Actually, maybe figuring out what we consider the majority cases to be and then making sure they are covered on packaging.python.org solves both the documentation issue and helps us focus (and communicate to the community) on what we consider core workflows that we will make sure work.
Iâve always been under the impression that the âflexibilityâ side of the âcomplexity+flexibilityâ double-edged sword of Python packaging has been one of the reasons that Python has been able to rise as far as it has, popularity-wise.
You can package darn near anything in Python, even though it may take figuring out a complicated three-step-and-a-hop process to get there⌠and I suspect that this has been part of whatâs enabled Python to grow into its âsecond best programming language for every taskâ aphorism.
I understand the urge to move toward simplicity, and I completely agree that examining the packaging ecosystem for sharp edges and pain points that can be resolved is an important endeavor. But, I do have some concern about possibly going too far, and accidentally removing an important and hard-to-specify ⌠âmeta-featureâ, I suppose ⌠of Python packaging.
Related to my prior comment, I gave this a low rating because it seems to me that Python packaging already supports a huge range of use cases. (It may support them in a way that requires a lot of work from the people trying to do some unconventional packaging, but it supports them.)
Thanks. Those are both extremely useful comments that I hadnât considered (I canât speak for others, obviously).
I still think itâs useful that we focus on making the âcommon casesâ as straightforward and well supported as possible, but itâs important that in doing so we donât prohibit the edge cases. To steal a principle from Perl, âMake the easy things easy, and the hard things possibleâ. Iâd argue that this might well be a useful âVision Statementâ[1] for Python packaging.
I think this is implicit in your remarks, but probably worth stating explicitly: I would imagine this slice of respondents think of themselves as using pip, not PyPI. Some may not even know that PyPI is what is backing most or all of their pip installs.
This is almost certainly not true for the respondants on this survey at large â over 90% of the survey responses came from users navigating via the PyPI banner.
Which is really kind of odd, when you think about it⌠how often do you actually go to the PyPI website in normal usage? Now and then you may look for something there, sure. Iâm there and saw the banner because I answer a fair number of beginner questions and like to give them a link to look at when something doesnât work because they tried to install a package just after a new Python dropped and that package doesnât have wheels yet for that Python version (though Iâd gotten notice of the survey elsewhere, so Iâm not one of the 90%). But once youâre working normally you have your set of packages in use, and you use pip to update them, or maybe you get them from a Linux distro packaging update, or you rebuild a venv using a requirements file using pip, or you read something that describes an interesting package, by name, so you try it out via pip⌠etc.
Does this imply the survey may be skewed by who it reached?
Thatâs a great point to keep in mindâa survey is only as accurate about a given population of interest as its sample captures a representative sample of that population. If the overwheamling majority of people saw it via PyPI, the population is effectively limited to âpeople who browse PyPIâ, which may happen to be reprisentative of the average Python user, or may be widely skewed (if, say, only packaging experts, or only non-scientific/ML Python users normally browse it).
Furthermore, thereâs another layer, that of self-selection. Presumably, only a small fraction of users who browse PyPI are likely to click the survey, and self selection can be notorious for biasing a sample toward people with certain characteristics (most likely, those with substantial interest and expertise in packaging itself, as opposed to as a means to an end).
Which begs the questionâwhat is the intended population the survey was targeting?
Thereâs also at least some measure of geographical bias to the respondent population. Itâs hard to infer accurate proportions from a map visualization, but certainly the U.S. is the country with the largest absolute respondent count, by far.
How does the distribution of survey respondents correlate with data on geographical distribution of overall Python users?
Do modes of packaging use correlate with geographical location?
I donât know if itâs worth the time and effort to try to answer these questions, but theyâre something to be aware of.
Hi @smm, Iâve found something concerning and disappointing in this document. It shows Crimea as a part of r*ssia effectively spreading the terrorist view of the world. This is insulting to me as well as any other Ukrainian who might see it.
Iâm sure itâs not intentional, and given that some map providers choose to ignore internally recognized borders it might be easy to overlook such details.
I only hope that you can edit it showing no border between the peninsula and the rest of the country, now that itâs been noticed.
FWIW, even at maximum zoom, I canât tell for sure if thereâs an actual line drawn either between Crimea and Ukraine or its just the complex geography of the region combined with the low resolution of the map, given it lacks enough pixels to reliably determine exactly what it intends to show.
This isnât helped by the Mercator projection it uses (which considered the cartographic equivalent of Comic Sans). It shrinks the size of countries closer to the equator (particularly in the global south), while grossly inflating those toward the poles (including Russia). Beyond just geography, due to this map being a choropleth this substantially distorts the data being displayed, as it skews the size of the shaded area, and thus the perceived significance by the viewer. If it is possible to change the map, a more appropriate equal area projection should be used, such as Equal Earth or Mollwide, to correct this issue.
I think the biggest issue with the map is that the scale is linear - there are no countries lying in the two bands between 773 and 1287 responses and only three countries that fall in the two preceding bands. I assume thereâs a lot of countries with single-digit responses (my own included), which you are not able to distinguish from countries with hundreds of responses.
A few other oddities with the map:
The antimeridian forms a border in Russiaâs far east.
Greenland is not shaded despite being part of Denmark - I donât know if thatâs because Greenland was a separate option in the questionnaire or if itâs been left unshaded by mistake.
The majority of island nations in the Caribbean and the Pacific are not visible on the map - these are typically represented by a ~10k sqm circle.
Has nobody from Taiwan completed the survey? I find that surprising.
I donât know how Mercator distorts the data, unless by (flawed) inference: âHow can a country so big have so few respondents?â But Russiaâs pretty sizeable, whichever way you slice warp the globeâŚ