Packaging Vision and Strategy - Next Steps

One interesting thing I noticed while reading the report. People seem to generally disagree with the statement “Python packaging deals well with edge cases and/or unique project requirements”, but also feels that “Supporting a wider range of use cases” is the least impactful thing for the PSF and PyPA to focus on. I kind of sense there’s some interesting thinking going on but can’t quite identify what exactly.

1 Like

People want their problem solved, but don’t care about others and their problems.

1 Like

My reading of that was that people want us to continue focusing on the majority cases, or at least that handling edge cases isn’t a good use of our time. To me, that fits with the desire for an “official” workflow and associated tool(s) - the edge cases would be outside the “official” workflow, and hence not a priority.

Of course, that doesn’t tie in with the evidence we see (on pip, at least) where people really don’t take kindly to being told their use case is unusual or unsupported. But maybe this implies support for us just saying “no” in those cases? :slightly_smiling_face:

2 Likes

I agree with @pf_moore, that was my reading as well: that we generally cater too much to edge cases, which leads to both the majority cases and edge cases being poorly supported.

1 Like

Have we ever done the exercise of outlining what the “majority cases” are? Would it be anything we have a tutorial or guide for on packaging.python.org (assuming that page is up-to-date :wink:)?

Actually, maybe figuring out what we consider the majority cases to be and then making sure they are covered on packaging.python.org solves both the documentation issue and helps us focus (and communicate to the community) on what we consider core workflows that we will make sure work.

5 Likes

I’ve always been under the impression that the ‘flexibility’ side of the ‘complexity+flexibility’ double-edged sword of Python packaging has been one of the reasons that Python has been able to rise as far as it has, popularity-wise.

You can package darn near anything in Python, even though it may take figuring out a complicated three-step-and-a-hop process to get there… and I suspect that this has been part of what’s enabled Python to grow into its “second best programming language for every task” aphorism.

I understand the urge to move toward simplicity, and I completely agree that examining the packaging ecosystem for sharp edges and pain points that can be resolved is an important endeavor. But, I do have some concern about possibly going too far, and accidentally removing an important and hard-to-specify … “meta-feature”, I suppose … of Python packaging.

2 Likes

Related to my prior comment, I gave this a low rating because it seems to me that Python packaging already supports a huge range of use cases. (It may support them in a way that requires a lot of work from the people trying to do some unconventional packaging, but it supports them.)

1 Like

Thanks. Those are both extremely useful comments that I hadn’t considered (I can’t speak for others, obviously).

I still think it’s useful that we focus on making the “common cases” as straightforward and well supported as possible, but it’s important that in doing so we don’t prohibit the edge cases. To steal a principle from Perl, “Make the easy things easy, and the hard things possible”. I’d argue that this might well be a useful “Vision Statement”[1] for Python packaging.


  1. Sorry for the “business speak” :wink: ↩︎

6 Likes

I think this is implicit in your remarks, but probably worth stating explicitly: I would imagine this slice of respondents think of themselves as using pip, not PyPI. Some may not even know that PyPI is what is backing most or all of their pip installs.

1 Like

This is almost certainly not true for the respondants on this survey at large – over 90% of the survey responses came from users navigating via the PyPI banner.

Which is really kind of odd, when you think about it… how often do you actually go to the PyPI website in normal usage? Now and then you may look for something there, sure. I’m there and saw the banner because I answer a fair number of beginner questions and like to give them a link to look at when something doesn’t work because they tried to install a package just after a new Python dropped and that package doesn’t have wheels yet for that Python version (though I’d gotten notice of the survey elsewhere, so I’m not one of the 90%). But once you’re working normally you have your set of packages in use, and you use pip to update them, or maybe you get them from a Linux distro packaging update, or you rebuild a venv using a requirements file using pip, or you read something that describes an interesting package, by name, so you try it out via pip… etc.

Does this imply the survey may be skewed by who it reached?

Excellent point. Now I’m quite curious what the distribution is of respondents’ interpretations of “using PyPI”.

That’s a great point to keep in mind—a survey is only as accurate about a given population of interest as its sample captures a representative sample of that population. If the overwheamling majority of people saw it via PyPI, the population is effectively limited to “people who browse PyPI”, which may happen to be reprisentative of the average Python user, or may be widely skewed (if, say, only packaging experts, or only non-scientific/ML Python users normally browse it).

Furthermore, there’s another layer, that of self-selection. Presumably, only a small fraction of users who browse PyPI are likely to click the survey, and self selection can be notorious for biasing a sample toward people with certain characteristics (most likely, those with substantial interest and expertise in packaging itself, as opposed to as a means to an end).

Which begs the question—what is the intended population the survey was targeting?

There’s also at least some measure of geographical bias to the respondent population. It’s hard to infer accurate proportions from a map visualization, but certainly the U.S. is the country with the largest absolute respondent count, by far.

How does the distribution of survey respondents correlate with data on geographical distribution of overall Python users?

Do modes of packaging use correlate with geographical location?

I don’t know if it’s worth the time and effort to try to answer these questions, but they’re something to be aware of.

1 Like

Hi @smm, I’ve found something concerning and disappointing in this document. It shows Crimea as a part of r*ssia effectively spreading the terrorist view of the world. This is insulting to me as well as any other Ukrainian who might see it.
I’m sure it’s not intentional, and given that some map providers choose to ignore internally recognized borders it might be easy to overlook such details.
I only hope that you can edit it showing no border between the peninsula and the rest of the country, now that it’s been noticed.

I invite you to check out https://stand-with-ukraine.pp.ua for more pointers to reputable sources.

Stand With Ukraine

2 Likes

FWIW, even at maximum zoom, I can’t tell for sure if there’s an actual line drawn either between Crimea and Ukraine or its just the complex geography of the region combined with the low resolution of the map, given it lacks enough pixels to reliably determine exactly what it intends to show.

This isn’t helped by the Mercator projection it uses (which considered the cartographic equivalent of Comic Sans). It shrinks the size of countries closer to the equator (particularly in the global south), while grossly inflating those toward the poles (including Russia). Beyond just geography, due to this map being a choropleth this substantially distorts the data being displayed, as it skews the size of the shaded area, and thus the perceived significance by the viewer. If it is possible to change the map, a more appropriate equal area projection should be used, such as Equal Earth or Mollwide, to correct this issue.

1 Like

I think the biggest issue with the map is that the scale is linear - there are no countries lying in the two bands between 773 and 1287 responses and only three countries that fall in the two preceding bands. I assume there’s a lot of countries with single-digit responses (my own included), which you are not able to distinguish from countries with hundreds of responses.

A few other oddities with the map:

  • The antimeridian forms a border in Russia’s far east.
  • Greenland is not shaded despite being part of Denmark - I don’t know if that’s because Greenland was a separate option in the questionnaire or if it’s been left unshaded by mistake.
  • The majority of island nations in the Caribbean and the Pacific are not visible on the map - these are typically represented by a ~10k sqm circle.
  • Has nobody from Taiwan completed the survey? I find that surprising.

I don’t know how Mercator distorts the data, unless by (flawed) inference: “How can a country so big have so few respondents?” But Russia’s pretty sizeable, whichever way you slice warp the globe…

1 Like

Also, people may think that “edge cases and/or unique project requirements” don’t allude to their own cases :slight_smile:

2 Likes

It may seem so to you, but it certainly doesn’t seem so to a lot of people that have to package native libraries or CUDA-enabled libraries, for example.

3 Likes

Yes, I considered this before writing the post — a few of other FOSS Ukrainian maintainers pointed this out to me too. There’s indeed bits of the “big land” and the peninsula that are close, even though there’s a body of water in between. But I compared it with other map features as well as how some online maps show the administrative internal borders, and it seems consistent with it. Other places showing land+water division don’t use such bold and dark lines, it seems. Which is why I felt like it’s important to flag this. This might also suggest some data sourcing flaws (a speculation on my side).

I feel like an interactive dataviz on a static website would allow presenting the data more adequately. Also, it could be beneficial to publish the structured stats for better understanding of what’s collected and how it’s clustered.

This would be explained by it being populated much less densely than, for example, the US.