Wanting a singular packaging tool/vision

Historical tangent: I was checking if I had ever written such a thing (spoiler: I kind of did) and noticed that we’re about to hit the 10 year anniversary of my writing Incremental Plans to Improve Python Packaging — Nick Coghlan's Python Notes 1.0 documentation

We’ve come a long way from that initial incremental goal of “‘./setup.py install’ must die” :slight_smile:

The “kind of” use case enumerations that I put together were:

Folks won’t find new answers in any of those, but the framing may still be useful (especially for folks just starting to explore the problem space)

(Edited to add the 3rd article reference - I initially forgot about the use cases mentioned at the end of each section in the “27 languages…” article)

2 Likes

Thanks Nick, those are great posts!

By coincidence, on a minor “ideas” thread, I just made a comment very similar to this quote of yours:

“”"
By the very nature of things, the folks that tend to be most actively involved in open source development are those folks working on open source applications and shared abstraction layers.

The folks writing ad hoc scripts or designing educational exercises for their students often won’t even think of themselves as software developers - they’re teachers, system administrators, data analysts, quants, epidemiologists, physicists, biologists, business analysts, market researchers, animators, graphical designers, etc.
“”"
[mine was far less articulate]

We do need to keep that in mind – a huge portion of Python users do not have the same needs and experiences as most of the folks involved in these sorts of discussions.

Well, as one of those that have been around since before PyPi, or pip, or setuptools, or wheel, or conda, or …

Package management is pretty darn easy now :slight_smile:

Go Team!

2 Likes

My suggestion then is to go start a new topic where you outline what is on/off-topic for planning out your utopian future and go have that discussion. This topic has never said, “don’t consider the steps required to reach the outcome because we don’t want to limit our thinking,” so if that’s what you want then you should state that upfront in the new topic to keep us “old hands” out of it and from saying “no”. :wink:

And to be clear, I totally support those of you up for having that greenfield discussion to go for it! But also understand that you may reach conclusions that those of us who having been doing this for a while will find issues with that may make implementing those ideas difficult or impossible. It’s just one of those things where either you get to dream big and potentially have those dreams shot down, or you get input upfront at the potential cost of ideas being shut down prematurely.

Unfortunately, get used to it. Even back when things were as simple as pure Python and just zip files you downloaded from a website, people complained; folks are generally not good at giving compliments are understanding historical context to understand why things are the way they are. Or we can all just give up on the “glue” part and only ship pure Python code. :wink:

2 Likes

In this vein, I think it’s worth mentioning some tools and projects that started as independent things long before they graduated to the central positions they occupy today:

  • the modern incarnation of PyPI started as Donald Stufft’s independent personal project. That illustration of what was possible eventually spawned the Warehouse project and the multi-year migration process to adopt it as the production PyPI implementation (as well as Donald taking over from Richard Jones as the lead PyPI architect)
  • the wheel format was originally created to solve some specific problems that Daniel Holth had in the packaging space. Fortunately for the rest of us, Daniel was willing and able to invest the time and energy needed to standardise the format and provide one of the most critical building blocks of the modern Python packaging ecosystem
  • pip existed as an independent project long before the creation of “ensurepip” blessed it as the default Python package installation tool shipped with CPython
  • the Python Packaging Authority’s name was originally intended purely as a joke. It wasn’t until much later that it became a broader group with genuine practical authority through the collective influence of its individual contributors
  • conda came about due to the fact that participants in the scientific computing ecosystem needed solutions to hard platform dependency problems due to their heavy reliance on external C/C++/FORTAN libraries that in turn have significant dependencies on specific CPU and GPU capabilities, and were able to make particular simplifying assumptions that in turn made the problem space a little more tractable

None of this was fast, none of it was easy, and there was definitely a lot of political community wrangling involved alongside the technical aspects. The point of mentioning it is to remind folks that even the “status quo” didn’t start out as a dictated default: it’s an emergent outcome from different individuals and groups solving their specific software distribution problems in a collaborative environment.

9 Likes

I can certainly do that, sure. But although this thread never said “don’t consider the steps needed to reach the goal”, it also never said don’t do that, and, at least to me, looking at the first post in this thread and the post it linked to as an example, it sure seems like it was more targeted at “let’s imagine what we want the workflow to be” rather than “let’s think about what incremental steps we can take right now”.

Personally I’m not even super averse to getting the ideas shut down if that comes in form of concrete feedback (e.g., “this cannot be done because of this reason”). Of course, as @pf_moore noted, the people who are able to provide that may simply not have the time or inclination to do so, and that’s their prerogative.

Anyway, just to take a stab at something that’s been on my mind with recent posts: I agree with what @PythonCHB said a few days ago, that the main obstacles to people putting things on conda-forge are social and not technical/administrative. Personally I’d go further and suggest that a lot of the obstacles that exist for Python packaging are just due to the difficulty of navigating the landscape of tools. What would people think about no change in tooling, and simply having more text on python.org that acknowledges that other distribution/install channels besides the pip+pypi combo already exist and are widely used? And perhaps suggests that people try them out, or offers some kind of summary of the options?

My hunch is that people will say that’s “out of scope” somehow. :slight_smile: But we already have multiple build tools listed in the pypacking docs with no explanation of how or why someone might choose between them. Why not expand the perspective that the official docs give?

I am used to it, that’s why I use conda, and why whenever anyone expresses their frustration with this whole situation I suggest they use conda as well.

1 Like

61 posts were split to a new topic: Pip/conda compatibility

(Getting into the conversation a bit late here, been traveling last week)

But as it’s commercial, the question is going to be “would it pay for itself”? I don’t think open source volunteers can answer that. Ultimately, someone interested in creating such an offering would need to go to companies that use Python and say “would you be willing to pay us for something like this?”

This is literally one of the major products that Anaconda sells. And we’re not the only ones. There’s Redhat and Activestate, and even JFrog and Sonatype do this to some extent.

What I will say is that the evidence I’ve seen is that very few companies are willing to pay when they can get something for free. Sorry if that is a rather cynical view, but it’s my experience

My experience over the last 20 years is that companies are absolutely willing to pay for open source. RedHat sold to IBM for $30B+, on the basis of selling RHEL to everybody on the planet. You just have to actually do all the standard functions of actual product development, packaging, marketing, selling, and support.

The thing to realize is that the software is not the bulk of the value. And that’s why simply being able to “get the software for free” is not nearly as big of a deal as coders might think. Commercial customers and users want a LOT more than just a tarball.

I am happy to share my experiences and perspectives on this. I’ve (somewhat unwillingly) been involved in the Python packaging world and OSS commercialization since 2004 - nearly two decades now.
The fact that Anaconda exists (and that Enthought and ActiveState also exist, and that RedHat exists…) are all data points that companies ABSOLUTELY pay for open source.


This leads to a deeper point, which is this: I believe that it’s entirely possible that the “volunteer-driven Python development” mode is fundamentally insufficient to cross the “adaptive valley” which I believe is now facing the Python ecosystem - and that has faced us for at least 4-5 years now. We need to make a revolutionary shift in the Python “packaging” problem space, and not merely an evolutionary one.

I am deeply moved by a combination of emotions when I read Paul’s comment:

The problem is one of resources, though, but maybe not in the way that it seems at first. The people involved in packaging are typically overwhelmed with “stuff that people want us to do”.
But I can’t respond to everything, so I have to prioritise. In doing so, I’m certain I come across as “shooting down” ideas. I hope I don’t say precisely “we don’t have the resources to do X”, but I definitely say “that’s hard, by all means look into it and come back if you can get it to work”.

I know that the volunteer community is absolutely beyond capacity. I have all the love and respect for everyone involved in this stuff, knowing full well how thankless it is. I know people have burned out on this stuff. :frowning:

Which is why I am forced to step back and ask: do we have the meta-capacity to tackle this problem? Does the PyPA believe that the scope of its remit is something that can actually handled by a volunteer community? Or has the explosive growth of Python’s popularity over the last decade possibly created demand that outstrips what can be supported on a volunteer basis?

10 Likes

Precisely. Those are the things companies can’t get for free. So they are willing to pay for them. But not necessarily in a way that supports the underpinning of all that work, namely the development of the actual software.

Again, agreed. But the “bulk” is meaningless without the software, so this is simply a variation on being willing to pay for the nice things that you want, but not invest in the maintenance that keeps everything going.

The following is 100% a personal viewpoint. It’s not a “PyPA view”, or a “PEP-delegate pronouncement”, or related to any of the other hats I may wear. But I personally believe that the PyPA has the volunteer resources to deliver a great packaging experience to users working in the same, or similar, contexts. So, casual developers, people using Python as “glue”, people who are using Python because they want to, not because they are paid to do so. I believe people using Python “for their job” can gain a lot of benefit from that scope, but may not get everything they need (for example, no SLAs, and they might have to change workflows in ways that will need them to persuade their employers to allow).

I do not believe we have the volunteer resource to serve commercial needs. I’ve never believed that, but nowadays I think we need to make that clearer, because too much volunteer time is spent trying to find compromises that help commercial users - or simply saying “no”.

Specifically,

  • We can’t provide things like SLAs, long term support versions, training, or support for untrained users.
  • We can’t support infrastructure that isn’t commonly available to volunteer developers (private package indexes, secure proxy setups, for example).
  • We can’t support all possible workflows, and in particular, workflows peculiar to individual organisations’ needs.
  • We can’t support compliance with regulatory demands.

Again, on a purely personal basis, I don’t want to support commercial users. My love for open source stems from the fact that I enjoy helping individuals. If someone came along with huge enthusiasm for Software Bill of Materials, singing the praises of what amazing things we could do with it if only pip had feature X related to SBOM, I’d be swept along. But having someone say “I need to fill in my govenment paperwork and pip doesn’t let me get a SBOM because it’s missing feature X”, I’d probably tell them no - because no one cares, they just need something to get their job done, and they want me to do it for them.

“How do we support commercial users of Python packaging?” is an absolutely legitimate question to ask. And it’s one the PyPA should be looking at. But I think it’s a perfectly reasonable answer to say that such work must be handled as a funded project (and we can look at how to organise such work in the context of open source, volunteer projects, including questions like how to ensure new features are sustainable without being a drain on volunteer effort). In other words, companies should be willing to pay for things they can’t get for free :slightly_smiling_face:

Maybe we can’t have a “singular packaging tool” that delivers everything that both commercial users and individuals need, while still being sustainable with volunteer resource supplemented by funded projects. I’m OK with that - if someone wants to develop a “commercial Python packaging offering” then that’s fine by me. That’s what conda (more accurately, Anaconda) is, after all. And such a commercial offering can choose how closely they want to work with the PyPA to support interoperability, based on the needs of its customers. All of this is fine. It might mean that the PyPA should be doing more to tell commercial users “go and use Anaconda if you don’t feel that pip/PyPI and so on deliver what you want for free”. Maybe if we did that, we’d find that my cynical view of what companies are willing to pay for is more accurate than you believe :wink: Or maybe you’d prove me wrong :slightly_smiling_face:

10 Likes

Luckily, many of them do pay for it in a way that flows almost directly into NumFOCUS :wink: Many of us pay into NumFOCUS directly and indirectly, as well as the PSF, which means there are grants available for projects that need support to do the development.

I think what Peter’s getting at here is that it’s much easier to get a company to pay for things that they recognise as costing money/providing value (because generally the people with the money have no idea what it’s being used for).

Scarcity is one of the easiest ways (“you can’t have this at all unless you pay” pairs nicely with “our engineers say they need this”), but as soon as the software is no longer scarce, the “bulk of the value”[1] goes away.

The next easiest thing to sell is access. I would bet there are companies who sponsor NumFOCUS solely because it gets them into the NumFOCUS Summit or a private session with project maintainers. It’s not necessarily about gaining influence, but it’s a valuable thing that the cheque-signers recognise.

Wonder why there’s so much money for security issues right now? Because the people with money have been convinced of the risk and believe that funding work now is better than paying the cost later.

But with packaging generally, people can and do solve it (or make do) by themselves. If nothing else, the wild proliferation of build_ext subclasses proves this. So it’s much harder to frame it as an existential crisis that can be solved by writing a cheque. (Yes, I’ve had this argument with people at work many times over the years, and I assume other organisations have less sympathetic leadership than I’ve got.)

Acknowledged, though it would be great to find a “PyPA consensus” on some of the suggestions you have here. Particularly from the core dev POV, since we all look to PyPA consensus to guide our recommendations (e.g. if PyPA were to draw a line between commercial and non-commercial users, we’d probably have to strongly consider whether to include pip by default in the distributions we control, given the differences in audience - possibly even consider having separate distributions of our own for each audience!).


  1. To the people who sign cheques/checks. ↩︎

1 Like

So, please don’t take this as confrontational, but what is your perspective on the intersection of what you say here and what you said in another thread (what I put in brackets is summarizing the context):

That’s a case where the issue is two and a half years old, and there’s a pull request that’s more than a year old, as you say the decision has already been made, but it’s apparently stuck with no one who has the time to review it. And as far as I can tell it’s not anything corporate-driven or anything like that.

Do you think that’s an okay situation for Python packaging to be in going forward? Or do you see some way for PyPI to avoid such situations going forward while still operating on an all-volunteer basis?

Obviously this is only a single case, but to me that kind of thing is worrisome, especially seeing the minimal-response brush-off from the PSF about why the work can’t be funded even when someone is willing to do the legwork to raise the money themselves.

I don’t think the current situation is ideal, no. But I don’t think it’s so bad that we have to abandon the open source / volunteer model in the way that was being suggested in the post I responded to in my first quote.

Funded resource would help, but funded resource working in a volunteer project is very different from a commercial project.

CPython core is a much bigger project, and they work just fine on a volunteer basis (with some contributors funded).

2 Likes

The thing is, the full level of the demand isn’t being supported on a volunteer basis, and essentially never has been. I personally started using Python professionally on the ActiveState distribution back in 2002 or so, and I believe they were founding sponsors back when the PSF first formed. Python has been shipping with commercial Linux distros since before even RHEL existed (it was still RHL back then). And Peter knows the somewhat turbulent history of the commercialisation of the Scientific Python stack better than I do, so I won’t even try to recap that :slight_smile:

Despite the impression we might sometimes get from the plaintive complaints of eager students running into brick walls as they try to pursue do-it-yourself environment setup across various Windows, Mac OS X, and even Linux environments, the practical fact is that the vast majority of current and prospective Python users are using (or going to use) a Python environment provided and maintained by someone else that is being paid to provide and maintain that environment. Maybe they’re in a Windows shop and using ActiveState or Anaconda. Maybe they’re using a commercial Linux distro. Maybe they’re using an academic high performance computing environment. Maybe they’re using a cloud-hosted development workspace instead of doing things locally. Maybe they’re using a bespoke solution put together by their corporate IT department or their academic institution. Regardless of the details, they’re not on their own, and they have local folks to ask for help before they have to reach out to the wider internet, just as most folks running (or otherwise using) Linux aren’t hitting up kernel.org as their first resource when they run into problems.

Do we want the online docs published on the various python.org subdomains to set people up as best we can to be successful with a completely free (both gratis and libre) set of tools? Absolutely. But we don’t need those docs to be all things to all people.

What they do need to cover is at least 4 main audiences:

  • technical info for the upstream contributors actively working on the various collectively maintained tools and libraries, as well as any of their own individual projects
  • technical info for the redistributors trying to get different parts of the ecosystem to play nicely together on behalf of their users (hence my recently suggested tweaks to the way we cross-reference some of the specifications from the Python Packaging User Guide)
  • educational info for folks using the community documentation as inspiration to produce their own more tailored introductory sessions (whether that’s other community maintained tutorials like Django Girls or Software Carpentry, or more commercial training programs)
  • introductory info for folks that genuinely want to pursue their own “do it yourself” journey on their own machine, rather than using one of the many pre-integrated options that are out there

And honestly? I put those audiences in that order not because that’s the overall relative significance of that audience in general. I put them in that order because that’s the relative degree to which PyPA is the only community that can address the needs of that audience.

Folks that just want to use Python can find hundreds of books and websites looking to help them out (and some of them don’t even charge for the privilege). Even folks that want to write updated guides for new versions of tools can usually get a long way with the documentation of specific tools covering already released changes, rather than looking to the overall meta-documentation for the ecosystem as a whole or trying to understand where the ecosystem is headed (more forward looking educators will want to know the latter though, hence this audience making my list).

The folks who can’t get the info and guidance they need anywhere else, though? It’s the folks actively working on tools and platforms that they’re trying to get to play nice together, even though almost all the pieces are being developed independently. When PyPUG fails those audiences, PyPA is the only group that can improve the situation.

For the rest? The Python ecosystem as a whole does not live or die based on the state of its upstream community documentation. Folks definitely get value out of it, and being able to write comprehensible tutorials is a key validation of the current state of the toolset’s overall user experience, but building a personal software development environment from a box of interoperable parts is always going to be harder than obtaining a pre-defined environment from someone else (the pay-off being that the environment ends up working exactly the way you want it to work, rather than being a take-it-or-leave-it set of design decisions that may or may not be to your taste).

And at the technical level, I see three main drivers for ongoing improvements:

  • developers of individual tools and libraries looking to develop systemic solutions to problems that end users may not even know they have (the recent PEP aimed at the dependency confusion problem comes to mind, but there are a lot of PEPs and other past improvements that fall into this category)
  • folks getting frustrated with UX problems that have their roots in specific underlying technical limitations, and setting out to resolve the latter, so the former problems at least become theoretically resolvable (even if the technical inteoperability fix doesn’t solve the UX problem on its own)
  • redistributors looking to collaborate more effectively on the undifferentiated heavy lifting parts of their jobs (like repackaging software for a different build system) so they get to spend less time on that in the future and instead work on other things that are more directly related to the needs of their particular customers

While the folks working on those improvements may formally be volunteers in terms of their interactions with PyPA, that doesn’t necessarily mean they’re pursuing them for free on their own time - there have definitely been non-trivial investments of commercial time in the past decade of Python packaging ecosystem improvements, and that’s unlikely to change in the future.

4 Likes

Interesting, do you mind sharing the statistics for that? I have been wondering about the distribution of user environments for a while, and pypistats etc. are too heavily skewed by automated deployments.

For example, coming from the scientific sector, I have never seen an official Python distribution in my career, nor did I know that e.g. py exists. So I am curious what the median Python user has on their machine, particular now with ML/AI users such major influence.

1 Like

I’m curious as well – I work for the US federal government, and my small group of developers gets zero/nada/zilch by way of support from our IT folks for Python development – we’re lucky they let us do it at all, and their policies do get very much in the way sometimes [*]

You haven’t seen Anaconda? or EPD before that?

A note on that from the government side: at the federal government, we are not allowed to make donations. We can give grants, but THAT involves a huge level of bureaucracy that we’re not going to do for a few tens (or hundreds) of thousands of dollars or less. So there it’s very hard for a small office to contribute to the OSS efforts that we’d like to see supported.

It’s actually helpful when there is a “product” that we can buy to support OSS:

Years ago, I bought a few copies of Travis Oliphant’s “numpy book” with gov’t funds.

I had a colleague in another government office buy EPD for his group, mostly as a way to support the effort.

Ways that small groups with small budgets can support without it being a donation are helpful.

[*] The blocker right now: On our Windows workstations, access to regedit is completely blocked – and the standard setuptools build command uses it to setup and call the compiler (actually, I’m pretty sure that it’s an MS utility that actually calls it). But the result is that we can’t easily build extensions on Windows – going to IT, they just shrug and say they have not idea what the solution is, but they won’t turn regedit access on – even though this is using the MS compiler that they want us to use!

2 Likes

If you’ve got Visual Studio installed, setuptools should only be calling its vswhere.exe tool (in the VS installer directory) to locate where your best install is. If this doesn’t work in your situation, use the “Report feedback” buttons in Visual Studio to file a bug - Microsoft can fix that one, or provide better guidance for your admins to follow.

Regedit itself shouldn’t be necessary. Read access to the system registry is essential for anything to function normally. You might be stuck on an old setuptools (or maybe they’ve reverted my fix again…) if it needs you to manually modify your registry.

1 Like

Thanks @steve.dower ! – I"ll take this to the appropriate fora now.

-CHB

1 Like

No statistics (I checked the 2021 Python developer survey results, and they don’t seem to include any questions along these lines), just a core developer’s perspective (which has its own set of biases):

  • the number of folks we interact with in the community aren’t even close to what would be required to hit the astronomical user numbers reported for the Python ecosystem as a whole (the intepreter downloads from python.org are also nowhere near high enough to account for the number of distinct clients that pypi.org sees). So while there are absolutely a lot of folks that only have access to community resources and are having to figure things out on their own, the bulk of the user base has to be operating in a different way (but the very nature of those differences mean they’ll never show up as part of community interactions)
  • at a core development and PyPA level, quite a few of the folks actively involved are employed as Python infrastructure experts for large organisations, often enabling thousands of Python devs with resources that mean those devs aren’t building environments from scratch for themselves from community guidelines, they’re using the ones provided by their employers (I think it’s interesting that one of the best examples of conda-is-not-just-for-data-science comes from that kind of context, as PayPal Engineering published an article about how they went from trying to do it all themselves to realising that conda solved most of the problems they needed to solve and just started deploying that instead of continuing to do their own thing)
  • as the founder of the Education Seminar at PyCon Australia, I spent quite a bit of time talking to folks involved in various education initiatives (from Australia and the UK’s digital curriculum, through Software Carpentry, and more). Those discussions often revolved around getting the educators the information they needed to build tailored courses that specifically met the needs of their audiences, rather than expecting the Python community resources to be sufficient on their own (Software Carpentry is particular interesting on that front, as they found that once you get beyond folks that are interested in programming for its own sake, the specific examples used in introductory material have to be domain-relevant: a librarian or nuclear physicist is going to tune out of a course tailored for biologists, and vice-versa. SC is also fascinating in other ways, in that part of what they do is help institutions better support their members when the available time and research budgets don’t stretch to full-fledged formal training courses in software development, but software skills are becoming increasingly essential to perform effective science in many fields)
  • while working for a commercial redistributor myself, a recurring industry-wide theme revolved around getting to a point of organisations being able to provide pre-built environments to staff via their web browsers rather than folks having to build and maintain their own environments. Jupyter Notebooks are absolute gold for that purpose (hence services like Google Colab and Azure Notebooks being a big deal), but things like GitHub Workspaces are pushing that potential even for more traditional “text editor & command line” and IDE-based development.

Thus my assumption is that a lot of the recent growth is arising from either folks getting handed Python environments at work as cloud services, or school-age users getting introduced to things via formal classes and online learning tools.

None of which directly helps the folks that are feeling like they’re getting things done despite their employer and past education experiences rather than because of them, it just highlights that any sense we might have of there being such a thing as the “typical Python developer” or even “the typical Python developer experience” is an illusion - the scope of the different use cases and backgrounds out there is as astonishing as the sheer size of the community in the first place :slight_smile:

7 Likes

2 posts were merged into an existing topic: PIP improvements for python 3.13

Please open a new topic about a specific actionable idea, taking into account existing discussions and work, rather than resurrecting a topic that concluded months ago.