PEP 594: Removing dead batteries from the standard library

I guess I’ll write up a blog post describing the approach I use, and see how far it gets.

At the very least, this is a scenario that’s only really supported or explained by Docker, but it’s easy to think that the freeze tools are the best alternative (they’re not) or that pip-running-on-the-target is the only okay approach (it’s not).

Maybe I’ll get some time during the sprints next week… :thinking:

4 Likes

That would probably be good. I definitely don’t consider myself a beginner (:slightly_smiling_face:) and I’m not entirely clear what you’re suggesting here…

1 Like

In this case we’re dealing with buildroot (well, a whole lash-up of stuff that uses buildroot for the Linux half of its life). If the stars align, adding a Debian-style package is easy. If they don’t, it’s a voyage of discovery :frowning:

The short version is a CI setup that looks like:

  • git clone (my project)
  • (depending on project, extract Python distro into .\python)
  • pip install -r requirements.txt --target .
  • tests
  • zip .\**\*
  • copy zipped package to target machine and extract it

And obviously there are 100 things to watch out for here, primarily in your dependencies, but provided you do look out for them this is totally reliable. I’ve done it with Linux web apps and Windows GUI apps (where the app was in a self-extracting ZIP that then launched itself).

An even simpler view of what Steve is advocating is if you can deploy your own code then you can deploy 3rd-party code as if it was your own code. Basically 3rd-party code is deployed differently from your own code only if you choose to treat it differently. Otherwise treat it the same and thus deploy it the same/part of your code.

As such, there’s only 3 scenarios where not having something in the stdlib is a true blocker.

  1. Issues with getting legal approval (I know Paul is familiar with this :wink:)
  2. It’s an extension module and you don’t have access to the compiler toolchain for some reason
  3. You can’t deploy a single line of code and you’re just have access to the REPL

After those it seems to me it’s a matter of convenience for certain users whether something is included in the stdlib or not. (And I would argue that even the above scenarios are still about convenience as Python can’t solve everyone’s deployment and legal problems.)

2 Likes

That’s a very optimistic way of looking at it.

People who have pip working for them consistently underestimate the
difficulty many people have in getting pip working.

A data point: I just tried to run ensurepip from Python3.8a and got a
series of warnings followed by an exception:

SyntaxWarning: ‘str’ object is not callable; perhaps you missed a comma?
SyntaxWarning: ‘str’ object is not callable; perhaps you missed a comma?
SyntaxWarning: invalid escape sequence \w
SyntaxWarning: invalid escape sequence \w

ModuleNotFoundError: No module named ‘_ctypes’

I do have a successfully installed pip running under 3.5, so I just
tried pip3.5 install numpy and got:

No matching distribution found for numpy

I’m not looking for advice on how to solve this problem. I’m just
demonstrating that pip is not a magic bullet. Getting pip working is not
a trivial step, for many people it will be a significant barrier to
entry.

I understand that people in marketing and customer service expect that
for every complaint they receive made directly to them, there could be
anything from 10 to 100 unhappy customers who simply went away
unsatisfied and won’t be back. For every person who asks a question on
Stackoverflow “why isn’t pip working”, we should expect that there are
probably a hundred who silently experienced the same problems but didn’t
complain about it anywhere we can see.

Some proportion of those will have solved their problems by just giving
up and doing without. We rarely hear from them, like the customers who
receive bad service but don’t say anything and never come back. As a
result, we suffer from survivorship bias.

Because we are surrounded by those who have pip working and can use it,
but not from those who can’t use it, we’re biased to think that pip
“Just Works” and that installation of third party libraries is trivial.

“Boom everything magically works” – except when it doesn’t, but we
don’t see those cases. Out of sight, out of mind.

3 Likes

Ah, sorry. I was thinking about the problem from a different angle, specifically this post from @njs

When developing a new project, I would expect the initial Write a script" stage to start with using just stdlib features, but transition fairly rapidly to using 3rd party packages. Whether the user uses a virtualenv or just the system Python is a matter of preference/experience, but at this point pip install stuff is the easy and obvious way to go 1.

The problem comes at the transition to “sharing with others” stage. At that point, package install is a post-deploy step, in that you say to the recipient, “here’s my script - you’ll need to install Python and a few dependencies, here’s a requirements.txt file you can use to do that, give me a shout if you have problems”. It’s still a bit fiddly and manual, but you have a direct interaction with the people you’re sharing your code with, and “oh yes, I forgot I have foo installed” is just troubleshooting, not a “failed deployment”.

It’s only when you get to step 3 or @njs’s description (deploy a webapp/distribute a standalone app) that you need to consider “deployment” as a formal thing. And at this point you quite possibly already have a routine of “ship the code, install dependencies in the target environment”, so switching to shipping dependencies with the code is a big change - not only do you need to include the dependencies, you also need to add code to your application to fix up sys.path, and you need to test all this as it’s an architectural change to your code. Not hard, maybe, but it’s a fairly big shift in how you think about what you’re doing (“sharing a script” vs “deploying an application”).

There’s very little in the Python packaging documentation that really helps with this final step. Not many descriptions of how to do it, little or no “best practice” recommendations, and essentially no help for people trying to get things working. So people stick with “what they know works”, the stage 2 “dependency installation as post-deployment step” approach, because it’s a known problem - maybe not easy, but at least familiar.

Having said all of the above, to bring the focus back to “Removing dead batteries from the standard library”, it’s very rare in my experience that the problem is fundamental, insurmountable problems where you can’t use external modules, but rather a succession of annoying road blocks, that add up to the point where it’s just not worth using Python for the task at hand2. Bundling dependencies is pretty much just as annoying a stumbling block in that case as having to get a 3rd party tool added to the target environment, so I’m not sure it makes much difference there.

1 And yes, pip install stuff isn’t always as easy as we’d like it to be, but there’s lots of people and resources who will help you solve issues, whereas there are very few places you can go for help on bundling stuff…

2 My experience here is with Python as a support tool, not the core business language - if your business is based on Python and you haven’t formulated a workable policy on PyPI modules, let’s just say I’m surprised…

2 Likes

What is the current state of this PEP?

Is it still realistic that the dead batteries will be deprecated with 3.8 and the removal will take place in 3.10?

cgi.FieldStorage is used by Zope and thus Plone, and there is new discussion about whether vendoring it or migrating to https://pypi.org/project/multipart/

Thank you!

No because Python 3.8 is already out and Python 3.9 is in beta. The PEP will need to be updated to target Python 3.10 with targeted removals in 3.12 (assuming someone drives the PEP to being finished).

2 Likes

But wouldn’t making chunk “private” make it more difficult to add a drop-in replacement to PyPI? Would the module in the stdlib be renamed to _chunk? Or the code incorporated in wave?

Removing chunk entirely makes no sense if wave is to be kept. WAV files are RIFF files and chunk is the basis of parsing RIFF files. BTW, RIFF-based files formats are still used everywhere, especially on Windows. I have used chunk many times to implement parsing of various proprietary binary formats.

Depends on the situation. The key point is it would no longer have a public API.

It does from a maintenance cost perspective. Public APIs have costs. From docs, to bug fixes, to enhancement requests; it all takes up someone’s time even if it is simply to read an issue to then immediately close it.

While you may have used the module many times, that doesn’t mean the vast majority of Python developers have or that the use of the module in the Python community warrants asking the core development team to continue to maintain it for everyone’s benefit. If the code is stable you can totally just copy the code to your projects as necessary or even start your own project for the code. But otherwise you’re in a way saying that you think any issue related to chunk is potentially more important than any other module in the stdlib that has an open issue against it.

1 Like

But otherwise you’re in a way saying that you think any issue related to chunk is potentially more important than any other module in the stdlib that has an open issue against it.

That is definitely not what I’m saying or even implying. Maybe I’m implying that it is equally as important as other modules.

But I think you misunderstood my. By “removing entirely” I meant removing its code. But since the wave module can’t function without it, even if you incorporated it in the latter, you would essentially re-implement it.

I think we are starting to argue semantics which aren’t really relevant to the discussion. The point of this PEP is these modules will no longer be considered publicly accessible and thus no longer directly supported as standalone modules in the stdlib. Whether we rename them, copying them around, or flat-out delete them is an implementation detail of the stdlib.

1 Like

IMO, anything that moves packages out of the standard library should be embraced. As long as CPython could be built and used nothing extra is needed there except basics, i.e. I/O and os interaction oriented packages.

So, I personally couldn’t embrace this PEP more.

While I heartily agree on reducing the maintenance burden for core developers, the alternative of having a module on pypi has problems for installing on machines that don’t have unfettered internet access.

Thank you Steven for this observation. The issue of “pip not working” is also relevant when installing on machines that have restricted internet access. The alternative of a local repository, or separately downloaded wheels (and all of their dependencies) is too much for beginners.

No one is denying that the proposal of moving modules to PyPI is a 1:1 replacement for people compared to shipping something with CPython. There are core developers who live in locked-down environments where legal and/or IT barriers make it extremely difficult to bring in new code, so this is a represented viewpoint (and also why this list is extremely conservative).

But there’s also an issue here of the burden of your workplace then placing a burden on us to try and ease your overhead by us taking on more of a maintenance cost. At some point we have to draw a line and say we can’t expend more energy trying to support every development process for our own sanity and time.

1 Like

I think there’s a midway between having core devs take on maintenance
and putting things on PyPI:

We could have a curated set of libraries and packages we ship as
part of the standard install, but take from PyPI whenever a release
is cut.

Similar to the way ensurepip works, but for a larger set of
“batteries”.

Maintenance could then be distributed to people maintaining the
code on PyPI, while removing the need to download it from PyPI
for execution.

2 Likes

I like this idea, but if we were to do this, we would have to be very careful of what we permitted redistributors to do when providing something that they called “Python”.

It’s already a significant issue for developers that distributions like Debian and Ubuntu ship a system package called “python” that omits chunks of the standard library. If the standard Python distribution starts including a bunch of libraries sourced from PyPI, the likelihood is that such distributions would “unbundle” those libraries from their version of Python, and we’d end up in a situation where people trying to write portable code would not be able to use bundled libraries without making life hard for their users on those distributions.

I’d suggest that if we take this approach, we’d need to formally state (and be prepared to enforce) that in order to be called “python” (as opposed to, say, “python-base”, or “python-minimal” or “python-system”) a distribution must make available a specific set of libraries.

4 Likes

Something that could be of use today is to allow downstream patchers to include a list of modules known to be missing. Then if you were to implement the ‘curated set of included PyPI packages’ idea, you can include the libraries which were previously in the standard library in that list.

An example implementation:

# importlib/__init__.py

known_sources = {
    "wave": "install `wave` from PyPI",
    "distutils.util": "run `sudo apt install python3-dev`",
}

def import_module(name, package=None):
    try:
        return _original_import_module(name, package)
    except ModuleNotFoundError as e:
        if e.name in known_sources:
            e.msg += "\n\nTo fix, " + known_sources[e.name] + "\n"
        raise

Could you define “Python” to be whatever comes out of the default configuration of the build? PyPI libraries included in the Windows installer could be part of the “Python extended library” (or “supplemental” or “full”). These PyPI libraries should hold enough respect as to be the default for newcomers to submit PRs for and for distributors to include in an “extended” distribution.

1 Like