[announce] Pybi and Posy

I’ve never declared it DOA, I said it looked interesting and remarked there were a few things I wasn’t particularly enthused about.

Oh wait, are you thinking about this discussion, and the conclusion that pip’s SpecifierSet wasn’t in a form that worked well with pubgrub? New Resolver: technical choices · Issue #7406 · pypa/pip · GitHub

I implemented specifier evaluation by converting them into a union-of-half-open-ranges representation, and the code is pretty gnarly but AFAICT it does work: posy/src/vocab/specifier.rs at 9c8aee67de7df824d19f46cd1b9c19c7a3cd139e · njsmith/posy · GitHub

OK actually I just tested this, and AFAICT it actually works perfectly. sitecustomize.py lives in the interpreter’s built-in site-packages/ directory. So with a regular venv, sitecustomize.py gets ignored, the venv hard-codes the location of the interpreter inside the EnvForest, and you get a regular functioning venv. If you do venv --with-site-packages, then the sitecustomize.py does get pulled in, and does the magic environment setup stuff when you run the venv’s python (though then it does mean you need to be running the venv within the right posy environment).

micromamba already has (experimental) support for better error messages, inspired by PubGrub:

micromamba is a single binary executable written in C++ which implements the conda packaging spec without requiring a Python installation.

It’s quick to start, quick to solve, can install any Python version you want into isolated environments, is cross-platform and the spec has been battle tested (iterated on, evolved) for over a decade with the toughest packaging problems there are.

It already solves the problems posy it trying to solve (and more). If you want to use pip:

micromamba create -n py311 python=3.11
micromamba activate py311
pip install -r requirements.txt

It seems that it would be easier to just adopt the conda packaging spec and bring both communities together.

What is stopping the PSF declaring conda packages to be an official alternative binary format, hosting and serving them on PyPI alongside wheels? Both communities could then move forward together without either one having to reinvent the… eh… wheel.

4 Likes

You nerdsniped me. A quick test suggests that startup time of a program with N entries on sys.path, that imports a single empty package from one of the N directories, has runtime linear in N, with startup times varying between 0.05 sec (100 entries) and 0.68 sec (2200 entries[1]). So it’s not totally ignorable, but hardly crippling.

I suspect the idea that “too many entries on sys.path causes startup slowdowns” comes from back when easy_install added lots of entries. I think the import system was missing a number of optimisations that meant the time was quadratic back then.


  1. I could only test up to 2200 path entries before my quick hack blew up because PYTHONPATH was too long… ↩︎

Pretty sure it also multiplies by the number of files in each path, as they all get cached to improve later import times (a feature requested from people who put network shares in sys.path). I experimented a while back with trying to speed this up further (doing a stat on a lot of files on Windows is sloooooow) but nothing was that impactful (other than zipping the files and putting that on sys.path instead).

Very excited to see this be properly announced. It’s exactly the kind of workflow that I’m coming around to thinking is our best way forward, so I’m glad you’re a few steps ahead of the rest of us!

Feel free to ping me if/when you hit thorny Windows issues. Maybe this will be the project that finally convinces me to learn some Rust…

4 Likes

Right, thanks. If that’s the case, wouldn’t distributing a frozen Python app do the trick? (Eating your own dog food and all that…) Given that posy is written in Rust, it’s hard to tell how much of the Python stdlib such a PyPosy might use (in addition to its own packages), but I suspect it wouldn’t require massive numbers of external packages.

1 Like

It would work yes; which is a part of @dstufft’s point. There are benefits to doing it in a compiled language – as there would be benefits to doing it in Python itself. It’s a choice of tradeoffs and @njs has picked the ones that he likes (which I imagine is a big part of why this effort even got so far – that he was motivated to work on this).

So… unless someone is suggesting that we go ahead and reimplement all of that in Python and built yet-another alternative, I say we drop the discussion of Python / Rust.

I’m gonna guess it is related to this:

:wink:

4 Likes

I would suggest it TBH.

Edit:

To be clear why, I don’t think the “yet another alternative” thing is a real problem wrt to posy… because posy isn’t even an alternative right now. FAICT the only way you can actually like, use posy right now is to install rust, check out a git repo, edit Rust code, and then cargo run that edited rust code.

So posy itself isn’t actually an alternative to anything right now, it’s a collection of ideas and a library that you could potentially use to implement an alternative though, but it’s not like there’s any danger of people starting to use posy as is currently and that causing confusion.

3 Likes

This is all very interesting and wonderful to see; the thought and innovation that the python world needs to move forward.

The comments about rust limited the platforms that can be target may be an issue at the moment.
With the gcc rust on its way and the pressure that will build on platforms to get a working rust for their OS, so that tools written in rust can be used, I see that as a problem that will go away in time.

Clearly the git clone then compile is only in the early stages; getting pre built binaries is hardly difficult once posy reaches that point in its development.

Or maybe this work inspires something else that builds in its shoulders.

I might have missed something (blame Discuss for making long discussions essentially unreadable if you haven’t followed them from the start), but I struggle to understand how that would be a replacement for conda, unless it also solves all the thorny library loading issues (and potentially other hurdles) that conda has learnt to solve over the years.

That said, I suppose it’s a nice frontend to install a Python executable from the command line.

2 Likes

As mostly a passive observer of these discussions for the last few years, it feels like the scope and scale of this problem makes designing a solution by consensus to be quite a challenge. As such, as commenters have already pointed out, actually delivering something that we can point at (and say, “Yes, this is what Python needs to look like”, or “No, this is not what Python needs to evolve into”) is probably worth a fair amount.

Personally, I see the main benefit of this line of attack to be the UX that it establishes, not the actual implementation. I’d almost prefer for us to pretend that we don’t know whether this is written in Rust, or an “oxidised” Python binary; we just accept that there is a binary for the systems that we need it for.

If it transpires that we can’t use Rust (I doubt it, but I’m not an expert), then we have the challenge of re-implementing the existing UX, not designing one from scratch. Given that @njs is clearly motivated by working on Rusty Posy, I feel we’d be better off supporting that momentum and seeing where it leads. At the end of the day, someone can come along in a year’s time and re-write it from scratch if they see fit.

12 Likes

The vision is that Posy is a single-file binary. I’m guessing here (my only knowledge is the GitHub repo), but I imagine that this will be available via apt, cargo, etc. For fun, you probably could release this via PyPI as well … but don’t think about what that would look like!

It would definitely not be expected that Python users need Cargo in order to build Posy.

1 Like

Definitely a novel tool and way to reframe the concept of distributing a base language. I can definitely see something like this being handy for hack projects and for learners. Outside of that would be a bit trickier. I run some “niche” OSes (FreeBSD and FreeBSD-based proprietary OSes), so I have an inkling that pybi wouldn’t suit that need very well. Also, someone trying to hack on a non-common place architecture, e.g., MIPS, RISC-V, would need to do some work in terms of building and managing binaries for their specific architecture.
The work you’re doing seems like it would really fit in well with a frozen binary builder like py-installer. I’d personally like frozen binaries with python to be a more standard packaging workflow for distributing self-contained python middleware, like Meta has had with their .xar format for some time.
Some questions:

  • What’s the disk space requirement for PyBI’s?
  • Who do you envision providing the builders?
  • Who do you envision providing the package hosting?
  • How do you envision security updates being done? What if a single package in a set
  • How do you envision the signing/verification process for PyBI’s? The package manifest? Nation states like China and corporations who MitM HTTPS connections are a couple points of concern I can come up with of the top of my head.

I’m not sure if there’s any specific collaboration here, but I would assume that if posy became the way to do things then the Launcher would somehow try to support it.

Only so much as shipping a lock file and posy itself might bring us closer.

Yeah, I wouldn’t want to try and support the world now either since flexibility isn’t always a good thing. Just because you can doesn’t mean you should. :wink:

So for the PyBI concept, I think there’s the pre-compiled CPython binary part and the metadata around it part. I have already talked to the release managers about producing pre-compiled binaries that are not behind an installer and they are up for experimenting with Linux and Steve already has it ready via nuget (macOS is an open question). The trick has been getting people interested enough to want to help out to make it happen and be a part of the CPython release process.

As long as their tooling doesn’t understand how to read whatever way posy is calculating what to hook up for Python to run, you’re right. But some tools actually ask Python for where to look, so it’s also surmountable.

I personally think that’s fine if everything posy uses to determine what to use is backed by a standard. For instance, if posy is effectively using a lock file (either on disk or in memory), then as long as your production environment can either reconstitute those packages or you can pull that all together for deployment from said lock file, then you’re covered.

Lots of things are now cached. The cost is a directory listing for each entry on sys.path (that gets cached) along with stat calls on the directory to know if the cache has been invalidated. You can start reaading at cpython/Lib/importlib/_bootstrap_external.py at b724ac2fe7fbb5a7a33d639cad8e748f17b325e0 · python/cpython · GitHub for the key bit for searching for something to import.


For me, there are two key questions around this work:

  1. Is this a workflow people like and want to get behind?
  2. What would need to be standardized to make this sort of workflow work (e.g. so tooling can interact with it)? And can those standards work with other ways of doing things so it can be piecemeal while still making forward progress?
3 Likes

Share bits of reusable Rust code, if any.

And because, in my mind initially, posy would be the tool that I would use to run/launch any Python code (scripts, zipapps, etc.) before I even have Python installed. So in this scenario it made sense to me to merge posy and the launcher, for unified experience and less proliferation of tools. But it has later become clearer to me that posy, as it is now, is more geared towards the developer workflow (which again in my mind, is a different use case that this tool I had in mind). Anyway, I think it would help a lot if we had some kind of launcher that would install a Python interpreter if none can be found.

  • A tool able to install Python interpreters on its own when needed sounds good.
  • An installer that is able to prevent N times the disk foot print when a package is needed in N environments sounds good (this is what posy does, right? I might have misinterpreted).
  • Python interpreters distribution format (extend wheel into wheel 2.0 to include Python interpreters or something separate?)
  • Lock file format (not sure it is strictly needed for posy, but why not)
  • Whatever the thing that it uses instead of virtual environments
  • The installation process so that other tools/libraries can also install/uninstall/query (does importlib.metadata work?)

Bootstrapping a stand-alone pip could be done by building a stripped down python interpreter with a couple of additional extension modules reconfigured as builtins and then appending a zip with the python code. It would be a bit bigger than the rust version, but not an unreasonable download size.

Rust is a great language and all, but I don’t see bootstrapping as a sufficient justification for using a different language to implement a fundamental tool of the Python ecosystem.

Love the concepts of installable interpreter wheels and declarative venvs.

2 Likes

There’s probably not much to share since their goals are different. Posy is essentially a Python package manager written in Rust. The Python Launcher is a tool that finds Python installations written in Rust. Since Posy directly controls the Python interpreters it installs it doesn’t need any help finding a Python installation like the Launcher does.

And that’s why I have brought up trying to get pre-built binaries for CPython as part of the release process.

I believe it’s a lock file and dynamic calculation of what sys.path should be.

I don’t see why it wouldn’t since Posy is just munging stuff on to sys.path and not using a magical importer or anything that might break importlib.metadata.

1 Like

I think this is a workflow that a lot of people like and have already gotten behind because it is the conda workflow. The question is more whether PyPI/PyPA/whoever is wants to get behind that workflow. (Note I’m not talking about adopting conda or its repositories, but shifting the workflow to a “manager-centric” model rather than having an installed Python interpreter as the top-level thing that manages environments.)

6 Likes