Wanting a singular packaging tool/vision

ofek · November 17, 2022, 5:38pm

By the way, I just embedded on a Rust team for a month to assist with a project and I totally understand the OP’s perspective. It was quite lovely using (mostly) just Cargo for everything.

Hatch is well positioned to provide this unified experience since it is already pretty much there (and supports plugins like Cargo) except for 2 things:

lock files: this is out of my control but I’m confident @brettcannon will save us
distribution: I tried really, really hard to package it as a single binary with PyOxidizer and Nuitka but virtualenv / venv hard requires a filesystem and PyInstaller also couldn’t. Basically, it needs to be distributed as Python but with Hatch pre-installed

barry · November 17, 2022, 6:26pm

I’m not up to speed on the history of lock file format standardization, but I hope you’re right. I can say that lock file support is going to be increasingly important for the work I’m currently doing.

johnthagen · November 17, 2022, 7:03pm

Here’s an attempt at user story to give some concrete examples of what kind of unified/rallied tooling vision I’m imagining for a future new Python developer.

Disclaimer: This is obviously very inspired by Rust, which I think does a fantastic job in this area, but some of the minor details may need to be tweaked for Python.

The new developer finds out about Python and wants to try it creating a Python project. They visit the official python.org website and are presented with a Getting Started page that provides them with a cross-platform interpreter installer/manager that has the same UI/CLI on every major platform supported. This could be similar to pyenv but Windows is a first class citizen and the installations could be pre-compiled rather than needing to be built.

We’ll call it pyup to sound similar to rustup.

Idea: Remove the duplicity of ways new developers are told to install Python: HomeBrew, stand-alone installer, deadsnakes PPA, build from source, Linux system package, etc.

Upon installing pyup, the latest stable release of Python is then available to them with a common, cross-platform name, e.g. py.

Idea: Remove the confusion regarding launching Python as python3 on *Nix and py on Windows (unless it’s the Windows store and then it’s python).

pyup can be used to install and manage multiple versions of Python. The developer decides they need to test their application on an older version of:

pyup install 3.10

Or change their default Python to 3.10:

pyup default 3.10

Python 3.11.1 is released, and the user can easily update to a new patch release:

pyup update 3.11

Idea: Make it simple and consistent across platforms how to install, update, and manage multiple Python interpreters.

Included with each interpreter installation that pyup installs is a package manager that looks something like (Poetry or hatch-with-lockfile) + pipx. We’ll call it pyrgo for now to keep with our silly naming conventions.

A coworker mentions that httpie would be a great tool to try out an API they will be interfacing with. They are able to easily install a tool globally on their system from PyPI (essentially vendoring a pipx-like experience:

pyrgo global-install httpie

Idea: Make bootstrapping installing Python tools something that is included out of the box. Rust has this with cargo install, Node has this with npx bundled, etc.

The developer is now ready to create their first Python project, so they run:

pyrgo new

This is similar to poetry new and hatch new in that it creates a new, standard project structure for them, including a populated pyproject.toml

demo
├── pyproject.toml
├── README.md
├── demo
│   └── __init__.py
└── tests
    └── __init__.py

The developer learns they will be using FastAPI for their application, so they add it as a dependency and automatically update pyproject.toml:

pyrgo add fastapi

They can then lock their requirements into a cross-platform lockfile, similar to Poetry, for reproducibility:

pyrgo lock

And install those into a virtual environment (which is managed for them, again similar to Poetry and hatch):

pygro install

Idea: Avoid the common papercut of how venvs are activated different on Windows vs *Nix.

The developer would like to automatically format their code to community standards. They find that pyfmt (e.g. black/isort) come preinstalled from pyup.

pyfmt

And one could imagine extending this to a linter or type checker (e.g. mypy) depending on community consensus.

The developer is ready to publish a new open source project to PyPI, they build the sdist and wheel using a single build command similar to Poetry/hatch build:

pyrgo build

They then upload the sdist and wheel to PyPI using a single publish command similar to Poetry/hatch publish:

prygo publish

Idea: Avoid the developer having to discover the existence of the build and twine PyPI packages, find and read their separate user guides, install them into a virtual environment, and invoke them using different commands.

Conclusion

The entire tooling workflow started with Python.org. This avoids a bootstrapping problem and many platform-specific steps that trip up new and seasoned developers alike. The community rallies around these central tools and reduces duplicated effort by pooling ideas/resources.

Importantly, it creates a cohesive ecosystem where a developer is much more likely to be able to drop into a new Python project and already know the workflow/tools.

Since all of these tools are standard across platforms, IDEs and editors have an easier time integrating and keeping up with updates and changes.

pf_moore · November 17, 2022, 7:24pm

I like this (not least because it’s like rust, which I like ). One immediate question springs to mind, though, how would “scripting” (writing single-file runnable Python utilities) fit in with this? This is where the analogy with Rust breaks down for me, because Rust doesn’t have the idea of scripts.

Many of my work colleagues wrote small scripts like this. It’s (in my experience) a very common use case for Python, and one that’s not well served by the “build a project” style of workflow.

ofek · November 17, 2022, 7:30pm

A single project can define multiple binaries

ofek · November 17, 2022, 7:31pm

Hatch always uses an env so this would work fine

johnthagen · November 17, 2022, 7:35pm

For standalone scripts with no dependencies, I would think you could use the py executable that pyup provides directly and skip the pyrgo new/lock/install steps I listed. This is one of Pythons strengths (scripting) that we can continue to support.

brettcannon · November 17, 2022, 8:28pm

I was going to say some things regarding why conda environments as-is are a no-go for me based on work experience, but I’m not diving into it as I think it’s a bit distracting.

I will say I’m interested in seeing what having Jupyter standardize on Hatch does for this.

No pressure. (Still slowly working towards it, BTW).

For other people’s benefit, this last came up in Creating a standalone CPython distribution . This is something else I would like to see solved upstream, but I have to clear some space in my schedule to start tackling it.

Short version:

PEP 665 – A file format to list Python dependencies for reproducibility of an application | peps.python.org got rejected (search around here for the threads on the topic and PEP).
I’m working towards making GitHub - brettcannon/mousebender: Create reproducible installations for a virtual environment from a lock file be an opinionated PoC for wheels-only lock/pinned dependency files
Need to review PoC: Metadata implementation by dstufft · Pull Request #574 · pypa/packaging · GitHub for parsing metadata
Need to do a new release of mousebender for PEP 691 (and probably PEP 700 soon)

Long version: you know where to find me.

https://pyup.io beat you to the name.

I’m hoping that once we have pre-built binaries for CPython releases I can get something like this going with the Python Launcher for Unix.

pf_moore · November 17, 2022, 9:23pm

I was going to write a long description of the “writing a script (with dependencies)” use case, but it occurred to me that @njs described this a lot better, back in 2018, with this post.

I think Python tooling currently focuses mostly on the “Reusable Library” section of this, with some help for the rest of section 3. But we tend to ignore sections 1 (“Beginner”) and 2 (“Sharing with others”). Those two early phases are where people have “a bunch of scripts in a Work directory” and where a formal project directory is more overhead than they want. And as a sidenote, I think that referring to stage 1 as “beginner” is misleading - I’ve been using Python for years and I still find that most of my work is in this category.

It’s standalone scripts with dependencies that I care about. They are the case that is badly supported right now IMO. These are often all dumped into a single directory, so PEP 582 doesn’t fully address this. The pip-run tool helps, but it reinstalls dependencies for every run, which can be painful.

I’d like a “unified vision” to cover this use case as well, and not just the “create a Python project” workflow.

ofek · November 17, 2022, 9:43pm

Hatch fixes this. For example, Hatch itself uses:

[envs.backend]
detached = true
dependencies = [
  "httpx",
]
[envs.backend.scripts]
update-data = [
  "update-classifiers",
  "update-licenses",
]
update-licenses = "python backend/scripts/update_licenses.py"
update-classifiers = [
  "pip install --upgrade trove-classifiers",
  "python backend/scripts/update_classifiers.py",
]

pf_moore · November 17, 2022, 10:03pm

I like hatch a lot, although I haven’t used environments much yet. But I think you’re missing my point (or maybe I’m missing yours).

I have a directory on my PC, C:\Work\Scratch, where I keep all sorts of junk - snippets of code in C, Python, and all sorts of other languages, directories with temporary work, etc. There’s no structure and barely any organisation. The other day, I wanted a script that would take a number and display factors of that number and other numbers “close” to it. I opened a new file, and started coding. I needed sympy in the script as it has factorisation routines. How would hatch environments have helped me there? My scratch directory isn’t in a hatch-based project, and the code I was writing wasn’t worth making into a project.

At the moment, I use pew to make a temporary virtualenv, install sympy, and run my script. But I have to remember (or read the code to check) what dependencies my script has when I run it, and build a throwaway virtualenv each time.

This use case is why I often push back against people saying “using packages from PyPI is easy”. It is, but package management is a big enough overhead on a throwaway script that sticking to the stdlib can end up being preferable.

ofek · November 17, 2022, 10:17pm

No need for a pyproject.toml, enter C:\Work\Scratch then touch hatch.toml then hatch shell

wevertonms · November 17, 2022, 10:20pm

The solution for this use case used by pyflow is very nice IMHO GitHub - David-OConnor/pyflow: An installation and dependency system for Python

pf_moore · November 17, 2022, 10:35pm

Seriously? Wow! Hatch just keeps getting better

konstin · November 18, 2022, 9:00pm

I just want to throw my prototype into the mix: monotrail

It’s one single static binary, it will download the correct python version (feat. pyoxy’s standalone cpython distribution), manages dependencies and environments (using poetry internally for locking and lockfiles). E.g. you can do monotrail run -p 3.9 command pytest which will run pytest on python 3.9 with dependencies from pyproject.toml and (if present) poetry.lock; I’ve put a lot more examples into the readme. While it’s a very opinionated take on environments, I believe it nicely showcases a lot of the single tool features requirements

h-vetinari · November 19, 2022, 6:46am

I think this is a management and messaging problem first and foremost. If the python packaging authority doesn’t mention conda anywhere, a lot of people will never even discover it. And even people who are aware are doubtful - I see the confusion all the time (in my dayjob and online) about which way is “the right way” to do python packaging and dependency management.

I firmly believe that the vast majority of users would adapt to any paradigm that solves their problems and doesn’t get in their way too much. I think the strongest resistance actually comes from those people knee-deep in packaging entrails, and the significance of that group is that many of them are the movers and shakers of the (non-conda) packaging ecosystem.

I think that willingness for change would be there, as long as there are no major functional regressions in terms of problems already solved today in conda-land. And that’s not just me saying it personally, but echos the statements by the anaconda CEO in the twitter thread I’ve already linked twice (also I believe most of conda-forge/core has a very solution-oriented approach to this as well).

End users really don’t benefit from a zoo of different solutions, the confusion, and the straight-up incompatibilities between those two worlds (in both ways).

I see the users of conda as part of the exact same larger ecosystem (not some parallel world), only that they have been so thoroughly underserved by the “native” capabilities that they found(ed) a new home^[1]. So I disagree quite fundamentally with this:

Conda is full-stack because that’s – unfortunately – what’s necessary to deal with the inherent complexity of the problem space. But it’s not a beneficial state of affairs for anyone IMO; that divergence is an issue that affects a huge amount of python deployments (e.g. having to decide how to prioritize the benefits of pyproject.toml / poetry’s UX, etc. vs. the advantages of conda) – it’s possible to claim that it’s too much work to reconcile, but fundamentally, that schism shouldn’t have to exist.

It’d be nice if PyPA and CPython folks didn’t treat conda as such a world apart, because for one that soft form of “othering” is not helpful in finding a common way forward, and secondly because it is a large part of the wider python ecosystem and deserves some usecase-empathy (beyond “of course we care about the data science persona!”).

Just to clarify my point – in the big picture, the whole discussion is about UX, including e.g. the avoidance of very frustrating crashes and insanely hard to debug situations. What I was referring to above was slightly more narrowly-scoped UX (as experienced through CLI/config etc.); providing a solid enough foundation is IMO the much harder thing to pull off in terms of technology / standardisation; shaping things into a nice-to-use CLI/config is important but by itself cannot solve the underlying problems.

or, for the sake of the metaphor, let’s say they’re living in the garage. ↩︎

pf_moore · November 19, 2022, 10:01am

I for one would love to see conda participating in packaging discussions. We’ve asked a number of times but it’s never really happened^[1]. Maybe we aren’t reaching the right people?

Excluding the occasional non-productive “just abandon all the exiting tools and use conda” comment. ↩︎

johnthagen · November 19, 2022, 2:15pm

As Ofek mentioned, both Poetry and hatch have a shell command that handles activating a managed virtual environment in a cross platform way. In the pyrgo example I was giving:

Developer writes a script, script.py that depends on scipy

pyrgo add scipy

This adds the dependency to pyproject.toml

They install the dependency:

prygo install

They activate the virtual environment with these depedencies:

prygo shell

And run the script within the activated environment that includes scipy:

(venv) $ python script.py

Both Poetry and hatch also support a run command to short-circuit the need to activate the virtual environment. So rather than running the shell command and then innovating the script in a second command:

pyrgo run python script.py

Of course, this virtual environment could also be used to install/uninstall arbitrary packages with pip based on how things currently work, if the developer wanted to fall back to the additional flexibility/unstructure that currently exists with pip and venv.

pf_moore · November 19, 2022, 3:00pm

Sigh. I think you’re still missing my point. One further try.

I don’t have (or want) a pyproject.toml for this use case. That’s *exactly my point. I’m not even in a Python project, or writing one. If it helps, assume I’m writing my script in /tmp.

… and I don’t want to manage a virtual environment associated with my script. I want the system to do that for me.

Which is great, but they still need me to manage the environment in the sense that I have to pick a name for it, delete it when I’m done, and remember that that environment is associated with my (probably throwaway, but I’m keeping it “just in case”) script.

My ideal here is for scripts which depend on 3rd party libraries to be just as easy to write and use as scripts that only rely on the stdlib. And crucially, for all situations that pure-stdlib scripts can be used in.

The nearest I’ve found is pip-run, which lets you say __requires__ = ['requests'] in your script, and it will then install requests in a temporary environment when you run your script. Its main disadvantages are that it re-creates the environment every run (slow if you have complex dependencies) and that it has a somewhat clumsy command line syntax. But integrate that functionality with something like hatch run and you have pretty close to what I’m talking about.

sinoroc · November 19, 2022, 3:12pm

Seems like pyflow is close enough (I have not tried). It seems to have support for __pypackages__ and __requires__.