PEP 723: Embedding pyproject.toml in single-file scripts

I’m not sure how you see this side-thread as helping @ofek get PEP 723 ready for approval - it seems like it will be as much a distraction here as it was in the PEP 722 thread. But having said that I feel that if we’re talking about the “10 years from now” view, I should explain how I would like things to look in 10 years. Because the model you describe sounds pretty terrible. It’s barely changed from what we have now, where Bob needs extra tools just to run a homework script, and has to learn “configuration formats” rather than just learning Python (which, so he’s been told, is awesome because it has loads of easily-available libraries, just ready to use).

In my view of “where we should be going”, a significant proportion of Python users will have no involvement with, or interest in, packaging. They will install Python, and get a command that starts the Python REPL[1] or runs scripts, just as they do today. They won’t get some sort of IDE with workflow-related commands to “create projects” - those are available but most people won’t need them or care.

For many, many people, that will be all they need. They write Python code, and run it with the python command. They use libraries from the standard library or from PyPI seamlessly, and the only way they know that a library isn’t in the stdlib is because they have to declare the name of the library on PyPI in their script before they can import it. (Ideally, they wouldn’t even need to do that, but I think that’s more likely to be 20 years away, not 10).

So Bob doesn’t need to Google for anything - he wants to use Numpy, so he adds it to his script and runs the script. He knows about import statements, and part of knowing about imports is knowing how to say “get this from PyPI”. Not because I’m ducking the question of “how to teach this”, but because 10 years from now, knowing how to say that something comes from PyPI is just as fundamental as knowing how to write an import statement. People working at this level don’t need or want to know anything about packaging, they just know that PyPI is available to all of their code.

On the other hand, Alice is writing an application in Python, which will be shipped to a bunch of customers. Or she’s writing a library which will be made available on PyPI. Either way, she starts her Python project management tool, and says “create project”. She gets asked some questions, including “application or library?” which she answers. And then she starts writing her code. When she’s ready, she runs the “build application” command, which creates a single file that can be shipped to the user, and run on the user’s computer. It doesn’t need the recipient to have Python installed. She has to configure the build so that it knows what dependencies to include, and she has to know about locking dependencies if she’s writing an application, or about leaving dependencies open if she’s writing a library, but the tool helps her with doing that. She could do it “by hand” if she wanted, but mostly knowing she can is sufficient, and she lets the tool add metadata and run lockers, etc.

Alice needs to know a bit more than Bob - she needs to understand ideas to do with application deployment like licensing, support, locking down dependencies to ensure reproducibility, etc. Her workflow tool helps her with that, so all she needs to do is run the appropriate commands. But being a conscientious developer, she doesn’t rely on her tool, she learns what’s going on behind the scenes, so she knows where the data she is entering gets stored. She doesn’t need to do this, but it reassures her to know that there’s no “magic” and she could easily write the data by hand if the tool wasn’t available.

Now let’s suppose one of Bob’s scripts is so good that he gets asked to make it into an application for deployment. Cool! He needs to learn how to do that, which is fine, he’s never done “deployment” before, but he’s willing to learn. And it turns out that the standard tools make it easy. There’s a “create application project from script” command that takes his script and puts it into this new “project” format that he needs - the questions it asks are things he knows (or, like licensing, can find out). And it explains what it’s doing (because he asked it for verbose output, as he wants to learn what’s going on, rather than just trusting the “magic”), so he understands why the layout is more complex than his simple scripts. And at that point, he can carry on learning what’s involved in making an application from his script - understanding deployment scenarios, adding a test suite and coverage tests, updating his code to match corporate policies on formatting and style, etc. For simple jobs like running the tests or style-checking his code, the commands to do this are simple, but if he needs to automate anything, he can do it just like he always has - by writing a Python script and running it with python reformat_code.py. There’s no “environment management”, or “script runners”. Running scripts is easy, and Bob’s already proficient at that.

It’s worth noting that the key here is that most Python users (like Bob) have no interaction at all with packaging, and probably don’t even know the term. They don’t think of PyPI and 3rd party libraries as “packages”, just as “more resources I can use”. In locked down environments, things might not be that simple - there could be rules on what 3rd party libraries are approved, meaning that Bob has to know how to configure Python to use the “approved list”. But that’s fine. Anyone who’s worked in a corporate environment or similar has had to deal with this sort of thing - it can be painful (particularly if the use of Python is “unofficial”) but it’s very much “business as usual”.

Also note that I didn’t make a fuss of what tool Alice used. Maybe that’s because there’s only one option. Or maybe (and more likely, in my view) it’s because it doesn’t matter. The workflow is the important thing, and everyone understands the workflow, and uses it the same way. What tool you use isn’t important, in the same way that what editor you use isn’t important (:slightly_smiling_face:). And that, in turn, is because workflow tools are no longer competing to try to claim the “top spot” as the “official tool”, but instead have started co-operating and enabling a common workflow, letting users have the choice and flexibility. Tools agree on the development process, so that users don’t feel that by choosing a tool, they are committing to a workflow or philosophy that they don’t understand yet, and won’t be able to change later. And users don’t feel pressure to make a choice, so having multiple options isn’t a problem. Just pick the one someone told you about, and change later if you want to - no big deal, no crisis. There will probably always be one tool that’s “trendy” and most people will use, but that’s just like every area of computing (heck, Python itself is the “trendy choice” out of a vast range of options!)

And the tool landscape looks very different. There’s no virtual environments or installers. These are low-level implementation details. There are no “script runners” - you run a script with Python. Most people never use any sort of tool unless they want to. Developing applications and libraries is still a complex task, but there’s a well-understood approach that works, so people won’t be asking “but what about my use case?” And tools exist to help with that approach, not to define, or control, the workflow. Build backends aren’t a decision the developer makes, they are chosen based on what you are trying to do. And they are easy to change - if you need to add some Rust code, switch to a backend that supports Rust. Nothing else needs to change.

But 10 years isn’t anything like as long a time as people seem to think. There will still be people with massive monorepos, with complicated arrangements of virtual environments, hard-coded dependency installation, custom build backends and all sorts. Heck, there will probably still be people maintaining a private copy of distutils, “because it works for me”. And the packaging community will have to support these people. We can’t wish everyone onto the new perfection. Expecting people to rewrite the infrastructure for a million-line project just because it’s the new “best practice” isn’t justifiable. So there will still be “lots of tools”. The best we can expect is that people who can work with new approaches can just get on with their jobs and basically forget about “packaging” and “workflow” and “what tool is best”. Unfortunately, there will still be a lot of legacy information on the internet, and thanks to those people who won’t or can’t change their workflow, it will look like it’s “current”. We can’t do anything about that, other than try to make sure that (a) the official documentation is clear enough, and covers enough use cases, that people who read it don’t need internet searches, and (b) make as much as possible “just work” before the user needs to go looking for advice on the internet.

On the other hand, in some ways 10 years is a long time. Expecting to know what will be the “best approach” 10 years from now is probably pretty naïve (or maybe arrogant…) And expecting to get there without any false starts, experiments, or abandoned approaches along the way is foolish. So while “fewer confusing alternatives” is fine as a goal, it’s a very bad way to approach the journey. We have to try things out to see if they work. And yes, that might even mean implementing standards that get superseded. That’s how we learn.


  1. because the REPL is awesome! ↩︎

18 Likes