PEP 723: Embedding pyproject.toml in single-file scripts

ofek · August 10, 2023, 8:09pm

I would be willing to do that as well!

In your examples it reads to me like your personal main use case is specifically only about required runtime dependencies and all of the other use cases that are deemed out of scope would not be supported by this potential new standard out of personal preference. This is not just you of course but some others in support of the dependency only way.

This is fine I think but I would really appreciate if you could update 722 to address some of the use cases and specifically say why they should not be supported by standards.

sinoroc · August 10, 2023, 8:13pm

My impression is that the scope is not single-file scripts (in the sense of an executable script) or single-file projects. As far as I understood, it would be possible to have such an embedded pyproject.toml in a Python importable module anywhere in a Python library (or any larger code base) for the only purpose to override the general linting rules for that particular Python file, in other words the embedded pyproject block would contain only something like the following:

[tool.SomeLinter]
some_rule_enabled = false

Is that right?

I would not know how to name the PEP either.

In the version of the PEP where the embedding happens in comments (instead of in a __pyproject__ variable) do we really want to use Markdown-like notation ```toml? Python is historically more in the reStructuredText camp. It is true that literal blocks in reStructuredText are probably not as easy to write in comments.

If this PEP goes towards markdown code block, we should look at how to handle the info string. Typically ```toml means the code block in question contains TOML formatted text, as far as I know this is the generally accepted usage. I wonder if something like ```toml pyproject is possible, I do not know how markdown parsers handle this.

pf_moore · August 10, 2023, 9:04pm

But you’re missing my point totally. Of the long list I gave, none of them (except the one that includes pip, pipx, …) would I consider to be “a project I write in Python”. In fact, most of them I don’t even consider as programming projects - they are maths, data analysis, reporting, data collection… Not programming, and certainly not Python.

Does nobody else have projects like this? Or do you not consider it to be worth improving the user experience of running the Python code that exists in such projects? Or do you consider them “out of scope” for PEP 723?^[1]

Because I genuinely have no clue what those use cases are, so I can’t work out what to say. The only one I’ve seen that I understand is the “python version requirement” one, and I’ve tried to address that, but I suspect I’ve failed to do a good enough job, because I guess there is some aspect that I’m not understanding. For me, the fact that pipx and pip-run have never supported this (and in the case of pip-run can’t, because of the way it manages environments), combined with the fact that no-one has asked for it, means I’m not sure what the (non-theoretical) issue is.

I could throw the question back at you. If you (in PEP 723, for now) could clearly explain the use cases you think aren’t covered by PEP 722, with practical examples, I could have a go at explaining my position. The only ones I can see in the PEP are:

“A user facing CLI that is capable of executing scripts”. Yep, that’s the one PEP 722 covers. The experience from pip-run and pipx is that dependency data is all that’s needed.
“A script that desires to transition to a directory-type project”. In my experience, step one of this transition is to copy the script into its own directory and write a pyproject.toml. Plus some tests, and maybe a deployment script, etc. The point is that there is no point at which I have ever felt that it was essential to retain the single-file nature of the code, but adding metadata was the compelling next step on the way to “full project” status. Nor have I ever heard anyone else say that. Do you have any evidence that people need this capability? Particularly as it’s easy (by design!) to write a tool that reads script metadata and writes it to a pyproject.toml when you’re ready to move across.
“Users that wish to avoid manual dependency management”. Yep, PEP 722 can do that. All tools have to do is find the dependency block and add lines at the end. In fact, I argue in PEP 722 that this is one of the things that TOML (and by implication, an embedded pyproject.toml) is worse for, as layout-preserving edits are notoriously hard to get right. I admit that I haven’t looked in detail at what the available libraries are capable of, but the reasons why PEP 680 argued against adding a write capability in the stdlib seem relevant here. Plus, I have no idea if capable layout-preserving libraries are available in Typescript, or in Rust, or whatever other languages people might write tools in (PEP 722 is simple enough that you could probably write a tool as a shell script!)

The only other case I can think of is tool-specific config. And that was more a case of asking “pyproject.toml allows it, do you have evidence that tools would use it?” rather than of an existing demand for the capability. I’m not aware of black, ruff, flake8, mypy, etc having people asking for per-file config of the form that a [tools] section would provide. Yes, when they are asked “if it were added, would you use it”, the response was “maybe”, or even “probably”, but there were reservations about the whole single-file vs directory aspect, which suggests to me that they don’t have actual use cases in mind, just a feeling that “it might be nice”.

If you wanted a compromise proposal, I’d be looking at something more along the lines of the abandoned PEP 722 “metadata blocks” approach. An extensible format, but not tied to pyproject.toml and with only the dependency data specified at this point, and further data to be added in follow-up standards when justified with real-world examples of use cases. I’d even be willing to accept a reserved “tools” namespace of some form, even though I think there’s a mess of complexity that would make it a lot less useful than it looks.

But the starting point for me would be concrete, real-world use cases that demonstrate that people are having problems right now because of capabilities that are missing - not “potential for future developments”. And at the moment, PEP 722 covers all of the ones I am aware of.

In contrast, they are more or less the entire scope of PEP 722, so if that’s the case, maybe both PEPs could be approved without conflict. But I doubt it. ↩︎

pf_moore · August 10, 2023, 9:13pm

This is a reservation I have as well. There’s never been any suggestion that embedding metadata into individual importable modules inside a package is necessary - and yet, assuming that we don’t want to frame PEP 723 as only applying to single-file scripts, it’s not at all clear to me how to word PEP 723 to prohibit it.

IMO, it’s better to focus firmly on (single-file) scripts. It’s an easy, understandable use case that has clearly-defined boundaries. I don’t know whether PEP 723 can restrict itself to that use case, but that’s for @ofek to say (clearly, I can’t, as I’m biased - I think PEP 722 addresses that use case perfectly well).

jamestwebber · August 10, 2023, 9:16pm

I didn’t mean “a project I write entirely in python” and I don’t know what distinction you’re making between “project” and “programming project”.

I don’t understand how you can say “a project that involves python scripts” is not “a project you write in python”, at least partly. In any project that involves a non-trivial amount of python code, it seems reasonable to want to configure tools to work with that code, and that configuration would seem naturally at home in a “python project” config file.

On the contrary, nearly all of my projects are like this, but I’m already used to configuring tools for them via pyproject.toml once I have more than a jupyter notebook. If I could add configuration in a lone script I might do it sooner (I don’t know for sure).

fungi · August 10, 2023, 9:29pm

I have many of the use cases Paul describes, I just haven’t weighed
in on the PEP 723 discussion because the PEP 722 draft seems to more
closely meet my personal needs.

In my case, part of $DAYJOB involves wrangling a diverse community
of hundreds, even thousands, of contributors to free/libre open
source projects. That means doing all sorts of quick data analysis
around various aspects of contributor patterns and demographics.
Sometimes I just knock something out in the REPL and paste a
transcript into E-mail to “show my work” (open methodology is
important).

If the analysis is particularly involved though I may be generating
summarized structured data from larger data sets, and so stuff the
logic into a script and zip that up along with the input and output
files for archival purposes. This might be something I want to run
again next quarter, or next year, so I embed comments at the top of
the script indicating what libraries need to be installed in a venv
before running the script from it. This is where I see
implementations of either PEP being potentially useful for me.

There are, of course, also times where I know I’m going to need to
run the same analysis frequently, and want to preserve the tooling
to do that more durably so that the task can be easily taken up by
other members of the community. In those cases I certainly make
“proper” Python packages with script entrypoints, include
self-testing mechanisms to ensure it continues to function properly
(with CI jobs, code review, all the typical things a veteran
developer expects from a full project). But I don’t want to put in
that extra level of effort for something only I’m likely to run, or
which may only ever get run once.

pf_moore · August 10, 2023, 9:33pm

The distinction I’m making is that the deliverable isn’t a program. It’s a report, or a data file, or a mathematical understanding. Or something else that is created using code, but the code is a tool not a product.

I have global config that defines how I want VS Code, and black, and ruff, and mypy, etc to treat my Python code by default (admittedly, most of that default config is empty because the tool defaults are fine for me). And I’m fine with applying that default config when I’m using Python for everything where the code isn’t the deliverable - because I’m the only person who will see the code, and my (global) preferences should apply just fine.

If I reach the point of defining project-specific standards or config for my Python code, then yes, that’s a “Python project” in the sense I’m trying to express. But it’s very rare that I need such control over config unless I’m working with other people on the code (or working as if that was a possibility). And that’s far from the use cases I’m talking about.

But maybe we can just agree that “it’s hard to come to a common understanding”? After all, that’s really the only point I was trying to make before we got sucked into this extended digression…

jamestwebber · August 10, 2023, 10:07pm

Sure. I hope this digression had some value to the larger discussion, in that we explored some of the differences among users.

jeanas · August 10, 2023, 10:25pm

I’ve already written too much about this topic (PS the post got a lot longer than anticipated, sorry), but I do find your questions about what a “project” is interesting and I’ll try to explain my point of view.

If we consider the existing packaging space, PEP 621 defines how you can use pyproject.toml for the metadata of “projects”. Here, the “projects” are formal projects in your “pip/pipx” sense. The main purpose of the metadata is currently to end up in sdists and wheels in the right format, so if we think in terms of the implementation, we may see it as a stretch to extend PEP 621 metadata to less formal “projects”, things that we don’t want to distribute, or even non-“projects” like scripts, etc.

But let’s forget the implementation completely for a minute and look at how Bob, who understands nothing to Python packaging may see this (Bob perhaps a beginner, but perhaps a relatively advanced programmer who never dived into the packaging space).

Bob’s starting point is “I want to use NumPy”. He will Google for “Python how to use external package” or something like that. And he will find lots of things: pip install numpy, stuff about venvs, stuff about requirements.txt, stuff about pyproject.toml, and more. It’s laboring the point by now that packaging^[1] is too confusing currently.

Let’s hope that in 10 years, venvs have been eliminated from the top of that Google search (because they’re now always an implementation detail of the tool you’re using, not managed by your directly), and requirements.txt too. So we get a few search results, some about the commands you need to use if you want to fire a REPL where numpy is installed, and some about how you can put dependency metadata in files where tools pick them up for you. And because Google (hopefully) doesn’t know whether Bob is a beginner running a script, or an advanced-user-packaging-beginner writing a script, or an advanced-user-packaging-beginner working on a formal project, Bob gets interspersed search results about both PEP 722/723 and pyproject.toml.

Suppose PEP 722 has long been accepted. Suppose temporarily that Bob is a beginner and just wants to implement Gaussian elimination as a homework exercise. Having got search results for two different methods, Bob is bound to wonder: which should I use? Both putting the code in a single-file script with a dependency block and putting the code in a directory with a pyproject.toml could work for him (in the first case he could use pipx run foo.py, in the second hatch run foo.py, if the tools are like today). Bob thinks he is missing what’s different between them. Bob googles for “script dependencies vs pyproject.toml”. Bob finds RealPython articles explaining that one is basically a limited form of the other for simple scripting use cases, and is confused as to why they both exist. Oh, and Bob is a a novice and a non-native English speaker, so terms like “command”, “shell”, “dependency” are already confusing him. He may not be understanding what a package is, what a script is, … Or, if Bob is a more advanced user, “why do I have to learn two different formats for when I write scripts or when I work on formal projects?”.

In the PEP 723 scenario, Bob finds that there is a format for configuration that can be put in a file called “pyproject.toml”, but also in a special comment as a convenience if you want to keep everything in a single file. Nearly everyone using a computer, whether non-programmer, beginner programmer or programming aficionado, can understand the difference between a single file and a directory with a bunch of files, and that is how the choice is framed for Bob. He’s not choosing between “run a singe-file script” (but what’s a script?) and “write a full project” (but what’s a full project?), instead between “put everything in one file” or “put several files in a directory”. Mentally, this is as simple as it gets.

Now, the part where this also interacts with how the packaging tools and community are organized: this works even better if the choice “single file vs project” doesn’t have an influence on the choice “pipx vs hatch vs $othertool”. The best from my point of view would be for that choice to just not exist (only one standard tool), but still, if at least one tool like hatch supports both, at least the choice is an easy one for users of hatch.

That is not to say I think the additional use cases PEP 723 enables compared to PEP 722 are worth nothing (e.g., specifying the Python version, etc.). PEP 723 is also more extensible for the future in case we want to define a way for build frontends to do actual builds of single files into wheels.

But, if you ask me to demonstrate that strong “concrete real-world use cases” justify choosing PEP 723 over PEP 722, my reply is that even without these PEP 723 is preferable over PEP 722. There is just no reason to introduce a new metadata format instead of reusing an existing one. From the pure user point of view, metadata is just metadata, whatever the tools want to do with it. We don’t need several ways of writing it.

Bottom line because I really wrote too much again (sorry): if “>” is the order “is better UX than”, then I believe that

choose (single file | directory) [+ use format X + use tool Y]

> choose (single file + use tool X) | (directory + use tool Y) [+ use format X]

≈ choose (single file + format A) | (directory + format B) [+ use tool Y]

> choose (single file + format A + use tool Y) | (directory + format B + use tool Z)

Unlike @kknechtel, I am using “packaging” in a broad sense, i.e., not just preparing distributions but also installing external packages or managing venvs. ↩︎

AA-Turner · August 10, 2023, 11:49pm

I think this is a good framing, though I’d approach from another direction – in that Bob could find that there are two clearly delineated approaches to specify “stuff”: the ‘lite’ (basic, simple, ‘quick’) approach, and the ‘full’ (advanced, configurable, detailed) approach.

The ‘lite’ approach would specify the bare minimum to “get this code to run”, of course on a Friday afternoon right before jetting off on holiday. The ‘full’ approach, however, would also be able to specify Python requirements, a short description, licensing information, etc – requiring more investment up-front in specifying things, but potentially providing more value to tools.

These could both be managed by the same workflow tool, or support for the ‘lite’ approach could come batteries-included in the <hypothetical-shiny-bundled-Python-installer-first-party-tool> and support for the ‘full’ approach through third-party tools.

Currently, Python packaging caters only for the ‘full’ approach via directories, and pyproject.toml. I see PEP 722 firmly as the ‘lite’ approach – well scoped and seemingly easy to implement for IDEs, the py launcher, pipx, etc. PEP 723, though, I think straddles this divide – it could represent the ‘lite’ approach (through declaring only the [project.dependencies] table), but is able to expand to the ‘full’ approach, and the current text of the PEP (and my view of the general consenus) is that this ‘full’ approach is the favoured one in PEP 723.

My summary of the challenge is that 722 and 723 are solving (slightly) different problems, perhaps due to a different view of terminology, meaning of a ‘project’, etc – yet we hesitate to accept both due to a (well-founded) fear of introducing further fragmentation into the ecosystem.

Stemming from this, one unresolved question to my mind is if we want to bless the concept of the ‘full’ approach in a single-file-script (as a migratory step to a directory-based Project with pyproject.toml, or otherwise), with project metadata, build-system information, structured configuration, et al—noting that this embedded metadata wouldn’t be usable for Python files within directory-layout Projects.

One of my lingering concerns with the PEP 723 approach for single-file-scripts is complexity for those who use Python irregularly, or as a ‘glue’. I think it’s much easier to comprehend a plaintext list of packages required for a script than to (initially) understand the pyproject metadata (what is and isn’t required, formatting, etc), especially for someone who mainly uses Python scripts/applications and infrequently needs to update or edit them, or write a new one for a different task.

A

brettcannon · August 11, 2023, 12:11am

Yep, which I think is the fundamental question that leads to the difference in format approaches.

ofek · August 11, 2023, 12:17am

Meta-commentary inside my head: in hindsight it’s a bummer that I have committed so strongly to only doing standard things in Hatch and now users expect that. So if we go with dependency only then that is what Hatch users will get and I won’t be even experimenting with the other approach

h-vetinari · August 11, 2023, 1:45am

If we’re back to considering e.g. “```toml” as a marker, one idea not yet covered in the “Rejected Ideas” section is putting it in the docstring.

This would avoid having to deal with # (or # , or #\s*…), which is another stumbling block for syntax highlighting and maintaining user formatting. The regex-approach would of course still work with that marker.

Though to be honest, I don’t see the problem with backslashes (and staying with __pyproject__). When we’re talking about runtime, we’re talking (presumably) about where we’re executing the regex-pattern. Once we have the output of that regex, it’d be trivial to post-process that (by turning \ into \\^[1]) before actually evaluating the string in toml.loads.

Then it would make sense to keep disallowing raw strings (also f-strings, BTW), and it’d be a clear case of saying that __pyproject__ is always treated 1:1 as toml.

with some exceptions (e.g. linebreaks: \n, \r) ↩︎

ofek · August 11, 2023, 4:18am

The more I think about it the more I wish to stay with the variable assignment approach. However I did get the implementation working for the comment approach:

import re, tomlkit

REGEX = r'(?m)^# ```pyproject$\s((^#.*$\s)+)^# ```$'

def parse(script: str) -> re.Match:
    matches = list(re.finditer(REGEX, script))
    if len(matches) > 1:
        raise ValueError('Multiple pyproject blocks found')
    elif len(matches) == 1:
        return matches[0]
    else:
        return None

def add(script: str, dependency: str) -> str:
    match = parse(script)
    content = ''.join(line[2:] for line in match.group(1).splitlines(keepends=True))

    config = tomlkit.parse(content)
    config['project']['dependencies'].append(dependency)
    new_content = ''.join(
        f'# {line}' if line.strip() else f'#{line}'
        for line in tomlkit.dumps(config).splitlines(keepends=True)
    )

    start, end = match.span(1)
    return script[:start] + new_content + script[end:]

BrenBarn · August 11, 2023, 4:26am

Yeah, it seems like people are using terms in slightly different ways and there is some misalignment. That makes it hard to tell which disagreements are more superficial and which are deeper. But, again, to me that is an indication that it is better not to be hasty in approving either PEP, and maybe it’s even not necessary to focus on specific revisions to the PEPs, but rather to back up a bit and try to boil down the discussion until we can be clear on what we’re considering.

Ah, okay, that makes sense — although (related to your point above) I wouldn’t have understood that based on a terminological distinction between “project” and “programming project”. I do think, though, that in many contexts, even with this type of “non-programming project”, there is a desire to make the code part of the deliverable, for greater transparency and reproducibility. Certainly in some academic fields this is becoming close to obligatory.

So your formulation makes me understand your position better, but also maybe explains why I (and perhaps some others) see the use case you describe a bit differently. Basically in my view, the prevalence of “this script is totally internal and no one else need ever see it, they’ll only see the plot/data file/etc. that it creates” as a use case is small and shrinking. That’s not to say there’s anything wrong with it, just to say that covering only that specific use case makes the benefit somewhat smaller.

As has been said a few times on these threads, a benefit of a pyproject.toml-based metadata approach is that it isn’t so locked in to that case. TOML is of course a standardized format, and pyproject.toml is a standard on top of that. That makes it easier for me to see migration paths from “it’s just this one script file that’s not part of the deliverable” to “oops actually we do need to deliver the script file too because people want to see how the data/plot/etc. was generated”.

Well, I’m a bit leery of getting to the precipice of a PEP (or two) possibly being approved if we don’t even have a common understanding of what the PEPs are really about!

I really like this way of looking at it, and especially the idea of “what will we think about this in 10 years”. What you said here to me is the essence of the reason to prefer a pyproject.toml-derived approach. With this approach there is only one format; with PEP 722 there are two.

That said, I’m still on the fence about whether I’d support the current PEP (723), since even with those advantages it still has its complications. I also think the questions about how exactly to specify the format to avoid awkward edge cases are important and haven’t been really resolved (e.g., the stuff about escapes and concatenated strings and so on).^[1]

Another thing I keep thinking about in this is that, for me, the painful thing is not really one files vs two or TOML vs something else but rather the overall cumbersomeness of the build-distribute-install process for Python code. I do have many of the same use cases that @pf_moore describes, but I also have use cases where the code involved is distributed across more than one file, and I still don’t want to have to wade through all the packaging rigmarole.

What I really want is something that’s just like “let me take this stuff and dump it somewhere and give me a way to somehow get it up and running without having to build anything”. And “this stuff” could be one file (which as I see it is basically @pf_moore’s use case) or two or three a directory full of them, but they key point is is I want to distribute the files directly, not some kind of build artifact (like a wheel or even an sdist), and then reconstitute a “live” environment in which I can run that stuff. I see both of these PEPs as addressing subsets of that kind of situation, which is one reason I’m hesitant about both, as I keep wondering if we could actually provide a better workflow for the full set, and in doing so also address the use cases addressed by these PEPs.

I still think the better path forward is a plan whereby we to continue to use a separate file, but not require it be named pyproject.toml, and not impose a separate-directory requirement, and instead introduce some means of having the script and/or TOML metadata file reference one another. ↩︎

kknechtel · August 11, 2023, 4:28am

… But this is really just a problem of finding the right terminology for clear communication. The choice is really between “set up everything needed to support the code, so it can run” and “prepare the code to share it with others, so they can use it in their programs”. Our advanced version of Bob can understand that, in the latter case:

The other user will be in control of the start point, so it makes sense to explain the setup requirements in a different file instead of choosing one of the Python code files;
The sharing tools also have to know which files to share, how to organize them on the other user’s computer, and a name and version number for the code (so the other user can tell the sharing tools which code to use).

I don’t need jargon any more difficult or CS-specific than “file” or “code” to explain this. For example, “dependency” is an unusual word (yes, it clearly means “thing upon which something else depends”, but it’s not commonly used), but “requirement” is a lot easier.

But the choice isn’t “single file vs project”. After all, nothing in PEP 722 prohibits the code from importing other code written by the user. It doesn’t even prohibit that code from having its own requirements block! (Whether those requirements work is up to the script runner, of course.)

While the primary motivation for PEP 722 is the single-file case, the choice being made is really “declaring requirements for an application vs packaging a library (or application that includes a library)”. The point, as I’ve said before, is that the PEP 722 use case is not packaging. It seems that you intend to rehash the “impact on the packaging ecosystem” argument; my rebuttal is the same as before.

From my perspective, PEP 722 doesn’t introduce a new metadata format. The actual metadata here consists of the requirements specifiers themselves. TOML is just a container format, and PEP 517 etc. are protocols for how to put the requirement-specifier metadata (and other metadata) into the container.

Alternately: if we consider that “requirements specifiers listed in a block comment” is different from “requirements specifiers listed in TOML”, then we should also acknowledge that “TOML qua TOML” is different from “TOML embedded within a .py file” - because there needs to be a rule for how to do the embedding. And, as we’ve already seen from this discussion, there is more than one contender for how to do that, with pros and cons, which can create weird corner cases (because TOML and the Python language itself are both designed to allow for deep nesting structures, escape sequences etc.).

FRidh · August 11, 2023, 7:06am

I suggest again adding the filename, pyproject.toml, so that with the same structure other tools can add their lock files as well.

Maybe use the code-block directive and add a custom field, filename. Docutils could then be used to retrieve the contents.

abravalheri · August 11, 2023, 10:25am

I find it funny that many people were against PEP 722 because it was “introducing a new way of doing things” even if what PEP 722 proposes is not packaging, and many people insist in re-using the [project] table for something that is tangentially related to the “reason d’être” of PEP 621. But then we might end up with a Markdown-style comment block.

Given Python history, tooling and broad adoption, RestruturedText is the status quo and Markdown is the new kid in the block. Adopting markdown-like comments, at least indirectly and in some level, is a way to show support for new ways of doing things…

(This is not a comment against either proposal, just me taking a moment to amuse myself with the outcomes of the discussions)

pf_moore · August 11, 2023, 10:58am

I’m not sure how you see this side-thread as helping @ofek get PEP 723 ready for approval - it seems like it will be as much a distraction here as it was in the PEP 722 thread. But having said that I feel that if we’re talking about the “10 years from now” view, I should explain how I would like things to look in 10 years. Because the model you describe sounds pretty terrible. It’s barely changed from what we have now, where Bob needs extra tools just to run a homework script, and has to learn “configuration formats” rather than just learning Python (which, so he’s been told, is awesome because it has loads of easily-available libraries, just ready to use).

In my view of “where we should be going”, a significant proportion of Python users will have no involvement with, or interest in, packaging. They will install Python, and get a command that starts the Python REPL^[1] or runs scripts, just as they do today. They won’t get some sort of IDE with workflow-related commands to “create projects” - those are available but most people won’t need them or care.

For many, many people, that will be all they need. They write Python code, and run it with the python command. They use libraries from the standard library or from PyPI seamlessly, and the only way they know that a library isn’t in the stdlib is because they have to declare the name of the library on PyPI in their script before they can import it. (Ideally, they wouldn’t even need to do that, but I think that’s more likely to be 20 years away, not 10).

So Bob doesn’t need to Google for anything - he wants to use Numpy, so he adds it to his script and runs the script. He knows about import statements, and part of knowing about imports is knowing how to say “get this from PyPI”. Not because I’m ducking the question of “how to teach this”, but because 10 years from now, knowing how to say that something comes from PyPI is just as fundamental as knowing how to write an import statement. People working at this level don’t need or want to know anything about packaging, they just know that PyPI is available to all of their code.

On the other hand, Alice is writing an application in Python, which will be shipped to a bunch of customers. Or she’s writing a library which will be made available on PyPI. Either way, she starts her Python project management tool, and says “create project”. She gets asked some questions, including “application or library?” which she answers. And then she starts writing her code. When she’s ready, she runs the “build application” command, which creates a single file that can be shipped to the user, and run on the user’s computer. It doesn’t need the recipient to have Python installed. She has to configure the build so that it knows what dependencies to include, and she has to know about locking dependencies if she’s writing an application, or about leaving dependencies open if she’s writing a library, but the tool helps her with doing that. She could do it “by hand” if she wanted, but mostly knowing she can is sufficient, and she lets the tool add metadata and run lockers, etc.

Alice needs to know a bit more than Bob - she needs to understand ideas to do with application deployment like licensing, support, locking down dependencies to ensure reproducibility, etc. Her workflow tool helps her with that, so all she needs to do is run the appropriate commands. But being a conscientious developer, she doesn’t rely on her tool, she learns what’s going on behind the scenes, so she knows where the data she is entering gets stored. She doesn’t need to do this, but it reassures her to know that there’s no “magic” and she could easily write the data by hand if the tool wasn’t available.

Now let’s suppose one of Bob’s scripts is so good that he gets asked to make it into an application for deployment. Cool! He needs to learn how to do that, which is fine, he’s never done “deployment” before, but he’s willing to learn. And it turns out that the standard tools make it easy. There’s a “create application project from script” command that takes his script and puts it into this new “project” format that he needs - the questions it asks are things he knows (or, like licensing, can find out). And it explains what it’s doing (because he asked it for verbose output, as he wants to learn what’s going on, rather than just trusting the “magic”), so he understands why the layout is more complex than his simple scripts. And at that point, he can carry on learning what’s involved in making an application from his script - understanding deployment scenarios, adding a test suite and coverage tests, updating his code to match corporate policies on formatting and style, etc. For simple jobs like running the tests or style-checking his code, the commands to do this are simple, but if he needs to automate anything, he can do it just like he always has - by writing a Python script and running it with python reformat_code.py. There’s no “environment management”, or “script runners”. Running scripts is easy, and Bob’s already proficient at that.

It’s worth noting that the key here is that most Python users (like Bob) have no interaction at all with packaging, and probably don’t even know the term. They don’t think of PyPI and 3rd party libraries as “packages”, just as “more resources I can use”. In locked down environments, things might not be that simple - there could be rules on what 3rd party libraries are approved, meaning that Bob has to know how to configure Python to use the “approved list”. But that’s fine. Anyone who’s worked in a corporate environment or similar has had to deal with this sort of thing - it can be painful (particularly if the use of Python is “unofficial”) but it’s very much “business as usual”.

Also note that I didn’t make a fuss of what tool Alice used. Maybe that’s because there’s only one option. Or maybe (and more likely, in my view) it’s because it doesn’t matter. The workflow is the important thing, and everyone understands the workflow, and uses it the same way. What tool you use isn’t important, in the same way that what editor you use isn’t important (). And that, in turn, is because workflow tools are no longer competing to try to claim the “top spot” as the “official tool”, but instead have started co-operating and enabling a common workflow, letting users have the choice and flexibility. Tools agree on the development process, so that users don’t feel that by choosing a tool, they are committing to a workflow or philosophy that they don’t understand yet, and won’t be able to change later. And users don’t feel pressure to make a choice, so having multiple options isn’t a problem. Just pick the one someone told you about, and change later if you want to - no big deal, no crisis. There will probably always be one tool that’s “trendy” and most people will use, but that’s just like every area of computing (heck, Python itself is the “trendy choice” out of a vast range of options!)

And the tool landscape looks very different. There’s no virtual environments or installers. These are low-level implementation details. There are no “script runners” - you run a script with Python. Most people never use any sort of tool unless they want to. Developing applications and libraries is still a complex task, but there’s a well-understood approach that works, so people won’t be asking “but what about my use case?” And tools exist to help with that approach, not to define, or control, the workflow. Build backends aren’t a decision the developer makes, they are chosen based on what you are trying to do. And they are easy to change - if you need to add some Rust code, switch to a backend that supports Rust. Nothing else needs to change.

But 10 years isn’t anything like as long a time as people seem to think. There will still be people with massive monorepos, with complicated arrangements of virtual environments, hard-coded dependency installation, custom build backends and all sorts. Heck, there will probably still be people maintaining a private copy of distutils, “because it works for me”. And the packaging community will have to support these people. We can’t wish everyone onto the new perfection. Expecting people to rewrite the infrastructure for a million-line project just because it’s the new “best practice” isn’t justifiable. So there will still be “lots of tools”. The best we can expect is that people who can work with new approaches can just get on with their jobs and basically forget about “packaging” and “workflow” and “what tool is best”. Unfortunately, there will still be a lot of legacy information on the internet, and thanks to those people who won’t or can’t change their workflow, it will look like it’s “current”. We can’t do anything about that, other than try to make sure that (a) the official documentation is clear enough, and covers enough use cases, that people who read it don’t need internet searches, and (b) make as much as possible “just work” before the user needs to go looking for advice on the internet.

On the other hand, in some ways 10 years is a long time. Expecting to know what will be the “best approach” 10 years from now is probably pretty naïve (or maybe arrogant…) And expecting to get there without any false starts, experiments, or abandoned approaches along the way is foolish. So while “fewer confusing alternatives” is fine as a goal, it’s a very bad way to approach the journey. We have to try things out to see if they work. And yes, that might even mean implementing standards that get superseded. That’s how we learn.

because the REPL is awesome! ↩︎

sirosen · August 11, 2023, 2:51pm

Wherever the debate about strings vs comments goes, just please don’t require it – ideally don’t even allow it – to be in the docstring. That’s already a runtime visible value with it’s own utility. I’m using it and I bet other people are too.

I think embedded comments simply introduce fewer unnecessary questions into the spec. Being able to get the most precision in the least number of requirements is good.

As the agent of chaos who tried to push “project” over “script”, and may have taken things a bit OT…

A significant part of what I was trying to get at was that PEP 722 intends to support building a runtime environment^[1] and then invoking the original python file either where it is on disk or in a temporary location which was not built into that environment. By contrast, PEP 723 allows for things like console entry points. Maybe I’m mistaken, but that sounds like it’s almost categorically impossible to make sense of and support unless the file is not executed directly but is instead somehow installed. Maybe it requires some dynamic sdist build – but now it’s not a script because the invocation pattern won’t be python [options] file.py.

I like and dislike things about both PEPs.
I like how easy PEP 722 is to read and write. It hardly requires any teaching in the typical case. Just learn a special header line and you’re off to the races. I dislike that in order to achieve that it has to introduce a new format.^[2]
I like that PEP 723 sticks with the existing standard format. Especially for more experienced and larger teams, this means “fewer divergent things”. Existing linters and tools to work on the toml data can support it with minimal adaptations. But I dislike the ambiguities it introduces, especially in existing packages with a pyproject.toml file, and I’m not sure that it’s a good idea to expose beginners to the format early on.

I wonder how many complaints about PEP 722 would disappear (and how many new ones would come out!) if it led outright as “a new format for simplified metadata, designed to be embeddable in comments”. It seems like the main objections to 722 – for me it’s the only thing I find imperfect – and the main driver for this alternative is the fact that it doesn’t use the “standard” format. But… It’s a draft standard itself. So 722 would make a new format a secondary standard format.

I come back to naming and describing these things accurately because I think/hope it can help identify what the key differences of opinion are.

There are a lot of nonstandard formats out there. setup.cfg, setup.py, poetry.lock, requirements.txt, tox.ini (it can hold other tool configs, remember), pipfile, …
These are what python users see and want consolidated a bit. But are we overfitting on that requirement? Do we need one format, or maybe two?

I’m getting dizzy getting to decide what I think so that I can convince you all that it’s right.

My last note on this topic for now is that looking at Rust for proof positive that “embedded toml is the right way” may be a mistake. Rust has a different audience from Python. The selection bias here towards packaging literate and advanced users is extreme. Remember that a portion of the target users will be seeing and using this data without reading a standard, caring about a standard, wanting standards, or generally being anything like the discussion participants. I think everyone here is aware of and sensitive to this difference, but notice how far that puts the python user base from the Rust user base.

a virtualenv, but I’ll hop on the bandwagon and agree that for the target users this is an implementation detail ↩︎
pyproject.toml is well specified, but I think we’re kidding ourselves if we think it’s beginner friendly ↩︎