Yeah, it seems like people are using terms in slightly different ways and there is some misalignment. That makes it hard to tell which disagreements are more superficial and which are deeper. But, again, to me that is an indication that it is better not to be hasty in approving either PEP, and maybe it’s even not necessary to focus on specific revisions to the PEPs, but rather to back up a bit and try to boil down the discussion until we can be clear on what we’re considering.
Ah, okay, that makes sense — although (related to your point above) I wouldn’t have understood that based on a terminological distinction between “project” and “programming project”. I do think, though, that in many contexts, even with this type of “non-programming project”, there is a desire to make the code part of the deliverable, for greater transparency and reproducibility. Certainly in some academic fields this is becoming close to obligatory.
So your formulation makes me understand your position better, but also maybe explains why I (and perhaps some others) see the use case you describe a bit differently. Basically in my view, the prevalence of “this script is totally internal and no one else need ever see it, they’ll only see the plot/data file/etc. that it creates” as a use case is small and shrinking. That’s not to say there’s anything wrong with it, just to say that covering only that specific use case makes the benefit somewhat smaller.
As has been said a few times on these threads, a benefit of a pyproject.toml-based metadata approach is that it isn’t so locked in to that case. TOML is of course a standardized format, and pyproject.toml is a standard on top of that. That makes it easier for me to see migration paths from “it’s just this one script file that’s not part of the deliverable” to “oops actually we do need to deliver the script file too because people want to see how the data/plot/etc. was generated”.
Well, I’m a bit leery of getting to the precipice of a PEP (or two) possibly being approved if we don’t even have a common understanding of what the PEPs are really about!
I really like this way of looking at it, and especially the idea of “what will we think about this in 10 years”. What you said here to me is the essence of the reason to prefer a pyproject.toml-derived approach. With this approach there is only one format; with PEP 722 there are two.
That said, I’m still on the fence about whether I’d support the current PEP (723), since even with those advantages it still has its complications. I also think the questions about how exactly to specify the format to avoid awkward edge cases are important and haven’t been really resolved (e.g., the stuff about escapes and concatenated strings and so on).
Another thing I keep thinking about in this is that, for me, the painful thing is not really one files vs two or TOML vs something else but rather the overall cumbersomeness of the build-distribute-install process for Python code. I do have many of the same use cases that @pf_moore describes, but I also have use cases where the code involved is distributed across more than one file, and I still don’t want to have to wade through all the packaging rigmarole.
What I really want is something that’s just like “let me take this stuff and dump it somewhere and give me a way to somehow get it up and running without having to build anything”. And “this stuff” could be one file (which as I see it is basically @pf_moore’s use case) or two or three a directory full of them, but they key point is is I want to distribute the files directly, not some kind of build artifact (like a wheel or even an sdist), and then reconstitute a “live” environment in which I can run that stuff. I see both of these PEPs as addressing subsets of that kind of situation, which is one reason I’m hesitant about both, as I keep wondering if we could actually provide a better workflow for the full set, and in doing so also address the use cases addressed by these PEPs.
I still think the better path forward is a plan whereby we to continue to use a separate file, but not require it be named pyproject.toml, and not impose a separate-directory requirement, and instead introduce some means of having the script and/or TOML metadata file reference one another. ↩︎
… But this is really just a problem of finding the right terminology for clear communication. The choice is really between “set up everything needed to support the code, so it can run” and “prepare the code to share it with others, so they can use it in their programs”. Our advanced version of Bob can understand that, in the latter case:
The other user will be in control of the start point, so it makes sense to explain the setup requirements in a different file instead of choosing one of the Python code files;
The sharing tools also have to know which files to share, how to organize them on the other user’s computer, and a name and version number for the code (so the other user can tell the sharing tools which code to use).
I don’t need jargon any more difficult or CS-specific than “file” or “code” to explain this. For example, “dependency” is an unusual word (yes, it clearly means “thing upon which something else depends”, but it’s not commonly used), but “requirement” is a lot easier.
But the choice isn’t “single file vs project”. After all, nothing in PEP 722 prohibits the code from importing other code written by the user. It doesn’t even prohibit that code from having its own requirements block! (Whether those requirements work is up to the script runner, of course.)
While the primary motivation for PEP 722 is the single-file case, the choice being made is really “declaring requirements for an application vs packaging a library (or application that includes a library)”. The point, as I’ve said before, is that the PEP 722 use case is not packaging. It seems that you intend to rehash the “impact on the packaging ecosystem” argument; my rebuttal is the same as before.
From my perspective, PEP 722 doesn’t introduce a new metadata format. The actual metadata here consists of the requirements specifiers themselves. TOML is just a container format, and PEP 517 etc. are protocols for how to put the requirement-specifier metadata (and other metadata) into the container.
Alternately: if we consider that “requirements specifiers listed in a block comment” is different from “requirements specifiers listed in TOML”, then we should also acknowledge that “TOML qua TOML” is different from “TOML embedded within a .py file” - because there needs to be a rule for how to do the embedding. And, as we’ve already seen from this discussion, there is more than one contender for how to do that, with pros and cons, which can create weird corner cases (because TOML and the Python language itself are both designed to allow for deep nesting structures, escape sequences etc.).
I find it funny that many people were against PEP 722 because it was “introducing a new way of doing things” even if what PEP 722 proposes is not packaging, and many people insist in re-using the [project] table for something that is tangentially related to the “reason d’être” of PEP 621. But then we might end up with a Markdown-style comment block.
Given Python history, tooling and broad adoption, RestruturedText is the status quo and Markdown is the new kid in the block. Adopting markdown-like comments, at least indirectly and in some level, is a way to show support for new ways of doing things…
(This is not a comment against either proposal, just me taking a moment to amuse myself with the outcomes of the discussions)
I’m not sure how you see this side-thread as helping @ofek get PEP 723 ready for approval - it seems like it will be as much a distraction here as it was in the PEP 722 thread. But having said that I feel that if we’re talking about the “10 years from now” view, I should explain how I would like things to look in 10 years. Because the model you describe sounds pretty terrible. It’s barely changed from what we have now, where Bob needs extra tools just to run a homework script, and has to learn “configuration formats” rather than just learning Python (which, so he’s been told, is awesome because it has loads of easily-available libraries, just ready to use).
In my view of “where we should be going”, a significant proportion of Python users will have no involvement with, or interest in, packaging. They will install Python, and get a command that starts the Python REPL or runs scripts, just as they do today. They won’t get some sort of IDE with workflow-related commands to “create projects” - those are available but most people won’t need them or care.
For many, many people, that will be all they need. They write Python code, and run it with the python command. They use libraries from the standard library or from PyPI seamlessly, and the only way they know that a library isn’t in the stdlib is because they have to declare the name of the library on PyPI in their script before they can import it. (Ideally, they wouldn’t even need to do that, but I think that’s more likely to be 20 years away, not 10).
So Bob doesn’t need to Google for anything - he wants to use Numpy, so he adds it to his script and runs the script. He knows about import statements, and part of knowing about imports is knowing how to say “get this from PyPI”. Not because I’m ducking the question of “how to teach this”, but because 10 years from now, knowing how to say that something comes from PyPI is just as fundamental as knowing how to write an import statement. People working at this level don’t need or want to know anything about packaging, they just know that PyPI is available to all of their code.
On the other hand, Alice is writing an application in Python, which will be shipped to a bunch of customers. Or she’s writing a library which will be made available on PyPI. Either way, she starts her Python project management tool, and says “create project”. She gets asked some questions, including “application or library?” which she answers. And then she starts writing her code. When she’s ready, she runs the “build application” command, which creates a single file that can be shipped to the user, and run on the user’s computer. It doesn’t need the recipient to have Python installed. She has to configure the build so that it knows what dependencies to include, and she has to know about locking dependencies if she’s writing an application, or about leaving dependencies open if she’s writing a library, but the tool helps her with doing that. She could do it “by hand” if she wanted, but mostly knowing she can is sufficient, and she lets the tool add metadata and run lockers, etc.
Alice needs to know a bit more than Bob - she needs to understand ideas to do with application deployment like licensing, support, locking down dependencies to ensure reproducibility, etc. Her workflow tool helps her with that, so all she needs to do is run the appropriate commands. But being a conscientious developer, she doesn’t rely on her tool, she learns what’s going on behind the scenes, so she knows where the data she is entering gets stored. She doesn’t need to do this, but it reassures her to know that there’s no “magic” and she could easily write the data by hand if the tool wasn’t available.
Now let’s suppose one of Bob’s scripts is so good that he gets asked to make it into an application for deployment. Cool! He needs to learn how to do that, which is fine, he’s never done “deployment” before, but he’s willing to learn. And it turns out that the standard tools make it easy. There’s a “create application project from script” command that takes his script and puts it into this new “project” format that he needs - the questions it asks are things he knows (or, like licensing, can find out). And it explains what it’s doing (because he asked it for verbose output, as he wants to learn what’s going on, rather than just trusting the “magic”), so he understands why the layout is more complex than his simple scripts. And at that point, he can carry on learning what’s involved in making an application from his script - understanding deployment scenarios, adding a test suite and coverage tests, updating his code to match corporate policies on formatting and style, etc. For simple jobs like running the tests or style-checking his code, the commands to do this are simple, but if he needs to automate anything, he can do it just like he always has - by writing a Python script and running it with python reformat_code.py. There’s no “environment management”, or “script runners”. Running scripts is easy, and Bob’s already proficient at that.
It’s worth noting that the key here is that most Python users (like Bob) have no interaction at all with packaging, and probably don’t even know the term. They don’t think of PyPI and 3rd party libraries as “packages”, just as “more resources I can use”. In locked down environments, things might not be that simple - there could be rules on what 3rd party libraries are approved, meaning that Bob has to know how to configure Python to use the “approved list”. But that’s fine. Anyone who’s worked in a corporate environment or similar has had to deal with this sort of thing - it can be painful (particularly if the use of Python is “unofficial”) but it’s very much “business as usual”.
Also note that I didn’t make a fuss of what tool Alice used. Maybe that’s because there’s only one option. Or maybe (and more likely, in my view) it’s because it doesn’t matter. The workflow is the important thing, and everyone understands the workflow, and uses it the same way. What tool you use isn’t important, in the same way that what editor you use isn’t important (). And that, in turn, is because workflow tools are no longer competing to try to claim the “top spot” as the “official tool”, but instead have started co-operating and enabling a common workflow, letting users have the choice and flexibility. Tools agree on the development process, so that users don’t feel that by choosing a tool, they are committing to a workflow or philosophy that they don’t understand yet, and won’t be able to change later. And users don’t feel pressure to make a choice, so having multiple options isn’t a problem. Just pick the one someone told you about, and change later if you want to - no big deal, no crisis. There will probably always be one tool that’s “trendy” and most people will use, but that’s just like every area of computing (heck, Python itself is the “trendy choice” out of a vast range of options!)
And the tool landscape looks very different. There’s no virtual environments or installers. These are low-level implementation details. There are no “script runners” - you run a script with Python. Most people never use any sort of tool unless they want to. Developing applications and libraries is still a complex task, but there’s a well-understood approach that works, so people won’t be asking “but what about my use case?” And tools exist to help with that approach, not to define, or control, the workflow. Build backends aren’t a decision the developer makes, they are chosen based on what you are trying to do. And they are easy to change - if you need to add some Rust code, switch to a backend that supports Rust. Nothing else needs to change.
But 10 years isn’t anything like as long a time as people seem to think. There will still be people with massive monorepos, with complicated arrangements of virtual environments, hard-coded dependency installation, custom build backends and all sorts. Heck, there will probably still be people maintaining a private copy of distutils, “because it works for me”. And the packaging community will have to support these people. We can’t wish everyone onto the new perfection. Expecting people to rewrite the infrastructure for a million-line project just because it’s the new “best practice” isn’t justifiable. So there will still be “lots of tools”. The best we can expect is that people who can work with new approaches can just get on with their jobs and basically forget about “packaging” and “workflow” and “what tool is best”. Unfortunately, there will still be a lot of legacy information on the internet, and thanks to those people who won’t or can’t change their workflow, it will look like it’s “current”. We can’t do anything about that, other than try to make sure that (a) the official documentation is clear enough, and covers enough use cases, that people who read it don’t need internet searches, and (b) make as much as possible “just work” before the user needs to go looking for advice on the internet.
On the other hand, in some ways 10 years is a long time. Expecting to know what will be the “best approach” 10 years from now is probably pretty naïve (or maybe arrogant…) And expecting to get there without any false starts, experiments, or abandoned approaches along the way is foolish. So while “fewer confusing alternatives” is fine as a goal, it’s a very bad way to approach the journey. We have to try things out to see if they work. And yes, that might even mean implementing standards that get superseded. That’s how we learn.
Wherever the debate about strings vs comments goes, just please don’t require it – ideally don’t even allow it – to be in the docstring. That’s already a runtime visible value with it’s own utility. I’m using it and I bet other people are too.
I think embedded comments simply introduce fewer unnecessary questions into the spec. Being able to get the most precision in the least number of requirements is good.
As the agent of chaos who tried to push “project” over “script”, and may have taken things a bit OT…
A significant part of what I was trying to get at was that PEP 722 intends to support building a runtime environment and then invoking the original python file either where it is on disk or in a temporary location which was not built into that environment. By contrast, PEP 723 allows for things like console entry points. Maybe I’m mistaken, but that sounds like it’s almost categorically impossible to make sense of and support unless the file is not executed directly but is instead somehow installed. Maybe it requires some dynamic sdist build – but now it’s not a script because the invocation pattern won’t be python [options] file.py.
I like and dislike things about both PEPs.
I like how easy PEP 722 is to read and write. It hardly requires any teaching in the typical case. Just learn a special header line and you’re off to the races. I dislike that in order to achieve that it has to introduce a new format.
I like that PEP 723 sticks with the existing standard format. Especially for more experienced and larger teams, this means “fewer divergent things”. Existing linters and tools to work on the toml data can support it with minimal adaptations. But I dislike the ambiguities it introduces, especially in existing packages with a pyproject.toml file, and I’m not sure that it’s a good idea to expose beginners to the format early on.
I wonder how many complaints about PEP 722 would disappear (and how many new ones would come out!) if it led outright as “a new format for simplified metadata, designed to be embeddable in comments”. It seems like the main objections to 722 – for me it’s the only thing I find imperfect – and the main driver for this alternative is the fact that it doesn’t use the “standard” format. But… It’s a draft standard itself. So 722 would make a new format a secondary standard format.
I come back to naming and describing these things accurately because I think/hope it can help identify what the key differences of opinion are.
There are a lot of nonstandard formats out there. setup.cfg, setup.py, poetry.lock, requirements.txt, tox.ini (it can hold other tool configs, remember), pipfile, …
These are what python users see and want consolidated a bit. But are we overfitting on that requirement? Do we need one format, or maybe two?
I’m getting dizzy getting to decide what I think so that I can convince you all that it’s right.
My last note on this topic for now is that looking at Rust for proof positive that “embedded toml is the right way” may be a mistake. Rust has a different audience from Python. The selection bias here towards packaging literate and advanced users is extreme. Remember that a portion of the target users will be seeing and using this data without reading a standard, caring about a standard, wanting standards, or generally being anything like the discussion participants. I think everyone here is aware of and sensitive to this difference, but notice how far that puts the python user base from the Rust user base.
a virtualenv, but I’ll hop on the bandwagon and agree that for the target users this is an implementation detail ↩︎
pyproject.toml is well specified, but I think we’re kidding ourselves if we think it’s beginner friendly ↩︎
Not really, just use tools that do what is desired and then based on usage we can come to a consensus on a possible standard. The build backend expansion happened and we made a choice to allow users and tools to experiment because there was basically only one way to do things.
If we are using standards to experiment with things that can already happen then I am an extremely hard -1 on us writing any more standards.
Offload the experimental features into plugins. That is probably what I would do.
As interesting as it is, I can’t help but feel like the discussion “what is a project?” is out of scope. As I mentioned earlier, if I understood PEP 723 correctly then it is possible to embed pyproject.toml into any Python file, even a single importable module in the middle of a library. Meaning the PEP allows embedding in a file that is probably not executable (not a script), a file that is not a full project of its own.
I am a bit worried about this. PEP 621’s [project] table has a specific purpose, is meant to be located in a specific location (a pyproject.toml file), and for specific kinds of projects (say: projects that are meant to be built as a wheel; in other words: packaging). And now we want to reuse [project] nearly as-is in possibly very different contexts without much caution.
PEP 621 says a tool has to take all metadata from [project] and place it in Core Metadata fields. Now it seems like tools such as pipx, pip-run, hatch, and so on will be free to pick whatever fields they want from PEP 723’s [project] table and do whatever they want with them.
Maybe there is no reason to be worried, but I can’t shake the feeling that it does not seem exactly right. Maybe the [project] specification (the one resulting from PEP 621) needs to be amended?
I do not have a solution to offer.
[Off-topic: Many times I have wished docutils was part of Python’s standard library. Too bad…]
Interestingly enough, I was just talking with someone today who asked whether it was desirable for linters or something to help making single-file scripts more isolated. I don’t think the PEPs should prescribe this, but tools could choose to support such a helper feature if they wanted to (e.g. symlink/copy the script to somewhere so that sys.path doesn’t pick up the local directory). But I think that’s a tools question as to whether they want to make the single-script portability an important use case.
Agreed. I’m personally ignoring it as I find it tangential to either PEP’s contents as it doesn’t change what conceptually the PEPs are each proposing (and I’m aware of the differences in scope between them).
It’s all a balancing act. The key point is you have to accept you may get it wrong. You can let tools experiment endlessly, but unless you’re willing to stop and choose something and be willing to get over your fear that it might not be perfect, you will end up with no standards in which case you end up with no interoperability and everyone doing everything differently because it’s all defined by the tools (and I do not want to go back to a convention-based world).
Another possibility is to define a new [run] table ala Projects that aren't meant to generate a wheel and `pyproject.toml` and that’s the only thing allowed in a script (i.e., really lean into the idea of this is replacing requirements.txt for the simple case and then scale up). And to be clear, I’m not trying to guide you or anyone else towards this, but this is an option that is sitting in the back of my head if [project] becomes the stumbling block while embedding TOML is not.
Related to this, an option is to also flat-out forbid [tool] tables and say if you want to go to that level of “production”, then please make a directory with a pyproject.toml. That would do away with the per-tool precedence question and also potentially simplifies explaining what the metadata is for and how it will be treated. I think this ties into the question/concern some people have expressed that folks are going to (ab)use this for way longer than they should before taking the time to create a directory and a separate pyproject.toml.
I would split out the off-topic “what’s a project” discussion, but it’s mixed in enough that I’m not clear how to do it. Given that “all of pyproject” essentially means “a project”, it’s hard to distinguish what’s actually meaningful to the PEP compared to what’s outside the immediate topic. If @ofek wants the conversation to be more focused, message me with how you want to split it.
I’m requesting people stick to the specifics of this PEP in this topic, and create a new topic if they feel they have things to say about projects.
I’d be concerned that we are normalising the idea of reading metadata directly from pyproject.toml, rather than reading it from the core metadata fields in an actual metadata file (PKG-INFO in a sdist, and METADATA in a wheel or installed project). The pyproject.toml file is by definition less reliable than those places, because fields can be dynamic in pyproject.toml and filled in later, for those other locations. I don’t have a specific issue here, just a general feeling that we’re taking a risk, and we should be cautious about assuming everything will be OK.
Even just looking at dependencies, tools can’t reliably get a project’s dependencies without invoking the build backend unless they are willing to reject any project that declares its dependencies as dynamic. And editable installs are explicitly allowed to inject additional dependencies even if the pyproject.toml states that the dependencies are static. How would a PEP 621 spec change address that?
The idea of making name and version optional would be quite problematic, unless it was tightly constrained. Many tools (for example, pip) rely on the idea that a package is uniquely identified by its name and version. If we combine making those fields optional with the idea of tools reading metadata from pyproject.toml, we could end up with tools that can’t tell if two projects are identical or not.
Basically, I think this would be quite a complex and risky PEP to write with sufficient precision to ensure we don’t cause problems because people misinterpret the spec, or read it in different, incompatible ways.
And I’m sorry to go on about this again, but this still seems to be motivated mostly by a sense of “it would be nice if we could…” and not by actual user requirements or use cases. This is one of my biggest reservations with PEP 723, and it sounds like you’re now simply proposing to push that problem a step further back, and apply it to the definition of pyproject.toml as well.
I’m not against amending PEP 621 if we need to. There’s an ongoing discussion in Projects that aren't meant to generate a wheel and `pyproject.toml`, which may well result in a proposal for a change to that spec. But that discussion needs to run its course and get some sort of consensus, and then someone needs to write a PEP proposing the agreed changes to the spec. If PEP 723 relies on modifications to PEP 621, then I don’t see how we can reasonably call PEP 723 ready for approval before that happens. And conversely, if it allows embedding of something that looks like pyproject.toml, but to which different rules apply, it’s both misleading and harmful to claim it’s proposing an “embedded pyproject.toml”
In the sense of further damaging the packaging community’s credibility over “complicated and confusing rules” and “too many similar but different ways of doing things”. ↩︎
This is very close to a TOML-based variant of PEP 722, with run.dependencies as the dependency block data, and all other sub-keys of run as “for future expansion”. I’d support exploring this as a combined version of 722/723, if we could address our other differences of opinion over format.
Is there a real need for the PEP to specify the format in terms of a regex instead of simply saying something to the effect of tomllib.loads(__pyproject__) being equivalent to tomllib.load(open("pyproject.toml"))? It seems unnecessarily strict to ask the PEP to produce an airtight specification that third party tools can read with minimal effort. If a tool cannot deal with e.g. __pyproject__ in a docstring, let that be a limitation of the tool.
I love the simplicity of PEP 722. I love the structured data approach of PEP 723.
Combining both like this would be such a simple thing for us to support both in Pants and PEX.
I don’t have many thoughts on where it goes. The back ticks in a comment approach seems the easiest middle-ground to support for us. I’d hope that the spec wasn’t too prescriptive on if we had to use a regex parser, because we already build on top of a Rust-based tree-sitter parser.
So, I really do think this is the right middle ground and grabs the best of each, while ALSO solving many of the cons in each
The new [run] table approach would preclude, for example, the possibility of any standard for building distributions from single files since any backend defined in [build-system] mostly depends on [project] in order to write core metadata appropriately.
I am okay with that situation if we continue to allow the [tool] table for extra functionality. If we are okay with that then I am comfortable adjusting the PEP or collaborating with Paul for a new one.