Your reply sounds like I didn’t quite get my point across, so let me use an analogy.
Consider Python itself, as a language. If we assess its complexity, it is huge. It has dozens of statements, complex semantic features like async, hundreds of stdlib modules. Nobody can even know all of it. Hairy stuff, right? Beginners don’t need most of that. An absolute beginner to Python just needs some simple assignments/if/while sort of statements, a few simple types, a few basic functions, and perhaps some turtle graphics. Then how do so many absolute programming beginners manage to use Python? The answer is, because Python does a good job of making simple things simple and hiding complexity from beginners’ eyes.
The packaging tools we have aren’t really like that, famously. They’re more like Java, where you must create a class before you can even print “Hello, World” to the console.
PEP 722 makes an argument, which you repeat above and which is valid, that this is too much complexity for beginners to handle. As a remedy, it proposes adding a second class of tools which would be more like Logo. Only the essentials, only for beginners, no complexity.
That’s a valid proposal, but what I’m saying is that I would rather have the Java-like tools transformed into more Python-like tools that beginners find themselves comfortable with, without creating a split between two types of tools and two metadata minilanguages.
I think it’s horribly misinterpreted by many people. But that’s offtopic for this discussion so I won’t say more here.
For running scripts in particular. Because running a script is a Python core function. And, the core devs have explicitly stated that they want packaging to be outside the core. So script running, while it may be something packaging tools will offer, is not exclusive to such tools.
Running a script with its dependencies will interact with packaging, but it will still be a core Python function, and therefore will (should) be part of the core feature set. Here, I’m considering the py launcher as core, because I expect that in 10 years it may well be shipped with Python for both Windows and Unix.
Project environments are persistent, and a core and visible part of the environment. They are typically managed explicitly by the developer (in terms of their content and use). They are often stored in the project directory (by choice, not as an implementation detail).
When running a script, an environment[1]may be needed, but it’s hidden, managed transparently, and transient. It may be cached for performance reasons, but if so, management of that cache (expiry of environments, in particular) should be transparent.
Those two sets of requirements are utterly different, and trying to force them to work the same will mean trade-offs to the detriment of both cases.
Not for beginners. For people who write Python scripts as part of a role that is not that of a Python developer. These are not beginners, they are often highly experienced and senior developers or admins, who simply view Python as a tool for automating important tasks within a single environment.
Honestly, I wish we would stop characterising users as “beginners”. Not everyone who uses Python is on a path that goes from being unfamiliar with more than the basics, through familiarity with the language and its ecosystem, to expert level understanding. People can be experts, but simply not consider parts of the Python ecosystem (for example packaging) to be a priority to them.
Would anyone consider Guido to be a “beginner” in Python? And yet, I’m pretty sure he has very little knowledge or interest in the packaging ecosystem. Would it make my argument stronger if I used him as the example of “someone who has no interest in, or patience for, the complexities of pyproject.toml”?
in the broadest sense - it may not need to be a virtualenv, for example ↩︎
Not to speak for James or anything, but I think you got the point across fine the first time - this is just a disagreement.
No, I don’t think this is the argument at all.
First off, workflows like this are not necessarily “for beginners”, and thinking that way is problematic - in the same way that thinking about questions about fundamental tasks as “too easy” is problematic for Q&A sites, documentation writers etc.
But more to the point: what is proposed is not a new class of tools (script runners exist already, and this only advocates a format without saying anything about who will actually use that format), and it is also not an alternative to the packaging process. It serves a different need.
Single-file scripts are unquestionably applications rather than libraries in almost every case. People who write this stuff aren’t expecting to share it with friends, so that their friends can import it. The goal, realistically, is to run it.
Making it possible to run the script is not “packaging”, so it doesn’t make sense to judge the tools for doing so by the standards of the existing packaging framework, or consider the impact on the “ecosystem”. When people set up a script for others to run this way, it’s not just that there is no intent to mark files as belonging together, no intent to put the code in a centralized “index” for distribution, no intent to prepare a “wheel” or interface with code in other languages etc. etc.
It’s that there is no intention for others to “install” the script.
A script distributed this way should be able to just sit anywhere it likes - not put any representation of itself into site-packages. Someone who already happens to have a Python environment set up that includes the necessary dependencies, should be able to just use /path/to/appropriate/python my_friends_script.py and go. People who don’t want to learn about any tools can read the comments, pip install whatever to whichever, and proceed with step 1. People who want runtime isolation (not build isolation!) can do fancy_scriptrunner my_friends_script.py, and let it parse the comments automatically.
But in no case is the script “installed”, therefore it was not “packaged”. Even when a script runner creates a virtual environment (which is likely temporary anyway) to house dependencies, the script itself is not moved or copied; and no actions are taken for the purpose of “advertising” the script to the OS, other programs etc.
I agree that this is not a feature targeted solely at beginners. I’ve been around the block a few times. I have a number of packages on PyPI. I understand packaging. And I would absolutely use the feature that PEP 722 is documenting.
Overall +1 from me for the “## Script dependencies:” version of the proposal (and I agree with the rationale for not trying to embed TOML or require Python syntax parsing, hence I’d be -1 on the 723 alternative).
Some functional suggestions for the PEP:
it should explicitly make the block header lines case insensitive. Otherwise “is the ‘d’ capitalised?” will be a recurring source of entirely pointless embedded metadata parsing failures. Precedent exists in the normalisation rules for dependency specifiers: https://peps.python.org/pep-0440/#case-sensitivity (Tangent: Package name normalization — Python Packaging User Guide should reference PEP 440 as the origin of the formalisation of the normalisation rules)
consider allowing comments and clarifying the spec for “blank” lines in the metadata blocks (i.e. still requiring the “##” prefix to keep the metadata block going, but ignoring “#” characters and anything after them, and then amending the blank line handling to cover discarding all lines with no non-comment text on them)
Editorially, I think the PEP would benefit from including the following:
rationale for preferring a combined “line prefix with block header” over a pure “line prefix” approach (the main benefits I see myself: less visually noisy overall since the line prefix can be much shorter, and it encourages listing all the dependencies in one place rather than scattered throughout the file)
a summary of the benefits of declarative metadata over runtime package installation APIs (i.e. metadata allows more robust environment caching based on either script hashes or file location details, while runtime config typically incurs more environment setup overhead on every run since any caching becomes the responsibility of the defined runtime API rather than being checked once in the script runner)
Note: I read the first 60 or so replies to this thread as well as the start of the split out PEP 723 thread (and Paul’s reply to that here), so it’s possible I missed folks already making the above comments. There’s a lot of off-topic discussion to skim over though, so I won’t get to that until I re-read the threads on a full computer rather than my phone).
This has already been mentioned a few times, but I think the survey answer preference for “serving common use cases” is an important one here. The survey’s phrasing unsurprisingly restricted itself to specifically talking about “packaging”, but I think that some of the frustration with the current packaging landscape is specifically the use case that is addressed with this PEP: that the currently available tooling forces users to go down the “packaging” route even if they don’t actually want to package anything, which then exposes them to the somewhat complex landscape.
By addressing this use case in a way that avoids the actual packaging part many of the other survey outcomes become mostly irrelevant because they no longer apply. And this is really true regardless of the experience level of the users.
Here I’m more wary (although perhaps there is some standard policy that makes my objection moot). Because of my position that this PEP is at-most-tangential to packaging concerns, I would prefer not to have PyPA directly involved, or house the document under that domain.
I’m not convinced this is especially necessary, given that the metadata blocks are already embedded into block comments. I would rather see the PEP explicitly allow (and require support) for multiple requirements blocks that are functionally concatenated; then one could simply intersperse regular comments if necessary. (And really: is “reading more of the file than necessary” a non-trivial concern, for a tool that is likely to spend time downloading and installing packages?)
I don’t think you overlooked anything relevant to the points you’re making, but I’m trusting my own memory on that.
## Script Dependencies: # for pip-run
## click
## requests>=2.26.0 # we need brotli support https://github.com/psf/requests/blob/main/HISTORY.md#2260-2021-07-13
## rich
##
## # for the data science part
## numpy
## matplotlib
#
## X-Python-Version: 3.11 # second (non-standard) metadata block starts here
Maybe not strictly needed, but maybe more convenient than the alternative:
# For pip-run:
# requests: we need brotli support https://github.com/psf/requests/blob/main/HISTORY.md#2260-2021-07-13
# numpy+matplotlib for the data science part
## Script Dependencies:
## click
## requests>=2.26.0
## rich
## numpy
## matplotlib
#
# second (non-standard) metadata block starts here:
## X-Python-Version: 3.11
I was going to say that I didn’t see the need, because dependency lists aren’t likely (in my experience) to be long enough to warrant the extra complexity. But your example is a good one. I’m still not 100% convinced it’s worth the extra complexity, but I’ll consider it. If others would find this useful, please speak up! If there’s sufficient support for comments, I’ll add them, but the default will be to leave the spec as it stands in this regard.
I’m torn on this. Your arguments are reasonable, but it opens up questions like "is a space allowed between the header name and the colon? What about multiple spaces, or tabs, between the two words? And I don’t want to make things that complex. For me, parsing simplicity is key, and I don’t think there’s much likelihood for confusion (frankly, the “D” is probably the only place where people might get it wrong). One other possibility is to just make the key a single word - “Dependencies:”.
Again, other opinions would be welcome. But (and I wish I didn’t have to say this, but I think I do) can people who feel the need to use this question as a chance to argue that this “proves” that PEP 722 is not as simple as it claims and therefore TOML is better, please refrain from trying to make that point again. It’s been hard enough to get feedback on the technical details of this PEP, and at this point, reiterating arguments that are already addressed in the PEP is feeling awfully like deliberate sabotage of the process.
It’s the normal process to migrate specifications to the packaging guide once they are approved, and I fully intend to do that (thanks for the suggestion of a precise location, it saves me having to think about it). But because of that, I’m not sure why people think it’s necessary to explicitly mention it in the PEP. Is it a case of having to make it clear because the process isn’t followed often enough?
I’ll add a brief comment to the PEP noting that the final form of the spec will reside in the packaging guide, just because it’s easy enough to add and clearly some people would feel better if it were there (you’re not the only person to have made this suggestion).
Will do. Your list of benefits is basically my reasons for preferring this form, so at least the intent was clear, even if I hadn’t added it to the PEP
I’m not convinced by this one, as I think it gets too deep into implementation details and tool UX. This PEP doesn’t prevent a runtime installation API, and in fact it’s totally neutral on that matter. A suitably crafted runtime API can be perfectly fine (take, for example, the nox session API), but it’s not the approach that this PEP is addressing. Not least, because an API is by definition tool-specific, and so is not appropriate for standardisation.
Yeah, I concede that “reading more than necessary” is probably an unnecessary attempt at optimisation. But there are other reasons for not allowing multiple blocks which get merged:
User readability. It would be way too easy to have an extra 1-line dependency block in the middle of a script and not notice it. This could even be considered a security risk (although I’m pretty sceptical of the whole “security” aspect when we’re talking about running some bit of code someone sent you).
Lack of any good use cases for it. No-one’s requested this so far in the discussion, and neither pipx nor (to my knowledge) pip-run support it.
Again, simplicity of parsing and processing. Maybe I’m going too far on this point, but I really don’t see any reason to make it harder to parse the data just to handle something that “might be useful”.
I can expand on that point in the PEP.
@brettcannon if you want to revise your schedule for when you make a decision based on the fact that these points have come up, I’m fine with that, but FWIW, I will commit to having these comments addressed before Monday 14th.
While it’s probably fine for the dependencies block use case, I’m not sure I like comments within the metadata block in the general case. # can have valid meanings other than just to indicate comments and splitting on it may not always make sense.
In this case #new-modules may be taken as a comment and removed which was not intended. I don’t have a specific use case where I need this but I thought it was worth bringing up.
Thanks for writing this up and for improving on this use case that has been mostly ignored in the python ecosystem.
One thing I think is missing in the PEP is how it may affect / is affected by linters and code formatters. The use of ## as a block identifier may get flagged / auto-formatted by known linting tools. If you run flake8 for instance on the example script in the PEP, it outputs:
$ flake8 script.py
script.py:3:1: E266 too many leading '#' for block comment
script.py:4:1: E266 too many leading '#' for block comment
script.py:5:1: E266 too many leading '#' for block comment
This may cause confusion if a tool was to automatically format the comment block and insert a space between the ## and the script suddenly stops working as intended. This could happen because of a ruff autofix rule, by an IDE format on save, or by running a code formatter directly on the script. The good news is that black which the most used formatter currently doesn’t mess up this case but I am not sure if other formatters/IDEs/autofixing linters do. It might be worth mentioning this in the PEP with a recommendation for tool authors and users on how to handle this case.
My requirements.txt files for larger projects are full of comments. Most often I’m commenting things out (they’re for testing, or maybe an alternate implementation), but occasionally notes like “# remove the next two dependencies when we finally drop Oracle support”. So I’d definitely like to see comments allowed.
I do think allowing comments would be an improvement.
Doesn’t that have to be specified either way? In fact it seems it already is. The PEP says:
The block type is all of the text on the initial line, up to the colon.
To me that means that any whitespace before the colon would be part of the block type. I can see this coming up in practice, since some people like to put whitespace in unexpected places (e.g., if foo == 2 :) and then some tool processing the script deps may fail because it’s looking for a block type of “Script Dependencies” and not "Script Dependencies ". So I think we can either ignore, or parse, or disallow (i.e., error) on whitespace there, but leaving it up to individual tools seems a bit dubious to me.
Again I think that has to be specified, but the good news it is also already is. That is, the PEP says the block type has to be Script Dependencies and that has one space so it has to have just one space. Again, if that’s not desired I think it’d be better to make it explicit. An alternative is to use a hyphen or underscore where there’s less scope for ambiguity.
I’m not so sure about that, for instance with the whitespace-before-colon case mentioned above. But in any case “parsing simplicity” can play out in different ways. If this is going to parsed with a regex, for instance, there’s not much additional complexity either way if the regex has \s* in a certain place or doesn’t.
Also, personally I’d rather think of it in terms of parsing simplicity for the user not the author of the parsing tool. In other words, what is to be avoided is people getting tangled in odd errors because they added a space here or there. One way to achieve that is to explicitly allow looseness in various places (e.g., whitespace before the colon), so that tools must be generous in what they accept. Another way is to explicitly disallow it, so that tools can raise a specific, informative error right away, which usually will still get the user past the problem quickly. But just leaving it up to the tool author to decide what slight variations to accept doesn’t seem best to me.
I agree that this is a good reason for requiring dependencies to be in a single block.
+1 again. I’d probably prefer writing in sentence case with a lowercase “d” , and there are people who prefer everything in lowercase. Let’s avoid the hassle of case sensitivity.
I am also +1 for allowing comments. Knowing why a dependency should be version N and not N+1 or N-1 is useful. Comments often clarify intent, and this applies to dependency files as well as “regular” code.
All of the examples posted so far are ones that I’ve also seen. Another common one is noting a transitive / diamond dependency conflict, where it’s not obvious why two dependencies wouldn’t be compatible.
That’s also a problem for the dependencies case, as name @ url is a valid PEP 508 requirement (see the second example here.
There’s clearly a lot of support for having comments, but it seems they aren’t as simple as they seem at first glance (specifically, @ncoghlan’s proposal as it stands is flawed because it doesn’t support the URL case).
And worse than that, there’s a different error (E265) that reports any comment that doesn’t start with precisely a hash and a space. As far as I can tell, this is motivated by a ridiculously strict reading of PEP 8. Worse still, it looks like ruff has this rule, too - thankfully off by default, but there’s no guarantee that will remain the case.
And while black doesn’t reformat ## comments (or #!), it does add a space in #%, #=, #[ and #] (and I suspect, anything other than space, # or ! after the first hash). So that blocks basically all of the “obvious” alternatives to ##.
There’s a lot to consider here, and it’s not obvious what the best solution is. One possibility is to revert to essentially the original proposal, which follows pipx and pip-run - a single hash, no comments, no internal blank lines:
# Script Dependencies:
# requests
# click
It’s limited, certainly. But it’s free of the flaws identified above, it’s proven in real world use, and it’s simple - which still remains a key goal for me.
There are other options, but I’m not particularly happy with any of them:
Picking delimiters until we get one that seems to work seems both risky (because of that “seems to”) and arbitrary.
Treating the flake8 issue as their problem, while it’s what I’d like to do, feels like it’ll just cause confusion and frustration for users.
There’s even the possibility of using TOML embedded in a specially formatted comment. But that still has all the problems I raise in the PEP about using TOML, and no-one has come up with a satisfying counter-argument to those so far (I’m discounting the people supporting “embedded pyproject.toml”, as they have mostly been simply saying they disagree, and not actually trying to persuade me to change PEP 722).
I need to think some more about this. I’d appreciate any feedback that people can provide, as long as it’s focused on just finding an alternative “embedded comment” syntax. People who prefer an embedded pyproject.toml have PEP 723, and people who want to debate PEP processes have a separate thread to pursue that discussion in. So it would make my life a lot easier if we could focus feedback on just this issue for now, until it’s resolved in a way that people interested in supporting PEP 722 are happy with.
When I’ve formulated my own thoughts, I’ll post them in a new thread (I think the “slow mode” rule of 8 hours between posts will make it too hard to discuss options, and hopefully by starting a new thread, we can keep things on topic without needing moderator intervention).
(Tone: slightly exasperated at these linting tools, none of which I use myself): would # # work?
Is this dependent on the indentation? Or how do the existing tools decide where the dependency list ends?
(I think TOML for anything other than “embedded pyproject.toml” - or whatever restricted variation thereupon - is a non-starter, the worst of all worlds.)