On top of the virtual environment need that was pointed out, this also assumes pip is installed. I could very easily see this being used with a virtual environment that lacks pip in order to speed up virtual environment creation and save on disk space (you can do the install externally via --target after environment creation).
That assumes venv has pip available at creation time. I think you definitely should be creating virtual environments as part of this, but I don’t think they need to be made into a single step.
I don’t think it’s that bad (assuming this untested code works):
import tokenize
with tokenize.open(py_file) as file:
requirements = None
for line in file:
if requirements is None and line.strip() == "# Requirements:":
requirements = []
elif requirements is not None:
if line.startswith("#"):
if comment := line.removeprefix("#").strip():
requirements.append(comment)
continue
break
I think that’s a great point! If you’ve muddled your code in that way then you can simply fix it since it’s a single script that you can look at it and is directly under your control. This isn’t going to be buried in some 3rd-party package that’s accidentally causing you issues with this.
Ah, okay, it wasn’t clear to me that you were only focused on running scripts “in situ”. Although later you say something that seems a bit different, so I’ll respond to that with another way I sometimes handle this stuff.
For what it’s worth, yeah, I do the same kind of thing and I also find it annoying to deal with, so it would be cool to improve the situation.
Yeah, I certainly do that. Although there’s a bit of catch-22 where because doing that is often so cumbersome (for the reasons that you’re trying to address with this PEP), I’ve been once burned twice shy and more often create directories just to be on the safe side!
However, in your description of how you use these scripts, I see “copying to a different PC” as really a form of distribution, because if you do that you’re transplanting the script into a different environment. Of course, the goal is to insulate the script from that environment by running it some kind of virtual environment, but intrinsically things are going to be different on a different system (most obviously, it could have a different version of Python). So to me it seems like a possible happy medium to say that there may be some kind of minimal install process, but that process may effectively just be a short way to type a series of venv-creation commands. But crucially that install process would be totally “local” in the sense that it wouldn’t be installing anything globally or even in a venv, it would only be working with the files you tell it to work with and with an environment (created by the install itself) just for that.
So let me describe how I handle a similar situation, without the “copy to another computer” part, and then handwave a bit about how maybe something like that could be extended to a barebones script “installer”.
I use conda. I have an everyday “kitchen sink” conda environment where I have installed a bunch of libraries I frequently use. But that environment isn’t activated systemwide, so if I write a standalone script, or if I just have some fragments of stuff that isn’t worthy of even being called a script yet, if I want to run it (often from an editor) I still need to somehow specify that it be run in that environment. And sometimes I’m working on something that needs its own environment (e.g., because for whatever reason it needs a different version of something than what I have in the kitchen sink).
So what I did is I wrote a little glue tool that looks for a line that starts with # conda-env: and expects the name of a conda environment after this. And then I can run this tool on my script and it will figure out the env and then use conda run to run the script in that env.
Now there are some disadvantages to this relative to something along the lines of the proposed PEP:
there’s nothing that actually declares or knows the dependencies, they’re just assumed to be in the specified environment
the script depends on the environment having a particular name
However there are some advantages too:
the script only has to be “embellished” with a single line referring to the environment, rather than a full list of dependencies
because it’s just one line, it’s easier to be strict about where it can occur[1]
Being small and strict drastically reduces the danger of unintended consequences of the type I mentioned in my earlier posts (e.g., with a # inside a string literal). That danger can be reduced further by having a longer and less collision-prone prefix (which is more tolerable here since you only have to type it once).
Now, what I’m thinking is (and this is just pie in the sky dreaming here, I haven’t tried any of this), what if something like this were combined with a “barebones” dependency file similar to requirements.txt or the format proposed in your PEP? So the dependencies would be in a separate file, but they’d still be in a simple-as-possible text format, and then instead of listing the dependencies in the script, we just have the script reference this external dependency file. So it’d be something like this:
# this is script_requirements.txt
great_lib
useful_lib
and
# this is script.py
# script-deps: script_requirements.txt
import great_lib
import useful_lib
def handy_function():
print("And so on...")
Then you could run some tool like pipx a la pipx --read-env script.py and it would set up the environment and/or run the script in the environment if it was already set up. Copying the script to another computer would then require copying two files, not one, but you could put them anywhere, as long as they remained side by side and didn’t have a name clash with other stuff in the same directory.
There are some usability questions that opinions might differ on:
How important is it to strictly have to copy just one file (the script file)? For me, this usually isn’t so critical; what’s more critical is not having to carefully maintain some particular directory structure, so “copy two files and keep them together” would probably be okay.
How important is it to be able to run the file directly with python script.py, rather than something like python -m scriptrunner script.py (where “scriptrunner” is a pipx-like tool that handles the environment matters)? It seems like any system like this has to rely on something already being installed, because the dependency info is useless without some tool to provision the environment. So it’s just a matter of having those required tools also be able to link up the script with its dependency file.
At this point, how much is gained by keeping it as a stripped-down text file vs. something like pyproject.toml? With tomllib in the stdlib now, it seems like the main problem with using pyproject.toml is it has to be named pyproject.toml, which means you can’t have more than one in the same directory. But if it could just be an arbitrarily-named TOML file, then it doesn’t seem that much more verbose to use that, and it might be a little more robust.
There’s still a coupling of two files which could get out of sync (e.g., if you rename the depedency file you have to fix it in the script). My own feeling is that this is sort of unavoidable, as trying to actually pack the dependency info directly into the script file just seems too messy to me. The next simplest thing is to have one file that knows about one other file, but impose no additional structure (e.g., directories).
Like I say, this is just handwaving. But my goal here is to try to probe a bit at which desiderata are most essential and which can be flexed a bit.
the way I have it set up, it actually requires that the first character of the line be # and that it be preceded (if at all) only by lines whose first character is also #↩︎
The PEP example script uses ‘#!/usr/bin/env python’ shebang. Isn’t this inconsistent with the idea that such requirements specifying script should be called with ‘pip-run’ or equivalent?
No good reason, I could change it to that. It may even be what pipx does already, now that I think about it.
Nowadays, the --python option is better than --target for this.
Cool. So how do you ensure the dependencies of these scripts are available? Do you use a shared environment and dump all the dependencies in there? What if there are conflicts. That’s the problem I’m looking at here.
That’s just an example, which can be fixed. I’ll probably just remove the shebang to avoid confusion.
I’m not trying to solve the “how do I run python scripts” problem here. That has much bigger challenges, especially on Windows, which is what I care about. This PEP is purely about “can we standardise the format for specifying dependencies that pipx and pip-run use?”
I’d prefer if it also allowed the requirements to be listed in the module doc-string.
Separate files and directories are just unrealistic in many cases. The real alternatives in practice are then being restricted to the standard library or mystery dependencies. So the comment / doc-string hopefully is already there anyway.
I hope a tool can be bundled with Python that supports this, or an already bundled tool can add support for this. (If not pip, or py launcher, maybe venv: py -m venv -r script.py?) That can be done later of course.
I think all that does (compared to my approach in pipx) is handle encoding detection. You can make it work using tokenize.generate_tokens, but it’s not entirely trivial (you need to take care to ensure you ignore trailing comments, for example). The PEP probably does need to discuss encodings, though - if only to say that tools should detect the encoding the same way Python does.
The bigger question for me is whether we want to require tools to implement a full Python tokenizer. One downside (which I mention in the PEP) is that it means that the spec is tied to the Python version. While it’s unlikely, it is possible that Python 3.13 could add new syntax that meant the 3.12 tokenizer wouldn’t be able to process 3.13 source. I’m pretty sure I’d count that as a deal-breaker for pipx, for example, because pipx has the ability to run different Python versions, and it’s perfectly conceivable for someone to run python3.10 -m pipx run --python 3.13 myscript.py…
And suppose Python 3.13 added multi-line comments of the form /*...*/. Would the spec need to change to cover whether these comments could be used for requirements? Or whether a /*...*/ could “comment out” a dependency block?
For all its limitations, I think that ignoring the Python syntax and treating the file as simple text when parsing for a dependency block has advantages.
There’s also a question of who will write and read this data? The PEP assumes that it is written by the developer (a human) and consumed by a utility (i.e., code). This matches the expected usage in pipx. Based on this assumption, it’s more acceptable to take the view that as long as we’re precise in describing how the data will be read and processed, the developer can write the data in a way that doesn’t trigger any of the edge cases we’re discussing here. But this approach needs a lot more care on the part of tools writing the data, as they have to handle all the edge cases intelligently.
It makes sense to me that this gets added to the “Rationale” section of the PEP. But if I’m going to do that I want to be sure that there’s consensus that it’s a reasonable design decision to take. Specifically, that cases like the one @BrenBarn noted, where things that look like dependency blocks in a multi-line string being processed, is something that we can legitimately address by just saying “don’t do that, then”.
Which reminds me:
Apart from being more verbose (which you mention) the other reason I dislike this is because you have to parse the whole source file - there’s nothing to indicate that you’ve reached the end of the dependencies except EOF. While I don’t expect people to be writing megabytes-long source files and calling them “simple scripts”, it’s not impossible (imagine a chunk of embedded data, like in get-pip.py). Script startup time is often quoted as a problem for Python, so let’s not make it worse than we have to.
This seems like a better solution to me. It’s not that different from what you sometimes find in Jupyter notebooks which come with the requisite %pip install ... magic at the top.
The installation could even be made conditional on some sort of __pipx__ sentinel, say.
Well, you could have the following “magic sentinel”
#!/usr/bin/env pip-run --
# Requirements:
# requests
... your script here
That doesn’t require anything that isn’t available today, it just needs pip-run available on PATH. It relies on the fact that pip-run already implements the syntax defined in this PEP, plus an optional convention to write the first few lines of the script in this way.
I prefer pipx, because it caches environments (and hence improves script startup time), but its interface isn’t as clean (yet). YMMV - the point here is that we don’t need anything new here, just agreement on stuff that already exists.
Perhaps something akin to # Script Requirements: (# Script Requires:, # Embedded Requirements:, etc) could help here – the additional word lowers the likelihood that a rogue comment matches the format, and provides extra context for a google search.
As an illustrative example, there are currently no recorded uses of # Script Requirements: on grep.app, and only about 10,000 search results for the query.
I honestly have no vested interest in any particular header text, so I’ll just wait for consensus here. But there may be a compatibility issue with any change, as Requirements: is currently implemented by pip-run (the pipx support isn’t released, so I don’t count that). So a different header would need some form of transition.
I’ve no idea how much use this feature of pip-run has. Maybe @jaraco has a view?
I like this idea, and given that tools are already implementing some form of it, I think it’s worth standardising.
As for the multi-line strings question, I don’t think it’s a showstopper. I could live with either of two compromises:
Tools have to know enough Python syntax to look only at comments. This makes non-Python implementations harder, but they can run a bit of Python helper code to do it, or you can write a Python tokenizer in another language. It’s a limitation, but not the end of the world.
Alternatively, the requirements block must come before any code, so you stop looking for it at the first non-empty, non-comment line. Simpler to implement tools for, more limiting for the user.
PEP 508 does not allow local directories or files as dependecy specifiers.
I think it actually might allow local files. It allows URLs (as in pkg @ https://...), and as far as I can see, it doesn’t specify what schemes are allowed, so I think foo @ file:///home/takluyver/foo-0.1-py3-none-any.whl is valid according to the spec. I haven’t checked what implementations actually do with this.
I agree. My initial reaction was a bit too panicky. I actually think we should apply the “consenting adults” principle here. The rules on how the dependency block is identified and parsed must be clear and unambiguous (I believe they are, but we can fix them if not). But they do not have to prevent users from putting then in dumb places (such as in multi-line strings).
If a developer wants to put something that looks like a dependency block (but isn’t) in a script’s docstring, they have to put the actual dependency block before the docstring, so it gets recognised first. And if they don’t have dependencies, but want to use a tool that looks for a dependency block, they can put in an empty dependency block.
If a developer wants to put a real dependency block in a multi-line string, fine. Let them do it. What’s the harm?
Rules don’t stop people doing stupid things[1]. Common sense (and other people) stops people doing stupid things.
I don’t like the PEP, as I said above, but assuming that it is going to be done, why not borrow the same syntax as encoding comments? Namely #-*- requirements: numpy, scipy, pandas, rich -*-.
I was over-simplifying. It may do, but the usability sucks (no relative paths, and don’t get me started on Windows drive letters). My point here was that tools may reasonably want to allow extended forms of dependency specifier, and I don’t want to disallow that, even though it’s not something I want to try to define in the PEP. The obvious extension is “anything that pip can use as a requirement” (and that’s what pipx and, I believe, pip-run allow).
These ideas are addressed in the PEP (“Why not include other metadata?” and “Why not make the dependencies visible at runtime?”). I imagine you’ll disagree with those sections as well, of course. And it’s perfectly fine for you to disagree - this isn’t going to be something everyone will like. But if you have suggestions or questions that aren’t already covered in the PEP, please do ask them - I’ll be updating the PEP based on the feedback here, so I want to make sure I hear everyone’s views.
(The idea of an encoding comment style is one I’ll think about. I don’t like it, but I need to articulate why if I want to represent it fairly in the PEP).