Projects that aren't meant to generate a wheel and `pyproject.toml`

To be fair, I’m making a lot of noise, and most other people are rather quiet. But yes, “lukewarm” probably covers that :slightly_smiling_face:

I’d like to see that, but I am aware it’s a bigger question than the one you originally asked[1].

No, not at all. But Brett’s (fairly reasonably) presenting “what VS Code chooses” as “what beginners will learn in future” so that gives it a bigger influence than just “whatever works for you”.

Something like requirement files, yes. But we keep getting suggestions that they should be standardised (and in some cases people are already treating them as if they are, and we need to deal with that).

Oddly enough, not long ago, I added that to pipx. It’s not in a released version yet, I don’t think.

The choice of how to declare dependencies is similar to (and I believe compatible with) that of pip-run, so could be the basis of a standard.

Agreed. We may have many different concepts of what a “project” is, and what solution works for each (and I deliberately used “project” rather than “app”, because I think “app” has a much stronger association with the sort of project that has a name and version, and is developed in a dedicated directory).

Maybe what we need is simply better terminology, so that VS Code can say “If you’re developing a Python app…” and have it be clear that this doesn’t mean the same as writing a one-file script, or a notebook. But then we have to consider those apps that start life as a one-file script and become something that needs its own directory (i.e., pretty much everything I ever wrote :wink:).

I’m mostly just trying to explore the problem space here and get a better understanding of what the implications are of giving beginners a particular view of “Python development”. I have recent experience of this, as I tried doing Advent of Code 2022 in Rust, and as a relative beginner, I was led into the idea that every program I wrote had to be a directory with a cargo.toml, etc. That meant having 25 (OK, I didn’t get through them all!) separate directories, with a lot of copying of boilerplate, a lot of frustration, and overall nothing like the feeling I would have had if I could have written 25 programs in one “project”. I later found out that I could have done something like this - but that’s sort of my point here, it’s easy to lead beginners into the wrong workflow for them if you don’t think about all of the use cases.

@brettcannon this is actually a very good use case for your VS Code situation. How would you expect to support a beginner doing Advent of Code? In my experience, the first few puzzles are typically one-file scripts in Python, often just using the stdlib. Later puzzles add the need for dependencies, for multiple modules and more complex program structure, etc. But to the user, the project is “Advent of Code” (with 25 “programs”, or maybe more as sometimes “part 2” is better done as a rewrite than as an addition to “part 1”).


  1. … and it’s a hobby-horse of mine, so I’m wary of hijacking your request too much! ↩︎

3 Likes

I feel like I should post a link to this description of a project lifecycle again. It’s 5 years old now, and as a result somewhat out of date - and I suspect many of the participants in this thread are already aware of it. But we still seem to be looking at the problem from somewhere like “Stage 3” of that explanation, and doing very little to ease the process of moving through stages 1 and 2. And what it sounds like VS Code is looking at is more of a way to quickly get beginners to stage 3, rather than helping them at stages 1 and 2, and possibly even supporting them staying at those stages, if it’s appropriate to their needs.

Maybe there is no “one solution fits all” answer here. But I firmly believe that many of the people complaining “Python packaging is hard” are saying that because they are at stages 1 and 2, and what we currently have is inappropriate for them. Making it easier to work at a stage they aren’t even interested in yet, simply isn’t the solution for them.

Again, this is just an anecdotal data point, but most of the beginner Python users I’ve taught (who are using Python in support of a job that is explicitly not “Python programmer”) will never need to write a reusable library, or a standalone app. They will write adhoc scripts, or notebooks, and their “project directories” will be full of shell scripts, or Excel spreadsheets plus data for analysis, or things like that - emphatically not “Python projects”.

3 Likes

Ah, that is very cool.

Just speaking to the group of people here as a developer who doesn’t typically use wheels. Having one file define my project metadata and the requirements feels like where this needs to go, with a lock file included. What would hold back the eventual merger of a requirements.txt file and pyproject.toml file functionality? It sounds like people don’t like a requirement for name and version, could there just be a default behavior or exclusion field within the file for this, or potentially a runtime flag added? I personally don’t build many wheels but my python projects are a project layout of script that form my application when it runs (like in a docker file). having to use multiple requirements.txt files feels messy for dev dependencies, then for implementation. I as a user would love:

  • a lock file
  • one (not multiple) file for this (perhaps a separate lock file)
  • The structure and clarity of the pyproject.toml file with the flexibility to potentially abstain from providing metadata so people who want to run a script with some dependences can just “send it”, install the dependencies, and they’re off to the races.
1 Like

That’s totally fine! Sharing the blog post was to make sure what it proposed seemed reasonable for today (which my takeaway is people are supportive of it). I then tacked on my “I wish we were using a standardized file to record what has been installed”, and it led to this discussion which gets us to what will be reasonable eventually. You know I’m always supportive of standardizing stuff, so I’m still happy with where this has gone so far!

That would be great! I personally would support standardizing on a way to embed runtime requirements so it isn’t solely a pipx thing and something VS Code could utilize.

And that last :wink: is why we are currently planning to help users get to that stage upfront (at least to start).

Multi-root workspace if you wanted to have all of the problems open at once instead of viewing each problem as an independent project. Otherwise as one big project where you created a package per solution.

Yes, because stage 1 leads you into stage 2 quickly and the leap is small when tooling can help you write down your dependencies. But stage 2 requires a way to write down those dependencies which we currently can’t do in a standardized way, hence us going straight to stage 3 where the baseline use case of being able to share things and not panic if you break your virtual environment is supported upfront. It’s all a question of which frustrations you’re trying to avoid.

I think to make stages 1 and 2 simpler we would need to be able to specify dependencies inside of a script to make that self-contained and take the guesswork of figuring out what to install out of the equation. After that is the transparent creation of virtual environments and the installation of dependencies as an inherent part of execution. That would take it from:

  1. Write code (.py file)
  2. Write down dependencies (requirements.txt)
  3. Create virtual environment (venv)
  4. Install dependencies (pip)
  5. Run code (python)

to:

  1. Write code w/ dependencies (.py file)
  2. Run code (pipx run)

Both allow for redistribution and reproducible results, but the former can lead into more complicated flows naturally while the latter is much simpler and thus has less stumbling blocks.

I think the other question with this hypothetical is whether that latter approach is enough to go from “simple, self-contained, and no control” to “requirements file or pyproject.toml”? Or is that too much of a leap from stage 1/2 to stage 3? It’s probably fine as long as we never let the in-file dependency list allow for decisions, so no extras or anything else where the user might need to provide input.

I could imagine a world where if people could specify dependencies in a file, VS Code’s Run button (the green :arrow_forward: in the UI) would inspect the file and do the whole virtual environment creation and installation on the fly much like pipx run would do (heck, if we standardized the naming of the temp directory we could even reuse an existing one if people wanted to). That way beginners wouldn’t even have to think about it. And we could also provide code actions to help write down any packages necessary when an import clearly isn’t from the stdlib (which is where Record the top-level names of a wheel in `METADATA`? comes into play to help with that). We could even warn the user that we don’t think running the file will succeed due having not written down any dependencies (or they are running without a virtual environment).

One of the trickiest things we have to balance in VS Code is that golden path where we have to guess versus asking the user to participate in making decisions. In general we lean on the latter because we get yelled at less that way (although as you can tell from the blog post, we are getting asked to be opinionated to help beginners out). :sweat_smile: But if we had a more restrictive flow for the simple case where guessing wasn’t a concern then that makes it easier for everyone: we get to follow a standard/common practice that no one will argue with us over and users get exactly what they were expecting.

3 Likes

I think that’s very true. There’s a related situation that I’ve been thinking about in these packaging discussions recently, which is that (at least for me), it would often be much more convenient if I didn’t even have to think about a project as a “project” per se until fairly late in the development process.

What I mean is that often I’ll be doing work interactively, playing around with data, exploring some library that I have a vague idea of using, etc. And I gradually refine some bits of code that do things I want. And then maybe at some point I want to plug those into some other code that I was working on separately. And through all of this I definitely don’t want to upload anything to pypi, and it all may be files with names like “async_version_experiment.py”. And it’s only after I spend a good deal of time in this scratch-work stage, and get things to a point where I feel they’re sort of stabilizing, that I even want to think about something like “is this a project and if so what is it called”.

As you say, the Python packaging system doesn’t support this kind of workflow well. I think it’s one reason things like Jupyter notebooks have become popular in academic and data science circles, where such workflows are common; the “solution” becomes to just dump a bunch of stuff in notebook cells and run them in inconsistent orders and create chaos but eventually (hopefully :slight_smile: ) boil it down to something more tidy.

In my experience a major stumbling block for people here is Python’s dogged insistence on making users “install” things before they can use the import system the way they expect to be able to. As soon as your not-yet-a-project has more than one file, you’re likely to run into import issues. Often people have different pieces of code in different directories and they want to do a “manual override” and say “import this file just so I can fiddle around and I’ll figure out where to put stuff later”, but this is very painful.

In terms of Nathaniel’s “lifecycle” that you linked to, this entire thing may take place at “beginner” level. That lifecycle is a great overview, but one caveat I’d add is that a lot of the complexity of the “sharing” level can come without any actual sharing, just from people writing multiple pieces of code that they want to interoperate with each other (even if it’s still only for their own personal use). It can be difficult to get people to understand why they need to go through the process of “packaging a library for distribution” when they have no intention of distributing it. Some of this is just nomenclature (e.g., they may want to “copy it to another computer” and we can say that counts as distribution) but some maybe has more substantive aspects.

I’m not really sure what the solution to this is, but it came to mind when you mentioned the “packaging is hard” crowd. In my days of answering lots of StackOverflow questions, a lot of the ones that generated “packaging is hard” complaints arose from situations like “I just have this little piece of code and want to import it from this other little piece of code , why can’t I do just that without engaging the whole packaging machinery”.

It’s natural to want to get the code itself working first, even if the way its components refer to each other is haphazard and even junky, and then only at the end worry about the organizational structure of the “project”. But the way things are now, you have to start with a package-like organizational structure in order to get very far. I suppose from one perspective this could be seen as beneficial in that it forces “good hygiene” on developers, but I think in practice a lot of people see it as a hindrance.

4 Likes

With that in mind, I’ve created PEP 722: Dependency specification for single-file scripts by pfmoore · Pull Request #3210 · python/peps · GitHub

2 Likes

I’m having difficulties to understand this discussion, which has so far been quite abstract. Concretely, what is at stake? What practical disadvantages would “blessing” the use of pyproject.toml for projects that aren’t meant to generate a wheel have? I mean, it’s already being used by tools for configuration, including tools that don’t have anything to do with packaging (like black), and for those tools, whether the project is or is not meant to become a wheel changes nothing.

3 Likes

I believe it is about the [project] section of a pyproject.toml file specifically (not literally the whole pyproject.toml). Whether or not it is okay to use this section of the file for a different purpose than what it is currently used for. Since this particular section follows a standardized specification, it can not be repurposed or modified without thorough discussion. On the other hand adding [tool:XXX] sections to the pyproject.toml (like [tool:black] for example) does not require discussion regarding standard and specification beyond sticking to the name that you own on PyPI (if you own foobar on PyPI then the [tool:foobar] section belongs to you).

1 Like

It’s basically a discussion that started from the idea of having a “project” that had a bunch of dependencies, but wasn’t being built into a wheel. I don’t personally know what the actual use case is, but it seems to be about putting scripts into a directory with a pyproject.toml acting as a requirements file. Sort of like the “single file dependencies” proposal, but for stuff which does live in a dedicated directory (this is why I’m vague on the use case - it’s not something I have ever encountered a need for myself).

The reason it needs thorough discussion is that the existing standards require the name and version fields in the [project] section. And [project] itself is only defined in terms of translating to distribution metadata. So this new use case needs an extension to the standard, of some form (it’s not yet clear what, exactly). Extending or changing standards is, like it or not, a laborious process…

2 Likes

Don’t Django projects typically work like that? A Django web app has dependencies, but it does not need further packaging metadata, it never gets truly installed, it never gets packaged into a source distribution or built into a wheel.

3 Likes

That is what slightly confuses me about this thread; every Python application that is not meant to be distributed to other machines works like this. The only difference is in this way one must necessarily accept not locking dependencies.

6 Likes

Ah. I think I finally understand why this has been confusing me. It’s simply that I’ve never thought there was a problem here that needed to be solved[1].

For me, if a project is stored in its own dedicated directory, there’s only two scenarios. Either it’s being distributed, and we have that scenario covered[2], or it’s not. In the latter case I just create a virtualenv for the project in a .venv subdirectory, and use a requirements file to list what I want installed in that virtualenv. A requirements file is quick, easy to use, and practical, so I’ve never seen a need for an alternative. Yes, there’s been talk of standardising requirements files, and if that happened I’d switch, but until then, I’m fine with the status quo.

So this discussion sounds like a weird mix of using pyproject.toml in a context where it’s not appropriate, and trying to half-standardise requirements files - neither of which seems like a useful goal to me.

If people want to focus this discussion, maybe it would be worth someone explaining what, precisely, is the use case that we’re talking about here? @pradyunsg - as the person who started this thread, do you have any insights? It came from discussion of the proposed VS Code package management workflow - is VS Code unwilling to support the non-standardised requirements.txt format? I find that unlikely, but if that is the case, then surely we need to be looking at standardising requirements files, not just picking off part of the problem and creating a second solution for that without replacing the existing solution?


  1. … beyond the meta-problem of “packaging is too confusing for newcomers” ↩︎

  2. in more than one way, but again that’s a different problem ↩︎

4 Likes

Typeshed is maybe a concrete example of a project that might benefit from being able to specify dependencies in a pyproject.toml file without building a wheel. (Note: I’m speaking for myself here as one typeshed maintainer; I’m not speaking on behalf of the typeshed project or the other typeshed maintainers.) We have a lot of dependencies at typeshed, currently specified in a requirements-tests.txt file. The file is a bit of a grab-bag of linters we use in CI that are convenient to have around locally, dependencies for our various tests and scripts, stubs packages for mypy, etc.

Typeshed isn’t “pip-installable” in the conventional sense. The repo consists of typing stubs for the standard library, and various user-contributed typing stubs for third-party projects. Our stdlib stubs are vendored by all type checkers rather than being distributed via PyPI. Our third-party stubs are pip-installable, meanwhile, but all the packaging and uploading to PyPI is automated via a separate stub-uploader repository. So for our purposes here, we can simply say that typeshed is a project that isn’t meant to generate a wheel.

It’s nice (to reduce cognitive overload) to keep all our dependencies in the same file, but this is also silly, as we don’t need all of them for each test, and installing all of them for each test has an impact on how fast our CI is. There’s already a solution to this – dependency groups in a pyproject.toml file, and we already have a pyproject.toml file (for black/isort/ruff config, etc)! We even already have a non-pip dependency specified in our pyproject.toml file. But we can’t put our pip dependencies in our pyproject.toml file without adding a project section, and if we add a project section, pip install .[lint] (or whatever) will build a wheel, which isn’t necessary.

2 Likes

I might be mixing threads, but one of the things that would be great is a way of encoding this information into a standard location such that a generic tool can do it for you[r contributors who have just cloned your repository]. Just like how python -m build knows how to find the right backend and invoke it, a section to find and invoke the right environment manager and package installer would cover a lot of the concerns here.

Hypothetical new section:

[environment]
backend = pip_with_venv  # new package someone puts on PyPI
requires = ['pip_with_venv', 'pip>23.01']

[environment.options]    # arbitrary options for the backend
venv_name = .venv
pip_arguments = -r ./ci/requirements.txt --find-links ./wheelhouse

This is all the information that would be included in a devguide or readme, but now it’s programmatically accessible by any tool that wants to handle it.

(The backend could of course be added directly to pip, or even the standard library, if someone is so motivated. But the principle is that it may be anywhere, including entirely custom.)

2 Likes

I don’t want to digress but please let’s not attempt to “standardize” environment config and rather, if there is a desire, focus on Brett’s proposal Support a way for other tools to assist in environment/interpreter discovery · brettcannon/python-launcher · Discussion #168 · GitHub

1 Like

My “latter case” was precisely that I’m not distributing this directory. So there’s no need for a standard. As @ofek says, environment management is a much bigger question (and likely very contentious). Let’s not hijack this thread with that discussion.

So this discussion sounds like a weird mix of using pyproject.toml in a context where it’s not appropriate, and trying to half-standardise requirements files - neither of which seems like a useful goal to me.

For projects like that, I sometimes use pipenv. Works great for me, but I don’t like having a Pipfile and a project.toml. As an alternative, I sometimes put my direct requirements in pyproject.toml (with a lower bound on the version) and then generate a requirements.txt (my lock-file) from that using pip-compile. Both files go into git.

When you all write pyproject.toml, do you actually mean its [project] section? I assume it is the case, but I rather ask for confirmation.


Is it still the case that if a pyproject.toml contains a [project] section but no [build-system] then setuptools is assumed as the build back-end? Because I kind of feel like that is more or less what we want for this use case: some project metadata (especially dependencies), but no build back-end. Isn’t it? What if we were to have an [open-project] section instead (placeholder name obviously), whose content is compatible with [project] so that migrating to a built project is as easy as renaming the section? How far off would we be?

Yes.

(I’ll respond to the rest of the comments later)