The way I see it, it’s like, suppose I have some kind of “project”. Whatever that may mean, at a minimum it means I wrote some files that contain Python source code. That code has some import statements. I see the purpose of dependency specification as essentially listing the libraries that I import. That’s it.[1]
So if my code has import numpy, I’m going to list numpy as a dependency. Why would I want to list that in a different way if it’s going to be run than if it’s going to be imported as a library? Either way, numpy needs to be installed before my file that says import numpy gets executed.
Why not? If my code depends on a library and I know that library made a breaking API change in moving from v2.9 to v3.0, so my code can’t run with version 3 of that library, why not specify "<3 " in my dependency listing for that library?
As has been noted, this could kind of almost be done automatically by scraping imports, except that we can’t link the import name to the distribution name, and (more important) the import doesn’t list the required versions. ↩︎
If you know, sure. Realistically, you generally don’t know. People often preemptively add this kind of restriction because “omg a new major version will break everything, that’s what they’re for after all”. However, in general this is just forcing things to break when they might not have to, and it messes with dependency solutions for more complex projects that include your package.
Because if it’s being run, you might want to pin a version for reproducibility. If you’re building a library, you want to avoid pinning, so your users aren’t over-constrained. You need to read one of the articles around on application vs library dependencies - this one is quite old, but still very relevant.
Note that the referenced blog post is in the context of overzealous pins in libraries (i.e. something meant to be reused and installed as a Python package), not applications which are the typical target use case that is being discussed here.
Okay, I’m familiar with those arguments, but having a whole separate key just in case you might want to pin in one situation but not in the other seems a bit extreme to me. Is there any other difference envisioned between these two keys other than “one of them might use pinned versions and the other one shouldn’t”? And is “distributing an application (where you might want to pin versions)” the only non-wheel case that is being discussed here?
Also, even in the pinning case, still what I usually do (and maybe this is bad?) is try to derive the versions to pin from a list of unpinned versions, i.e., let the resolver try to find what it thinks is a working set of versions, and then just pin what it comes up with (or back off if it doesn’t). That’s why I was asking about whether the [run] list would be derived from a [project] list.[1]
In either case (although I’m aware some might regard this as broadening the scope too much ), I’d say version upper bounds would not be such a problem if PyPI metadata were mutable, as discussed on pypacking-native. ↩︎
Semantically they mean different things. If you look at all the metadata recorded in the [project] table you will notice it’s very much about the metadata you write down for a wheel (by design). That does not align with what you need to run your application (e.g., do you really need keywords?). That’s the point my blog post was trying to convey.
I view this about “running an application”, not distribution specifically. While this could help tools that build something to let you distribute your app, I don’t view it as the design goal here.
Nope, that’s a totally legitimate thing, but a a lock file is a separate thing in this discussion. I personally view this whole [run] table as writing down what is statically known about what an app needs to give to a resolver to calculate what needs to be installed.
Using the same key for two fundamentally different purposes seems incredibly dangerous to me. What if there are situations where the two usages overlap?
You seem to be making the “structural similarity vs semantic similarity” mistake that I’ve already commented on in at least one of these threads, in response to basically this same point.
I can’t think of many scenarios where requires-python would diverge between the two, but conceptually they’re different, and I think it would help consistency (e.g. for scripts that have [run] without the [project]).
Perhaps I’m looking at this too broadly (not just in the context of “projects that aren’t meant to generate a wheel”; though one way of looking at this discussion is that pyproject.toml has outgrown the “just the wheel, please” pattern); I was referring to
My point was that a putative [run] table could also be a candidate for projects that do create wheels.
Basically I responded to what I saw as a possible avenue for further unifying the various diverging scenarios. Ideally we could avoid something “like project.optional-dependencies”, and actually provide the same interface regardless of use-case.
I don’t want to turn your proposal into something you did not intend, but I found the [run] idea very appealing, and a new table is also an opportunity to make this API (more) consistent. Also, [library] and [application] already got mentioned, so I figured a bit of a wider angle is fair game.
We currently have at least: build-system.requires, project.python-requires, project.dependencies & project.optional-dependencies. and now potentially run.{...}. What I’m saying is that it might be possible to slice & dice this in a better way[1]. Obviously touching this for existing use-cases would mean migration pain (and we’d have to manage that), but I’m trying to take the long-term view here.
As a complete strawman for unifying the things under discussion:
[dependencies.build] # optional; for projects building a wheel; ~= [build-system]
backend = "setuptools" # PEP 517
requires = [...] # PEP 518
[dependencies.run] # PEP 621 & PEP 723
python = ">=3.9"
requires = [...] # for wheel metadata (if applicable); for applications:
# base constraints that can be used to compile lock file
[dependencies.optional] # PEP 621, resp. your proposal's run.dev-dependencies
tests = ["pytest"]
# your proposed extension for self-references
coverage = [".[tests]", "spam[tests]", "coverage"]
I was even tempted to put [dependencies.lock] in there, if only for exposition[2]
That way, [dependencies.run] could be shared between libraries, applications & scripts, and scripts could reuse [dependencies.optional] where necessary.
It would be a fair criticism to say that the above strawman bites off more than a PEP might reasonably chew, but this is the kind of unification that I think pays off even such an effort in the long term (and that users are asking for).
I re-read PEP 517/518/621 again, and those were solving crucial problems at the time, though that space was still brandnew, so it’s unrealistic to expect an eternal API from v1. ↩︎
scripts will want to be able to lock dependencies too right? Should still stay single-file presumably, and with TOML we could inject the equivalent of PEP 665’s .pylock.toml very easily. ↩︎
I’m getting a bit confused on what use cases are under discussion here, so I’m going to back up a bit.
I find it useful to think of things like the dependencies in terms of how the mind of the programmer (i.e., the person writing pyproject.toml or similar info) engages with them. Let’s suppose I sit down and write some code that uses, say, numpy and requests. I write some imports and realize “hey, I am using numpy and requests”, and so I know I want to list numpy and requests among my dependencies in some metadata file. At some point I may realize that I need to specify a certain lower bound on the numpy version to ensure I have access to some feature that was added in a certain version, so then I know I need to update my dependency list to reflect that. And at some point (perhaps in a later release of my own code) I may discover that numpy made a breaking change, so I need to add an upper bound as well.
All of this is the same whether I am writing a library or an application. That’s why I’m uncertain about the need to separate the two cases. Now, it may be that for a library vs. application there is a difference in when or how this information is utilized: for a library I may send along just my dependency list, and the user’s resolver will calculate, at install time, precisely what to install; whereas for an application I may run a resolver beforehand and generate a precalculated set of all versions to ensure that different users won’t get different behavior due to having different environments. But still in both cases all of what happens flows from me as a human thinking “I am using this library”, and possibly some version restrictions, and me deciding to add those to my metadata.
In particular, I don’t think at any time “oh I am writing an application so I need exactly numpy version a.b.c”[1]; if that ever happens at all, it is derived from the more abstract, human decision of “what does my code need”. As far as I can introspect, I don’t at any time think about the dependencies differently depending on whether I’m releasing an app or a library, because I think about the dependencies as all being “what does this code need to run”. (In some sense it seems some of the app-vs-library question is maybe actually about the reverse, namely “what else might need this code in order to run”, and for an app the answer is “nothing because you don’t depend on apps” and maybe that affects the way the metadata is used.)
I’d imagine that what will be useful to users is a system that allows them to specify dependencies, as much as possible, in the way that they think about them in the course of development. Based on my own thinking above, to me this means not having to conceptualize application and wheel dependencies as different at the “root” level (i.e., a top level key like [project] or [run]), but rather to layer on some additional information about when, how, and by whom/what that information to be used. For instance some metadata that means “I just told you what my code needs to run, now I’m telling you that it is an app, so that means resolve that at build time before I distribute anything to anyone”.
That’s certainly possible! So what I’m wondering is, in my thinking above, is there something that is “wrong”? If so, independent of this discussion, it might be something for me to note as a possible future doc improvement that explains how to think about dependencies and writing metadata.
I’m not sure I grasp that. For the user, an application cannot be run without first being distributed somehow, so I feel like we have to at least somewhat peer over the boundaries here to understand how this metadata will be used.
Based on what I said above, I still don’t really see the difference in terms of what needs to be given to a resolver. I do see that there may be a difference in when that is given to a resolver (i.e., at the build/distribution stage vs. at the end-user install stage), but in both cases, to me, the dependency information itself is the same at a conceptual level.
unless maybe due to some odd combination of bugs there is only one version that has the combination of features I need, but that’s unlikely and in any case is really just a limit case of specifying upper and lower bounds ↩︎
Because you’re looking at it from your perspective of an author, not a user of your code (regardless of whether we’re talking about a library or application).
If you’re using a library, then you’re not using pyproject.toml. A pyproject.toml file has zero interest to users of your library because they won’t have it; the information in the pyproject.toml got transformed into a METADATA file and that’s what gets shipped in the wheel.
But users of your application, if you send them an, e.g., zip file, will see the pyproject.toml file and potentially use it to see what extras it provides. And all of that stuff in [project] that has nothing to do with running your application (e.g., keywords, summary, description, etc.) are just a distraction or something that will probably confuse users of your application when they do a search online for python pyproject.toml project table and don’t understand why all of this wheel-related metadata stuff could be there but isn’t.
There’s a big difference in my mind between distributing in an official packaging sense and sending someone some code. A wheel is a distribution as it’s a standard binary artifact in the Python packaging ecosystem. Sending someone a zip file of source code is not a distribution in my head.
Well, an archive of source code is a source distribution at any
rate (and while it’s less popular these days, “users” do still
sometimes build and install applications from source… I mean, I do
that and then use the resulting software).
But in my conception, the idea is that the author writes down the metadata needed to make the code work, and then someone (either the author or the user) uses a tool to convert that into what the user needs. What we’re describing here is a sequence of steps that an author will take to record metadata and/or use tools to process that metadata. The user shouldn’t need to know anything whatsoever about any of these internal details; they may know literally nothing except the name of the application/library. So from the user perspective I would say all of what we’re talking about here is immaterial; they’ll mostly only care about whatever tools eventually use the metadata, not the metadata itself.
I tried to be careful to say “metadata”: rather than “pyproject.toml”, although I may have slipped up and said the latter (since that’s what we currently use for metadata). I thought the purpose of this discussion was to figure out whether pyproject.toml or something else is the way to express some metadata (like application metadata). If that is the case I prefer to think conceptually about what information is present at what stage, and then after getting clear on that we (or at least I, if I’m the only one who’s confused on this) can think about whether that is the same for the current library-focused pyproject.toml workflow as for a putative application-oriented system.
The person writing the code (hopefully! ) knows what libraries their code needs to run, and they note that in some kind of metadata file. The person using it need not know any of that, neither from pyproject.toml nor from the wheel metadata (i.e., they as a human likely will not look inside those files). A tool which builds and/or installs the code may make use of the metadata the author created to somehow facilitate creating a valid environment on the user’s end to run that code. As I see it that accurately characterizes the current workflow with pyproject.toml. What I’m saying is that I see that workflow as compatible with distributing an application, perhaps with some changes in which tools are used at different stages, or how they work.
So, again, are we specifically considering only the “send someone a zip file” situation here? It seemed to me in the thread that people were considering application distribution, and I don’t consider “sending someone a zip file” as the only or even the best way to do that.
Even so, most of the stuff you describe is just as unrelated to using a library as it is unrelated to running an application. Keywords are just as unrelated to doing “import foo” as to doing “python foo.py”. And surely an application can have keywords, a description, and so on, just as much as a library can.
Well, for my part, there’s a difference (although maybe not so big) in my mind between “an application” and “some files I zipped up and sent to someone”. It seems maybe there’s some misalignment in terminology.
One key thing is that for me an “application” is still something that is “deliberately” distributed, in the sense that the person distributing it is fine performing some build steps, assigning a version number, etc. So I think almost all of the metadata that’s in [project] is quite relevant for applications as well.
Another is that I interpret “distribution” pretty broadly. To me it just means “I have some code running in one environment and I want to get it successfully running in another environment”. So sending someone a zip file is something I absolutely consider as distribution.
Whether we call that distribution or not, I do think we need to think about it as a step that needs consideration, and that may require for instance the running of tools, even if those tools don’t do quite what pip install currently does. I’ve frequently used “guerrilla distribution” mechanisms (e.g., sending a zip file, cloning a repo) in the past, and in my experience it’s usually overoptimistic to think that literally just getting the code is going to be enough. There’s almost always some kind of configuration or setup required (e.g., “edit this file to point at your data directory”).
So when I think of “not generating a wheel” I really am just thinking of “not generating a wheel”; I’m still considering that the author may still need to take steps to prep the code for being sent to someone[1], and that the user may still need to take steps to make use of the code rather than just unzipping a file and immediately typing “python dostuff.py”.
Which makes me wonder about something I didn’t see in this thread, and I don’t remember seeing it in the thread this was split from either, which is: When someone has a project that they don’t want to generate a wheel for, why don’t they want to generate a wheel? What are the practical use cases that aren’t handled by wheels? I think we all have some notions about this but maybe it would be helpful to lay them out to get clear on what problems we’re trying to solve.[2]
those may be “build” steps like packaging an application, or “maintenance” steps like keeping some metadata file up to date ↩︎
As an example, I have seen people shy away from making a wheel because they don’t want to publicly publish their code, and they associate wheel-building with publishing to PyPI, not realizing they can do the former without the latter. That’s something that could maybe be handled just with documentation improvements. ↩︎
I have definitely written many applications that have never been distributed, and were never intended to be. In both professional and hobby contexts, and in Python and other languages. But maybe our definitions of “distribution” differ.
Well, I wouldn’t consider moving an application from a test environment to production as “distribution”, to give a specific example. If you do, then I guess you’d say a lot of my applications get distributed, but I disagree.
I’d distinguish based on intent. If I send them a zip of my working code, I’d consider that more like “sharing” than “distribution”. For me, if I’m distributing something, I’d expect to send something that includes instructions on how to set it up in the user’s system. I’d expect to commit to supporting the user if they have problems getting the application to run. I’d even assume a certain responsibility for helping if the application doesn’t work the way they expected (although how much responsibility varies case by case, based on my relationship with the recipient). Maybe that’s because I’m used to working in a more “commercial” context, but I do think that it’s important to distinguish between a more formal idea of “distribution” as distinct from a more informal “sharing”.
For me, the biggest reason is usually because I don’t want to require the user to install the code. I’ll often send something that only needs to be run once, or occasionally. Download and run is very definitely the model I’m aiming for. The user I’m sending it to may not understand Python environments, or know about virtualenvs. I might want to avoid having to deal with a user having an incompatible version of one of my dependencies. I might not even want the user to have to know the application is written in Python, for a larger project.
Let’s put this another way. Mercurial is an application, written in Python. They don’t (primarily) ship as a wheel, because their users don’t want them to. That’s a perfectly valid use case, and one I’d like to see supported by the “packaging ecosystem”. It means getting some sort of integration with, and buy in from, projects like pyInstaller and BeeWare, but that’s the reality of “packaging Python applications”. If we ignore this, it’s very hard to credibly say that we’re listening to Python users - survey or not.
I’ve been talking from the perspective of replacing pip’s requirements files. I personally never got into application distribution ala Briefcase. Worrying about distributions somehow is an expansion of what this topic was originally about, stemming from, I think, the analogy to wheels and how this is all meant to be a different use case.
I personally just want a way to write down the requirements my code has in order to run. People objected to reusing the [project] table back in A look into workflow tools - package management in the Python extension for VS Code because it felt like a re-purposing of [project] for something it wasn’t meant for. This discussion got split off to try and come up with something separate from [project] for this use case of replacing pip’s requirements files.
Unfortunately, even after taking the time to write a blog post to try and motivate why I didn’t think reusing [project] was necessarily the right thing, I’m still spending most of my time trying to clarify this point. That tells me that either I’m personally failing horribly at explaining all of this or the concept is just not clear enough on its own to explain in general and thus the idea should just be dropped due to difficulties in teaching it.
I was hoping this could all get resolved before I needed to make a decision about those PEPs to potentially help inform what PEP 723 would want embedded a single-file script. But if that’s simply not going to happen then so be it as I’m not holding up making a decision about those PEPs for this because if we aren’t heading towards consensus/decision after 87 posts and a month of very active discussions then we might not ever come to a conclusion here and thus status-quo wins.
I guess what you call “sharing” is more like what I’d call “distribution” and what you call “distribution” is more like what I’d call “publishing”.[1]
Regardless of the terminology, I agree that the factors you mention are relevant. At least for me, though, even the informal “sharing” situation often has to involve some amount of instruction for how to get things working (even if it’s just “unzip this and run dostuff.py”). I suppose there is a gradient of how much apparatus (instructions, support, etc.) is attached to code that is shared/distributed/published.
When you say “don’t want require the user to install”, are you referring only to an install via Python mechanisms like pip? I think I know what you mean, but there is a sort of uncanny valley in terms of how things may be installed. At the level of casually sharing code among tech-savvy users, the “simplest thing” may just mean “copy this file and run it”. At the intermediate level of “formally” sharing code with tech-savvy users is where we live in the pip/wheel/pyproject.toml world, which seems to be the world we’re trying to step slightly outside of in this discussion.[2] But often, in the wide world of people who know nothing about Python or programming, installing the software is exactly what the users do want to do, and is viewed by those users as more simple than just opening a zip file — and that’s why tools like pyInstaller and Beeware exist. As you say, these users don’t want to know or care that they’re installing Python code, but there is definitely an install process, and it may in some ways be even more technically complex than pip-type installs in that it has to integrate smoothly into user expectations across different OSes, etc.
I totally agree that this latter kind of installation (what we might call Python-agnostic installation) is currently separated from the world of Python packaging by an immense chasm, and it would be great to bridge that. One way is integration with tools that package Python apps to be installed as “normal” programs for whatever platform, as you mention. Another idea is some kind of distribution platform for Python apps that is easy to use even for non-technical users; this model has been very successful for games via Steam, and Anaconda Navigator has a similar mechanism for installing applications (as distinct from Python packages).
Getting bad to the pyproject.toml matter, though, the question for me is again whether anything really needs to be different in terms of how the author initially specifies the dependencies. That is, if we are aiming at something like pyInstaller or Beeware, I see it as feasible for such tools to take the same kind of dependency information that’s currently specified for a wheel, and use it to package up a Python-agnostic installer.[3]
Publishing, of course, can still mean commercial publishing, in addition to things like PyPI. ↩︎
And, personally, often the reason I don’t want to subject users to that install process is precisely because of the problems that exist with Python packaging . ↩︎
It’s been a while since I’ve looked at those tools, but my recollection is that this is more or less how they currently do things anyway. ↩︎
At the same time, it’s good if project.dependencies constraints remain abstract instead of pinned. And when we do that, it is the same again for applications and libraries.
A trickier part here I think is that (as I wrote in an earlier comment) when we’re not building a wheel, the application may not contain dynamic dependencies, but when building a wheel we do.
A project may contain a library and an application. They may not have the same dependencies. E.g., the application may only be a subset of the library dependencies (or the other way around if you use packages such as click for your cli).
Maybe we’re going a bit too deep here but it does matter. E.g., in Nixpkgs we often split packages, moving the executables elsewhere, because depending on your use case you may not need them and thus also not certain dependencies.