Packaging of projects that are both an app and a library

OTOH, I think the black developers are the right folks to pin their dependencies. IMHO, the application developer (in this case, black devs) are the integrators and only they know what matrix of dependencies work for them.

They’re the right people to do it, but pip/venv are the wrong tool to do it with. Better for the black package to be unopinionated, so that the integrator using pip to set up a working install is able to constrain the actual versions used.

(Which means Petr’s concerns are valid, but I think the result is that we just have to make clear statements about who users need to “blame” when things don’t work.)

3 Likes

I’m not so sure about that. black is an application, so it should be opinionated about what versions it knows it works with. And it should know whether it works with -L or not. Those are things the black developers should declare, and whatever tools are used to install that application – on whatever platforms are supported – should honor those declarations, or take responsibility for changing or ignoring them.

I agree that the application black should pin all dependencies. The problem is that some projects like ipython treat black as a library, too.

And here it gets messy. Python’s packaging guideline do not encourage people to split applications and reusable library components into separate packages. In a perfect world we would have black CLI application with pinned dependencies and a reusable libblack package with relaxed dependencies.

10 Likes

I agree with this. If “pip/venv are the wrong tool to do it with” then sure, use another tool to distribute black. But what would that tool be? Particularly given that there’s a general view (not one that I particularly subscribe to, but it seems to be common) that you should install development tools like black into your project venv.

There’s a very real conflict here, that has existed for some time, but we’ve mostly ignored it. Applications built with Python need to be able to pin their dependencies for all sorts of reasons - it reduces the support matrix drastically as well as allowing for things like lazy imports to be safely used. But if they do that, they absolutely must not be installed into a shared virtualenv, as that is a sure-fire recipe for dependency conflicts.

Python applications need to have their own environment. Whether that’s using something like pipx to manage virtual environments, or something like zipapp/shiv to bundle dependencies along with the application code, doesn’t matter nearly as much as the principle that applications have to control their own environment. Otherwise they become as hard to maintain as libraries, which by their nature have to be able to run in a wide variety of contexts.

This is very off-topic, and should probably be split into a discussion of its own, but as long as people consider “pip install a project that exposes an entry point into your project venv” as a viable way of installing tools, it’s going to be hard for applications give any sort of guarantee of reliability. The best they can do is assume things will probably be OK, and deal with issues on an adhoc basis. Lazy imports are just another variation on this problem.

I’d go further and say Python’s packaging guidelines have almost nothing useful to say about how to package applications. They focus almost entirely on libraries, with “making a command line entry point” as almost an afterthought.

10 Likes

+1

If you’re just using a tool like black (or flake8, mypy, etc) for development purposes, don’t install them into your dev venv. That’s one of the reasons I organize my tox.ini file into separate environments so that my dev/static analysis requirements don’t bleed into my development environments.

I personally use pipx for that kind of thing, or tox for single repo development, so I don’t run into that problem. But maybe that’s not a widely used strategy? I personally never pip install anything explicitly, unless I’m in a special purpose venv for experimental purposes, or a tox-managed environment.

+1

(Perhaps it’s time to split this into a subthread…)

I think it’s a bit more nuanced. IMO black devs are doing the right thing as application developers, but they might want to take on an additional role of integrators/distributors, where they:

  • pin dependencies
  • test with the known dependencies
  • provide timely updates when the dependencies release security updates (here’s the rub!)
  • can use -L with clean conscience

But, as a distro packager I’d be sad if the project didn’t also provide unpinned dependencies as it does now. I think there’s value in alternate distribution channels with alternate stability/security/… needs, and having some leeway in the dependency versions is very helpful here. (Of course, it should go without saying that an integrator needs to their own integration testing, and be responsible for integration bugs.)

1 Like

I think I’d turn that around. As an application developer, I would pin my dependencies, but if downstream integrators want to override those, then there should be a mechanism for that, and such integrators would also assume the responsibility of ensuring that those overrides worked in whatever distro environments they support.

2 Likes

This is a topic we discuss a lot in the Nixpkgs project as well. Ignoring for now the library use case, if we have an application, do we want it to be pinned entirely, even on patch version, or not. The application distributor might want to do that and say, this is what we support, and maybe distribute also things like flatpak’s or docker images with their tool at that version. As integrator though, we need a mechanism to override those versions, typically using the latest patch versions.

But where would the application author do the pinning? In the pyproject.toml? I think it should be done with a lock file, so that pyproject.toml still contains at most valid ranges, just as is done for libraries.

4 Likes

Sorry, I have to ask: this recommendation goes against how poetry and pdm provide “dev-dependencies” and how e.g. VSCode expects the project venv to be set up, correct? What should IDEs do that want to access specific, pinned tool versions so that your CI linting matches what you do locally?

1 Like

It does, yes. (Speaking as someone who broadly agrees with @barry’s comment).

The IDE shouldn’t be requiring pinned versions, IMO. That’s not its job (at least, not in the way I use an IDE). I tend to use pre-commit for my linting, and that installs a pinned version of the tools into a dedicated virtualenv, which ensures that I get the same results locally and on CI. This is similar to the way @barry uses tox.

1 Like

As far as I know, the Poetry maintainers have acknowledged this as an issue, and want to fix it. They want to offer ways to install (some of) the tools in a different virtual environment than the one containing the project in development.

1 Like

This suggestion from VS Code will very likely be changing with our new tool-specific extension for performance reasons.

1 Like

So for clarity, the specific thing I’m wondering about is how to hook up e.g. VSCode’s source actions/live checking functionality to the tools installed in the special dev-tooling (tox|hatch|whatever) venv and not deal with the hassle of specifying the exact venv path for every tool for every git repo/workspace I touch. Because I jump around a lot :confused:

The scenario I have in mind (and that I encountered a couple of times in my team) is that it is very easy to rely on the the IDE’s (VSCode in this case) quick actions to format the code or sort imports or whatever using e.g. a pipx- or venv-installed copy because it’s so convenient (and VSCode will do it for you when prompted). You then commit the code and oops, the older pinned version of black in the tox lint env puts a comma back in and fails the job :open_mouth: I pin e.g. dev tools because sometimes I don’t touch codebases for a while and then want to do a quick change and get bogged down fixing random new things pylint found to make CI pass again.

Interesting, I was wondering what they were for.

Anyway, I’m sorry for the thread hijack. I someone thinks this train of thought has merit, I can open a new thread somewhere else?

I think that the way out of this problem is to decouple the way we think about dependencies from the way we think about releases (and how we publish them to pypi).

One part of this we already have in part via the extra_requires machinery. While I have mostly come across this in the context of being able to specify optional dependencies, if we settled on [APP] or [FULLY_PINNED] as a (socially?) standard key that will give you a fully-pinned set of (transitive) dependencies to get the application like behavior. If you try to mix multiple packages in one venv, your on your own, but if you stick to a tool-per-venv than there is a path for maintainers to guarantee there is a set of dependencies that will work as expected. On the other hand, the maintainers can also give you a “here is what we know didn’t work” set of dependencies which can be used by people who are down stream integrators (be they further packagers (conda, linux distros, homebrew, …), people are making in house “blessed” environments for their co-workers, another maintainer making their fully-pinned requirements, or people who just like to be their own integrator) as they see fit.

The second thing, which is a much heavier lift, is a way to update the requirement metadata on pypi without having to do another release of the project. Thus, if a new version of a dependency (or transitive dependency) does break something you can “just” update the metadata and protect your users from that breakage (without preemptively over constraining). Alternatively, as bug-fix versions of dependencies come out, maintainers could update the fully-pinned meta-data (after validation) if they choose. The ability to retro-actively adjust the dependencies is something that downstream integrators already have (both when they re-package and in some cases by mutating the metadata associated with built artifacts). If upstream packages are also expected to provide integration, then we should make sure we have all tools we need!

For small pure-Python packages it is pretty easy to do a release, but for packages with complied extension we are uploading over 30 wheels which are 7-11MB each (e.g. matplotlib · PyPI). Even though the wheel building is mostly automated, it is still a fair amount of effort, compute and disk space.

1 Like

Do you have a link where I could read about that?

1 Like