Dependency management with many private / local packages

vorticon · September 30, 2024, 3:20pm

Hi there,

I’d love to get some advice on dependency management.

General context:

At my workplace, we’re developing a complex set of software packages (in the robotics / motion control field - but that’s not important here)
It’s 90% Python (any non-Python code is embedded in Python packages)
Currently we have 25 distinct Python packages, each with its pyproject.toml. Some of them (less than 10) are required in more or less every use case, some others are specific to one use case
The entire project is research-oriented and private, currently used by a small number of individuals, but it’s a long-term development effort and we want to develop & maintain it efficiently
We use a “rolling release” approach with respect to both internal and external dependencies (i.e. we always use the latest version of everything, unless there is a good reason to use an older one).
All our 25 packages are under VCS in one single git repository

Question:
What would be the most appropriate approach to manage our internal dependencies (i.e. dependencies between our 25 private packages)?

First of all, I want to be able to install the required (for a use case) set of packages in a reproducible way, with automatic resolution of internal and external dependencies. Normally, I want to install all our private packages in “editable” mode. I currently don’t need a build system. Version management is not an important aspect either (since we use rolling releases).

We tried the following approaches:

Maintain a shell script to install the required packages as needed for various different use cases - basically, execute “pip --no-deps --editable $PACKAGE_NAME” in a loop. This works, but it doesn’t seem to be a very smart solution (dependencies must be maintained manually in the shell script).
Use dynamic dependencies with a setup.py which (for every package) makes the private dependencies link to the specific local path (like “package_name @ file:///path/to/package”). This works and is more or less portable & reproducible, but it doesn’t install editable dependencies, and again - it’s a workaround, not really an elegant solution.

TLDR:
Is there a simple solution for resolving project-internal dependencies, without a private package index server, and with the possibility of all-editable package installations?

Thanks for any advice or opinions!

vorticon

abravalheri · September 30, 2024, 3:39pm

Is there a simple solution for resolving project-internal dependencies, without a private package index server, and with the possibility of all-editable package installations?

I am afraid the most suitable solution for private packages is indeed a private package index (like devpi or pypiserver).

You can use pip’s configuration files to tell your machines how to access the private index without the need of explicitly using a URL.

vorticon · September 30, 2024, 3:54pm

I have considered a private package index, but I think the effort of running extra infrastructure will probably not pay off. I tried to use a local package index without a server (like “–extra-index-url file:///…”), but I failed, I assume it’s not supported for some reason.
Then, there is still the issue of not being able to install the dependencies in editable mode - and even if I do it manually, they are often unintentionally replaced with non-editable installs by pip.
I thought my situation is not super exotic and there should be a solution in any of the packaging tools (poetry, uv, pipenv …) - but I couldn’t find one.

steve.dower · September 30, 2024, 4:00pm

simpleindex can let you configure a directory as a source of packages and run a local server to provide access to it.

Unfortunately, your situation does seem to be somewhat exotic. Editable installs and monorepos don’t really mix, and certainly aren’t mixed enough by the contributors here for them to be well understood.

What’s not clear to me from your description is whether the monorepo you have is also the one that your developers are working in, and then whether you always match the versions of packages from within the monorepo or if they might vary?

If the packages from within the monorepo always have to match, then you can likely get away with setting PYTHONPATH or creating a .pth file including the source directories directly. You also intrinsically need to have a single consistent list of external dependencies that apply to all the projects within the monorepo, which means you would install all of them and then refer to the various projects within the repo.

If either of these conditions are not true - either you want to mix-and-match versions of projects from the monorepo, or you want a different set of dependencies based on which projects from the monorepo you intend to use in a certain environment - you really do have a set of independent packages. The only way you’ll find sanity is for users to use prebuilt packages from a package feed and not touch the monorepo at all, and potentially you’ll want to split up the monorepo so that each projects lives on its own.^[1] Anything blurring these lines tends to become a problem.

Thereby removing the temptation to break cross-compatibility, or to skip the install of the correct versions. ↩︎

barry · September 30, 2024, 5:15pm

I’m not sure my use case is your use case, but we did something similar at a previous employer. Some of our decisions which seemed to work well^[1] were:

We had a namespace package which was the umbrella for all subpackages.
Use exact version pins for packages within that namespace.
Floating versions for external dependencies
Monorepo git
Use hatch with plugins to provide cross-package development and editable installs.

It worked pretty well, in that if you were a developer for one of the subpackages, you could pretty much do your own thing and not worry too much about dependencies. The namespace package maintainer had to maintain exact pins for subpackages and internal extras, and the hatch plugin “futzed” with those dependencies during development and testing.

I think that’s mostly true. You might be able to get it work “well enough” for your use cases, but if you can’t, it’s gonna be pretty painful. I wonder if @jaraco has more thoughts about this with Coherent.

I left before we had any long term experience with it ↩︎

mdrissi · September 30, 2024, 6:27pm

Hmm my work uses monorepos + editables pretty heavily. Our solution boils down to,

requirements.in
-e path_to_lib1
-e path_to_lib2
…

pip compile (or now uv pip compile) that file and use --no-emit-package to exclude those editable dependencies. You get a requirements.txt file with all non-editable dependencies. Install that with --no-dependencies. Then install requirements.in with no dependencies. For your case each use case should have it’s own separate requirements.in/requirements.txt files. Like requirements-usecase1.in, requirements-usecase1.txt.

We do have a wrapper bash script that then handles actual install commands just to hide that from average developer who is less familiar with all these commands needed. That bash script has grown some but is still fairly simple and been used for years. I don’t think you should aviod/reject having to write a several dozen line script that has flags for different use cases and then chooses right requirements files to install in one or more commands.

hauntsaninja · October 1, 2024, 12:02am

I work in a research setup and we faced a similar problem.

I ended up just building my own lightweight package manager (called oaipkg), because editable installs are great and can be instant. It’s scaled not terribly, even as my company 20x-ed in headcount, 70x-ed in number of projects, and requirements have changed.

The rough description is:

Every project is defined by a pyproject.toml (inside a directory of a matching name)
A pyproject.toml can define a tool.oaipkg section, which has a monorepo-dependencies key
oaipkg discovers projects by directory walking (with some optimisations to make this fast)
To install a target, oaipkg walks the first party graph to collect first and third party dependencies. It shells out to uv or pip for third party dependencies and it implements a fast editable install itself with no build backend in sight:
- Make an .egg-info, .egg-link, write to easy-install.pth, install entrypoints
- You could also make a .dist-info and separate .pth file, this is more modern, but interpreter startup is slower

There are some twists for extension modules (we build wheels and key them by a tree hash), bootstrap, first party projects in other repos, locking, testing, various interesting integrations. But if you’re predominantly pure Python, I think it’s a good setup. Installs are really fast (e.g. if you don’t need to install a wheel, it’s sub-second).

vorticon · October 1, 2024, 5:02am

@all thanks for your ideas.

Yes and yes, everyone is working in the monorepo, and the individual package must always match for any given repo snapshot. So yes, your solution would work, I think it’s not very different from our solution #1 (install via shell script with pip --no-deps --editable …). However the external dependencies are not always the same. We always need some well-maintained packages (numpy, numba, casadi, …), but many others are needed only in specific cases and I don’t want to make everything dependent on them. There is also a lot of visualization stuff which I use most of the time (for development / testing), but for any headless / embedded application they are just bloatware and can really complicate things.

Thanks barry, mdrissi, hauntsaninja as well. I think for now I tend to a shell install script and I agree that’s often the most straightforward way to a custom solution if there’s no “out-of-the-box” solution available. I’ll have a look at hatch once again, as well - for me, the Python packaging world is a jungle, I couldn’t really get an overview of everything yet.

shiny_red_scale · December 9, 2024, 3:53pm

Hello there,

Sorry to piggyback on this thread, but I have a similar use case at work.
This quote from @steve.dower really struck me because I don’t understand it.
Is this a philosophy issue, or a technical issue ?

This might be a really dumb question but here goes - how do major python monorepo work in terms of local developer experience if they don’t use editable builds ?

Thanks !
Gaspard

steve.dower · December 9, 2024, 8:18pm

Plenty of ways. All that you need to make things work is for sys.path to include the parent directory of your packages, and for any compilation steps to have run. I personally like generating .pth files and using in-tree builds, but the main monorepo I work in (not public) has a single shared package and all the rest are independent. I could do an editable install of that core package, but just adding it to sys.path is sufficient, and far more reliable in my experience. In real builds, it gets built first and installed properly to build the rest.

If you want to dig into the most complex (public) monorepo I’m aware of, it’s here: https://github.com/Azure/azure-sdk-for-python I believe some of their scripts will automatically do editable installs, though I believe many/most internal dependencies get installed from pre-built packages. I haven’t had to use this repo since it had 1/100th the number of packages, but it’s a similar pattern - a handful (~8-10 IIRC) packages are dependencies for all the rest, but the rest are largely independent.

hauntsaninja · December 10, 2024, 1:50am

There is no insurmountable technical issue. This is what we do at work (monorepo with well over a thousand Python packages).

I think if you give up subsecond latency between edit code and run code, you lose one of the big advantages of Python.

shiny_red_scale · December 11, 2024, 5:25pm

Thank you both @steve.dower and @hauntsaninja !

After reading your answers carefully and re-reading the documentation of tools I already use, I’m going to rely on uv for local editable packages.

It does rely on the .pth trick mentioned earlier under the hood !

thanks again,
Gaspard