Package-specific module search paths

I am trying to create a monorepo where my app is split up into separate packages with their own pyproject.toml/requirements.txt files. For example, I might have service-a and service-b which both depend on my-shared-pkg.

The challenge is that the services may use different versions of the same package.

Each package has its own .venv virtual environment.

So what I want is that when an import occurs in service-a, it will look inside of service-a/.venv/lib/.../site-packages, and service-b would search in service-b/.venv/lib/.../site-packages.

Options

  • sys.path is global and would not allow importing different modules depending on which package the caller file is in.
  • __import__ monkey-patching would potentially be able to solve it but would require patching a lot of the internal import machinery. Also, sys.modules caching would need to be modified to allow import foo to import 2 different versions. IDE support would become difficult too because typically they only allow customization of the PYTHONPATH - although in PyCharm you could create separate modules in the same project perhaps with their own PYTHONPATH.
  • sys.meta_path - import foo is a “top-level import” so the path param of MetaPathFinder#find_spec is always None. There seems to be no way to access the path of the file that contains the import statement from a meta_path hook.
    • I was thinking of perhaps setting a global var of the caller path by monkey patching __import__.
    • NOTE: _gcd_import should probably be monkey-patched instead because its used by importlib.find_module which some packages may be using.
  • Monkey-patch __import__ and modify sys.path on each import based on the path and then reset it. Might have to clear the sys.modules cache too.

IMO instead of doing elaborated import lookups at runtime, it would be much simpler to link the modules into virtual environments at install time. I mentioned a while ago on pypa/packaging-problems how you can build a script to share the same module installation between environments, and still looking forward to someone producing a functional proof-of-concept :slightly_smiling_face:

That doesn’t solve using different versions for different 3rd party packages. I agree, for 1st-party packages symlinking in the venv is best.

Why not? You can only have one version of a given package in an environment, so you link that into it when populating the environment. But you know your workflow best, so feel free to ignore me 🤷

Not sure I follow what you proposed.

Yes, one venv can only have one version.

But I want to have multiple venvs, and pick the correct venv based on file location of the import caller at runtime. I see no way to do this at install time with the current Python module search algo.

If you want to have multiple versions of the same module, the most robust approach today would be to vendor packages and rewrite imports at install time.

It will to be tough to pull this kind of thing off because of global sys.modules. Suppose service-a and service-b have their own venv’s with different versions of something common like sqlalchemy. Even if you could pull off the path thing, service-a would import its version of sqlalchemy, service-b would also try to import sqlalchemy, notice it was already in sys.modules and skip any paths. Then suppose you changed the import order and switched sqlalchemy versions by surprise.

It would be more usual to make each service a distribution with its own setup.py or pyproject.toml-declared dependencies. And build the composed application in its own .venv even if the pieces are tested in separate environments. I like using editable installs for this. You can have your own code available in one or more enviorments and only have to reinstall when the dependencies change.

service-a would import its version of sqlalchemy , service-b would also try to import sqlalchemy

These are known as peer dependencies in JS land. They could just be installed in a root venv.

It will to be tough to pull this kind of thing off because of global sys.modules.

To get around this, I’m thinking of monkey-patching sys.modules by replacing it with my own class instance implementing the dict/MutableMapping interface.

For writes, I will try to use the module’s path to create a compound key of `name

  • closest venv dir`.

One entry in sys.modules looks like the following which makes me think I can extract the path from it.

 'main.conf': <module 'main.conf' from '~/dev/monorepo/packages/main/main/conf.py'>,

For reads, I will store the path of the module with the import statement in a global variable and then search upwards to find the closest venv and use that

I’m sure I will run into problems though.

It would be more usual to make each service a distribution with its own setup.py or pyproject.toml -declared dependencies.

Symlinking dev packages didn’t work for me with Poetry - just left an foo.egg-link in site-packages.

When developing a package, wouldn’t you run into the same problem of peer deps - where two packages need to share one dep?

I suppose everything is a peer dependency in Python. The installer is supposed to help you make sure that there’s no conflicting requirements. It would be “interesting” to try to make it work like JavaScript but you’re going against the grain.

The egg link and a .pth file add your editable distribution to that environment’s search path. (Try printing sys.path). So it’s not necessary to symlink the individual .py files.

I suppose everything is a peer dependency in Python.

True. I don’t think its an optimal situation though - doesn’t encourage separation of concerns and makes spitting things up really tricky.

It would be “interesting” to try to make it work like JavaScript but you’re going against the grain.

I can’t figure out a way to make monorepos with multiple services work in Python otherwise. But maybe editable dists is the way.

The egg link and a .pth file add your editable distribution to that environment’s search path.

Cool didn’t know this - thanks!

You’ll be wondering if pkg_resources (that finds .egg-info and .dist-info) has or had a separate search path than Python’s sys.path. It does! This is the difference between having the SQLAlchemy==x.y distribution installed and merely being able to import sqlalchemy. It might do that for performance reasons by having a shorter path, or so that it can know when you changed the path and might have added distributions.

It’s fine if .egg-link points to a directory containing .dist-info even though it’s named after the older .egg with its .egg-info.

1 Like

There’s no guarantee that will work. I’m not sure if there’s code somewhere which requires an actual dict instance (or at least a subclass).

My small hack seemed to work fine.

By chance - do you know of a way to access the file that contains the import statement being processed from within a sys.meta_path hook or a similar mechanism?

I’d like module resolution to change depending on the location of the file that made the import call. meta_path looks like the ideal place, except you only get the module name, and the path if its a sub-package…

First off, I want to state I think you’re playing with fire here and heavily customizing import has a tendency to cause plenty of trouble down the road.

With that warning out of the way, you want to look at sys.path_hooks for the top-level directories as they come in via sys.path and then the cached finders in sys.path_importer_cache() will be used subsequently for module and subpackages. See https://docs.python.org/3/library/importlib.html#setting-up-an-importer for some details, otherwise the language reference or the source for importlib will have all the details.

1 Like

Are editable dists that depend on other editable dists suppose to work?

Say foo depends on bar which depends on baz…all in the same repo.

poetry new foo
poetry new bar
poetry new baz

cd foo
poetry build
poetry add ../bar

cd ../bar
poetry build
poetry add ./baz

cd ..
poetry run python -m foo

foo finds bar but bar can’t find baz.

I checked the easy-install.pth files and the references don’t exist.

Seems like quite a shortcoming of editable dists. You can really only work on one at a time…so no chance of splitting our multiple packages in a monorepo without rewriting importing.

Or have a missed something?


Also,

cd bar
poetry add cowsay
echo "import cowsay >> __init__.py"

cd ../main
poetry run python -m main

# ModuleNotFoundError: No module named 'cowsay' 

So bar can’t find its own dependencies. What is suppose to happen for bar to see its own dependencies? Are they suppose to be installed in foo's venv? I guess the idea is to just have a single venv…

I’m not sure what you’re doing, maybe it’s a poetry quirk? Usually multiple editable installs in the same venv work. When you install the second one it notices its dependency, the first one, is already installed. Works fine.

I was using multiple venvs so that I had my packages isolated from one another. That is, foo can only see its dependencies, and bar can only see its. But this is not possible in Python it seems.

I think edtiable dists and a single venv is the best approach for me.

Correct. The point of different venvs is to isolate code, so it would be very strange if code from one venv depended on code in another.

1 Like