Native extensions in zipapps

I was wondering if it would now be possible to support loading native extensions in .zip archives.

zipimport would have to match .so files that end with sysconfig.get_config_vars('EXT_SUFFIX') and extract them, then there would need to be platform-dependent code for how they are opened:

On Linux they can be extracted to memory and dlopened, pseudocode:

fd = memfd_create(name, MFD_CLOEXEC)
zipdata.extract(name, fd)
handle = dlopen(f"/proc/self/fd/{fd}")

while on FreeBSD there’s a better libc function:

fd = memfd_create(name, MFD_CLOEXEC)
zipdata.extract(name, fd)
handle = fdlopen(fd)

If this was a feature that was only available on some platforms it would still be a nice win, allowing bundling applications with their dependencies into a single .zip. This is achievable right now but only if the application and all dependencies are pure Python, which is rather limiting.

2 Likes

It seems like it might be a nice thing to have (although personally, I don’t care unless it’s supported on Windows). How useful it would be in practice, though, I don’t know - there’s not been a significant demand in the past for this feature (although whether that’s because the OS support needed is new, I don’t know - I’ve never seen that memfd thing before, although that’s not saying a lot as I’m mostly a Windows developer).

Ultimately, I think that as much as anything it’s just that no-one has cared enough to try to implement it. If you wanted to have a go, maybe it would spark enough interest to make progress. Personally, I think the biggest problem is having it only available on some platforms. If you can’t use a zipapp on some of the platforms you support, you’re going to have to develop a non-zipapp distribution method anyway, and then why not use it everywhere to avoid the extra support costs?

Thinking some more about this, you could likely write your own custom import hook that used these APIs to load extensions. This could be written in pure Python using ctypes, so it could be shipped in your zipapp and loaded (with the existing zipimport loader) when your application starts up.

Writing something that was convenient to use could be a bit messy, but as a prototype it would be fine. And by publishing it, you’d give people the chance to ship zipapps containing extensions. Then, if such zipapps proved popular, that would be a good argument to say “this can be done with a library, but it’s messy - we’ve clearly demonstrated that there’s an audience for this feature, it would make it much simpler to use if it could be added to the stdlib zipimport implementation”.

1 Like

How common is that? I know that isn’t in libc, so is this some inherent syscall to the Linux kernel that’s always available?

Except when you have to explain to folks why that .pyz file won’t work on their OS. :sweat_smile: You would effectively need to add wheel tags to .pyz files to help prevent that (or have some tooling to make it easy to differentiate between .pyz files.

I take it you’re asking for some direct support for this in CPython itself since other tools already exist that do this sort of dynamic unzipping?

I wish I’d known about that when we were building shiv. I think I’ve been talking with @thomas for years about a cross-platform “dlopen-from-memory” API.

2 Likes

I looked into it a bit already and there are a few arguments for why to do this in CPython itself:

  1. zipimport is complex. I thought about forking it to add this but I don’t want to maintain that.
  2. The actual dlopen() is inside dynload_shlib.c, so there isn’t a good API to wrap for this.

Also I guess I wondered whether the expertise of the core devs could come up with solutions for other platforms…

1 Like

I didn’t know about shiv, I’ll give that a try. Reading the motivations section I see it comes from frustrations about the performance of pex, which is where I started with this. Thanks.

1 Like

Props to Loren Carvalho, my former colleague who wrote shiv.

3 Likes