Thanks for working on this @virtuald. It’d be quite useful to document the recommended design pattern here better and make it easier to implement. I know NumPy and SciPy plan to put OpenBLAS in a separate wheel and rely on that (only for their use, not for anyone else’s), and the PyArrow folks are similarly planning to split up their one large wheel into multiple ones (e.g., put libarrow in a separate wheel). It’s still nontrivial to get this right across OSes, as discussed in the posts above.
I do have to caution that this is only a reasonable idea if you control all the separate pieces. That is the case for RobotPy it sounds like, and also for NumPy/SciPy and PyArrow. The OpenSSL story is a bit different. The original proposal here and pynativelib don’t really have a good upgrade path - given how PyPI and wheels work, it is not a healthy idea to go all-in on separate wheels with non-Python C/C++/etc. libraries. For this to work, you need a way to roll out upgrades when you need to make backwards-incompatible changes, switch compilers, etc. Linux distros, Conda, Homebrew & co all have that, and that kind of setup is the right one for broader use of non-vendored libraries. I wrote about this in a fair amount of detail in pypackaging-native, see for example PyPI’s author-led social model and its limitations and https://pypackaging-native.github.io/meta-topics/no_build_farm/.
I think you’re right, but I’m glad to see people exploring solutions like this for cases where it can work. It’s very clear that there are a lot of potential problems that might arise, and the failure modes (e.g., segfaults) are not great. But if we accept that it’s not a perfect solution, IMO a “good enough” answer will be useful for a lot of people.
Wheels themselves were designed as a “good enough” solution, and we got a long way before their limitations started to be a problem. Of course you could argue that we’re now suffering from not having looked closely enough at the risks - but I feel that this is a case where “Now is better than never” applies (and IMO we’ve spent long enough trying to find ways forward that we’ve satisfied the demands of “Although never is often better than right now” )
I may be missing something, and maybe this isn’t directly an answer for this situation, but it seems like this is precisely the case addressed by conda? In conda binary dependencies like that become separate packages and then everything calls them out explicitly.
Note: I wrote this some time ago, but forgot to post it – turns out that this is now a good time
I just want to comment on a this:
While conda is widely used by data scientists, there is nothing about it that is specific to that use case. If there are not conda packages available for your types of projects, that’s because no one has bothered to build them – not because conda is somehow not suited to other fields.
And with the advent of conda-forge – it’s actually pretty easy to make packages available for the community to use.
Like any new system, conda had some growing pains – it’s a lot better now. But I would note that those of us that use conda do so because:
“issues they had with using pip, and the few times I had to interact with pip I didn’t have a particularly good experience”
My, and many others’ experience is that conda is massively easier when you have to deal with non-pure-python dependencies. [*] – which is exactly what this thread is about.
Maybe conda is poorly designed or implemented, but I think that the challenges of conda are because it is trying to solve a very hard problem. If pip+wheel, etc is expanded to address those same problems, it will have the same difficulties.[**]
If the community decides it wants pip+wheel to solve these issues – great – but they are not easy problems, and you will find that you are reimplementing much of conda. (at the very least, learn from it – most of the issue I see being discussed in this thread have been solved in conda (maybe badly, but at least look)
Don’t forget that conda was developed precisely because the PyPA (well, I don’t think it existed then, but the Python packaging community anyway) specifically said that they were not interested in solving those problems.
It’s come a long way since then, but the challenges are still there, as you can see.
[*] Note: the non-pure python dependency thing is a big deal for users, but an even bigger deal for package developers – for the most part, the pip+wheel solution is to “vendor” all the libs needed for a package
[**] Note2: IN my experience when conda does not work well for someone, it is caused by one of three reasons:
The packages they need are not built for conda:
this is much smaller deal than it used to be, because of conda forge, and because adding a few packages with pip to a conda environment works pretty well. Unless it’s not easily pip-installed anyway, but pip doesn’t solve that.
1b) They don’t know about conda-forge
They are a bit confused about what conda is and how it interacts with pip, virtualenv, etc. – so try to build a virtualenv within a conda env; use pip to install / upgrade packages that they should install with conda (even pip itself).
This is a tough one, but it’s about education – one of the main sources of problems is that tutorials, etc often start with "make a venv… without any explanation of why, or whether you need to, or … I have literally had students that thought making a venv was something specifically Flask needed to run…
I believe I addressed this in the OP under “Sidebar: Why not conda?”
Oops! Indeed, I had forgotten about that sidebar by the time I got through the detailed description of all the problems that arise when not using conda for the problems that conda is explicitly designed to solve.
Conda: Myths and Misconceptions | Pythonic Perambulations has a nice overview of how conda ended up being a separate tool; the short story is that Guido van Rossum quite explicitly considered these types of binary dependency issues to be out of scope for the main Python packaging tools.
I agree with Chris that the complexities of conda are inherent to any system that takes binary dependencies seriously, and I don’t think it’s reasonable to hope that a system not designed to handle such dependencies will somehow be patched up to do a better job than conda.
I agree with this as well. I felt similarly about the “posy” tool mentioned a few weeks ago on this forum.
These ideas are interesting and cool as explorations of alternative ways to do things, but they essentially seem like taking the long way around to do what conda already does. “I had some problems with conda when I tried it” seems to be a fairly common sentiment (which surprises me as it hasn’t been my own experience), but I think those problems are still small relative to the problems that must be solved to re-implement something like conda.
To be fair, though, conda has its own share of warts and oddities that potentially would be easier to address with a new system than with an incremental evolution from current conda. (For instance, the activation process may be more baroquely flexible right now than it really needs to be.) And anyone who’s been reading some of the other threads on here knows I’m all for a clean sweep in general. It’s just that there’s a lot to handle to really get native dependencies working as smoothly as they do with conda.
At the end of the day you will end up following in the footsteps of the conda, Fedora, Debian et al. Pick your poison.
The only thing you have to decide is what trade offs you will accept and what features you cannot do without.
If the only things that you package in pure python then you lose.
It you will not package C, C++, fortran, rust then you cannot win.
My experience is with packaging extensions like pysvn. That needs lots
of non-python dependencies built. I have the skills to do this, and I’m well
aware that other people do not have the time to learn these skills. They
just want to ship their cool python package.
But it seems that OSX can be made happy as far as the library shares the same identifier.
So in the case where I have a libtwo package that depends on a library provided by the libone package. The libone package is the only one that must have the @loader_path/.dylibs/libname.so set as delocate does.
The libtwo package can have any fake reference (like LIBONE-DYLIBS/libname.so) as far as that reference is the same as the identifier of the library itself set with install_name_tool -id
@loader_path/.dylibs/libfoo.so (set by delocate)
libone-dylibs/libfoo.so (its own ID, set by consolidatewheels)
/DLC/libtwo/.dylibs/libbar.so (its own ID, set by delocate)
libone-dylibs/libfoo.so (set by consolidatewheels)
@loader_path/.dylibs/libbar.so (set by delocate)
@loader_path/../libone/.dylibs/libfoo.so (set by delocate)
That makes so that the python packages could be installed anywhere, even in different directories and they will continue to work.