Vendoring third party libraries in CPython

Just out of curiosity, how do you decide when to vendor a third party library? Bundling a library removes the need for support code, and it gives CPython total control over how to build the third party library. I don’t know if the pros outweigh the cons though.

The reason I ask, is because for the sqlite3 module, there’s no straight forward way to detect how the SQLite library was built at compile time1. There’s no define that tells me if SQLite was built with R*Tree support, or if it was built without double-quote literal support. For instance, if a user built their SQLite library with SQLITE_OMIT_AUTOINIT defined, Python would segfault at sqlite3.connect(...), and there’s no way for the sqlite3 module to detect if the SQLite library was built with that option. Bundling SQLite would make a lot of things easier.

1) You can run sqlite3 ":memory:" "pragma compile_options" to fetch compile options

I’m not aware of any policy we have. There are only very few 3rd party libraries that we ship with Python and most of those are small. There’s one exception: OpenSSL on Windows ships with the Windows installers, since the OS does not typically provide this out of the box. But this is a special case.

I remember that expat was added to Python, since it required some changes and maintenance had stalled at the time. @fdrake should have more details.

You can find a list of 3rd party libraries included in Python in our license document: History and License — Python 3.9.2 documentation (this is not necessarily complete; the source tree may have more).

Some considerations (in no particular order):

  • the license has to be compatible to the Python license
  • packages should be stable to not require frequent updates
  • it should be possible to replace the vendor package with a system installed one via compile time switch, if the code typically is available as an OS library (e.g. as in --with-system-expat)
  • someone needs to feel responsible for keeping the vendor package up to date
  • vendoring packages always introduces the risk of losing access to security fixes applied in OS packages

That last item is a consideration which is important in contexts, where users rely on LTS style OS support, but I guess users would go with the OS provided Python version as well and OS maintainers will obviously make sure that those fixes also get into the OS Python version.

Specifically for SQLite: Wouldn’t it be possible to have Python’s setup.py test the OS SQLite library at compile time and also have the module itself apply some checks during startup to verify a few assumptions ?

1 Like

On Windows, isn’t it just a case that we bundle everything because there’s no package manager that lets us depend on non-system DLLs? Or is this question targeted at the Unix/Mac platforms?

The question is not targeted at a specific platform. The Windows and macOS installers already bundle SQLite. Perhaps my post was a little bit vague: I’m exploring the possibility to add the SQLite sources to CPython and always build the sqlite3 module with them. Kind of like the Windows and macOS installers already do.

1 Like

Thanks your input, @malemburg.

Thanks for the link.

The SQLite license should not be a problem, as it is public domain. It is however frequently updated, but just keeping track with landmark releases (and of course bugfix releases) should be sufficient. If we allow overriding the library with a system installed library, we must keep all the support code. In that case I guess we’re better off with the status quo.

Yes, that might be the best solution, but it could be tricky with multiple SQLite installations.

In Fedora (and RHEL) we try to have only one version of each library on the system. This makes updates (esp. security ones) much easier.
IOW, we’d unbundle SQLite anyway, and I’m sure a lot of other distros would. And since python.org doesn’t ship Linux binaries, relying on the distros instead, I’m afraid the bundling wouldn’t solve much.

Should that check be added to the tests, or to the build system, instead?

2 Likes

While that shouldn’t present any licensing issues, I don’t think that’s a good idea to pursue. In general, we try to avoid such couplings with a few notable exceptions that have proven to be bothersome at times. Most downstream third-party suppliers of Python would not be happy with such a move and would likely just strip it out since they want to minimize the size of their distributions for their users: why have more than one copy of a library? And I don’t see that this would solve any big problem. There would still be dependencies on other key third-party libraries, like zlib and OpenSSL. Why special case SQLite? Better to make it easier for users to install all of these libraries, from third-party distributions or by building them all from source, and have the Python build link with them.

1 Like

I guess you’re right about that.

Yes. Generating a “pysqlite.h” config file would definitely help! I think I’ve got a patch for setup.py somewhere on my drive. I’ll play a little bit more with it.

Thanks, @nad. You’re right, it’s probably not a good idea to pursue; there’s a lot of issues I didn’t think of :slight_smile: Thanks for all your input!

1 Like

Regarding your original problem, does libsqlite not offer a function to check if a feature is enabled that a Python binding could be made for?

For most features, that’s unfortunatly not possible. One possibility could be to create a set of «sub-modules» for additional features (like the R*Tree extension). At module init, I could execute a «pragma compile_options» query, and then import submodules for the features the library support.