Drawing a line to the scope of Python packaging


(Tzu-ping Chung) #1

Another topic in the Big Picture thread I found interesting :slightly_smiling_face:

This problem is also not specific to data science (and Python) IMO. In web, Django packages provide front end stuff, and how should they be managed together with npm/yarn and various precompilers? In GUI land we got Qt plugins; PyQt (and PySide IIRC) bundle Qt in the module, but why can’t I link them with my already-installed Qt instead to save hundreds of MB (for each venv), like rust-qt? People in those areas seem to feel satisfied enough with pip, but I feel it is legitimate to draw parallels to the data science world.

In the end, every domain-specific package manager would need to draw a line somewhere to keep itself on the slippery slope of dependencies. The question is, then, where and how should the line be decided?


(Thomas Kluyver) #2

Interesting question. Technically, the examples you mention are in two distinct directions:

Web frontend stuff hasn’t needed any special affordances in packaging, as far as I know. It’s just opaque data files from our perspective, which are valuable for all sorts of purposes. If it ever did, I’d be inclined to say no: JS has various package managers already which solve broadly similar problems to Python tooling. I’m sympathetic to tool fatigue in frontend development, but I don’t think it’s reasonable to ask the Python packaging ecosystem for extra work to avoid the JS ecosystem.

C and C++ (both for data science and GUIs) is a different story, for two reasons. First, the C API and the ease of using extension modules have always been a strength of CPython (the reference implementation and most widely used Python interpreter), and it’s crucial to be able to effectively distribute extension modules. Second, C/C++ doesn’t have a generally accepted standard package manager of its own. Package managers like apt and homebrew aren’t easy to integrate with, because they’re designed to install packages systemwide rather than for a specific environment.

Conda is the exception here. It looks like the holy grail of packaging: a cross-language, cross-platform package manager which knows about environments. The reluctance I’ve seen to use it comes from two angles: the perception that it’s for data science (somewhat self-reinforcing, as general purpose libraries may not be published for conda) and concerns about its tight connection to Anaconda, Inc. I have enough sympathy with this that I think the main Python packaging ecosystem should continue providing a practical alternative, not just point to conda for anything difficult.

I haven’t worked out exactly where the lines should be, but that rationale explains why I think they should be drawn further out in one direction than in another.


(Steve Dower) #3

Let’s not dive too deep down this hole ourselves. @willingc and @pzwang are both interested/actively looking at this area for all of Python, including packaging, and have experience with the various models used for it (e.g. Personas, customer development, etc.). This is a great opportunity to figure it out for all of the things we do, and having relative outsiders (from packaging) make the start is going to negate a lot of our biases.


(Donald Stufft) #4

Or introduce different biases.


(Steve Dower) #5

True, but since they can’t actually force us to do anything, another point of view won’t hurt :slight_smile:


(Donald Stufft) #6

Sure, different PoVs are fine, as long as we don’t pretend they’re not just differently biased, because if we pretend they’re going to somehow undo our own biases by being neutral, then we’re likely to just end up with a poor result.


(Carol Willing) #7

I am so glad to see @takluyver on this thread. He understands packaging so well from both a pip and conda perspective. He also has the respect of conda and pip maintainers. I hope you keep adding your insights.