WIP Package manager using pypackages and pyproject.toml

encukou · September 9, 2019, 9:49am

For modern distros, you’ll want to get the info from /etc/os-release. For Python, there’s distro (which does a bit more).

As one of the maintainers of Python for Fedora, for now I’d advise you to tell users to use/install Python from the system, e.g. sudo dnf install python34. It should also possible to install automatically and without sudo, but in a distro-specific way.

David-OConnor · September 9, 2019, 9:51am

Lock ex:

[[package]]
id = 3
name = "colorama"
version = "0.4.1"
source = "pypi+https://pypi.org/pypi/colorama/0.4.1/json"
dependencies = []

[[package]]
id = 2
name = "scinot"
version = "0.0.11"
source = "pypi+https://pypi.org/pypi/scinot/0.0.11/json"
dependencies = ["colorama 0.4.1 pypi+https://pypi.org/pypi/colorama/0.4.1/json"]

David-OConnor · September 9, 2019, 9:58am

I’m not clear here what you mean by “PyPI’s dep info”.

I’m referring to the Warehouse API. It stores accurate dependency info for many packages, but is missing it for others, and there’s no distinction between missing info, and no dependencies required. I’m using it for finding available versions, digests, and downloading packages, which are all reliable.

I propose that the Warehouse store reliable dependency info.

pf_moore · September 9, 2019, 10:37am

Sorry, my bad. I’d missed that the JSON API exposes requires_dist. I’d assume that if it’s wrong that’s “just” a bug. Do you have examples of incorrect data? It’s possible that Warehouse extracts the data for new uploads, but is missing it for older files (or maybe for files uploaded manually, rather than via twine, or some such).

Agreed, if the data is being exposed (in a “supported” manner, I’m not sure what the formal status is of the JSON API - JSON API - Warehouse documentation doesn’t cover what specific metadata is supported, for example) it should be correct.

njs · September 9, 2019, 10:48am

The problem with saying PyPI has to reliably provide dependency info, is that this is literally impossible to do :-(. If there are wheels it’s fine, and IIUC in this case PyPI already does what you want. But if there aren’t wheels, then the only way to find out the dependencies is to download the source, build it, and then look at the wheel – and even then it only tells you the dependencies for when it’s built on your machine, not what anyone else might see. This is super annoying but AFAICT it’s unavoidable :-(.

This is also part of why pip has a “wheel cache”, where it stores any wheels it built locally: it also serves as a local database of the dependency info that’s missing from PyPI!

pf_moore · September 9, 2019, 10:57am

True. I was assuming (probably incorrectly because I’m too close to the problem ) that the expectation was that PyPI reliably provides “what metadata it can”, i.e., if there’s a wheel, then the metadata from that wheel is correctly exposed without needing to download the wheel. If there’s a case where there’s a wheel with metadata, and what PyPI exposes doesn’t match that, then yes, I’d say let’s have specifics and we can investigate.

But “obviously” (to me!) if there’s no wheel metadata then PyPI shouldn’t provide anything - and tools have to be prepared to deal with that case. Sorry for not being clearer.

David-OConnor · September 9, 2019, 11:18am

Do you have examples of incorrect data?

https://pypi.org/pypi/matplotlib/json

David-OConnor · September 9, 2019, 11:20am

The problem with saying PyPI has to reliably provide dependency info, is that this is literally impossible to do :-(.

We can do better than what it currently does. Fix cases I posted like matplotlib. (Which is not an isolated example), and if there are no wheels, indicate explicitly that there’s no way to reliably get dep info. Ie @pf_moore’s last past.

pf_moore · September 9, 2019, 12:28pm

Thinking some more about it, given that there may be multiple wheels per version, and each wheel can have different dependency metadata, a JSON endpoint that’s at the package level or the version level is at the wrong level to be reporting dependency metadata anyway.

Of course, most of the time metadata will remain the same for every wheel in a version, but that’s by no means guaranteed. So the only guaranteed-valid place to get metadata from is the wheel. As @njs says, that’s annoying but unavoidable (given the existence of build tools like setuptools that can introspect the target system at build time and generate metadata dynamically based on that).

Static metadata is a goal we’ve talked about for a long while, but it’s a hard problem (it’s simple enough for 90% of projects which have straightforward needs, but that remaining 10% isn’t going to go away )

David-OConnor · September 9, 2019, 12:42pm

Let’s say we’re at 80% valid dep info on the warehouse now. I suspect we can get to 95% by picking a wheel at random for each package (or building one using python setup.py bdist_wheel) and inspecting its METADATA. I think we should take that route vice stagnating because we can’t get 100%.

uranusjr · September 9, 2019, 1:39pm

Pipenv also explored using the JSON API a while ago, but eventually came to the conclusion that it’s not worth it. Aside from the various problems Warehouse has inspecting upload artifacts, there’s a fundamental problem in how the API presents the information.

By design, each wheel uploaded (for the same version) can have different sets of dependencies, but the JSON API would only display a random (the first uploaded? I have no idea) set for each version. Sdists are even worse, since the same artifact would prodoce different dependencies running on different machines! (e.g. if setup.py inspects C libraries on build time) This makes some of our users, uh, unsatisfied.

Now you might say this is an upstream problem; Python offers enough declarative syntax for packagers to declare unified dependencies for artifacts of the same version. But in practice maintainers have different reasons to not do it. Legacy consideration is one common reason (some high-profile projects support as low as pip 1.5). Or the maintainers might just not care; I’ve had pull requests rejected because things already work, and they see no value of improving the metadata. And honestly, why should they care?

In the end we ditched all JSON API calls for straight download-build-inspect, since it is the only reliable way to get the dependency set the user expected.

I guess what I’m getting to is that while it might seem like an easy step to fix PyPI’s dependency presentation, it also might not be as worthwhile as you expect

David-OConnor · September 11, 2019, 7:17am

Implemented something for administrative scripts, ie @pf_moore’s point one. API is pypackage script myscript.py. Deps are pulled by parsing the file’s imports, then set up in ./python-installs/script-envs. Not robust, but I think quick+dirty’s all that can be expected here; the point is to make the API as simple as possible, and require no config. An open question is how to clean up scripts no longer used, and how to differentiate them. For the latter, right now we use the filename as a unique identifier, but this will cause problems if you have multiple files with the same name. Could have an optional unique identifier in a comment in the file.

ihnorton · September 11, 2019, 12:17pm

The following exo-build system works very well IME: GitHub - python-cmake-buildsystem/python-cmake-buildsystem: A cmake buildsystem for compiling Python (supports 2.7).

You could also look at the build script in the conda-forge python feedstock.

pradyunsg · September 11, 2019, 12:34pm

A slightly less magical suggestion – a __requires__ variable that’s a list of strings, with PEP 508 dependencies, that the tool installs to the environment, before running the scripts.

That would mean running the script with pypackage, would ensure these would be present unconditionally, which is great and ensures that the imports and requires are in sync.

One example that comes to mind, where this is useful, is “msgpack-python” which is imported as msgpack, but msgpack is a different project on PyPI with a different API (or at least it used to be that way, as per my memory). Anyway, PyPI has no requirement for the import names and package names to be in sync, so I imagine being be more robust toward that is fairly important.

pf_moore · September 11, 2019, 1:33pm

See pip-run which uses this approach.

David-OConnor · September 11, 2019, 3:01pm

Done. ____

WIP Package manager using __pypackages__ and pyproject.toml

WIP Package manager using pypackages and pyproject.toml