WIP Package manager using __pypackages__ and pyproject.toml

I’m also extremely interested. But I’m not very clear yet whether Windows support is available, or just planned, and until it’s available I’m not likely to be able to try this out.

Also, I’m struggling to get a clear picture of how it would fit with my workflow. Maybe some “typical usage scenario” descriptions would be helpful here. Some specific ones I’d be interested in:

  1. Lots of little analysis and administrative scripts, each one too small to justify a “project directory”, with some common dependencies but also some unique ones per script. My current process is just to dump such scripts in a “work” directory, and use temporary virtualenvs (reinstalling dependencies each time) to run them. This sucks, but I’ve yet to find a better approach.
  2. Jupyter notebook projects, where I don’t want to reinstall Jupyter for each project, but projects do have additional per-project dependencies. Projects are typically big enough to warrant a directory per project, but I also have a “miscellaneous experiments” category here - just having a “Junk” project for those would do, though. Current process here is to have a single shared Jupyter virtualenv which gets cluttered with per-project extra dependencies as needed. Works OK, but suboptimal if I ever want to share an analysis, as I’d need to pick out the specific dependencies I need.
  3. Less likely to be a realistic need until the project matures significantly, but how would a traditional project like pip, virtualenv or similar use this tool? Or is that not a realistic target (and if not, what’s the constraint)? (Let’s ignore the irony of pip using an alternative package management tool - it’s just an example :wink:)

I think ultimately you’ll want the ability to configure multiple python environments in pyproject.toml with different python versions, similar to how tox works. But this sounds like a great first step.

I agree. Something to work on after this part’s stabilized.

Your tool also installs packages directly, right, without relying on pip? I know pip has a whole system for caching packages it downloads; maybe you haven’t gotten around to implementing that yet? But when you do it seems like you could reuse it here.

Correct. Caching’s another todo. If you have multiple environments set up for a project, it currently needs to download the packages for each new one you switch to. Maybe store them in a lib subfolder of ~/python-installs ? Would have to be organized by version.

IMO the simplest way to do this would be to switch from the PEP 582 layout to a more traditional environment layout like tox uses, where you stick built environments into a directory like $PROJECTROOT/.pyenvs .

Could do. __pypackages__ was a somewhat arbitrary choice, based on there already being a PEP for it. Open to suggestions. Executable scripts are currently stored in __pypackages__/3.7/bin.

We have a lot of infrastructure now for building binary packages and figuring out which binary packages will work on which system.

Awesome. That would make things much easier if we could have a bin that works for any Linux distro, for example. Do you have any references or starting points for building linux wheels that work on multiple distros? I expect the Windows and Mac Py builds should be comparatively straightfwd, but haven’t verified this.

When you say differences, do you just mean differences from how existing tools work, or something else? Also, related to this, will it be compatible with existing PEP’s, or does it deliberately depart from the standards in some cases?

Yep. The immediate question people will ask when seeing this project is “What’s the point when we have venv/Pipenv/Poetry etc?” Which existing Peps are you referring to? I haven’t done my homework on this, so probably includes some uneeded departures.

Can you say more about what this is a reference to? Is this something that PyPI can fix, or is it not fixable – what do other tools do to cope with this issue?

Pypi lists dependencies on the Warehouse API under requires_dist. With wheels, this is pulled from the METADATA file in dist-info. For some packages, this will be incorrect; (eg matplotlib) The most common case is requires_dist will be empty (Implying no dependencies), while there will be dependencies that can be found by installing or building a wheel, and checking dist-info. The pydeps API I set up does this and caches it. I’m not sure how Pipenv and Poetry work, but IIRC one of the reasons they can be slow is that this is done on the user’s computer.

Ideally, this will be handled on PyPi’s end, and I’m not sure why it hasn’t been. The API I set up’s a short Python script running on DRF. Related article. I gather that there are some packages that we’ll never be able to properly locate dependencies for, but there are many where can, but PyPi doesn’t. At minimum, needs a differentiator between “no dependencies”, and “can’t find dependencies”.

But I’m not very clear yet whether Windows support is available, or just planned, and until it’s available I’m not likely to be able to try this out.

It works on Windows, except for the Python version management. Need to build the Win Py binaries, and test with them before that works.

Lots of little analysis and administrative scripts, each one too small to justify a “project directory”,

Open question. The tool’s not currently designed for that. Open to suggestions for how to handle. That’s a big issue I run into myself. I agree it needs a solution. I’d considered it out of scope of this project, but perhaps we can figure out a way. There’s no easy/quick solution that exists, and no reason there can’t be.

Jupyter notebook projects, where I don’t want to reinstall Jupyter for each project, but projects do have additional per-project dependencies.

Again, open-question. Maybe something like sub-project folders or workspaces, each with a pyproject.toml? What do you think?

Less likely to be a realistic need until the project matures significantly, but how would a traditional project like pip, virtualenv or similar use this tool? Or is that not a realistic target

Not a realistic target. Expect undefined behavior if you use this with pip, virtualenv etc.

Current limfac on Windows Bins: Can build 3.7, but not older versions. Unable to find msbuild.exe. Have VS2015 selected on the Visual Studio installation manager, but can’t find the file in the Program Files (x86)/Visual Studio 14.0 folder. If I point the Path to the Visual Studio (2019 edition)'s Msbuild/bin path, I receive an error about not having a version of the SDK that’s not listed in the VS Installer.

edit: Found 2015 build tools, fighting through other errors now.

This is quite interesting! I really love the doesn’t run on Python and easy to distribute approach :smiley: I also made a Rust-based POC, but had stalled it for a while due to my lack of free time.

Some initial thoughts:

  • Ad-hoc Windows binary build is quite difficult, especially for 3.6 or earlier. You might find Victor Stinner’s notes useful, but TBH it really is not a good idea to expect user to install e.g. Visual Studio 2015 just for building 3.5 and 3.6. (2.7 and 3.4 are even more pain, but you can just decide to not support them.)
  • I am not really into the ad-hoc build approach as whole though, personally. It’d be more viable to build a common interface around known-to-be-good package sources. Official distributions on python.org already works out of the box for Windows and macOS, and are easily discoverable via registry (Windows) and the plist thing (macOS). For Ubuntu and Debian etc. you can use APT (with deadsnakes), and I believe there are similar repositories for RPM as well. My attempt for Windows.
  • pipenv run has hit quite a few edge cases along the way, and I’ve been re-thinking how it really should work. Maybe it’d be a better idea to parse and run the CLI entry points directly, instead of running binaries/scripts created by pip/setuptools. This might also handle an edge case I didn’t see mentioned: What would happen on pypackage run black if the project has multiple versions of Black installed?

BTW, I strongly object to checking package dependencies with semver. It works extremely well at a small scale, but will break horribly once your tool is widely-adopted—there are just too many Python projects out there not using semver (pytz comes to mind). I recommend sticking to standard version semantics.


Edit: Is there a reference or example of package.lock somewhere? Related topic: Structured, Exchangeable lock file format (requirements.txt 2.0?)

Strongly +1. I’m developing a project (contractor) right now that would benefit greatly from this; right now I’m having some custom scripts to wrap Pipenv calls. More specifically Something like Cargo’s workspace mode would work well (although maybe with better config design; I’ve always found it confusing that they use Cargo.toml for both workspace and project).

My question was initially a reaction to seeing that it’s implementing PEP 582, since it’s not clear that will actually be adopted (and there are some objections). I also wasn’t sure if you were referring to standards or just current tools when you said “to address shortfalls with existing processes.” It was also just a general question about whether you had standards compliance and playing nice with others as an explicit project goal, or if you just wanted to scratch a personal itch. Your reply here made it seem a little like interoperability wasn’t a priority or goal of yours:

Regarding the “custom API to store and query dependency info,” one reason I asked about that was to start to get an idea of which differences are fundamental versus being things that could be implemented by existing tools but simply haven’t been yet. It sounds like this particular one could be implemented by existing tools, whereas the “Doesn’t run on Python,” of course, can’t.

4 posts were split to a new topic: Distributing CPython via PyPI

Found my old notes on this that might help:

Minimal VS workload for building Python 3.5:

  • Microsoft.VisualStudio.Component.VC.140
  • Microsoft.Component.VC.Runtime.UCRTSDK
  • Microsoft.VisualStudio.Component.Windows81SDK
  • Microsoft.VisualStudio.Component.VC.Redist.14.Latest

Need to relaunch the shell if you get error 0xc000007b. Is there a way to work around this?

But this won’t work because
Visual Studio 2017: vcvars for toolset v140

The link describes a bug when you install VS 2015 and 2017 side by side caused by VS changing how toolchains are discovered (IIRC).

This is quite interesting! I really love the doesn’t run on Python and easy to distribute approach :smiley: I also made a Rust-based POC, but had stalled it for a while due to my lack of free time.

Awesome!

Some initial thoughts:

I think asking the user’s computer to build Python from source is out of the question; too finnicky. Ideally, Python.org would host the binaries, as @njs said. Are you referring to the embeddable zip ones listed there for Win? Haven’t tried them.

What would happen on pypackage run black if the project has multiple versions of Black installed?

Currently, if it’s due to different Py versions, it would run whichever one corresponds to the version listed in pyproject.toml. If it’s due to circular dependencies, it would run the one specified directly.

BTW, I strongly object to checking package dependencies with semver. It works extremely well at a small scale, but will break horribly once your tool is widely-adopted—there are just too many Python projects out there not using semver (pytz comes to mind). I recommend sticking to standard version semantics.

Can you provide an example of standard version semantics? I ran into several issues with semvar, some of which are still open questions. (How best to handle a/b/rc? What if 4 digits are specified?) I suspect there are more I haven’t encountered. Open to alternative ways of dealing with this.

Is there a reference or example of package.lock somewhere?

I’ll post one later. It’s inspired by Cargo.lock. Ideally, we’d standardize something with Pipenv and Poetry. A notable addition is metadata about renamed deps, for installing multiple versions.

It was also just a general question about whether you had standards compliance and playing nice with others as an explicit project goal, or if you just wanted to scratch a personal itch. Your reply here made it seem a little like interoperability wasn’t a priority or goal of yours:

Interoperability, and integrating with the ecosytem’s a goal. I agree with @njs that the best way to initiate change is with a proof-of-concept. Willing to deviate from standards where it will improve things.

Regarding the “custom API to store and query dependency info,” one reason I asked about that was to start to get an idea of which differences are fundamental versus being things that could be implemented by existing tools but simply haven’t been yet. It sounds like this particular one could be implemented by existing tools, whereas the “Doesn’t run on Python,” of course, can’t.

That one should be handled by PyPi. IIRC there’s resistance to fixing it due to being unable to find a solution that works in all cases. (Can someone who knows more comment on this? Ie what’s the root cause of PyPi’s dep info being unreliable, when better info’s avail throug wheel METADATA?) Hopefully the custom API’s a temporary workaround, as-is hosting custom binaries.

Found my old notes on this that might help:

Nice! Will keep you posted on if I can get the 3.5 and 3.6 Win binaries built. May hold off on 3.4 if it’s too finicky due to using VS 2008. Can release with a caveat “If you want to use with Python 3.4 on Windows, you must install Python 3.4 yourself”.

We can share it via the same mechanism as pip, if you’re interested in cooperation on the caching front.

The standard for Python package versions is PEP 440.

I’m not clear here what you mean by “PyPI’s dep info”. The formal standard for dependency data is the wheel METADATA file, but that’s not available without downloading the wheel. It would be nice to have a means of getting dependency data without downloading and unpacking the wheel, but that needs a couple of things to happen:

  1. Standardisation of a richer API for indexes than the “simple repository API”, PEP 503, quite possibly just standardising the JSON API.
  2. Exposing package metadata via that API.

But if you’re after dependency metadata, downloading the wheel and reading METADATA is the correct metadata.

For modern distros, you’ll want to get the info from /etc/os-release. For Python, there’s distro (which does a bit more).

As one of the maintainers of Python for Fedora, for now I’d advise you to tell users to use/install Python from the system, e.g. sudo dnf install python34. It should also possible to install automatically and without sudo, but in a distro-specific way.

Lock ex:

[[package]]
id = 3
name = "colorama"
version = "0.4.1"
source = "pypi+https://pypi.org/pypi/colorama/0.4.1/json"
dependencies = []

[[package]]
id = 2
name = "scinot"
version = "0.0.11"
source = "pypi+https://pypi.org/pypi/scinot/0.0.11/json"
dependencies = ["colorama 0.4.1 pypi+https://pypi.org/pypi/colorama/0.4.1/json"]
1 Like

I’m not clear here what you mean by “PyPI’s dep info”.

I’m referring to the Warehouse API. It stores accurate dependency info for many packages, but is missing it for others, and there’s no distinction between missing info, and no dependencies required. I’m using it for finding available versions, digests, and downloading packages, which are all reliable.

I propose that the Warehouse store reliable dependency info.

Sorry, my bad. I’d missed that the JSON API exposes requires_dist. I’d assume that if it’s wrong that’s “just” a bug. Do you have examples of incorrect data? It’s possible that Warehouse extracts the data for new uploads, but is missing it for older files (or maybe for files uploaded manually, rather than via twine, or some such).

Agreed, if the data is being exposed (in a “supported” manner, I’m not sure what the formal status is of the JSON API - https://warehouse.pypa.io/api-reference/json/ doesn’t cover what specific metadata is supported, for example) it should be correct.

The problem with saying PyPI has to reliably provide dependency info, is that this is literally impossible to do :-(. If there are wheels it’s fine, and IIUC in this case PyPI already does what you want. But if there aren’t wheels, then the only way to find out the dependencies is to download the source, build it, and then look at the wheel – and even then it only tells you the dependencies for when it’s built on your machine, not what anyone else might see. This is super annoying but AFAICT it’s unavoidable :-(.

This is also part of why pip has a “wheel cache”, where it stores any wheels it built locally: it also serves as a local database of the dependency info that’s missing from PyPI!

1 Like