WIP Package manager using pypackages and pyproject.toml

David-OConnor · September 6, 2019, 7:03am

I’m posting this here after a suggestion from VorpalSmith on Reddit. I’m working on a package manager that attempts to address shortfalls with existing processes. The Github readme explains my reasoning for a new tool, and am looking for critique/spears/holes/design-flaws etc.

Some differences:

Stores packages in __pypackages__ and uses pyproject.toml.
Uses a custom API to store and query dependency info, since the PyPi warehouse info is inconsistent/unreliable.
Installs multiple versions of a dependency via renaming, if otherwise unresolvable. Has some problems like if relative imports are in binaries, or if another package attempts to use one using this as a dependency. I think it’s still worth it, since there are many cases where Pipenv and Poetry will fail to resolve, where this approach works.
Doesn’t run on Python, so isn’t sensitive to how it’s installed.
Aims to be as easy-to-use as possible, ie the user should never have to consider environments / terminal state, features considered out-of-scope etc.

In particular: What do you think of the multiple-version approach? Alternative implementations? Traditionally, this has been treated as unfeasible. My thought is that it works in many cases, so allow it, even though it’s not perfect.

njs · September 6, 2019, 7:56am

That’s me The thread is here if anyone else is curious:

https://www.reddit.com/r/Python/comments/czk4z3/a_new_package_manager/

David-OConnor · September 8, 2019, 5:58am

I’ve mostly implemented the Python version management you proposed. GH Repo with binaries as releases.

API: User specifies a version in pyproject.toml. To change versions, the user changes this value. If none is specified, the program asks which to use, then writes it to the file. Or run pypackage switch 3.7 etc.

What happens: When running (eg pypackage install, pypackage python), it checks for either a python on the path matching the major and minor version specified, or one installed in ~/python-installs. If it finds one, it uses it. If it finds multiple, it asks which to use. If it doesn’t find any, it downloads and extracts to ~/python-installs, then uses that.

What’s broken: I’m unsure how broadly-compatible the binaries are. Ie, will the ones I built on Ubuntu 19.04 work on older versions of Ubuntu? What about Windows - will it work on any Windows 64-bit? Can the code tell what Linux distro a user is on, or do we need to ask? Checking Win vs Linux vs Mac is easy via conditional compiling. The code I pushed is hard-locked to Ubuntu for now, and I’ve only uploaded Ubuntu binaries. (The dependency management etc works for other OSes, but will break if you set a specify a version that’s not avail on the path.)

Storing binaries as Zip currently. .tar.xz appears to produce files half the size, but I can’t figure out how to parse them in Rust. The Zip extraction lib I’m using also appears to break symlinks, and therefore this process. Ie it works if I extract the Py bin manually, but the program will produce a broken version.

njs · September 8, 2019, 7:15am

Sweet!

So to summarize for folks who didn’t read the reddit thread: IIUC, @David-OConnor has a standalone Python package manager built in Rust, designed for managing project dependencies, and apparently it can now bootstrap a Python environment from scratch as well as install packages into it, all configured via pyproject.toml. It seems like something that other folks here would be interested in (CC: @dstufft @techalchemy @bernatgabor @uranusjr and uh… does anyone know the appropriate folks to CC from poetry?)

I think ultimately you’ll want the ability to configure multiple python environments in pyproject.toml with different python versions, similar to how tox works. But this sounds like a great first step.

Your tool also installs packages directly, right, without relying on pip? I know pip has a whole system for caching packages it downloads; maybe you haven’t gotten around to implementing that yet? But when you do it seems like you could reuse it here.

IMO the simplest way to do this would be to switch from the PEP 582 layout to a more traditional environment layout like tox uses, where you stick built environments into a directory like $PROJECTROOT/.pyenvs. This is more compatible with other tools (e.g. I know @brettcannon was saying that a frustrating thing for Visual Studio Code trying to support pipenv is that they want to find the python environment so they can run autocompleters etc. and pipenv putting it in a weird place makes that hard), it gives you a place to put executable scripts, and it makes projects more standalone. Again though, not necessarily the first thing to figure out…

We have a lot of infrastructure now for building binary packages and figuring out which binary packages will work on which system. I think the thing to do is to re-use that – so e.g. you could have a manylinux2010_x86_64 build of CPython, that you build using the manylinux2010_x86_64 docker image, and your tool would install it on the same systems that support manylinux2010_x86_64 python packages. You could probably even convince auditwheel to run on your zip files to make them self-contained. E.g. you don’t want to depend on the system version of openssl, because it’s not compatible across systems; you want to ship a copy of openssl inside the interpreter. Auditwheel has lots of smarts to handle this for wheels, and this is basically the same problem with a slightly different container.

cjerdonek · September 8, 2019, 10:18am

A couple quick questions:

When you say differences, do you just mean differences from how existing tools work, or something else? Also, related to this, will it be compatible with existing PEP’s, or does it deliberately depart from the standards in some cases?

Can you say more about what this is a reference to? Is this something that PyPI can fix, or is it not fixable – what do other tools do to cope with this issue?

pf_moore · September 8, 2019, 11:39am

I’m also extremely interested. But I’m not very clear yet whether Windows support is available, or just planned, and until it’s available I’m not likely to be able to try this out.

Also, I’m struggling to get a clear picture of how it would fit with my workflow. Maybe some “typical usage scenario” descriptions would be helpful here. Some specific ones I’d be interested in:

Lots of little analysis and administrative scripts, each one too small to justify a “project directory”, with some common dependencies but also some unique ones per script. My current process is just to dump such scripts in a “work” directory, and use temporary virtualenvs (reinstalling dependencies each time) to run them. This sucks, but I’ve yet to find a better approach.
Jupyter notebook projects, where I don’t want to reinstall Jupyter for each project, but projects do have additional per-project dependencies. Projects are typically big enough to warrant a directory per project, but I also have a “miscellaneous experiments” category here - just having a “Junk” project for those would do, though. Current process here is to have a single shared Jupyter virtualenv which gets cluttered with per-project extra dependencies as needed. Works OK, but suboptimal if I ever want to share an analysis, as I’d need to pick out the specific dependencies I need.
Less likely to be a realistic need until the project matures significantly, but how would a traditional project like pip, virtualenv or similar use this tool? Or is that not a realistic target (and if not, what’s the constraint)? (Let’s ignore the irony of pip using an alternative package management tool - it’s just an example )

David-OConnor · September 8, 2019, 1:35pm

I think ultimately you’ll want the ability to configure multiple python environments in pyproject.toml with different python versions, similar to how tox works. But this sounds like a great first step.

I agree. Something to work on after this part’s stabilized.

Your tool also installs packages directly, right, without relying on pip? I know pip has a whole system for caching packages it downloads; maybe you haven’t gotten around to implementing that yet? But when you do it seems like you could reuse it here.

Correct. Caching’s another todo. If you have multiple environments set up for a project, it currently needs to download the packages for each new one you switch to. Maybe store them in a lib subfolder of ~/python-installs ? Would have to be organized by version.

IMO the simplest way to do this would be to switch from the PEP 582 layout to a more traditional environment layout like tox uses, where you stick built environments into a directory like $PROJECTROOT/.pyenvs .

Could do. __pypackages__ was a somewhat arbitrary choice, based on there already being a PEP for it. Open to suggestions. Executable scripts are currently stored in __pypackages__/3.7/bin.

We have a lot of infrastructure now for building binary packages and figuring out which binary packages will work on which system.

Awesome. That would make things much easier if we could have a bin that works for any Linux distro, for example. Do you have any references or starting points for building linux wheels that work on multiple distros? I expect the Windows and Mac Py builds should be comparatively straightfwd, but haven’t verified this.

David-OConnor · September 8, 2019, 1:45pm

When you say differences, do you just mean differences from how existing tools work, or something else? Also, related to this, will it be compatible with existing PEP’s, or does it deliberately depart from the standards in some cases?

Yep. The immediate question people will ask when seeing this project is “What’s the point when we have venv/Pipenv/Poetry etc?” Which existing Peps are you referring to? I haven’t done my homework on this, so probably includes some uneeded departures.

Can you say more about what this is a reference to? Is this something that PyPI can fix, or is it not fixable – what do other tools do to cope with this issue?

Pypi lists dependencies on the Warehouse API under requires_dist. With wheels, this is pulled from the METADATA file in dist-info. For some packages, this will be incorrect; (eg matplotlib) The most common case is requires_dist will be empty (Implying no dependencies), while there will be dependencies that can be found by installing or building a wheel, and checking dist-info. The pydeps API I set up does this and caches it. I’m not sure how Pipenv and Poetry work, but IIRC one of the reasons they can be slow is that this is done on the user’s computer.

Ideally, this will be handled on PyPi’s end, and I’m not sure why it hasn’t been. The API I set up’s a short Python script running on DRF. Related article. I gather that there are some packages that we’ll never be able to properly locate dependencies for, but there are many where can, but PyPi doesn’t. At minimum, needs a differentiator between “no dependencies”, and “can’t find dependencies”.

David-OConnor · September 8, 2019, 1:54pm

But I’m not very clear yet whether Windows support is available, or just planned, and until it’s available I’m not likely to be able to try this out.

It works on Windows, except for the Python version management. Need to build the Win Py binaries, and test with them before that works.

Lots of little analysis and administrative scripts, each one too small to justify a “project directory”,

Open question. The tool’s not currently designed for that. Open to suggestions for how to handle. That’s a big issue I run into myself. I agree it needs a solution. I’d considered it out of scope of this project, but perhaps we can figure out a way. There’s no easy/quick solution that exists, and no reason there can’t be.

Jupyter notebook projects, where I don’t want to reinstall Jupyter for each project, but projects do have additional per-project dependencies.

Again, open-question. Maybe something like sub-project folders or workspaces, each with a pyproject.toml? What do you think?

Less likely to be a realistic need until the project matures significantly, but how would a traditional project like pip, virtualenv or similar use this tool? Or is that not a realistic target

Not a realistic target. Expect undefined behavior if you use this with pip, virtualenv etc.

David-OConnor · September 9, 2019, 1:30am

Current limfac on Windows Bins: Can build 3.7, but not older versions. Unable to find msbuild.exe. Have VS2015 selected on the Visual Studio installation manager, but can’t find the file in the Program Files (x86)/Visual Studio 14.0 folder. If I point the Path to the Visual Studio (2019 edition)'s Msbuild/bin path, I receive an error about not having a version of the SDK that’s not listed in the VS Installer.

edit: Found 2015 build tools, fighting through other errors now.

uranusjr · September 9, 2019, 7:40am

This is quite interesting! I really love the doesn’t run on Python and easy to distribute approach I also made a Rust-based POC, but had stalled it for a while due to my lack of free time.

Some initial thoughts:

Ad-hoc Windows binary build is quite difficult, especially for 3.6 or earlier. You might find Victor Stinner’s notes useful, but TBH it really is not a good idea to expect user to install e.g. Visual Studio 2015 just for building 3.5 and 3.6. (2.7 and 3.4 are even more pain, but you can just decide to not support them.)
I am not really into the ad-hoc build approach as whole though, personally. It’d be more viable to build a common interface around known-to-be-good package sources. Official distributions on python.org already works out of the box for Windows and macOS, and are easily discoverable via registry (Windows) and the plist thing (macOS). For Ubuntu and Debian etc. you can use APT (with deadsnakes), and I believe there are similar repositories for RPM as well. My attempt for Windows.
pipenv run has hit quite a few edge cases along the way, and I’ve been re-thinking how it really should work. Maybe it’d be a better idea to parse and run the CLI entry points directly, instead of running binaries/scripts created by pip/setuptools. This might also handle an edge case I didn’t see mentioned: What would happen on pypackage run black if the project has multiple versions of Black installed?

BTW, I strongly object to checking package dependencies with semver. It works extremely well at a small scale, but will break horribly once your tool is widely-adopted—there are just too many Python projects out there not using semver (pytz comes to mind). I recommend sticking to standard version semantics.

Edit: Is there a reference or example of package.lock somewhere? Related topic: Structured, Exchangeable lock file format (requirements.txt 2.0?)

uranusjr · September 9, 2019, 7:44am

Strongly +1. I’m developing a project (contractor) right now that would benefit greatly from this; right now I’m having some custom scripts to wrap Pipenv calls. More specifically Something like Cargo’s workspace mode would work well (although maybe with better config design; I’ve always found it confusing that they use Cargo.toml for both workspace and project).

cjerdonek · September 9, 2019, 8:01am

My question was initially a reaction to seeing that it’s implementing PEP 582, since it’s not clear that will actually be adopted (and there are some objections). I also wasn’t sure if you were referring to standards or just current tools when you said “to address shortfalls with existing processes.” It was also just a general question about whether you had standards compliance and playing nice with others as an explicit project goal, or if you just wanted to scratch a personal itch. Your reply here made it seem a little like interoperability wasn’t a priority or goal of yours:

Regarding the “custom API to store and query dependency info,” one reason I asked about that was to start to get an idea of which differences are fundamental versus being things that could be implemented by existing tools but simply haven’t been yet. It sounds like this particular one could be implemented by existing tools, whereas the “Doesn’t run on Python,” of course, can’t.

brettcannon · September 10, 2019, 9:44am

4 posts were split to a new topic: Distributing CPython via PyPI

uranusjr · September 9, 2019, 8:20am

Found my old notes on this that might help:

Minimal VS workload for building Python 3.5:

Microsoft.VisualStudio.Component.VC.140

Microsoft.Component.VC.Runtime.UCRTSDK

Microsoft.VisualStudio.Component.Windows81SDK

Microsoft.VisualStudio.Component.VC.Redist.14.Latest

Need to relaunch the shell if you get error 0xc000007b. Is there a way to work around this?

But this won’t work because
https://social.msdn.microsoft.com/Forums/vstudio/en-US/a1b9e6a1-6bd6-43b0-9df5-d620c377d2ff/visual-studio-2017-vcvars-for-toolset-v140

The link describes a bug when you install VS 2015 and 2017 side by side caused by VS changing how toolchains are discovered (IIRC).

David-OConnor · September 9, 2019, 9:00am

This is quite interesting! I really love the doesn’t run on Python and easy to distribute approach I also made a Rust-based POC, but had stalled it for a while due to my lack of free time.

Awesome!

Some initial thoughts:

I think asking the user’s computer to build Python from source is out of the question; too finnicky. Ideally, Python.org would host the binaries, as @njs said. Are you referring to the embeddable zip ones listed there for Win? Haven’t tried them.

What would happen on pypackage run black if the project has multiple versions of Black installed?

Currently, if it’s due to different Py versions, it would run whichever one corresponds to the version listed in pyproject.toml. If it’s due to circular dependencies, it would run the one specified directly.

BTW, I strongly object to checking package dependencies with semver. It works extremely well at a small scale, but will break horribly once your tool is widely-adopted—there are just too many Python projects out there not using semver (pytz comes to mind). I recommend sticking to standard version semantics.

Can you provide an example of standard version semantics? I ran into several issues with semvar, some of which are still open questions. (How best to handle a/b/rc? What if 4 digits are specified?) I suspect there are more I haven’t encountered. Open to alternative ways of dealing with this.

Is there a reference or example of package.lock somewhere?

I’ll post one later. It’s inspired by Cargo.lock. Ideally, we’d standardize something with Pipenv and Poetry. A notable addition is metadata about renamed deps, for installing multiple versions.

David-OConnor · September 9, 2019, 9:07am

It was also just a general question about whether you had standards compliance and playing nice with others as an explicit project goal, or if you just wanted to scratch a personal itch. Your reply here made it seem a little like interoperability wasn’t a priority or goal of yours:

Interoperability, and integrating with the ecosytem’s a goal. I agree with @njs that the best way to initiate change is with a proof-of-concept. Willing to deviate from standards where it will improve things.

Regarding the “custom API to store and query dependency info,” one reason I asked about that was to start to get an idea of which differences are fundamental versus being things that could be implemented by existing tools but simply haven’t been yet. It sounds like this particular one could be implemented by existing tools, whereas the “Doesn’t run on Python,” of course, can’t.

That one should be handled by PyPi. IIRC there’s resistance to fixing it due to being unable to find a solution that works in all cases. (Can someone who knows more comment on this? Ie what’s the root cause of PyPi’s dep info being unreliable, when better info’s avail throug wheel METADATA?) Hopefully the custom API’s a temporary workaround, as-is hosting custom binaries.

David-OConnor · September 9, 2019, 9:10am

Found my old notes on this that might help:

Nice! Will keep you posted on if I can get the 3.5 and 3.6 Win binaries built. May hold off on 3.4 if it’s too finicky due to using VS 2008. Can release with a caveat “If you want to use with Python 3.4 on Windows, you must install Python 3.4 yourself”.

pradyunsg · September 9, 2019, 9:16am

We can share it via the same mechanism as pip, if you’re interested in cooperation on the caching front.

pf_moore · September 9, 2019, 9:23am

The standard for Python package versions is PEP 440.

I’m not clear here what you mean by “PyPI’s dep info”. The formal standard for dependency data is the wheel METADATA file, but that’s not available without downloading the wheel. It would be nice to have a means of getting dependency data without downloading and unpacking the wheel, but that needs a couple of things to happen:

Standardisation of a richer API for indexes than the “simple repository API”, PEP 503, quite possibly just standardising the JSON API.
Exposing package metadata via that API.

But if you’re after dependency metadata, downloading the wheel and reading METADATA is the correct metadata.

WIP Package manager using __pypackages__ and pyproject.toml

WIP Package manager using pypackages and pyproject.toml