Pip options for controlling use of prebuilt packages

[Broken out from the PEP 517 bootstrapping thread]

We’ve hinted at the idea that the need for bootstrapping is coming up more than it probably needs to because people are using --no-binary :all: when something more specific would meet their needs, simply because it’s convenient and often has no downside beyond making the installation a bit slower. The slowness and the potential for things going wrong will increase as more projects use PEP 517/518 mechanisms. Can we make it easier to be more specific?

Cases I can imagine include:

  1. build a single package from source, without the awkward duplication of --no-binary pygame pygame.
  2. build all packages to be installed from source but satisfy build dependencies normally
  3. skip any platform specific wheels but allow universal wheels (ensure extension modules are built locally)

This is also tangentially related to the idea that --no-binary could build a wheel locally and install it, excluding only pre-built wheels from consideration.

1 Like

Is this achievable with a combination of --platform and --implementation options? If so, this could be more a problem of feature visibility.

Is there any reason this can’t simply be an issue on the pip tracker? It would make it far more visible to the users of pip, who are the ones likely to be affected, and it would allow us to link to it from other --only-binary related issues.

1 Like

@pf_moore sorry, the pip tracker would probably have been a better idea. I was writing other posts here and didn’t consider if it was the right venue. Do you want me to open an issue there and try to redirect discussion, or shall we continue here now that it’s started?

I think you’re right, and I spent a while trying to make the ‘pure Python wheels are source’ argument with Flit, because it didn’t initially make sdists at all, and people complained that it was essential to have a ‘source’ version of the package alongside the wheel.

But I can also see that if you have an infrastructure that needs to build some things from sdists, it might be easier to do that for everything than to exclude some wheels. And the universal wheel tag only means that it’s platform independent, not that there’s no build step - e.g. it might contain minified Javascript. (I know sdists can have built artifacts too, but hopefully we’re moving towards that being less common.)

It is at least partially. The help says --implementation py will ‘force implementation-agnostic wheels’, which would rule out anything with (CPython) extension modules. I think there are some packages which build a C library and then wrap it with CFFI, so they might produce wheels that are platform specific but Python implementation independent. It’s not clear if --platform none will work.

Given that discussion has started here, maybe just open a tracker issue for pip, and note in the description that discussion is occurring over here. I do think it should be visible on the pip tracker in some form.

I think the big issue here is purely and simply that this isn’t in practice a technical matter - even though logically, pure Python wheels are just as much source code as sdists, and even though people wanting everything from source seem happy to exclude building tools like pip from source, logical arguments aren’t going to win in a situation like this. Both sides simply end up being frustrated and no progress gets made.

I think the way you framed the original posting is the way to go here - look at how we can make pip’s command line options align better with specific, well-defined behaviours and leave it to others to build a workflow based on those, rather than trying to aim for vague, somewhat aspirational targets like “build everything from source”.

Done: Easier options for controlling use of prebuilt packages · Issue #6271 · pypa/pip · GitHub

Let’s back up a second. So far, I have not been convinced that you understand these users’ needs better than they understand their own needs. This means that to me at least, your posts here aren’t coming across as clever and incisive, they’re coming across as borderline-CoC-violating rants about how you don’t respect other community members. Personally, I’ve found it’s more productive to assume that when other people seem be acting like idiots, it’s because I’m missing some crucial information. Have you talked to any of these users about why they prefer sdists? What did they say? Why should I believe your version, as compared to alternatives like – they rationally prefer sdists because they’re easier to patch?

OK, I will delete all my posts on this topic. Sorry to have offended you.

I don’t think this is absolutely true FWIW, pure Python does not mean that there wasn’t a build step in there.

Some examples:

  • The 2to3 code that was the original recommended plan for libraries to support Python 2 and 3 in the same code base.
  • Libraries like PyYaml (which actually isn’t a pure Python wheel, but that doesn’t matter for this case) who install different code on Python 2 and Python 3.
  • Packages that build and minify JS at build time (I believe sentry does this?), although maybe this stretches the bounds of “Pure Python”, but as far as the platform tags is concerned, it is Pure Python.

It’s important to remember I think that a tag of py3-none-any wheel does not mean “Pure Python” or that there was no “real” build steps to go from Source to Wheel and that it was essentially just a copy into the archive, although that is the most common case, it means that this wheel targets Python 3 with no Python ABI or Platform restrictions. It’s perfectly valid to have a single py3-none-any wheel that say, packages a bunch of binary .so and .dll files in them and will dynamically select the right one at runtime with a Pure Python fallback.

IOW, the platform triple on a Wheel is describing the environment in which the wheel is valid for, not the contents of said wheel.

1 Like

Dude, I’m not offended and I don’t want to drive you away or something. I just want you to show a bit more empathy for users, and a bit more humility about the uncertainties and complexities that make packaging so difficult. It’s really not easy to do, I know that, and we all struggle with it. But it’s important.

This works best for e.g. our corporate environment.

A few ideas for possible spellings:

  • Fake extra: pip install foo bar[.sdist] - avoids repeating the package name, makes it easier to see which package is being handled specially. On the other hand, it’s a new bit of special syntax for people to learn and code to handle.
  • pip install --from-source foo bar - this could mean that just foo and bar are built from sdists, or it could include their dependencies (but not build dependencies).
  • pip install --build-locally foo bar - similar, but this hints at the possibility of caching and reusing wheels with some metadata indicating that they are built on this machine, not downloaded from PyPI.
  • Separate command: pip install-from-source foo bar.

I haven’t thought deeply about any of these yet.

This one looks cool, and tickles my “ooh a secret handshake” fancy. Which probably means it’s a good reason not to adopt it, unless we make it a PEP somewhere for all package installation front-ends to adopt.

I like this one, but not for both foo and bar. I’d prefer a syntax that says “please install only the next package from source”. As such, it behaves like -e/--editable, that installs only the next package in editable mode.

The reason is that I’d like these switches to behave consistent with -e when I embed them in a -r/--requirement (i.e. requirements.txt) file.

Same applies for the --build-locally version, though I’d be perfectly OK with --from-source caching the built wheel. Not caching the wheel should be the responsibility of another switch, as it could apply to wheels as well, specially when being fetched from something other than PyPI where we don’t have the immutability guarantee (think network shared directory full of wheels).

I’d also like shorthand version for these parameters, like -s or -l.

I suspect whether --from-source foo bar means both from source or only ‘foo’, it’s easy for someone to mistakenly assume it means the other one. Maybe any such option should only be allowed with a single package name, to avoid the ambiguity?

Even though I started this discussion to try to find more convenient spellings, I’m inclined to add long options first, and let people make the case for single-letter options once they’ve started using them. Apart from a few common conventions like -v for verbose, I think single-letter options really hurt readability, and should be avoided unless it’s something you’re typing very often.

How that would look in a requirements.txt file?

Right now I consistently request a --no-binary=psycopg2 install of psycopg2 due to it’s warning against installing the wheel version, and I usually add it on the same line in requirements.txt where I add psycopg2 itself. E.g.:

pandas==0.24.2
--no-binary=psycopg2 psycopg2==2.7.7
sqlalchemy==1.3.1
-e src/some-local-dependency-I'm-developing
-e .

Replacing --no-binary=psycopg2 psycopg2 with --from-source psycopg2 would be an improvement, IMO.

If we only allow the switch to apply to a single package, I think the requirements.txt case will either not work or work differently in a requirements.txt context than out of it.

The --editable switch already established the precedent of applying modifiers only to the following package, so I don’t think we need to be too afraid of people interpreting it otherwise.

Sounds reasonable.

I was thinking that the limitation would only apply on the command line. In a requirements.txt file it’s already unambgiuous that it only refers to that line. I’ve no idea if it’s easy to implement that distinction in pip, though.

This is a side-note, but FYI: this issue was misdiagnosed, and the docs about why the wheels don’t work are misleading. The problem is really that psycopg2 is buggy on older openssl, like the version they use to build their wheels. All they really need is for someone to update their wheel-building scripts to use a more recent openssl version.

There are other reasons to avoid prebuilt packages so don’t let me derail the thread. But if you’re figuring out how to work around the psycopg2 issue, I figure it can’t hurt to know more about your options :-).

1 Like