Can we finally add a minimal API to pip?

ssbarnea · December 11, 2019, 12:38pm

PIP has no public API and in fact a lot of effort was invested into explaining why it does not, even if again and again people are asking about, some brave one even using it and paying the cost.

Over the years I had few cases where I ended up doing a hacking ad-hoc module installation when I got an ImportError, very similar to the py2/py3 import approach. I am fully aware that this may and will not work for any case but it does still resolve a real life issue.

we generally won’t fix issues that are a result of using pip in an unsupported way.

Considering that the current maintainers even documented I see no chance of being able to do anything towards creating an API because that is not desired.

I kinda find the documentation on the subject bit contradictory because it starts reasoning why this is not good while saying that pip developers are not against the idea but few paragraphs later it does say that they close tickets as wont fix: mainly confirming that they are against. And that is what is really happening – at least I cannot complain that documentation is out of sync with real practices.

Adding a minimal API with tests on it should be quite easy but this could happen only if there

PS. CLI is not API, lets not try to fool ourselves. If we go this way, we can consider a web application an API as you can use selenium to automate it.

uranusjr · December 11, 2019, 1:13pm

Everything the documentation page you linked still holds true; why would the conclusion be any different?

encukou · December 11, 2019, 1:53pm

Starting this discussion was on my TODO list for a while, so let’s go:

CPython’s ensurepip uses pip’s internal API. It is kept updated of course, but it does cause problems when pip is unbundled, which we do in Fedora.
Unbundling allows updating the pip wheel and CPython independently. IMO, that is a good thing: with CPython+pip bundled, most new venvs will show warnings about out-of date pip.

Ensurepip needs to set up some state (sys.path) before calling pip. The same effect can’t be had with environment variables: PYTHONPATH is not appended to the very beginning of sys.path.

Note that ensurepip is* also a command-line tool – in sole control of the environment, with no other code to worry about. The reasons against using pip as a library don’t really apply in ensurepip’s case.
* actually: ensurepip should be only a CLI tool, IMO. The note in the docs should be updated to match reality.

Can we instead have an API to call pip with a list of CLI arguments in the current process (and perhaps: then exit, since the the process is in an unknown state)?

dustin · December 11, 2019, 2:46pm

Not sure if this is exactly what you’re asking for, but there is this project (which I maintain): pip-api · PyPI

I created this because I wanted an importable API, but also wanted to respect pip’s command-line interface.

pf_moore · December 11, 2019, 2:52pm

Generalised “can we have an API” questions always go nowhere, because there’s nothing concrete to discuss.

To make any progress, someone would need to:

Propose an actual API that they would want exposed.
Explain how it would be maintained and supported, and review and clearly explain how supporting that API would (or would not) constrain our ability to make future internal changes to pip.
Offer to write the documentation and the code for the API, and provide support for a suitable period of time.

The reason we don’t have an API is fundamentally because none of the pip developers are going to do any of the above (for whatever reason - as volunteers, we don’t have to justify what we are willing to do).

Even if someone did provide all that I asked for above, that’s still no guarantee that we’d accept the proposal. There’s also a “what direction do we want pip to go in” question that would need to be addressed.

My personal stance, and I believe that of the other pip developers, is still that people wanting a programmatic API to any packaging type functionality should be writing new libraries based on standards (either existing ones, or ones that they propose for adoption). In all likelihood, pip will then vendor such libraries, which will ensure feature parity between pip and the libraries.

Again on a purely personal basis, I’m open to someone proposing an (extremely limited) API for pip. But I have yet to see any such proposal that has really thought through the issues that would be involved in doing so. An educational exercise for anyone looking at proposing an API for pip would be to go through pip’s issue history looking for ways that even people just using pip.main have hit issues in the past (thread safety, messing with global logging state, changing process-global data like the cwd, assuming exclusive use of stdio streams…) and writing up how they would be handled.

See above, regarding pip.main. This is the nearest to achievable that I can imagine (and it’s still a lot harder than you seem to be imagining). But I’m unclear how that would be detectably different from a subprocess call (an API that is identical to a subprocess call is pointless, because you can just call pip in a subprocess and be done with it!).

encukou · December 11, 2019, 4:29pm

But I’m unclear how that would be detectably different from a subprocess call (an API that is identical to a subprocess call is pointless, because you can just call pip in a subprocess and be done with it!).

The reason I can see is that interpreter-wide settings like sys.path and Python flags like -I, -B, -v aren’t passed to subprocesses.

A concrete proposed API for wrapper tools like ensurepip:

The main function takes a list of CLI arguments, runs pip with those arguments in the current process (with the current sys.path and Python flags like -I, -B, -v), and then exits (as with sys.exit()).
After pip’s main has finished, the interpreter is in an unknown state; attempts to ignore the resulting SystemExit are unsupported. The main call is not thread-safe.

pf_moore · December 11, 2019, 4:53pm

Thanks. As I said above, I recommend that you check pip’s tracker history. For example, from what I recall, the calling program cannot use threads. Not just that main isn’t thread-safe, but even having threads in the main program confuses pip. Also, the calling program probably shouldn’t configure logging, or pip’s logging config might not work as expected. I’m pretty sure I recall tickets on both of those points, but I’ve no idea of any of the details.

But if you want to put together a full proposal/PR on the basis of just this, I wouldn’t personally reject it out of hand. I can’t speak for other pip maintainers, of course (see Create a supported "high level" programmatic API for pip · Issue #3121 · pypa/pip · GitHub, which was ultimately rejected, where I proposed something along these lines, but a bit more ambitious, myself).

pradyunsg · December 11, 2019, 5:22pm

I concur. All that @pf_moore has said above is something that’s in line w/ my thoughts as well.

pradyunsg · December 11, 2019, 5:24pm

I’m personally in favor of changing to a runpy call for that call-site – bpo-38488: Upgrade bundled versions of pip & setuptools by xavfernandez · Pull Request #16782 · python/cpython · GitHub.

pf_moore · December 11, 2019, 7:22pm

I’ve not really looked into the details of using runpy in this situation, and honestly I’m too tired right now to think through the implications, but would it be practical to document using runpy as an alternative “supported way of running pip”? I don’t know if it would behave closer to the way @encukou was suggesting (respecting sys,path and Python flags like -I, -B and -v) but if so it might be sufficient to handle his use case.

pradyunsg · December 12, 2019, 6:16am

runpy provides the functionality of replicating python <dir/file> or python -m ... in-process. Essentially, it allows running Python scripts just like invoking the interpreter would.

Think of invoking with runpy as executing pip.__main__ as __main__, in the same interpreter process. Since it’s in-process execution, I’m not comfortable making this a “supported” way for using pip, as part of a bigger application right now.

The main benefit of using it this way, is that we can change how pip.__main__ works (and where our main function resides) – as long as python -m pip works and users of runpy don’t do things like invoke pip twice in the same process or affect logging etc, pip should work fine – there’s no guarantee that everything will be OK, but this is definitely better and more robust than from pip._internal.main import main; main(["install", "six"]).

If someone wants to explore a “pip API”, I’d suggest identifying + documenting what global state pip affects/depends on. Based on this, we can try trimming that list and having very defensive¹ code in pip.__main__ to then enable us supporting “pip-via-runpy” as a supported API.

¹ To allow only “interpreter states” that we know pip works in (allow-filter, not a block-filter) since that’s the best way to keep things simple. As an example, pip doesn’t work if you’ve already configured logging or it’s being run a second time in the same process etc. The investigation should identify other “required” constraints like this.

pf_moore · December 12, 2019, 8:33am

Thanks for the explanation. I agree that the fact that it’s in-process makes it as problematic as any other “pip API” proposal. But it would be nice to capture the fact that this is the best way to run pip.main for those applications (like ensurepip) that are doing so. Maybe we should add a comment to pip.main itself, saying something like:

Do not run this directly! Running pip in’process is unsupported and unsafe. Also, the location of this function may change. If you have to call this function, do so using runpy as follows:
sys.argv = ["pip", your, args, here]
runpy.run_module("pip", run_name="__main__")
This still has all of the issues with running pip in-process, but ensures that you don’t rely on the (internal) name of the main function.

(Having written this up, I can confirm that it’s tricky to get right, so I would like it documented ) I’ll take the suggestion to the pip tracker.

pradyunsg · December 12, 2019, 9:13am

The only two arguments for maintaining a “pip API” that have been stated here are:

sub-processes have overhead / CLI is not an API
Python flags aren’t passed to subprocesses

I do not see the former as a good reason to justify the costs of maintaining such an API in pip. The latter is a bit more compelling (and tricky) and definitely warrants more discussion (as noted already).

As noted in OP, we’ve put in a lot of effort to communicate clearly why pip doesn’t have an API. Unless someone solves those issues or provides a robust way to avoid them without significantly increasing maintenance workload, I’m weary of adding an API. I think at this point everyone involved knows this but it doesn’t hurt to reiterate: preparing, implementing and maintaining an API would be a lot of work and we don’t currently have the maintainer availability to deal with that.

In my opinion, even if we have significantly greater maintainer availability (eg. funded maintainer roles), there are more impactful changes to invest effort into, compared to “provide an extremely constrained and limited API for something that’s already possible with a subprocess call (even though the exact invocation may be tricky)” – I’d still prefer that we work on more impactful enhancements and toward reducing maintenance overhead.

pradyunsg · December 12, 2019, 10:14am

For anyone interested in following this, Paul has filed a PR for this: Add a comment showing how to call main using runpy by pfmoore · Pull Request #7471 · pypa/pip · GitHub

bernatgabor · December 12, 2019, 11:54am

Personally I’m fine with only giving CLI level API for pip. E.g. only allowing stuff such as:

from pip import run

return_code = run(["list"])  # one can intercept stdout/stderr here for more details

return_code = run(["install"])

The issue I found troublesome is more related to the fact that this is unaccessible from in-process; and forcess one to do it in subprocess. This can be expensive (non-negligable on Windows, especially when you have many interactions to perform with pip).

I believe the issue for not having such in-process interface is mostly because pip is not desinged/tested to still work when not starting from scracth (aka there probably is a lot of state objects laying around). Isn’t it?

pf_moore · December 12, 2019, 12:09pm

Correct. If we didn’t both rely on and alter global state, a main() interface would be easy enough to support. But getting to such a state, as @pradyunsg mentioned, is quite a lot of work, and there’s plenty of higher priority tasks we’d prefer to tackle first.

bernatgabor · December 12, 2019, 12:13pm

So the actionable item here for someone wanting an in-process API would be to write some tests; fix the issues found; create the PR?

pf_moore · December 12, 2019, 12:42pm

I guess so. But to repeat what I said above:

Feel free to do as you suggest. But expect very strong pushback (or possibly worse, little or no interest) on any PR. We’ve been round this cycle so many times that we’re pretty burned out on it, and the burden of proof is very much on anyone proposing an API.

bernatgabor · December 12, 2019, 12:58pm

Propose an actual API that they would want exposed.

The API is 1-1 what the API is now, it’s the CLI API. It would just make it available via in-process.

Explain how it would be maintained and supported, and review and clearly explain how supporting that API would (or would not) constrain our ability to make future internal changes to pip.

No new constrains apply other than the constraints we have now on evolving the CLI api.

Offer to write the documentation and the code for the API, and provide support for a suitable period of time.

This would be part of that PR, I’d figure.

pf_moore · December 12, 2019, 1:31pm

We must be misunderstanding each other, because that to me is simply incorrect. We would, for example, be constrained in future from modifying the CWD arbitrarily in pip’s internal code, from assuming that sys.stdout is a writeable IO stream, or that we are in full control of the logging subsystem, etc, etc.

Those are all constraints that would need to be agreed and documented, so that future pip developers didn’t inadvertently break (or be broken by) user code. And they are constraints that do not apply at the moment.

I feel like I’m simply reiterating comments I’ve already made at this point, so for me, this discussion has reached the point of diminishing returns. That’s fairly typical of this type of proposal, so I’ll bow out now. Either someone will come up with something genuinely new in the way of a proposal, and we’ll finally have a way forward, or the discussion will die down as it has in the past. I hope the former is the case