Helping dependabot and github detect python dependencies

ssbarnea · July 19, 2020, 7:05am

I wonder if packaging folks around here can give few hints to dependabot team regarding how to improve their currently ancient approach on detecting dependencies of a python repository.

Current status is quite bad as it fails to detect any dependency declared in setup.cfg but they have a https://github.com/dependabot/dependabot-core/pull/2281/files to address it.

Once done, we should start seeing dependencies between python packages on github, as now they work only for ancient python projects.

ofek · July 19, 2020, 7:46pm

It’ll be standardized soon: PEP 621: how to specify dependencies?

One thing that would assist users would be the ability to select the file. My team at Datadog isn’t using it because back when I investigated the file name was assumed to be requirements.txt at the root but ours is https://github.com/DataDog/integrations-core/blob/master/datadog_checks_base/datadog_checks/base/data/agent_requirements.in

ssbarnea · August 17, 2020, 8:34am

I assume “soon” counted as sarcasm. Anyone trying to read the thread will realise that two years from now the PEP 621 issue will still be open to debate.

pganssle · August 17, 2020, 2:58pm

I mean, it’s obviously not sarcasm, but regardless I also am not sure that that solves the problem.

The way to get dependencies for a project is standardized. You can use PEP 517 or you can build a wheel and look at what its dependencies are. There is currently no standardized way to know whether or not those dependencies are reliable, cross-platform. PEP 621 may help with that to the extent that it is adopted, but that won’t always work and I expect adoption to be gradual even if it were accepted and implemented tomorrow. Still, I believe using the existing strategies is probably better than what dependabot is doing now.

brettcannon · August 17, 2020, 8:55pm

I think the problem there with dependabot is it simply wants to read static metadata from files to determine what needs updating and then submit a PR to update things. Asking dependabot to run Python code for every Python repo on GitHub is probably asking a bit too much. This is pro for having Structured, Exchangeable lock file format (requirements.txt 2.0?), but that’s yet another debate that I don’t think any of us want to attempt to tackle right now until the preexisting ones are settled.

pganssle · August 17, 2020, 9:15pm

Well, the PR apparently is actually doing this, just not in an ideal way.

I am not sure there’s ever going to be a “correct” way to generically parse Python dependencies that doesn’t allow for the possibility of executing Python code. The linked PR is trying to automatically detect a dependency graph, so I don’t think they can count on users adopting any specific form of lock file (otherwise they’d get terribly skewed dependency graphs). That’s made even worse from the fact that most libraries have very little use for a lock file except as regards to testing (which isn’t quite what dependabot wants to measure anyway, I gather).

Considering that the current behavior is that they modify the text of your setup.py and then call exec on it, I think they might not be too concerned about arbitrary code execution.

njs · August 17, 2020, 9:23pm

Dependabot (unlike some of their competitors) already does run arbitrary sandboxed code to handle dependency updates, e.g. if you have requirements.in files then they update them by running pip-compile in a sandbox. I think the hard part is figuring out how to map those back to a source patch that will update the dependencies.