Examples of why parsing `requirements.txt` files to feed project metadata is a bad idea

astrojuanlu · March 14, 2023, 12:19pm

Many projects declare their dependencies in a requirements.txt file and then they parse it in setup.py to feed the install_requires metadata key. Here are some interesting examples.

Apart from the fact that setup.py files are ~~going away~~ losing its privileged position in the ecosystem (by setuptools deprecating direct CLI invocations), and that requirements files should be for concrete dependencies whereas project metadata is for abstract dependencies (this blog post never gets old does it?), could you help me find some technical justifications of why this is a bad idea or how it might break?

And, trying to play the devil’s advocate: any of the people doing this would care to elaborate if putting those requirements in pyproject.toml would suit their use case fully, or if there’s something missing? Things that come to mind:

Helping systems like dependabot parse dependencies - I think this works nowadays?
Helping pip-compile freeze dependencies - I’m quite sure this works nowadays (I helped contribute to it)

sinoroc · March 14, 2023, 12:30pm

Maybe some answers here (the whole thread is interesting):

github.com/pypa/setuptools

Please allow "file:" for setup.cfg install_requires

opened 08:50PM - 28 Dec 19 UTC

closed 06:36PM - 19 Jun 22 UTC

nedbat

The new declarative setup.cfg syntax is fabulous. It's wonderful to have two-li…ne setup.py files! I tried to use "install_requires = file: requirements/base.in" and found that "file:" wasn't supported there. Our pip-compile workflow works really well, and it would be great to be able to start with base.in as our source of truth, and easily use it in our setup.cfg. As a design point, I'm not sure it makes sense to special-case which fields can use which kinds of info-gatherers. Why can't I use "file:" for any field? I can understand if the implementation requires it, but the more the developer can choose how to structure their own world, the better.

This is irrelevant, since it is possible to use a pip’s requirements.txt formatted file as input for the list of dependencies from pyproject.toml and setup.cfg (assuming setuptools is still the build back-end).

pf_moore · March 14, 2023, 12:47pm

Let me ask the question the other way around. What’s the point in having requirements.txt at all? Why not just put the dependencies in pyproject.toml? It’s a standardised format that’s readable by tools, so it’s easier to use than requirements.txt.

Unless requirements.txt contains different data than pyproject.toml (for instance, the full frozen dependency list, inclusing transitive dependencies, for application deployment rather than for package metadata) then it has no value.

fungi · March 14, 2023, 1:17pm

It’s a gross misstatement to say that “setup.py is going away.” Can
you provide any references to back that up?

What I’ve understood is that calling setup.py directly from the
command line is no longer supported in recent versions of
SetupTools, but it’s still an integral part of lots of packaging and
I’ve seen no plans to get rid of support for setup.py files nor to
provide a different way to run arbitrary Python during package
building (which is quite necessary for a number of complicated
projects). As for populating install_requires from requirements.txt
type files, a widely-used SetupTools plugin I help maintain supports
this in part because it needs to work with projects which have been
around since long, long before PEP 517 was dreamt of.

Over a decade ago we had (and still have) a very large number of
projects which needed some packaging consistency enforced, and we
determined that declarative configuration was the way to go about
that, so we added a setup.cfg file (back before SetupTools copied
that idea) but kept dependencies in requirements.txt files for
compatibility with pip install -r so that we could install them
independently of the projects themselves in order to facilitate
specific workflows (before pip had support for dependency-only
installation). This was basically an evolution of the distutils2
project, and while it’s going strong and supports pyproject.toml
files today with project.dependencies instead of requirements.txt,
it also continues to support the earlier configuration mechanisms
because our community strongly values backward compatibility.

In our projects, we use requirements.txt files to indicate what
version ranges for direct dependencies the project supports, and
then apply constraints files (pip install -c) to get very specific
versions for different test scenarios. It was people doing
development work on our projects who introduced constraints support
into pip in the first place, precisely to enable this distinction
between general and specific version lists, at a time when pip still
lacked a coherent dependency solver. It seems like a bit of a pivot
that people are now declaring requirements.txt files should be used
only for specific dependency versions, when that’s exactly the
opposite of how we saw it when pip’s constraints feature was
designed (and the opposite of how our ecosystem of many hundreds of
projects continues to use requirements and constraints files to this
day).

I’ll note that dependabot support means very little to our projects,
as we value open source services and so don’t do our development on
proprietary platforms like GitHub, but we have CI jobs which serve a
similar purpose by automatically proposing and testing version
increases for packages in shared central constraints files rather
than per-project requirements.txt or pyproject.toml files.

sinoroc · March 14, 2023, 1:30pm

@astrojuanlu I was under the impression that this topic had been discussed back and forth many times already, so let me ask what has prompted you to bring up this topic today. Is there some specific context or reason that brought you to ask this today?

By the way, the Sourcegraph link you posted does not seem to show anything relevant to me. It tells me “Unable To Process Query – Search context not found”. Is there something specific under that link you wanted us to see or read? Maybe you could describe here in short what is “interesting” about the examples I assume it should show.

mgorny · March 14, 2023, 2:02pm

I maintain a large number of Python packages for Gentoo Linux. A large part of that effort means diff-ing between successive package releases to find out whether dependencies have changed (and therefore I have to update the list in the Gentoo package).

In this context, I find requirements.txt files inconvenient — if only for the simple reason that their names are unpredictable. Some projects have requirements.txt, other have requirements/*.txt, yet some more use .txt for locked versions and .in for the original data.

If not for these files, I would just have to look for pyproject.toml, setup.cfg, setup.py (the list is already getting long…). Now I have to also try to maintain an ever-growing list of arbitrarily named files that I need to look for, and that sometimes trigger false positives.

abravalheri · March 14, 2023, 2:12pm

Hi @astrojuanlu, @fungi is correct here.
setup.py remains as a perfectly valid configuration file.
Setuptools is deprecating only its usage as a CLI tool.

The issue pointed out by @sinroc offers a good compilation of ideas in both sides.

Other than the standard abstract vs. concrete debate and the direct consequences of using concrete dependencies^[1], the most common mistake I saw people making is to assume that the build backend can parse any arbitrary pip flag.

My take is that entangling setup.py/setup.cfg/pyproject.toml with requirements.txt is not for everybody. It is error prone and it should only be used after careful consideration. The person needs to understand it should only contain abstract dependencies AND that every non-comment line should be compliant with PEP 508, no pip-flag allowed.

which progressively makes dependency resolution harder and prevent final users of obtaining bug fixes or security patches. ↩︎

astrojuanlu · March 14, 2023, 2:29pm

Thanks @fungi and @abravalheri, that was careless wording and I’ve edited my post accordingly.

astrojuanlu · March 14, 2023, 2:38pm

Also thanks everybody for chiming in.

Possibly, but I did not quite find them possibly because I was not sure what to search for. And the reason to do it today is because I was participating in a GitHub issue related to this topic.

Woops, sorry - I assumed copy-pasting the URL would work, but I think Discourse is doing some extra mangling. Try this: https://sourcegraph.com/search?q=context%3Aglobal+file%3Asetup.py+requirements.txt&patternType=standard&sm=1&groupBy=repo (editing my post accordingly)

Now I have a bit more clarity: requirements.txt files might contain any syntax that is amenable for pip, whereas project metadata dependencies need to be processed by the build backend, which will be a totally separate software.

sinoroc · March 14, 2023, 3:41pm

@astrojuanlu

Looks like it is a case of “we want our setup.py / setup.cfg / pyproject.toml to get the list of dependencies from a requirements.txt file because other tools in the Python packaging ecosystem like Dependabot and pip-tools do not understand anything but pip’s requirements.txt file format”. Is that right?

If yes, then as far as I can tell it is mostly a thing of the past. Nowadays most tools in the Python packaging ecosystem are able to extract the list of dependencies from pyproject.toml’s [project] section (which is clearly defined by a standard specification contrary to pip’s requirements.txt file format). If they are not able, they should learn.

If for a reason or another your code really needs to parse a requirements.txt file there is at least one library I know of that is focused on doing just this: pip-requirements-parser (probably better than pkg_resources.parse_requirements()).

If your code needs to parse the list of dependencies out of pyproject.toml’s [project] section, then a simple TOML parser should be good enough. There is one in Python’s standard library since version 3.11: tomllib.

If your code needs to obtain the list of dependencies for a project that is [build-system]-compatible but either not [project]-compatible or the dependencies are marked as dynamic, then I can recommend to use build.util.project_wheel_metadata(). Maybe like this:

python -c "import build.util; print(build.util.project_wheel_metadata('.').get('Requires-Dist'))"

pitrou · March 15, 2023, 10:45am

Is it possible to disagree with this blog post or is it to be taken as gospel?

It’s fine to point out that requirements.txt can be used to designate concrete dependencies. But my experience is that many requirements files are not concrete. They don’t explicitly list an index URL. They don’t explicitly pin versions for most dependencies.

(another common use case I see, though, is to have several requirements files for different use cases; for example one requirements files with all the optional dependencies for a more thorough test suite)

The packaging discussions would be helped if the PyPA looked at how their productions are actually used in the wild, rather than simply reiterate how they should be used in an idealistic worldview of how packaging works.

sinoroc · March 15, 2023, 11:28am

The blog post quickly moves on from opposing install_requires vs. requirements.txt to opposing abstract vs. concrete dependencies. And in my opinion that is what the reader should really take away from the blog post.

In a Pipenv context, you could probably substitute install_requires with Pipfile and requirements.txt with Pipfile.lock. But… these two have very different syntax so it is not possible to copy-paste the content of one file into the other file. So no one tries to do that. Whereas the syntax in install_requires and requirements.txt is almost the exact same, so people want and try to use them interchangeably, which can be good or bad, depending on how exactly this is used.

If your project has a requirements.txt file containing only abstract dependencies, then feel free to have it as source for your install_requires. In the current packaging world this seems less and less needed, more and more tools are offering better ways to do this kind of things. But at the time the blog post was written, things were very different and putting abstract dependencies in a file with requirements.txt format was quite convenient to go around some other limitations of the Python packaging ecosystem. So people did. People who knew what they were doing (i.e.: using it for abstract dependencies only). Then other people saw this. People who did not know what they were doing. And they ended up having concrete dependencies in their library’s packaging metadata. Which is bad.

@pitrou What is your opinion on the following article? Is it better? What should be improved?

https://packaging.python.org/en/latest/discussions/install-requires-vs-requirements/

sinoroc · March 15, 2023, 11:39am

Personally I am of the opposite opinion. If there is one organization who should focus on publishing and advertising the safe and recommended ways of doing things, it is PyPA. There are plenty of other individuals and organizations publishing tips and tricks (good and bad), on how to push the tools and workflows beyond their “comfort zone”. I am quite sure PyPA members are well aware of how packaging tools are used in the wild, because they are the ones receiving the bug tickets, the complaints, and so on. And it seems to me like these wild usages do influence bit by bit the evolution of the tools and PEPs written by PyPA members. For example setuptools now supports easy inclusion of an external file (with limited requirements.txt syntax) as source for the install_requires (in setup.cfg and pyproject.toml). Some are working on PEPs for lock file format.

pitrou · March 15, 2023, 3:06pm

Much better indeed, and less dogmatic.

I don’t think we’re disagreeing here.

sanderr · March 17, 2023, 8:24pm

I like this a lot. Is this meant to become a piece of official PyPa documentation? If so I do think it could be expanded to not only cover setup.py but abstract dependency specifications in general (pyproject.toml, setup.cfg), with install_requires as an example. The requirements files section could perhaps include that a common name for such a file is requirements.txt, and a motivation for why you would want repeatable installs in the first place, e.g. repetitive testing.

Just what comes to mind when I read it, in case it helps. Overall I think it’s already quite nice.

Since you’re asking about dependabot support for pyproject.toml I thought you might be interested to know it’s not completely there yet, but they’re working on it (their next release should resolve the most immediate bug). I’ve created two issues on their repo but so far I’ve been the only one with real Python experience chiming in, and I think my experience is rich in some areas, but quite lacking in others. So if you or others in this thread care about how dependabot resolves this it may be worth having a look.

github.com/dependabot/dependabot-core

Python: dependabot refuses to bump requirements.txt for non-poetry pyproject.toml projects

opened 03:05PM - 08 Feb 23 UTC

closed 10:58PM - 09 Feb 23 UTC

sanderr

T: bug 🐞 L: python:pip

### Is there an existing issue for this? - [X] I have searched the existing i…ssues ### Package ecosystem pip ### Package manager version _No response_ ### Language version 3.9 ### Manifest location and content before the Dependabot update `/pyproject.toml`: ``` [build-system] requires = ["setuptools"] build-backend = "setuptools.build_meta" [project] name = "lorem" version = "2.5.0" requires-python = ">=3.9,<4.0" description = "Generator for random text that looks like Latin." dependencies = [ "more-itertools>=8", ] ``` `/requirements.txt`: ``` more-itertools==8.14.0 ``` Full reproductive project: https://github.com/sanderr/inmanta-module-factory/ (commit 0463a45). Simply fork it and enable dependabot to reproduce. ### dependabot.yml content ``` version: 2 updates: - package-ecosystem: pip directory: "/" schedule: interval: daily open-pull-requests-limit: 10 allow: # Allow both direct and indirect updates for all packages - dependency-type: "all" ``` ### Updated dependency _No response_ ### What you expected to see, versus what you actually saw I expected dependabot to open a pull request for `more-itertools==9.0.0` in `requirements.txt`. I had a look at the source code and I believe there's a reasoning error where dependabot refuses to continue if the dependency is both constrained in `pyproject.toml` and in pinned in `requirements.txt`. I found the following related commits (in chronological order): - ae9658c1146f29db504952ccd1cb65eff29ae766: for poetry projects, use the `widen_ranges` update strategy. - 4dc71870ea8a05699d229a5fc05a143f372bf63c: extend this to any project with a `pyproject.toml` file - 132c4c29df17484f5890eaa48a51fb64f081c57d: when `widen_ranges` strategy is picked and the dependency exists in `requirements.txt` as well, abort I sort of get the reasoning of these changes for poetry based projects: they have their own lock file so having a separate `requirements.txt` doesn't makes sense. However, with [PEP518](https://peps.python.org/pep-0518/) `pyproject.toml` is used for more than just poetry projects. I'm not sure what `widen_ranges` means, but from the name I get the impression it would indeed apply to any `pyproject.toml` project (unless this is specific to the `requirements.txt` update, which is a plain bump?). Therefore I am inclined to believe that the first two commits mentioned above are valid, but the third should not abort based on the update strategy but rather on whether it's a poetry project or not. I would be willing to implement a fix if consensus is reached on what the desired behavior would look like. ### Native package manager behavior _No response_ ### Images of the diff or a link to the PR, issue, or logs ``` updater | INFO <job_599690860> Checking if more-itertools 8.14.0 needs updating proxy | 2023/02/08 13:48:31 [016] GET https://pypi.org:443/simple/more-itertools/ proxy | 2023/02/08 13:48:31 [016] 200 https://pypi.org:443/simple/more-itertools/ updater | INFO <job_599690860> Latest version is 9.0.0 proxy | 2023/02/08 13:48:31 [018] GET https://pypi.org:443/pypi/lorem/json/ proxy | 2023/02/08 13:48:31 [018] 301 https://pypi.org:443/pypi/lorem/json/ proxy | 2023/02/08 13:48:31 [020] GET https://pypi.org:443/pypi/lorem/json proxy | 2023/02/08 13:48:31 [020] 200 https://pypi.org:443/pypi/lorem/json updater | INFO <job_599690860> Requirements to unlock update_not_possible proxy | 2023/02/08 13:48:32 [022] GET https://pypi.org:443/pypi/lorem/json/ proxy | 2023/02/08 13:48:32 [022] 301 https://pypi.org:443/pypi/lorem/json/ proxy | 2023/02/08 13:48:32 [024] GET https://pypi.org:443/pypi/lorem/json proxy | 2023/02/08 13:48:32 [024] 200 https://pypi.org:443/pypi/lorem/json updater | INFO <job_599690860> Requirements update strategy widen_ranges updater | INFO <job_599690860> No update possible for more-itertools 8.14.0 ``` ### Smallest manifest that reproduces the issue See "manifest location and content before the Dependabot update" section.

github.com/dependabot/dependabot-core

pip auto versioning strategy makes invalid assumptions

opened 05:08PM - 16 Feb 23 UTC

sanderr

T: bug 🐞 L: python:pip

### Is there an existing issue for this? - [X] I have searched the existing i…ssues ### Package ecosystem pip ### Package manager version _No response_ ### Language version _No response_ ### Manifest location and content before the Dependabot update _No response_ ### dependabot.yml content _No response_ ### Updated dependency _No response_ ### What you expected to see, versus what you actually saw As first noted in https://github.com/dependabot/dependabot-core/issues/6625#issuecomment-1424194477, I think the reasoning on which versioning strategy to use for Python packages when not configured explicitly is not sound. I promised then I'd give it some thought and report back. # Current behavior The current implementation tries to differentiate between a Python application and a Python library. For an application, it uses the `increase` strategy, while for a library it uses `widen`. I think I get the reasoning behind this decision, and if I understand correctly it's a general dependabot approach, regardless of the package manager. So I have no issues with this approach per se. I do however think that the deciding factors in the current implementation for what makes an application versus a library are flawed, and err on the side of conflicts. The deciding factors as I understand them are the following: if the package has a `pyproject.toml` and it has ever been published to PyPi, it's a library. Otherwise it's an application (see https://github.com/dependabot/dependabot-core/commit/4dc71870ea8a05699d229a5fc05a143f372bf63c). I believe this leads to both false positives and false negatives. I'm especially concerned about many libraries being labeled applications. The consequences of this are currently limited due to a bug that happens to mitigate the difference (#6519, #6631) but if I understand the bump process correctly, it will change the bumping behavior on a lot of projects when this bug is fixed. # Where it falls short I'm not a Python or dependency management expert by far, but I have gathered quite some experience with it and I think I understand the ecosystem pretty well. So while I don't have many references I believe most of what I say below applies to the Python ecosystem in general. My objections with this approach are the following: 1. The presence of a `pyproject.toml` file has nothing to do with a package being a library or an application. It is just one of the potential metadata files for Python projects, and with PEP518 I expect it will gain a lot of traction, both for applications and for libraries. Always labeling projects without this file an application is therefore incorrect. 2. PyPi is just a Python package index. Private libraries might not be published there, and public applications probably are published there. Furthermore, anyone can upload a package to PyPi if the name is not yet in use, so even private applications may have a namesake on PyPi, published by someone else. A package being on PyPi therefore has nothing to do with it being a library or an application. 3. While I see the use case for the `increase` strategy for applications, I don't think it's as important as the opposite: libraries will be used by many projects, together with many other libraries. Each of these libraries may have some constraints on their dependencies to ensure proper functionality. These constraints should generally be as loose as possible to keep the risk of dependency conflicts low. A lower bound on a dependency constraint should generally be due to an incompatibility with lower versions. The current approach, as indicated above, has both false positives and false negatives. Therefore, it will apply the `widen` strategy to some applications and the `increase` strategy to some (many) libraries. The first could be considered unfortunate (and even that is subjective), the second should in my opinion be considered a real issue. Therefore, in case of doubt, `widen` is the sane fallback, not `increase`. # Proposed solution To summarize, I think the main concern is that ideally, all libraries should be bumped with the `widen` strategy. The secondary constraint is that we make a best effort to identify applications and apply the `increase` strategy to them. This to be in line with the documentation and what seems to be dependabot's general approach. Alas, as far as I know, there is no reliable way to differentiate Python libraries and applications. The best I can think of would be to use the presence of one or more entry points (excluding plugins) as the deciding factor. It is still far from perfect: some packages may be both a library and an application, in which case `widen` would be appropriate, but it would have an entry point defined. It is however far more accurate than the current approach. The alternative would be to not even try, and just default to `widen` always. I actually believe that would be both the sanest approach (even though it breaks slightly with the general documentation of `auto`) as well as the least breaking: `increase` has been bugged to behave like `widen`, so with backwards compatibility in mind it could make sense to stick to the same behavior as a default (but use the proper name). # Conclusion In summary, I can't think of a reliable way to tell the difference between an application and a library. Considering the importance of flexible constraints for libraries, and the lack of a strong motivation for strong constraints for applications, I would propose to just always default to `widen`. If this is not acceptable, the most accurate (though still flawed) metric I can think of would be the presence of entry points. I'd be interested to hear your thoughts on this. ### Native package manager behavior _No response_ ### Images of the diff or a link to the PR, issue, or logs _No response_ ### Smallest manifest that reproduces the issue _No response_

sinoroc · March 17, 2023, 11:03pm

It already is and was. For sure there is room for improvement. Feel free to open a ticket with your suggestions and tag me there (@sinoroc), we could work on it.