Examples of why parsing `requirements.txt` files to feed project metadata is a bad idea

Many projects declare their dependencies in a requirements.txt file and then they parse it in setup.py to feed the install_requires metadata key. Here are some interesting examples.

Apart from the fact that setup.py files are going away losing its privileged position in the ecosystem (by setuptools deprecating direct CLI invocations), and that requirements files should be for concrete dependencies whereas project metadata is for abstract dependencies (this blog post never gets old does it?), could you help me find some technical justifications of why this is a bad idea or how it might break?

And, trying to play the devil’s advocate: any of the people doing this would care to elaborate if putting those requirements in pyproject.toml would suit their use case fully, or if there’s something missing? Things that come to mind:

  • Helping systems like dependabot parse dependencies - I think this works nowadays?
  • Helping pip-compile freeze dependencies - I’m quite sure this works nowadays (I helped contribute to it)
  • :question:

Maybe some answers here (the whole thread is interesting):

This is irrelevant, since it is possible to use a pip’s requirements.txt formatted file as input for the list of dependencies from pyproject.toml and setup.cfg (assuming setuptools is still the build back-end).

Let me ask the question the other way around. What’s the point in having requirements.txt at all? Why not just put the dependencies in pyproject.toml? It’s a standardised format that’s readable by tools, so it’s easier to use than requirements.txt.

Unless requirements.txt contains different data than pyproject.toml (for instance, the full frozen dependency list, inclusing transitive dependencies, for application deployment rather than for package metadata) then it has no value.

4 Likes

It’s a gross misstatement to say that “setup.py is going away.” Can
you provide any references to back that up?

What I’ve understood is that calling setup.py directly from the
command line is no longer supported in recent versions of
SetupTools, but it’s still an integral part of lots of packaging and
I’ve seen no plans to get rid of support for setup.py files nor to
provide a different way to run arbitrary Python during package
building (which is quite necessary for a number of complicated
projects). As for populating install_requires from requirements.txt
type files, a widely-used SetupTools plugin I help maintain supports
this in part because it needs to work with projects which have been
around since long, long before PEP 517 was dreamt of.

Over a decade ago we had (and still have) a very large number of
projects which needed some packaging consistency enforced, and we
determined that declarative configuration was the way to go about
that, so we added a setup.cfg file (back before SetupTools copied
that idea) but kept dependencies in requirements.txt files for
compatibility with pip install -r so that we could install them
independently of the projects themselves in order to facilitate
specific workflows (before pip had support for dependency-only
installation). This was basically an evolution of the distutils2
project, and while it’s going strong and supports pyproject.toml
files today with project.dependencies instead of requirements.txt,
it also continues to support the earlier configuration mechanisms
because our community strongly values backward compatibility.

In our projects, we use requirements.txt files to indicate what
version ranges for direct dependencies the project supports, and
then apply constraints files (pip install -c) to get very specific
versions for different test scenarios. It was people doing
development work on our projects who introduced constraints support
into pip in the first place, precisely to enable this distinction
between general and specific version lists, at a time when pip still
lacked a coherent dependency solver. It seems like a bit of a pivot
that people are now declaring requirements.txt files should be used
only for specific dependency versions, when that’s exactly the
opposite of how we saw it when pip’s constraints feature was
designed (and the opposite of how our ecosystem of many hundreds of
projects continues to use requirements and constraints files to this
day).

I’ll note that dependabot support means very little to our projects,
as we value open source services and so don’t do our development on
proprietary platforms like GitHub, but we have CI jobs which serve a
similar purpose by automatically proposing and testing version
increases for packages in shared central constraints files rather
than per-project requirements.txt or pyproject.toml files.

@astrojuanlu I was under the impression that this topic had been discussed back and forth many times already, so let me ask what has prompted you to bring up this topic today. Is there some specific context or reason that brought you to ask this today?

By the way, the Sourcegraph link you posted does not seem to show anything relevant to me. It tells me “Unable To Process Query – Search context not found”. Is there something specific under that link you wanted us to see or read? Maybe you could describe here in short what is “interesting” about the examples I assume it should show.

I maintain a large number of Python packages for Gentoo Linux. A large part of that effort means diff-ing between successive package releases to find out whether dependencies have changed (and therefore I have to update the list in the Gentoo package).

In this context, I find requirements.txt files inconvenient — if only for the simple reason that their names are unpredictable. Some projects have requirements.txt, other have requirements/*.txt, yet some more use .txt for locked versions and .in for the original data.

If not for these files, I would just have to look for pyproject.toml, setup.cfg, setup.py (the list is already getting long…). Now I have to also try to maintain an ever-growing list of arbitrarily named files that I need to look for, and that sometimes trigger false positives.

2 Likes

Hi @astrojuanlu, @fungi is correct here.
setup.py remains as a perfectly valid configuration file.
Setuptools is deprecating only its usage as a CLI tool.

The issue pointed out by @sinroc offers a good compilation of ideas in both sides.


Other than the standard abstract vs. concrete debate and the direct consequences of using concrete dependencies[1], the most common mistake I saw people making is to assume that the build backend can parse any arbitrary pip flag.

My take is that entangling setup.py/setup.cfg/pyproject.toml with requirements.txt is not for everybody. It is error prone and it should only be used after careful consideration. The person needs to understand it should only contain abstract dependencies AND that every non-comment line should be compliant with PEP 508, no pip-flag allowed.


  1. which progressively makes dependency resolution harder and prevent final users of obtaining bug fixes or security patches. ↩︎

Thanks @fungi and @abravalheri, that was careless wording and I’ve edited my post accordingly.

Also thanks everybody for chiming in.

Possibly, but I did not quite find them possibly because I was not sure what to search for. And the reason to do it today is because I was participating in a GitHub issue related to this topic.

Woops, sorry - I assumed copy-pasting the URL would work, but I think Discourse is doing some extra mangling. Try this: https://sourcegraph.com/search?q=context%3Aglobal+file%3Asetup.py+requirements.txt&patternType=standard&sm=1&groupBy=repo (editing my post accordingly)


Now I have a bit more clarity: requirements.txt files might contain any syntax that is amenable for pip, whereas project metadata dependencies need to be processed by the build backend, which will be a totally separate software.

1 Like

@astrojuanlu

Looks like it is a case of “we want our setup.py / setup.cfg / pyproject.toml to get the list of dependencies from a requirements.txt file because other tools in the Python packaging ecosystem like Dependabot and pip-tools do not understand anything but pip’s requirements.txt file format”. Is that right?

If yes, then as far as I can tell it is mostly a thing of the past. Nowadays most tools in the Python packaging ecosystem are able to extract the list of dependencies from pyproject.toml’s [project] section (which is clearly defined by a standard specification contrary to pip’s requirements.txt file format). If they are not able, they should learn.

If for a reason or another your code really needs to parse a requirements.txt file there is at least one library I know of that is focused on doing just this: pip-requirements-parser (probably better than pkg_resources.parse_requirements()).

If your code needs to parse the list of dependencies out of pyproject.toml’s [project] section, then a simple TOML parser should be good enough. There is one in Python’s standard library since version 3.11: tomllib.

If your code needs to obtain the list of dependencies for a project that is [build-system]-compatible but either not [project]-compatible or the dependencies are marked as dynamic, then I can recommend to use build.util.project_wheel_metadata(). Maybe like this:

python -c "import build.util; print(build.util.project_wheel_metadata('.').get('Requires-Dist'))"
1 Like

Is it possible to disagree with this blog post or is it to be taken as gospel?

It’s fine to point out that requirements.txt can be used to designate concrete dependencies. But my experience is that many requirements files are not concrete. They don’t explicitly list an index URL. They don’t explicitly pin versions for most dependencies.

(another common use case I see, though, is to have several requirements files for different use cases; for example one requirements files with all the optional dependencies for a more thorough test suite)

The packaging discussions would be helped if the PyPA looked at how their productions are actually used in the wild, rather than simply reiterate how they should be used in an idealistic worldview of how packaging works.

The blog post quickly moves on from opposing install_requires vs. requirements.txt to opposing abstract vs. concrete dependencies. And in my opinion that is what the reader should really take away from the blog post.

In a Pipenv context, you could probably substitute install_requires with Pipfile and requirements.txt with Pipfile.lock. But… these two have very different syntax so it is not possible to copy-paste the content of one file into the other file. So no one tries to do that. Whereas the syntax in install_requires and requirements.txt is almost the exact same, so people want and try to use them interchangeably, which can be good or bad, depending on how exactly this is used.

If your project has a requirements.txt file containing only abstract dependencies, then feel free to have it as source for your install_requires. In the current packaging world this seems less and less needed, more and more tools are offering better ways to do this kind of things. But at the time the blog post was written, things were very different and putting abstract dependencies in a file with requirements.txt format was quite convenient to go around some other limitations of the Python packaging ecosystem. So people did. People who knew what they were doing (i.e.: using it for abstract dependencies only). Then other people saw this. People who did not know what they were doing. And they ended up having concrete dependencies in their library’s packaging metadata. Which is bad.

@pitrou What is your opinion on the following article? Is it better? What should be improved?

https://packaging.python.org/en/latest/discussions/install-requires-vs-requirements/

Personally I am of the opposite opinion. If there is one organization who should focus on publishing and advertising the safe and recommended ways of doing things, it is PyPA. There are plenty of other individuals and organizations publishing tips and tricks (good and bad), on how to push the tools and workflows beyond their “comfort zone”. I am quite sure PyPA members are well aware of how packaging tools are used in the wild, because they are the ones receiving the bug tickets, the complaints, and so on. And it seems to me like these wild usages do influence bit by bit the evolution of the tools and PEPs written by PyPA members. For example setuptools now supports easy inclusion of an external file (with limited requirements.txt syntax) as source for the install_requires (in setup.cfg and pyproject.toml). Some are working on PEPs for lock file format.

Much better indeed, and less dogmatic.

I don’t think we’re disagreeing here.

I like this a lot. Is this meant to become a piece of official PyPa documentation? If so I do think it could be expanded to not only cover setup.py but abstract dependency specifications in general (pyproject.toml, setup.cfg), with install_requires as an example. The requirements files section could perhaps include that a common name for such a file is requirements.txt, and a motivation for why you would want repeatable installs in the first place, e.g. repetitive testing.

Just what comes to mind when I read it, in case it helps. Overall I think it’s already quite nice.

Since you’re asking about dependabot support for pyproject.toml I thought you might be interested to know it’s not completely there yet, but they’re working on it (their next release should resolve the most immediate bug). I’ve created two issues on their repo but so far I’ve been the only one with real Python experience chiming in, and I think my experience is rich in some areas, but quite lacking in others. So if you or others in this thread care about how dependabot resolves this it may be worth having a look.

1 Like

It already is and was. For sure there is room for improvement. Feel free to open a ticket with your suggestions and tag me there (@sinoroc), we could work on it.

1 Like