If Pygments required an extension module, how would it affect pip?

takes off noob packaging contributor hat

puts on Pygments committer hat

Pygments (the syntax highlighting library) uses regular expressions pervasively. There have been discussions [1] about switching from the stdlib re module to the third-party regex module, which is a C extension module.

Pip depends on Pygments, through rich. Pip also has an obvious bootstrapping problem and solves this problem by vendoring all its dependencies.

If Pygments started to depend on an extension module, could pip still use it? Alternatively, I’m guessing pip could patch rich to not require Pygments; a cursory look at the code suggests this would not be difficult, but I’m not totally sure.

  1. which have not been active recently because all maintainers really lack time, but I’m trying to invest a little bit more energy on the project right now ↩︎


Have there been discussions about moving regex into the stdlib? Isn’t it at this point strictly better than the stdlib version?

1 Like

Perhaps the Pygments dependency in rich can be made optional (or relegated to an extra)? I don’t think code highlighting should be required for pip to function (or even fancy terminal formatting, for that matter).


To answer the direct question here, pip wouldn’t be able to vendor the extension, so it would no longer be able to vendor pygments. Ideally rich would make pigments an optional dependency, but if not we’d have to look at either patching or pinning an old version of pygments, neither of which is ideal.

The immediate practical answer is we wouldn’t upgrade our vendored copy of pygments until we found the resource to address the issue.


Could pygments add the new dependency as desired but keep fallback code for the missing import?

Not really. The reasons for switching to regex include its richer regular expression syntax (e.g., with Unicode category character classes and more control over backtracking). If we start using these features in the thousands of regexes that Pygments contains, I don’t see us maintaining parallel lexers that use re regexes.


I have no idea how hard it is, but we should push(or contribute to) rich to make pygments an optional dependency.

This has been suggested before, [REQUEST] Add a [minimal] version of rich without commonmark and pygments · Issue #2277 · Textualize/rich · GitHub but it would be breaking backwards compatibility since it wouldn’t default to have those included. Seems like it would be useful for extras to exclude dependencies, but currently that isn’t possible AFAIK.

I think rich can do that, if we can also figure out how default extras should work. :slight_smile: