What to do with re.LOCALE?

The re.LOCALE flag and the (?L) mode modifier determine what is a letter and what characters are equal in case-insensitive mode in a byte pattern.

  • It only works with byte strings, not Unicode strings.
  • It only works in 8-bit locales, not UTF-8 or Shift-JIS. Today 8-bit locales are very rare.
  • It an order slower. It does not allow compile-time optimization, and at runtime calling tolower() is much slower than simple table lookup.

In the past there were issues with compiling pattern in one locale and using it in other locale and with caching, but they were fixed long time ago.

In 2014 I implemented support of re.LOCALE with Unicode strings, but it was not supported.

The only visible effect of using re.LOCALE with Unicode string (besides slowing down) was different handling of letters I, İ, ı and i on the Turkish locale. Turkish is unique among all other languages. Maybe there were other differences with normal Unicode matching, but I did not noticed them.

I still see the use of re.LOCALE in wild code. In all cases it was unnecessary.

What should we do with re.LOCALE?

  • Deprecate and remove it.
  • Implement Unicode string support.
  • Leave it as it was.
0 voters
3 Likes

re.LOCALE made sense when we were still using the bytes version of str(). In the Unicode world, locale support is either no longer necessary or you need to get into the really complex ICU world (but that’s outside the scope of what we can support in the stdlib)…

2 Likes

Not much use in the top 5k projects:

$ python3 ~/github/misc/cpython/search_pypi_top.py -q . "re\.LOCALE"
./drf-extensions-0.7.1.tar.gz: drf-extensions-0.7.1/docs/backdoc.py: "l": re.LOCALE,
./sphinx_toolbox-3.4.0.tar.gz: sphinx_toolbox-3.4.0/sphinx_toolbox/more_autodoc/regex.py: if flags & re.LOCALE:
./schema-0.7.5.tar.gz: schema-0.7.5/schema.py: "re.LOCALE",
./behave-1.2.6.tar.gz: behave-1.2.6/behave/configuration.py: # -- NOTE: re.LOCALE is removed in Python 3.6 (deprecated in Python 3.5)
./behave-1.2.6.tar.gz: behave-1.2.6/behave/configuration.py: # flags = (re.UNICODE | re.LOCALE)
./djlint-1.31.1.tar.gz: djlint-1.31.1/src/djlint/lint.py: "re.LOCALE": re.LOCALE,
./pockets-0.9.1.tar.gz: pockets-0.9.1/CHANGES: * Fixes Python 3.6 compatibility by only using re.LOCALE flag on Python 2
./pockets-0.9.1.tar.gz: pockets-0.9.1/pockets/string.py: to (re.LOCALE | re.MULTILINE | re.UNICODE).
./markdown2-2.4.9.tar.gz: markdown2-2.4.9/lib/markdown2.py: "l": re.LOCALE,
./pymongo-4.4.0.tar.gz: pymongo-4.4.0/bson/__init__.py: if flags & re.LOCALE:
./pymongo-4.4.0.tar.gz: pymongo-4.4.0/bson/json_util.py: if obj.flags & re.LOCALE:
./pymongo-4.4.0.tar.gz: pymongo-4.4.0/bson/regex.py: flags |= re.LOCALE
./pyre2-0.3.6.tar.gz: pyre2-0.3.6/src/compile.pxi: return fallback(original_pattern, flags, "re.LOCALE not supported")
./pyre2-0.3.6.tar.gz: pyre2-0.3.6/src/re2.pyx: LOCALE = re.LOCALE

Time: 0:00:18.548066
Found 14 matching lines in 9 projects
1 Like

You’d also need to look for re.L, (?L), and (?L:) – but the inline flags can contain multiple modifiers [1] (I’m not going to atempt to make a pattern to find them!)

A


  1. (?aiLmsux) and (?aiLmsux-imsx:...); docs ↩︎

$ python3 ~/github/misc/cpython/search_pypi_top.py -q . "\bre\.L\b"
./apprise-1.4.0.tar.gz: apprise-1.4.0/apprise/utils.py: 'L': re.L,
./djlint-1.31.1.tar.gz: djlint-1.31.1/src/djlint/lint.py: "re.L": re.L,
./parsimonious-0.10.0.tar.gz: parsimonious-0.10.0/parsimonious/expressions.py: (locale and re.L) |
./pockets-0.9.1.tar.gz: pockets-0.9.1/pockets/string.py: RE_FLAGS = re.L | re.M | re.U
./pymongo-4.4.0.tar.gz: pymongo-4.4.0/bson/json_util.py: "l": re.L,
./pymongo-4.4.0.tar.gz: pymongo-4.4.0/test/test_bson.py: regex = re.compile(b"", re.I | re.L | re.M | re.S | re.X)
./pymongo-4.4.0.tar.gz: pymongo-4.4.0/test/test_bson.py: self.assertEqual(re.I | re.L | re.M | re.S | re.X, Regex.from_native(regex).flags)
./pyre2-0.3.6.tar.gz: pyre2-0.3.6/src/re2.pyx: I, M, S, U, X, L = re.I, re.M, re.S, re.U, re.X, re.L

Time: 0:00:41.166755
Found 8 matching lines in 6 projects

I expect the others are a similar order of magnitude.

1 Like

The majority of votes are in favor of deprecation.

2 Likes

During the deprecation, if there is a strike of users who want to keep the feature, we can keep it. With more concrete use cases, we can better understand how it is used.

1 Like