Is Python supposed to work with the plain `C` locale and `US-ASCII` as the system encoding?

When trying to build Python 3.6 (ignore the fact that it is severely outdated, I am an employee of a company maintaining it for enterprise customers) in GitHub I get a set of tests in test.test_tarfile failing:

======================================================================
ERROR: setUpClass (test.test_tarfile.NoneInfoExtractTests_Data)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/cpython/cpython/Lib/test/test_tarfile.py", line 2790, in setUpClass
    tar.extractall(cls.control_dir, filter=cls.extraction_filter)
  File "/__w/cpython/cpython/Lib/tarfile.py", line 2231, in extractall
    tarinfo = self._get_extract_tarinfo(member, filter_function, path)
  File "/__w/cpython/cpython/Lib/tarfile.py", line 2283, in _get_extract_tarinfo
    tarinfo = filter_function(tarinfo, path)
  File "/__w/cpython/cpython/Lib/tarfile.py", line 831, in data_filter
    new_attrs = _get_filtered_attrs(member, dest_path, True)
  File "/__w/cpython/cpython/Lib/tarfile.py", line 778, in _get_filtered_attrs
    target_path = os.path.realpath(os.path.join(dest_path, name))
  File "/__w/cpython/cpython/Lib/posixpath.py", line 395, in realpath
    path, ok = _joinrealpath(filename[:0], filename, {})
  File "/__w/cpython/cpython/Lib/posixpath.py", line 429, in _joinrealpath
    if not islink(newpath):
  File "/__w/cpython/cpython/Lib/posixpath.py", line 171, in islink
    st = os.lstat(path)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 98-104: ordinal not in range(128)

======================================================================
ERROR: setUpClass (test.test_tarfile.NoneInfoExtractTests_Default)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/cpython/cpython/Lib/test/test_tarfile.py", line 2790, in setUpClass
    tar.extractall(cls.control_dir, filter=cls.extraction_filter)
  File "/__w/cpython/cpython/Lib/tarfile.py", line 2240, in extractall
    numeric_owner=numeric_owner)
  File "/__w/cpython/cpython/Lib/tarfile.py", line 2304, in _extract_one
    numeric_owner=numeric_owner)
  File "/__w/cpython/cpython/Lib/tarfile.py", line 2384, in _extract_member
    self.makefile(tarinfo, targetpath)
  File "/__w/cpython/cpython/Lib/tarfile.py", line 2429, in makefile
    with bltn_open(targetpath, "wb") as target:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 98-104: ordinal not in range(128)

======================================================================
ERROR: setUpClass (test.test_tarfile.NoneInfoExtractTests_FullyTrusted)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/cpython/cpython/Lib/test/test_tarfile.py", line 2790, in setUpClass
    tar.extractall(cls.control_dir, filter=cls.extraction_filter)
  File "/__w/cpython/cpython/Lib/tarfile.py", line 2240, in extractall
    numeric_owner=numeric_owner)
  File "/__w/cpython/cpython/Lib/tarfile.py", line 2304, in _extract_one
    numeric_owner=numeric_owner)
  File "/__w/cpython/cpython/Lib/tarfile.py", line 2384, in _extract_member
    self.makefile(tarinfo, targetpath)
  File "/__w/cpython/cpython/Lib/tarfile.py", line 2429, in makefile
    with bltn_open(targetpath, "wb") as target:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 98-104: ordinal not in range(128)

======================================================================
ERROR: setUpClass (test.test_tarfile.NoneInfoExtractTests_Tar)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/cpython/cpython/Lib/test/test_tarfile.py", line 2790, in setUpClass
    tar.extractall(cls.control_dir, filter=cls.extraction_filter)
  File "/__w/cpython/cpython/Lib/tarfile.py", line 2231, in extractall
    tarinfo = self._get_extract_tarinfo(member, filter_function, path)
  File "/__w/cpython/cpython/Lib/tarfile.py", line 2283, in _get_extract_tarinfo
    tarinfo = filter_function(tarinfo, path)
  File "/__w/cpython/cpython/Lib/tarfile.py", line 825, in tar_filter
    new_attrs = _get_filtered_attrs(member, dest_path, False)
  File "/__w/cpython/cpython/Lib/tarfile.py", line 778, in _get_filtered_attrs
    target_path = os.path.realpath(os.path.join(dest_path, name))
  File "/__w/cpython/cpython/Lib/posixpath.py", line 395, in realpath
    path, ok = _joinrealpath(filename[:0], filename, {})
  File "/__w/cpython/cpython/Lib/posixpath.py", line 429, in _joinrealpath
    if not islink(newpath):
  File "/__w/cpython/cpython/Lib/posixpath.py", line 171, in islink
    st = os.lstat(path)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 98-104: ordinal not in range(128)

======================================================================
ERROR: test_data_filter (test.test_tarfile.TestExtractionFilters)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/cpython/cpython/Lib/test/test_tarfile.py", line 3438, in test_data_filter
    filtered = tarfile.data_filter(tarinfo, '')
  File "/__w/cpython/cpython/Lib/tarfile.py", line 831, in data_filter
    new_attrs = _get_filtered_attrs(member, dest_path, True)
  File "/__w/cpython/cpython/Lib/tarfile.py", line 778, in _get_filtered_attrs
    target_path = os.path.realpath(os.path.join(dest_path, name))
  File "/__w/cpython/cpython/Lib/posixpath.py", line 395, in realpath
    path, ok = _joinrealpath(filename[:0], filename, {})
  File "/__w/cpython/cpython/Lib/posixpath.py", line 429, in _joinrealpath
    if not islink(newpath):
  File "/__w/cpython/cpython/Lib/posixpath.py", line 171, in islink
    st = os.lstat(path)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 57-63: ordinal not in range(128)

======================================================================
ERROR: test_tar_filter (test.test_tarfile.TestExtractionFilters)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/cpython/cpython/Lib/test/test_tarfile.py", line 3428, in test_tar_filter
    filtered = tarfile.tar_filter(tarinfo, '')
  File "/__w/cpython/cpython/Lib/tarfile.py", line 825, in tar_filter
    new_attrs = _get_filtered_attrs(member, dest_path, False)
  File "/__w/cpython/cpython/Lib/tarfile.py", line 778, in _get_filtered_attrs
    target_path = os.path.realpath(os.path.join(dest_path, name))
  File "/__w/cpython/cpython/Lib/posixpath.py", line 395, in realpath
    path, ok = _joinrealpath(filename[:0], filename, {})
  File "/__w/cpython/cpython/Lib/posixpath.py", line 429, in _joinrealpath
    if not islink(newpath):
  File "/__w/cpython/cpython/Lib/posixpath.py", line 171, in islink
    st = os.lstat(path)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 57-63: ordinal not in range(128)

Of course, I can easily avoid the problem by setting locale to en_US.utf8 or even C.utf8, but my question is what are the expectation on the environment. Is this supposed to work or not? Should I make a patch?

I’m not an expert on encodings on Unix, but PEP 538 and PEP 540 discuss various issues with locale encodings in Python 3.6 and earlier, which might be relevant for you.

For Python 3.6? No, that version is no longer maintained, so there’s no point in subitting a patch.

1 Like

Per PEP 11, ASCII-only *nix systems are not supported since 3.7.
For 3.6, as you can see, I failed to test a tarfile backport on an ASCII-only platform. Sorry for that! It looks like it’s only an issue in the test suite. However, at this point, you’re on your own when it comes to patches.

1 Like

When you set the environment to the C locale or US-ASCII, Python will assume ASCII for all file names (among other things) and so you get errors when trying to work with files which have non-ASCII chars in their names or paths.

If you need to continue using Python 3.6, it’s probably best to locally patch the test suite to not use such names.

The better alternative is to set the locale to e.g. C.utf-8, though, since you’ll likely hit similar issues with other applications as well.

Unfortunately, somebody (it was long before I came to SUSE) forgot to think about Python long-term development, so Python 3.6 is part of stable API we promised to support in SLE-15 until end of its support. Currently thinking long into 2030s … (fortunately, he didn’t forget to give up on Python 2, so at least that one is gone already).

OK, I will take care of that.

Other problems were already fixed in our SLE patches (which is what those commits in 3.6..opensuse_3.6 are). Just I lost the nerve before and added export LANG=en_US.UTF-8 to our SPEC file. Now I am reconsidering, whether it was a good idea or not.

Still, thank you for all the help I’ve got with that patch (it was quite painful to port, and yes, we have Python 3.4 and 2.7 in yet older SLE-12; fortunately, it was EOSed this fall).

I am painfully aware of that :pouting_cat:

I’m not sure what you’re asking us. 3.6 is out of support and there is no way that we would accept a PR for it, even it it was a life-and-death security matter. Maintenance of unsupported versions is supposed to happen by vendors (like your company) who extend that support for a fee.

So I see as your only option to make a local change in your own copy of Python 3.6.What am I missing?

The piece of information that Petr Viktorin provided, that Python < 3.7 had been expected to work in ASCII-only environment. Of course, I know that 3.6 is a way out of support, and that all changes will be only for us (or any enterprise distros maintainers, if they care about our patches). I am too painfully aware of that.

One option is to apply Fedora/RHEL patch to backport PEP 538 (coerce the LC_CTYPE locale) to Python 3.6: Tree - rpms/python3.6 - src.fedoraproject.org

In short, if Python is started with the LC_CTYPE locale “C”, Python switchs to one of these LC_CTYPE locales: “C.UTF-8”, “C.utf8” or “UTF-8”. So the Python filesystem encoding is UTF-8 and you should be good.

Interesting, I have forgotten about this patch, but we have it in our package as well: coerces locale to be C.UTF-8 irrespective to the system locale · openSUSE-Python/cpython@19a62ee · GitHub

I will keep digging.