Unfortunately, the test suite is run with -bb. That is causing BytesWarning to be raised as an exception when looking into sys.path_importer_cache with a mix of bytes and strings. One solution is to stop using -bb so that this doesn’t cause a failure; it’s a legitimate check and catching the exception doesn’t make the in check work anyway. Another is to temporarily turn off warnings for BytesWarning in importlib in the key places where sys.path_importer_cache is checked, but that doesn’t fix the issue for anyone trying to use sys.path_importer_cache themselves.
The last option is to update the docs to says sys.path must be strings (or at least the built-in import system only supports strings). Since this support has been broke sometime between Python 3.2 and 3.6, it’s not worked for quite some time and wasn’t reported until March 2022. That would suggest it isn’t really missed.
I originally wanted to bring back the bytes support, but in writing this topic I realized it’s going to be messy for users to support directly, and so I’m now advocating dropping bytes support from the import system. Does anyone object to that?
Agreed. If it hasn’t worked since 3.6 and we haven’t heard anyone complain, lets just keep the simpler behavior and update the docs.
I’d only consider re-adding sys.path bytes support if something really painful from a user comes up during a pre-release related to our intention to make utf-8 the default (PEP 686 – Make UTF-8 mode default | peps.python.org targeting 3.15) for filesystem and io encodings.
In Python 3.0, using bytes was the only option to use paths which cannot be decoded from the Python filesystem encoding. Since Python 3.1 and PEP 383 (surrogateescape), using Unicode for paths give access to all paths and so supporting bytes paths is not longer needed. Unicode is better for portability: on Windows, many paths are not encodable to the ANSI code page. Well, since Python 3.6, Python now uses UTF-8 rather than the ANSI code page for paths on Windows, but still, Unicode remains least surprising and more convenient.
See my articles about Unicode and paths in Python:
If we remove support for bytes filenames, whether throughout the stdlib or just in the import system, can we please provide a recipe or FAQ for how to represent a byte path which is unrepresentable in the system encoding?
Consider an invalid UTF-8 path component like b’\xe5\xe6’ on a Linux file system. There are many ways such a file or directory could be created. How do I refer to that?