Hello again Inada-san, thanks for bringing this up and I’m looking forward to continuing to see this process move forward.
Oops, I didn’t see the “please don’t start the discussion yet” admonition on the first pass as I was sequentially going through your post—it might be wise to simply not include those parts that you aren’t yet looking for comment on, to avoid any potential for confusion. As such I’ve collapsed my response in a details block, to preserve it for the future while avoiding derailing the discussion now.
This would be a much safer option than immediately switching to UTF-8
by default after only having an optional warning that was only enabled by a special -X
option, and would give developers a more reasonable amount of time to fix their code. At present, it is likely that very, very few are actually seeing these warnings; no projects I know of have yet turned them on even in CI (though I have done some experimental once-off runs with it to catch outstanding issues). Simply switching to UTF-8 could cause existing projects that depend on the legacy behavior, either explicitly or implicitly (e.g. reading previous output using locale-dependent encoding) to silently break.
Something like the following timeline might work:
- In 3.12 (3.10 + 2), show
EncodingWarning
like DeprecationWarning
, in __main__
and like other warnings with -W default
, -X dev
, etc. This will allow developers with proper python invocation, pytest or CI testing configs to catch and fix these issues, without either causing extra noise for users or having to enable special bespoke interpreter options.
- In 3.14 (3.10 + 4), display the warning by default to all users.
- In 3.16 (3.10 + 6, i.e. when all supported version of Python incorporate
encoding="locale"
), make not explicitly specifying encoding
an error
- In some future version (4.0?) allow
encoding
to be unspecified again, with UTF-8
as default.
Overall, we found it quite useful to spot these issues (which can and do cause many real problems for users that frequently work on non-*nix platforms like myself), and I was able to spot several with this. However, there were a few practical limitations at the moment that limited its potential utility so far.
We found a number of those, both in our own codebases, and in others.
I don’t really deal with too many of those, and we try our best to make our code correct, explicit and cross-platform, so this wasn’t an issue for us.
Forward-compat was the biggest issue for us, but not because of encoding="locale"
—at least in our various use cases, that wasn’t really needed at the moment (though we could forsee some where it would benefit). The actual problem was that EncodingWarning
was a brand new warning, and occurred in a number of dependencies outside our immediate control, so we had no way of silencing it in our Pytest config or our Python invocation string in a way that would not either break other warnings, cause our test suite to error out (since we use -W error
by default to ensure non-silenced warnings are actually seen and dealt with) or be very imprecise and potentially silence other desired warnings:
- The Pytest config is static, so without a hacky script rewriting it for different Python versions, we couldn’t add a warning filter for it there, or else it would result in the test suite erroring out completely on Python versions <3.10.
- We also couldn’t reliably add it via a warnings filter passed via
-W
(which is needed to avoid errors that occur on or before full Pytest initialization and hooks fire), since -W
does not support much of the same syntax as filterwarnings
that is required for reliable but precise warning silencing
- Finally, we couldn’t add a manual
filterwarnings
with branches for Python versions in a Pytest hook, because that either gets fired too late to silence early warnings or overriden anyway.
As such, it was useful for a manual pass to catch warnings in our libraries/applications and direct deps, but it is not yet useful to incorporate into routine test runs, which would make it much more broadly applicable.
While it doesn’t totally fix this problem, a staged approach to gradually enabling this (as proposed above), potentially combined with considering making EncodingWarning
a subclass of another warning (DeprecationWarning
, PendingDeprecationWarning
and/or FutureWarning
) would help to ameliorate these impacts over time.
In our view, it is much better to be explicit when this is the case, and this may always change, so we still consider it useful in this case. Also, technically speaking, it isn’t actually guaranteed that the locale encoding is 100% ASCII-compatible, though this is almost certainly the case on essentially every platform with a modern version of Python.
Yes. I suggest something like:
- In 3.12 (3.10 + 2), show
EncodingWarning
like DeprecationWarning
, in __main__
and like other warnings with -W default
, -X dev
, etc. This will allow developers with proper python invocation, pytest or CI testing configs to catch and fix these issues, without either causing extra noise for users or having to enable special bespoke interpreter options.
- In 3.14 (3.10 + 4), display the warning by default to all users.
- In 3.16 (3.10 + 6, i.e. when all supported version of Python incorporate
encoding="locale"
), make not explicitly specifying encoding
an error
- Then in a later version, change the default encoding
You could maybe skip 1 year or even 1 step, but this would ensure a smooth deprecation process.
I don’t think so, because like other warnings, it in many (though not all) cases represents a real problem, and certainly code that can and should be improved to be more explicit, and at least in our experience, it wasn’t overwhelmingly more common than other types of warnings.
I’m not 100% clear where you want to draw the boundary here between the two, but I’ve interpreted it as only commenting on the latter set of bullets rather than your plans above, and not responding to others whose feedback and views sharply differ from my own experience.