Allow to set process specific defaults for `open(..., newline=, encoding=, ...)`

Hi,

there is a better solution for the previous discussion on New defaults: `newline=”\n”` and ensure_ascii=False

@dataclass
class DefaultsForOpen()
    encoding: str | None # None meaning `locale.getencoding()`
    errors: ... = None 
    newline: str = os.linesep

and there is an instance of this class in module sys or module os

and open fetches the defaults from there.

why / what for?

I create an application that is used on Windows and on Linux by a heterogenous distributed team, some git commit from here, others from there, outputs of the tools.

wouldn’t it be lvoely to tell python: use these values for encoding and newline?

and not rely on the grace of the individual open() line?

furthermore, as pathlib.Path().write_* don’t have newline or similar arguments? too bad.

so, that’s why I’d love to have a central location to configure this.

_open = open
def open(*a, **kw):
    return _open(*a, **kw, encoding="CP437")
3 Likes

When \n is used on Windows \r\n is written to the file.
You can use utf-8 on Windows as well these days.
Its called code page 65535 i think.

If both are working for you there is nothing to fix.

I’d suggest something less drastic, which every project is able to do with existing Python, and that would avoid potential pitfalls:

Just use functools.partial - create your defautl open in some module, and import it from there elsewhere. You gain “explicit is better than implicit” and it won’t happen that my library code trusting the default Python open sudenly behaving differently because someone is using it in a project with different defaults (or any other combination thereof)

utils.py

from functools import partial

our_open = partial(open, newline=”\n”, encoding=”rot13”)

other_file.py

from utils import our_open as open

...

Then, for the case of Pathlib.write_*, if you really want to override it application wide, it should be possible to replace (monkeypatch) the `open` method implementation in `pathlib.Path` itself, or `io.open` which is called by that. (Ideally, it should be possible to subclass `pathlib.Path` and override `open` but I don’t know how that plays)

update

yes, I indeed tested it, and the monkeypatching of pathlib.Path.open works - subclassing pathlib.Path works as well (I briefelly posted here it did not work - but then I found a mistake on my side)

In [1]: import pathlib

In [2]: class MyPath(pathlib.Path):
   ...:     def open(self, *args, **kwargs):
   ...:         kwargs |= {"encoding": "cp1252"}
   ...:         return super().open(*args, **kwargs)
   ...: 

In [3]: MyPath("abc.txt").write_text("maçã")
Out[3]: 4

In [4]: open("abc.txt", "rb").read()
Out[4]: b'ma\xe7\xe3'

(default encoding would be utf-8, and the resulting file would be 6-bytes long otherwise)

3 Likes

I think that there are generally many ways to do this, but updating the builtins might not be the best option, as it is different for each application (it has worked well enought for some time).

Wrapping the open itself might be worht a shot, either in a function / partial, or even a contextmanager. You could also wrap it, and return a custom FileIO object (iirc it is called that).

Also, maybe you could make the encoding, newline, … stuff as some transform function, or similar, and then pass it to a file, writing a header if needed (encoding, …)

What I don’t like about monkey patching the global open() is that anyone who imported it before the patch is applied wil hold a handle to the original. and open() is central.

also I do not suggest to change the default behavior of open, at all.

all I suggest is a way to provide defaults most suitable for my project. in away that is 100% backwards compatible.

btw I realized another way could be function attributes.

so what I’m really asking for, maybe :slight_smile: is

to have CPython support special writable function attributes for builtins, too.

I think that would be the much better solution, btw.

The best way to do that is to have a per-module open function, like I gave as an example. You can import that into other modules. Whatever you do in that function, it’ll apply to your code, but not to other things.

1 Like

To be honest, this feels like a very niche requirement. I’m not aware of any use cases other than yours, and I speak from experience when I say that the existing defaults for open are perfectly fine for normal use on Windows and Linux (encoding of UTF-8 is supported everywhere, and universal newlines make changing newline unnecessary in pretty much every case).

So I think it’s extremely unlikely that the benefits would justify the extra complexity of having a way to alter the defaults globally. After all, many libraries rely on the documented defaults for open - if you change those defaults you could easily break those libraries, so this would still be a significant breaking change.

There are ways of achieving the behaviour you want already. You may not like them that much, but that doesn’t mean they aren’t available. And presumably your application works at the moment, even if it has to explicitly pass the relevant parameters to every call to open. So really, it’s not clear how much benefit this proposal would offer in any case - the possibility of updating to use slightly cleaner code in one known application, and maybe a couple of others that we don’t know about? That’s nowhere near reaching the bar needed for a new language feature.

4 Likes

That only became true in 3.15, and we had a thread just recently that happened because the defaults on Windows didn’t include UTF-8 encoding. But moving forward, yes, I agree, the defaults are almost always correct.

3 Likes

Yep -
At the end of the day, the one thing not that easy would be to have the application-desired defaults to be used in pathlib methods - but then, my experimenting yesterday found that can be done by subclassing `pathlib.Path` and overriding `Path.open` -

It looks like this, along with explicitly importing a pre-parametrized `open` callable, would cover all of the O.P. use-cases, while being more explicit.