New defaults: `newline="\n"` and `ensure_ascii=False`

Idea number 1

Because the Windows operating system has been handling \n as a newline character since practically forever, set newline="\n" to be the new default of the open() built-in function.

Idea number 2

Since we will be defaulting to "utf-8" (to replace current None ) for the encoding parameter of the open() built-in function (now that PEP 686 has been accepted), we must make sure to change the default value of the ensure_ascii parameter of the json.dump() and json.dumps() functions to be False by default (currently, this value is True by default).

The reason is that if you have UTF-8 encoding set (as it will be by default, once PEP 686 is implemented), then if ensure_ascii stays being True by default (as it is now), you will not get UTF-8 characters (for example, letters like č š ć ž đ, etc.) decoded properly and shown in their intended form at all. This is a problem that we need to take into account when implementing PEP 686.

Windows and before is MSDOS uses \r\n as the line separator.
Only the C runtime on Windows maps between \n and \r\n.

1 Like

ensure_ascii is an encoding-only option that uses Unicode escapes for non-ASCII characters. This doesn’t affect decoding as Unicode escapes are part of the JSON syntax for string type. You’ll just get them encoded as, e.g. {"key": "\u1234"} but it will still get decoded back to the proper Unicode character.

Well, in any case, Windows understands and properly interprets \n as a newline character. That’s what matters.

I would not consider 2018 “since forever“: Introducing extended line endings support in Notepad - Windows Command Line

“\uXXXX” is perfectly valid JSON. While not very human-readable, it is machine-readable. It is a bad idea for JSON ending up in files, but it is safer for JSON transmitted over networks, as you never know if the other side respects UTF-8 (and I would argue this is the primary use of JSON).

3 Likes

This is news to me: is it really true that notepad actually handles non-CRLF new lines sanely now?

My experience is that “since forever” this was not the case although I haven’t used Windows much in the past few years so I am not up to date with latest developments.

Oh yes, I remember the days when opening almost any text file from the internet using notepad would render the whole file as one long line. Fortunately, there was also Windows Wordpad which understood UNIX line endings. It also had an advanced feature for saving in the native encoding which allow me to write bat files without the Ă© in my username being trashed. :slightly_smiling_face:

1 Like

Yes, the link does not lie. This was introduced in Windows 10 October 2018 Update.

1 Like

Does it in all cases? How does .net handle text files?
As pointed out notepad finally got handling for \n but is that true of other runtimes and apps?