I’m trying to stream JSON over a web connection, and I want to use JSONL (newline-delimited JSON). This seems straightforward - encode each record using json.dumps, and then write it, with a newline, to the output stream. And indeed, it seems to work fine. But checking the json module documentation, I couldn’t see anything that guaranteed the output of json.dumps wouldn’t contain a newline character.
I’m using all the defaults (notably, ensure_ascii=True, indent=None). I’m only serialising basic types (dictionaries, strings, integers and lists). Have I missed any edge cases, or am I safe?
There seem to be JSONL libraries on PyPI, but I’m trying to minimise my dependencies, so a stdlib solution would be preferable.
I believe you’re safe, but the docs could be clearer. It’s the optional indent= argument that controls this:
indent (int|str|None) – If a positive integer or string, JSON array elements and object members will be pretty-printed with that indent level. A positive integer indents that many spaces per level; a string (such as "\t") is used to indent each level. If zero, negative, or "" (the empty string), only newlines are inserted. If None (the default), the most compact representation is used.
By default, then, the “most compact” representation doesn’t contain semantically meaningless newlines.
It’s not absolutely guaranteed, but also, there’s no reason for it to use a newline anywhere (and any newlines that exist in strings are going to be escaped). It would be tricky to make the guarantee, on account of custom types, but…
… absent custom types, it should be safe, yes. I have made similar assumptions about JSON output in the past.
I guess the question is: Are you concerned about the lack of guarantee in and of itself?
No, I was mostly worried that I’d missed something obvious.
Reading the docs carefully, I think that even custom types are safe - you can override default, but it returns “a JSON encodable version of the object” - i.e., it reduces the object to basic types, which as we’ve established, are safe.
So I think it’s safe, but as @tim.one says, the docs could be clearer.
And of course, it’s easy to get newlines in the output by changing other parameters from the default, so custom types (which require you to use a non-default value for default or cls) would be just another case of that limitation, anyway.