Vote: New function for reading JSON from path

Fixing a lack of user knowledge by adding new “simpler” functions strikes me as a classic case of trying to solve a people issue with a technical solution.

Surely the correct answer here is to teach people the correct way to use the existing tools, rather than spend developer time and effort on debating, adding, and maintaining, wrapper functions for a couple of lines of code that will be obsolete as soon as UTF-8 mode is the default encoding for open()?

… and yet it provides no help at all for users loading JSON from a zipfile, or a URL, or any one of many other situations. Whereas teaching users what the issue is and how to solve it for themselves gives them the tools to handle all of these cases.

IMO, if we keep adding “helpers” like this we’re essentially making the assumption that users aren’t capable of understanding the correct practice here. I don’t think that’s fair or accurate. Sure, people may prefer a quick answer using a helper over the effort needed to understand the problem and learn, but that doesn’t mean they can’t understand, and it doesn’t mean we should design the language on that assumption.

6 Likes

Respectfully, is “choose the right encoding” really the core issue here? I don’t think the objective of the original python-ideas proposal was to paper over the API for newbies (although that’s one side benefit), and I generally don’t think that it should be the objective here, if it comes at the expense of other sensibilities.

As an experienced “production” dev, I would very much like this for my own codebases. I would find it more useful that the walrus operator even. It takes non-composable multi-line blocks of code and turns them into single function calls. Maybe there is something to be said for forcing the user to perform file I/O in visually-distracting non-composable blocks, but IMO on the wrong side of the “practicality vs. purity” spectrum for Python.


As for the encoding issue, my example implementation (the repl.it link) defaults to UTF-8. In my opinion, this is the sole sensible default for:

  • Plain text
  • JSON
  • YAML
  • TOML
  • Column delimited (unless exported from Excel, which is a nontrivial use case)
  • Fixed-width columns

Python itself has very much “blessed” UTF-8 by making it the default encoding for bytes.decode(), str.encode(), and the standard streams. Encodings other than UTF-8 are already more or less nonstandard from the perspective of the Python standard library.

If you want to make the binary/text distinction in the API more explicit, you can add a binary: bool = True kwarg, which just ignores the encoding kwarg if enabled (you’d have to change the type annotations a bit by using Literal instead of dispatching on the type of encoding).


I think if we want to add this loadf/dump functionality to JSON and not through my top level loadfile proposal, the best way would be through a PEP that specifies a general protocol for “loaders”, i.e. namespaces (modules or classes) that have the following methods:

  • load(stream)
  • dump(stream)
  • loadf(filename or pathlike)
  • dumpf(filename or pathlike)
  • loads(str)
  • dumps() -> str
  • loadb(bytes) ?
  • dumpb() -> bytes ?

This would give 3rd party library developers some kind of standard to base their work on. It would be a nice unification of an existing ad-hoc standard anyway.

With something like this in place, we could even have something like loaderlib.loader that works analogously to contextlib.contextmanager, taking in 1 or 2 functions and wrapping them in a spec-compliant loader.

Arguably that’s even more powerful than the top-level function interface I proposed earlier, since then you can have things like

import json
import csv

data1 = json.loadf('data/foo.json')
data2 = csv.DictLoader().loadf('data/bar.csv')

This protocol retains the property of visually separating “action that does i/o” from “other stuff”, but with more composability.

Obviously this is a very sloppy proposal and it would require a bit of thought for use with some APIs like ZipFile. But if I can convince my partner that this is a good use of my time, maybe I can hack together an example implementation of this over the weekend :sweat_smile:

IMO that’s the problem, there is no “core issue”, just a number of nice-to-have but not compelling ideas.

Encodings are something people need to learn about. Until the world moves on and everything is UTF-8 (not just defaults, but legacy files, feeds from old systems, etc) people will still need to know about encodings at some point.

So would I. And…

Your implementation is great, and I plan on using it. But I don’t think it needs to be in the stdlib, I’m happy copy/pasting it into my codebase if needed, or if I do that a lot, putting it into a utility library and either keeping that locally or publishing it. Or you could publish it yourself and I’d likely use it. But I might still copy/paste it to avoid a dependency :slightly_smiling_face:

I’m not saying the functionality isn’t useful - not at all. I’m not even saying it doesn’t take a bit of care to get right. All I’m saying is that it’s not useful enough to warrant going into the stdlib, when there are loads of other, equally nice to have, functions that don’t. It’s the old “not every 3 line function needs to be a builtin” argument.

I much prefer your loadfile function over the json-specific one proposed here. But that doesn’t mean I think it should go into the stdlib either.

Fair enough!

Out of curiosity, what are some of the other nice-to-have functions you had in mind? Part of why I’m specifically in favor of doing something here (but opposed to the JSON-specific proposal) is that I have a hard time coming up with anything that would be as nice-to-have as this.

Personally I think something like this would be more useful than the (very useful) walrus operator and removesuffix/removeprefix methods. And certainly more useful than structural pattern matching.

If not the loadfile function, then certainly a higher-order API that generates loadf/dumpf functions/methods from lower-level load/dump functions/methods that operate on streams.

In short, I disagree that this is just “any” 3-line function:

  1. The “with-open” idiom is ubiquitous, moreso than any other I can think of (except maybe if kwarg is not None and if dict.get('a') is not None, which is a different can of worms).
  2. Most uses of the with-open idiom are non-composable boilerplate, which this would reduce to a single composable function call.
  3. The existence and semantics of the with-open idiom are not obvious to beginners, and are difficult to teach. Imagine if beginners could Google “how to load json with python” and get json.loadf('foo.json') instead of the insidious json.load(open('foo.json')).
  4. Avoiding the use of with-open means avoiding issues where people put too much logic inside the with, or engage in contorted error handling, or try to open files for reading and writing at the same time, etc. etc. I know it’s not the job of the core language to police all uses of bad code, but if we can offer a simple, low-maintenance improvement in this area, then I say it’s worth it.

I skipped reading python-ideas, because I ignore that list (and the category here), but since you’re in the python-dev equivalent now…

Why not pathlib.Path.read_json() and pathlib.Path.write_json()? We already have the precedent of convenience functions there, and pathlib is going to do a good job hiding other pitfalls of using paths. Should be pretty uncontroversial to add them there.

(If your answer is to tell me to go read python-ideas, then my response is to summarise the discussion - including rejected ideas - in a PEP before bringing it to python-dev/Core Development. :wink: )

2 Likes

Mostly just the fact that there’s a constant stream of them on python-ideas, not anything in particular.

To be honest, I agree with you, it’s certainly a viable contender for adding to the stdlib. But

  1. I prefer to argue against the read-json function directly, rather than tie my arguments to adding something else to the stdlib instead.
  2. It’s got something of a functional programming style (passing the loading function as an argument) which I quite like, but a lot of people aren’t as happy with.

Let’s keep the discussion on your loadfile function separate.

2 Likes

Because that would then lead to adding the equivalent methods for all other file formats the stdlib supports (e.g. read_xml()/write_xml() using ElementTree).

I will say I’m leaning towards not wanting this shortcut because I don’t think it buys us enough. The various one-liner/two-liner solutions are straight-forward enough and can be easily covered in the json docs as examples of how to work with the module appropriately.

5 Likes

Since many core devs opposed this idea, I think we should suspend this idea for a while.

There is an separate issue for including toml library into stdlib. (Issue 40059: Provide a toml module in the standard library - Python tracker)
I don’t know which toml library is the most popular. But some library implements overloading idea:

So we should reconsider this idea after we chose the toml library.

To be blunt, I think we should simply reject the idea. I don’t see any reason why including TOML in the stdlib makes any difference here. If we adopt an existing library, we take its API as it stands. If we design our own (why would we???) then we should mirror the JSON library for consistency.

I think we should accept that core developer opinion is against this idea and just drop it. Re-opening the question later will just burn people out from endless discussions and favour people with an agenda to pursue (and hence more energy) over people with more nuanced views and less willingness to fight the same battle endlessly.

4 Likes

I would much rather see those functions in the json module. pathlib is definitely not the right place for this. Especially, users who do not use pathlib would not benefit from the functionality.

As much as I want to see this functionality, I’m not trying to propose that at this point. However, I think there’s a couple relevant points that should be considered if it is brought up again. So it made sense to me to tack this on the end of this thread, rather than starting a new one.(there’s already too much of a disconnect between discuss and python-ideas)

(looking back at the thread, I notice that @gwerbin made a number of similar points, but this is a higher-level summary in one place)

  1. Why would we do a new API? because if something is going to be in the stdlib, it makes more sense for it to have an compatible API to other similar functionality in the stdlib than an arbitrary other API. And the toml API is already (at the top level) almost the same as the json one. Though under the hood, it’s pretty different, which is a mistake if you ask me.

  2. We should keep in mind that the stdlib json API was not designed for json – it is a mirror of the pre existing pickle API. And I’m sure that was done very much on purpose. But in fact, the needs of pickle are different than JSON, which is probably why it didn’t have a “load from a path” function in the first place.

  3. There are also implementation issues: the toml lib referenced above uses type checking to overload the load function. The json lib doesn’t overload, but it also doesn’t do any type checking – using simple duck typing instead. And it certainly could overload using duck typing: check for a read() method, and if that raises an AttributeError, then try to use it to open a file. (in fact, I prototyped this a while ago, after the python-ideas thread, but didn’t finish it. (cpython/Lib/json/__init__.py at json_file · PythonCHB/cpython · GitHub)

If this does get revived, maybe the way to go is a PEP about formalizing a “serialization” API – which would then be used by pickle and json and any other new format (e.g. toml). And optionally used by third party libs. That’s already what’s happening, but formalizing it would be good, and that would be the time to make any changes, if any.

Finally, something really struck me in reading the objections to adding the ability to read JSON directly from a path in one call. And that is that most of the objections came from the perspective of “systems programming” as opposed to “scripting” (to probably incorrectly use Ousterhout’s terms). Python is an excellent scripting language. But most of the changes in recent years (except f-strings) have been aimed at making it a better systems language, some at the expense of scripting. In this case:

“developers need to understand encoding issues anyway”, well, yes, but do non-developers writing simple scripts? I don’t think so.

"We need to educate folks about the “best practice”:

with open(path, "b") as f:
    json.load(f)  # Implies encoding="utf-8".

Do we really think that folks writing scripts should have to learn to do that, when we could offer them:

json.load(path)

“From my experience, in most cases you load or save JSON not from file, but from network, or database, or GzipFile, or ZipFile, or TemporaryFile, etc”

That is VERY much systems programming experience – people writing scripts are most often going to load from files. And most importantly, no one is suggesting limiting in ANY WAY the ability to load from file-like objects rather than paths.

I really like the maxim: “the easy things should be easy, the hard things should be possible”

I know to core developers that the current situation isn’t “hard”, but it’s not so easy to non-software-developers writing scripts. Do we want Python to be an even better scripting language?

5 Likes

Sweet, CHB already said exactly what I’d intended to.

(In fact, I even had an in-progress reply still sitting in the compose buffer for this page, 5 months later. So, 10 points to Discourse. Not that I’m planning to use that very ancient proto-reply, but it’s no skin off the software’s nose to hang on to it for me anyway just in case.)

But, yeah, this just strikes me as pungent with code smell, to be touted as a “best” practice:

with open(path, "b") as f:
    json.load(f)  # Implies encoding="utf-8".

And it would make me supremely nervous for the developer who commits the Pythonic sin of forgetting any of those byzantine details. (Eight-months-from-now me, debugging: “…You have to open a text file in binary mode for it to be decoded correctly, SRSLY?!”) I assume the reward for their achievement would be the Standard Offer: The code runs fine for 5 months and passes all of the tests, only to suddenly start throwing exceptions in production when it’s finally presented with an input file that didn’t adhere to the same, expected-and-therefore-the-only-one-tested format.

This is how languages and APIs and systems and vendors get reputations, that eventually turn into cliches, that eventually devolve into terrible jokes and/or awful song lyrics. (Which can occasionally be extremely funny, admittedly. But that’s more the exception than the norm; we can’t all be Bill Sutton.)

1 Like