Vote: New function for reading JSON from path

We have discussed about new function in the json module, but we didn’t make decision about its naming.

Previous discussion: Mailman 3 A shortcut to load a JSON file into a dict : json.loadf - Python-ideas - python.org

Issue: Issue 43457: Include simple file loading and saving functions in JSON standard library. - Python tracker

Function name for saving/loading JSON from given path. (note that load/dump is used for saving/loading JSON from given file object.)
  • loadf/dumpf
  • load_file/dump_file
  • loadp/dumpp
  • load_path/dump_path
  • other (leave a comment)

0 voters

1 Like

I meant to vote load_path but misread and clicked other instead.

1 Like

load_file sounds to me like it could be shorthand for load_file_(pointer|descriptor). Same for loadf.

load_path seems the best name for something that takes a path.

1 Like

I am strongly against adding such functions.

  • They are just combinations of two functions. In best case it would save you one line of code at the cost of larger maintaining burden and more difficult learning. In any case you could not use them until you drop support of 3.9.
  • Since they are combination of two functions, they should support the union of all arguments of underlying functions. open() and loads()/dumps() have too many parameters.
  • From my experience, in most cases you load or save JSON not from file, but from network, or database, or GzipFile, or ZipFile, or TemporaryFile, etc. There will be too litle use of these specialized functions.
  • Since there are several stdlib and third-party serialization modules which support similar interface (load()/loads()/dump()/dumps()) we will need to add new functions in modules marshal, pickle, plistlib and add a burden for adding them in third-party libraries.
11 Likes

I don’t feel as strongly as @storchaka (and I did in fact vote for a name) but I feel that the case for adding these functions is weak regardless of what they are named.

This post seems to assume that the case for having such functions is decided. Maybe it is, but a link to a clear statement of the consensus would be useful in that case (I’m not going to re-read the whole thread). Also, the proposed implementation

    with open(filename, "r") as fp:
        data = json.load(fp, *args, **kwargs)

ignores any question of encoding (are JSON files required to be UTF-8? because the default encoding for open isn’t necessarily UTF-8).

So if there is a consensus that the functions should be added, and if the handling of encodings is properly defined, then my vote on what to call the functions stands. But IMO, at this point we’re a long way from the point where the name is the biggest outstanding question, here…

But the combination of two functions create huge amount of bugs:
See Mailman 3 [Python-ideas] Re: A shortcut to load a JSON file into a dict : json.loadf - Python-ideas - python.org

New APIs must use UTF-8 always for saving, and accept UTF-8, 16, 32 like json.loads and json.load.
See Mailman 3 [Python-ideas] Re: A shortcut to load a JSON file into a dict : json.loadf - Python-ideas - python.org


Please keep this thread for just naming vote. If you have topics not discussed in the previous discussion, please reply to the thread or create a new thread.

1 Like

Did you consider to overload load() and support pathlib additionally to a file-like object?

def load(f, *args, **kwargs):
    if isinstance(f, pathlib.PurePath):
        with open(f) as fp:
             return loads(fp.read(), *args, **kwargs)
    elif not hasattr(f, "read"):
        raise TypeError(...)
    else:
        return loads(f.read(), *args, **kwargs)
4 Likes

Of course, I considered it. See Mailman 3 [Python-ideas] Re: A shortcut to load a JSON file into a dict : json.loadf - Python-ideas - python.org

Sorry I didn’t include it in the voting option.

2 Likes

Sorry, and I didn’t follow the link :slight_smile:

1 Like

I’m opposed to adding these functions specifically for JSON, if they aren’t also added for reading files as plain text.

What if it was something like this?

from operator import methodcaller

def readfile(path, encoding='utf8', loader=methodcaller('read')):
    with open(path, 'r', encoding=encoding) as fp:
        return loader(fp)

Then you could write

data = readfile('./data.json', loader=json.load)

It could also optionally support encoding=None to load the file in pure “bytes” mode, which would have to be handled by the chosen loader.

(You’d probably want to use typing.overload to annotate this.)

Edit: this is post is probably off-topic w/ respect to the current thread. I will re-post this to the python-ideas thread.

1 Like

It is off-topic, but there is Path.read_text() and Path.read_bytes() already. So you can write it in oneline (e.g. without with) already.

3 Likes

I also concur with Serhiy’s points and am a strong -1 on this. Even though stuff like this keeping gets requested, their use case is too little in real-world applications.

I am so strong against this because it lowers a bar for many similar propositions. Reading a content of plain file from file name, reading a content of compressed file from file name, loading JSON from compressed file by file name, reading a content from downloaded from Web, reading CSV from compressed file downloaded from Web,… It is a PHP way.

It is not hard to write two lines of code. It is more explicit and flexible. You can easy modify the code to load JSON from compressed file, or from network, or from database field, or to load multiple JSONs from the same stream (newline separated).

4 Likes

“more flexible” means “more easy to do wrong”. Especially, since default text encoding is not UTF-8, people will create bugs by omitting encoding= parameter.

I think all “UTF-8 should be used” modules should have similar functions. (e.g. toml, yaml, xml, …).

And I think “binary mode should be used” modules can have similar functions too, although I think you will strong -1 about it too.

If we don’t add such “easy” functions, I think we must hurry about changing the default encoding of open(). It is too easy to make a mistake. See this issue for example.

1 Like

I am sure that in long term the default encoding of open() will be UTF-8. So the question is not “If”, but “When” and “With what transition period”.

It will eliminate your argument for loadf().

2 Likes

There are other ways to “fix” encoding issues without introducing new functions as well. For example, json.load() can gain a new argument encoding="utf-8" that only works with binary streams, and this becomes a best practice issue:

# Don't do this.
with open(path): as f:
    json.load(f)

# Do this instead.
with open(path, "b") as f:
    json.load(f)  # Implies encoding="utf-8".

It is the best practice already. json.load() supports binary file and read UTF-8,16, and 32.
But people don’t know the best practice.

FWIW, I’m with Serhiy on this one.

There is not much point in adding lots of new helpers which save
you one or two lines to many different data format modules, just
so that people don’t forget to specify an encoding in the open()
call which is used for opening the file.

A solution such as the one mentioned by Greg Werbin on the ideas
ML would be a better solution:

https://repl.it/@maximum__/loadfile#main.py (click on “Code” to see
the code)

Alternatively, the format modules could check the file object’s
.encoding attribute and raise a warning if a non-standard encoding
is found, e.g. the json module could check for “utf-8”.

Regardless of what we do in Python to help users with file encodings,
programmers will have to learn about these one way or another, since
the world is not perfect and we’re still not quite where we’d like
to be with text files - although things are already a lot better
than 10 years ago :slight_smile:

E.g. it’s still not uncommon to have CSV files encoded in
Windows code page encodings.

2 Likes

And we need to first figure out why people don’t follow the best practice. Otherwise, even if a new function is introduced, it is very like people will still ignore it and reach for the wrong solution.

2 Likes

One obvious reason is the “best practice” is tightly coupled with its implementation. It is almost leaking implementation detail.

Current JSON library supports bytes input so opening with binary mode is the best practice. But before JSON supports binary input, there was no “best” practice. encoding="utf-8" or encoding="utf-8-sig" might be used.

When users need to use JSON, YAML, TOML, csv, etc…, user need to check “should I open file with binary mode, or specify encoding? If both are OK, which is more efficient?”

If all modules supports “module.load_path(path)”, it provides the most efficient and recommended way. Modules can hide implementation detail.

It is not better solution because it doesn’t hide the implementation detail; use binary mode or specify encoding. module.load_path() can chose the most efficient way. load_file() can not.

2 Likes