Vote: New function for reading JSON from path

methane · March 10, 2021, 3:56am

We have discussed about new function in the json module, but we didn’t make decision about its naming.

Previous discussion: Mailman 3 A shortcut to load a JSON file into a dict : json.loadf - Python-ideas - python.org

Issue: Issue 43457: Include simple file loading and saving functions in JSON standard library. - Python tracker

Function name for saving/loading JSON from given path. (note that load/dump is used for saving/loading JSON from given file object.)

loadf/dumpf
load_file/dump_file
loadp/dumpp
load_path/dump_path
other (leave a comment)

0 voters

uranusjr · March 10, 2021, 5:42am

I meant to vote load_path but misread and clicked other instead.

domdfcoding · March 10, 2021, 7:24am

load_file sounds to me like it could be shorthand for load_file_(pointer|descriptor). Same for loadf.

load_path seems the best name for something that takes a path.

storchaka · March 10, 2021, 7:50am

I am strongly against adding such functions.

They are just combinations of two functions. In best case it would save you one line of code at the cost of larger maintaining burden and more difficult learning. In any case you could not use them until you drop support of 3.9.
Since they are combination of two functions, they should support the union of all arguments of underlying functions. open() and loads()/dumps() have too many parameters.
From my experience, in most cases you load or save JSON not from file, but from network, or database, or GzipFile, or ZipFile, or TemporaryFile, etc. There will be too litle use of these specialized functions.
Since there are several stdlib and third-party serialization modules which support similar interface (load()/loads()/dump()/dumps()) we will need to add new functions in modules marshal, pickle, plistlib and add a burden for adding them in third-party libraries.

pf_moore · March 10, 2021, 9:04am

I don’t feel as strongly as @storchaka (and I did in fact vote for a name) but I feel that the case for adding these functions is weak regardless of what they are named.

This post seems to assume that the case for having such functions is decided. Maybe it is, but a link to a clear statement of the consensus would be useful in that case (I’m not going to re-read the whole thread). Also, the proposed implementation

    with open(filename, "r") as fp:
        data = json.load(fp, *args, **kwargs)

ignores any question of encoding (are JSON files required to be UTF-8? because the default encoding for open isn’t necessarily UTF-8).

So if there is a consensus that the functions should be added, and if the handling of encodings is properly defined, then my vote on what to call the functions stands. But IMO, at this point we’re a long way from the point where the name is the biggest outstanding question, here…

methane · March 10, 2021, 11:22am

But the combination of two functions create huge amount of bugs:
See Mailman 3 [Python-ideas] Re: A shortcut to load a JSON file into a dict : json.loadf - Python-ideas - python.org

New APIs must use UTF-8 always for saving, and accept UTF-8, 16, 32 like json.loads and json.load.
See Mailman 3 [Python-ideas] Re: A shortcut to load a JSON file into a dict : json.loadf - Python-ideas - python.org

Please keep this thread for just naming vote. If you have topics not discussed in the previous discussion, please reply to the thread or create a new thread.

tiran · March 10, 2021, 12:20pm

Did you consider to overload load() and support pathlib additionally to a file-like object?

def load(f, *args, **kwargs):
    if isinstance(f, pathlib.PurePath):
        with open(f) as fp:
             return loads(fp.read(), *args, **kwargs)
    elif not hasattr(f, "read"):
        raise TypeError(...)
    else:
        return loads(f.read(), *args, **kwargs)

methane · March 10, 2021, 1:22pm

Of course, I considered it. See Mailman 3 [Python-ideas] Re: A shortcut to load a JSON file into a dict : json.loadf - Python-ideas - python.org

Sorry I didn’t include it in the voting option.

tiran · March 10, 2021, 3:22pm

Sorry, and I didn’t follow the link

gwerbin · March 10, 2021, 3:35pm

I’m opposed to adding these functions specifically for JSON, if they aren’t also added for reading files as plain text.

What if it was something like this?

from operator import methodcaller

def readfile(path, encoding='utf8', loader=methodcaller('read')):
    with open(path, 'r', encoding=encoding) as fp:
        return loader(fp)

Then you could write

data = readfile('./data.json', loader=json.load)

It could also optionally support encoding=None to load the file in pure “bytes” mode, which would have to be handled by the chosen loader.

(You’d probably want to use typing.overload to annotate this.)

Edit: this is post is probably off-topic w/ respect to the current thread. I will re-post this to the python-ideas thread.

methane · March 11, 2021, 2:42am

It is off-topic, but there is Path.read_text() and Path.read_bytes() already. So you can write it in oneline (e.g. without with) already.

isidentical · March 11, 2021, 7:06am

I also concur with Serhiy’s points and am a strong -1 on this. Even though stuff like this keeping gets requested, their use case is too little in real-world applications.

storchaka · March 11, 2021, 7:58am

I am so strong against this because it lowers a bar for many similar propositions. Reading a content of plain file from file name, reading a content of compressed file from file name, loading JSON from compressed file by file name, reading a content from downloaded from Web, reading CSV from compressed file downloaded from Web,… It is a PHP way.

It is not hard to write two lines of code. It is more explicit and flexible. You can easy modify the code to load JSON from compressed file, or from network, or from database field, or to load multiple JSONs from the same stream (newline separated).

methane · March 11, 2021, 8:47am

“more flexible” means “more easy to do wrong”. Especially, since default text encoding is not UTF-8, people will create bugs by omitting encoding= parameter.

I think all “UTF-8 should be used” modules should have similar functions. (e.g. toml, yaml, xml, …).

And I think “binary mode should be used” modules can have similar functions too, although I think you will strong -1 about it too.

If we don’t add such “easy” functions, I think we must hurry about changing the default encoding of open(). It is too easy to make a mistake. See this issue for example.

storchaka · March 11, 2021, 8:54am

I am sure that in long term the default encoding of open() will be UTF-8. So the question is not “If”, but “When” and “With what transition period”.

It will eliminate your argument for loadf().

uranusjr · March 11, 2021, 9:09am

There are other ways to “fix” encoding issues without introducing new functions as well. For example, json.load() can gain a new argument encoding="utf-8" that only works with binary streams, and this becomes a best practice issue:

# Don't do this.
with open(path): as f:
    json.load(f)

# Do this instead.
with open(path, "b") as f:
    json.load(f)  # Implies encoding="utf-8".

methane · March 11, 2021, 9:14am

It is the best practice already. json.load() supports binary file and read UTF-8,16, and 32.
But people don’t know the best practice.

malemburg · March 11, 2021, 9:21am

FWIW, I’m with Serhiy on this one.

There is not much point in adding lots of new helpers which save
you one or two lines to many different data format modules, just
so that people don’t forget to specify an encoding in the open()
call which is used for opening the file.

A solution such as the one mentioned by Greg Werbin on the ideas
ML would be a better solution:

https://repl.it/@maximum__/loadfile#main.py (click on “Code” to see
the code)

Alternatively, the format modules could check the file object’s
.encoding attribute and raise a warning if a non-standard encoding
is found, e.g. the json module could check for “utf-8”.

Regardless of what we do in Python to help users with file encodings,
programmers will have to learn about these one way or another, since
the world is not perfect and we’re still not quite where we’d like
to be with text files - although things are already a lot better
than 10 years ago

E.g. it’s still not uncommon to have CSV files encoded in
Windows code page encodings.

uranusjr · March 11, 2021, 9:22am

And we need to first figure out why people don’t follow the best practice. Otherwise, even if a new function is introduced, it is very like people will still ignore it and reach for the wrong solution.

methane · March 11, 2021, 11:16am

One obvious reason is the “best practice” is tightly coupled with its implementation. It is almost leaking implementation detail.

Current JSON library supports bytes input so opening with binary mode is the best practice. But before JSON supports binary input, there was no “best” practice. encoding="utf-8" or encoding="utf-8-sig" might be used.

When users need to use JSON, YAML, TOML, csv, etc…, user need to check “should I open file with binary mode, or specify encoding? If both are OK, which is more efficient?”

If all modules supports “module.load_path(path)”, it provides the most efficient and recommended way. Modules can hide implementation detail.

It is not better solution because it doesn’t hide the implementation detail; use binary mode or specify encoding. module.load_path() can chose the most efficient way. load_file() can not.

Topic		Replies	Views
Question about json.loads() Python Help	5	515	August 26, 2020
Help: json.loads() cannot parse valid json Python Help	32	59753	August 10, 2023
I try to load json file then filter some data, but I failed to load it even Python Help help	4	6356	July 18, 2022
Introduxing JSON paths to Python dictionaries Python Help	1	206	August 4, 2023
Not finding object in dict when its in dict Python Help	6	352	March 19, 2023

Vote: New function for reading JSON from path

Related Topics