PEP 680: "tomllib" Support for parsing TOML in the Standard Library

CAM-Gerlach · January 11, 2022, 9:47pm

Thanks @hauntsaninja , @hukkinj1 and @encukou !

For reference (might be good to include in the OP, @hauntsaninja ), the PEP was spawned from this previous discussion:

I’ll submit a PR with some formatting and copyediting fixes shortly, and raise any more substantial issues here.

I guess my one real concern is, as a number of others have mentioned, the lack of write support, for several reasons.

As the PEP links to, usecases requiring write are at least 1/3 as common as those requiring read, so we’d be excluding a large fraction of the potential use cases that could benefit for this (perhaps close to 50% more use cases than those that just need read support, assuming most use cases that write TOML also read it).
This also isn’t consistent with any (AFAIK) of the other similar formats (JSON, CSV, INI, Pickle, XML, etc.) in the standard library that all support both read and write, particularly the idiomatic load/loads and dump/dumps API common in both the standard library and third party packages.
The fact that the stdlib has read but not write support (unlike other formats, as above) is likely to confuse and frustrate users, and potentially push them toward other legacy or less-appropriate formats, as well as (combined with its inclusion in the stdlib and accelerating use in other contexts leading to widespread adoption), generating a substantial volume of questions, complaints, bug tickets and support requests,
This, in turn, prevents users restricted to the stdlib for various reasons from taking advantage of it, and requiring others to have to find, decide on, install and depend upon a third party package to do so. Furthermore, given the most popular current option is (and many of the others are) unmaintained, buggy and does not support the current stable standard (that tomli does), there is a serious risk of users either having to spend substantial effort finding a “good” choice, or picking a “bad” option, and either splitting their reads and writes over two different libraries or forgoing the stdlib altogether.
Also, as the PEP mentions, it would make moving to the toml name substantially more viable (though still probably not recommended, at least until usage of toml dies out and tomllib can be relied upon, due to the backward compat breakage).

Addressing the objections in the PEP:

The ability to write TOML is not needed for the use cases that motivate this PEP: for core Python packaging use cases or for tools that need to read configuration.

Use cases that involve editing TOML (as opposed to writing brand new TOML) are better served by a style preserving library.

Both in my experience using toml for a number of different packages, as well as browsing the first few pages of grep.app hits for toml.load (as linked in the PEP), the great majority of use cases are not for editing TOML, but rather creating it, particularly for outputting a default/initial config file for an application, tool, script, etc. (quite common), and various other use cases that didn’t involve a round-trip from user created/edited TOML. Furthermore, the fact that toml continues to see such large usage despite it not preserving any style and tomlkit and other alternatives doing so is a testament to it not being critical for the great majority of applications.

But even without considering style preservation, there are too many degrees of freedom in how to design a write API. For example, how much control to allow users over output formatting, over serialization of custom types, and over input and output validation. While there are reasonable choices on how to resolve these, the nature of the standard library is such that one only gets one chance to get things right.

Is there a reason why the existing, minimal toml_w API for this cannot be adopted for the initial implementation, which doesn’t include these features at all and should satisfy most use cases, and then support added later for them if there is demand? I’m a little confused about how “the nature of the standard library is such that one only gets one chance to get things right” applies if we simply don’t include those features at all in an initial implementation, and only add them later, which is the exact same rationale presented for write support as a whole.

Currently no CPython core developers have expressed willingness to maintain a write API or sponsor a PEP that includes a write API.

Perhaps the blockers here could be further explained? Tests aside, tomli_w is only 167 sloc in a single module (at least a few dozen of which appears similar or identical to code in tomli, and could be de-duplicated), vs 658 sloc split over multiple modules for tomli. Is maintaining around 20% more lines of simple, modern, well-tested code really a hard blocker to capturing perhaps as much as 50% more use cases, as well as the other points mentioned above? Or are there other reasons why maintenance of even a minimal, existing write implementation such as tomli_w’s is a burden? Is there anything others can do to help?