PEP 680: "tomllib" Support for parsing TOML in the Standard Library

hauntsaninja · January 11, 2022, 1:56pm

Some previous discussion at:

hukkinj1 · January 11, 2022, 2:10pm

Great! @dustin would it make sense to reserve the name tomllib for this PEP now?

dustin · January 11, 2022, 3:15pm

I’ve reserved the name tomllib on PyPI.

vstinner · January 11, 2022, 4:30pm

Since typosquatting is a real threat, I suggest to reserve tomlib as well

eirnym · January 11, 2022, 5:33pm

It’s a great library widely used as an alternative to setup.py format.

I don’t see any other benefits of this library in standard library. Pip is a separate package and have it’s own lifecycle.

barry · January 11, 2022, 6:25pm

I’m in favor of this, and I’ll resist the temptation to bikeshed on the name

One of the problems $we (corporate overlords) have encountered is toolchain issues with moving to a pyproject.toml future. Some tools vendor their TOML library, others make them build requirements with very tricky to support bootstrapping mechanisms. I just want to reiterate that in corporate environments, building from sdists rather than just consuming PyPI wheels is an important use case. In the scientific community, that’s increasingly difficult, but for the majority of packages that need to be internally imported from PyPI, building from sdist is greatly preferred. That means we don’t want to have to have special hacks to bootstrap pyproject.toml support. I think having a TOML library in the stdlib will (eventually, of course) help the packaging ecosystem finally move away from setup.py and friends, and that would be a Good Thing!

Abdur-rahmaanJ · January 11, 2022, 7:05pm

I’ll quip in to note that i fancy naming it as toml just as we do with json.

bernatgabor · January 11, 2022, 7:17pm

That’s sadly not possible, as we’d break most likely all tools/libraries using the widely popular toml · PyPI. The name chosen was specifically to avoid this quagmire.

Abdur-rahmaanJ · January 11, 2022, 7:21pm

I don’t want to lengthen the discussion on a simple issue. Though i understand, it feels officially throwing Python’s elegance down the drain for a 3rd party library. Nobody’s at fault, just the end result of it.

tiran · January 11, 2022, 7:50pm

$ find Lib -maxdepth 1 -name '*lib' -or -name '*lib.py' | wc -l
18

just saying…

(Disclaimer: I proposed the name tomllib first).

Abdur-rahmaanJ · January 11, 2022, 7:57pm

Yes, but you import csv, json … i mean data formats

Edit: I am not aware of non-pure dataname imports

EpicWink · January 11, 2022, 8:02pm

The PEP does mention plistlib, even if it’s an outlier.

tiran · January 11, 2022, 8:14pm

We have two file format libraries: plistlib and xdrlib. Sun XDR is a binary data format used by NFS and RPCd.

steve.dower · January 11, 2022, 9:18pm

Has anyone tried asking the current toml project whether they’d give us the name? They might be as disappointed with the json/csv/etc. inconsistency as the rest of us

brettcannon · January 11, 2022, 9:22pm

Yes, and the author is currently MIA. @pradyunsg has even put in a request to take over the project, but there is still potential compatibility issues as the API from toml is not the same as tomli.

steve.dower · January 11, 2022, 9:27pm

This is true whenever we add anything to the stdlib. We might be aware of it this time, but it’s always true. Just one of the costs of updating Python version, and in ten years time I’d rather everyone be using toml rather than tomllib. If the existing toml is unresponsive, even more reason to claim the name now before it becomes typo-bait in a few years time.

I haven’t looked at either of the APIs, so I don’t have any idea how they differ, but I could see us starting with already-deprecated shims to help transition known common cases.

CAM-Gerlach · January 11, 2022, 9:32pm

FYI, the PEP has an appendix discussing all the differences and their compat implications. At the very minimum, write support would need to be added (which I favor), but there would still be the issue of libraries/applications, even those using pinned deps, because the stdlib package would shadow the third party package and make it impossible to use, aside from hacks.

bernatgabor · January 11, 2022, 9:40pm

I think we should start with read only API. We can always add right at a later point in time. The only way we keep the toml name without introducing potential backwards incompatibility if we have some future flag to toggle toml being imported from stdlib or 3rd party. The problem is that toml is very popular, and a lot of tools adopted it. And while we could track down public usages and force them to change it, that would not help with all the breaking changes we don’t see behind enterprise firewalls. Generally python 3 remains mostly compatible with previous Python versions, this would be a major change from that. I feel people understimate the potential impact here, similarly python 2 to 3 transition was initially.

CAM-Gerlach · January 11, 2022, 9:47pm

Thanks @hauntsaninja , @hukkinj1 and @encukou !

For reference (might be good to include in the OP, @hauntsaninja ), the PEP was spawned from this previous discussion:

I’ll submit a PR with some formatting and copyediting fixes shortly, and raise any more substantial issues here.

I guess my one real concern is, as a number of others have mentioned, the lack of write support, for several reasons.

As the PEP links to, usecases requiring write are at least 1/3 as common as those requiring read, so we’d be excluding a large fraction of the potential use cases that could benefit for this (perhaps close to 50% more use cases than those that just need read support, assuming most use cases that write TOML also read it).
This also isn’t consistent with any (AFAIK) of the other similar formats (JSON, CSV, INI, Pickle, XML, etc.) in the standard library that all support both read and write, particularly the idiomatic load/loads and dump/dumps API common in both the standard library and third party packages.
The fact that the stdlib has read but not write support (unlike other formats, as above) is likely to confuse and frustrate users, and potentially push them toward other legacy or less-appropriate formats, as well as (combined with its inclusion in the stdlib and accelerating use in other contexts leading to widespread adoption), generating a substantial volume of questions, complaints, bug tickets and support requests,
This, in turn, prevents users restricted to the stdlib for various reasons from taking advantage of it, and requiring others to have to find, decide on, install and depend upon a third party package to do so. Furthermore, given the most popular current option is (and many of the others are) unmaintained, buggy and does not support the current stable standard (that tomli does), there is a serious risk of users either having to spend substantial effort finding a “good” choice, or picking a “bad” option, and either splitting their reads and writes over two different libraries or forgoing the stdlib altogether.
Also, as the PEP mentions, it would make moving to the toml name substantially more viable (though still probably not recommended, at least until usage of toml dies out and tomllib can be relied upon, due to the backward compat breakage).

Addressing the objections in the PEP:

The ability to write TOML is not needed for the use cases that motivate this PEP: for core Python packaging use cases or for tools that need to read configuration.

Use cases that involve editing TOML (as opposed to writing brand new TOML) are better served by a style preserving library.

Both in my experience using toml for a number of different packages, as well as browsing the first few pages of grep.app hits for toml.load (as linked in the PEP), the great majority of use cases are not for editing TOML, but rather creating it, particularly for outputting a default/initial config file for an application, tool, script, etc. (quite common), and various other use cases that didn’t involve a round-trip from user created/edited TOML. Furthermore, the fact that toml continues to see such large usage despite it not preserving any style and tomlkit and other alternatives doing so is a testament to it not being critical for the great majority of applications.

But even without considering style preservation, there are too many degrees of freedom in how to design a write API. For example, how much control to allow users over output formatting, over serialization of custom types, and over input and output validation. While there are reasonable choices on how to resolve these, the nature of the standard library is such that one only gets one chance to get things right.

Is there a reason why the existing, minimal toml_w API for this cannot be adopted for the initial implementation, which doesn’t include these features at all and should satisfy most use cases, and then support added later for them if there is demand? I’m a little confused about how “the nature of the standard library is such that one only gets one chance to get things right” applies if we simply don’t include those features at all in an initial implementation, and only add them later, which is the exact same rationale presented for write support as a whole.

Currently no CPython core developers have expressed willingness to maintain a write API or sponsor a PEP that includes a write API.

Perhaps the blockers here could be further explained? Tests aside, tomli_w is only 167 sloc in a single module (at least a few dozen of which appears similar or identical to code in tomli, and could be de-duplicated), vs 658 sloc split over multiple modules for tomli. Is maintaining around 20% more lines of simple, modern, well-tested code really a hard blocker to capturing perhaps as much as 50% more use cases, as well as the other points mentioned above? Or are there other reasons why maintenance of even a minimal, existing write implementation such as tomli_w’s is a burden? Is there anything others can do to help?

steve.dower · January 11, 2022, 9:48pm

Thanks, that’s a helpful summary. It doesn’t look as bad as I feared, though of course the name shadowing is a major concern during a transition period. (I’m nowhere near as pessimistic as Bernát, though )

If we manage to agree on renaming the existing toml before this ships, there’s no reason we couldn’t warn and recommend updating names to the new version. Python’s 3.x version is also pinned, and so users who change runtime version may also need to update other pinned versions. We don’t have to treat the choice of runtime version as more flexible than the packages running on it.