PEP 680: "tomllib" Support for parsing TOML in the Standard Library

It’s a great library widely used as an alternative to setup.py format.

I don’t see any other benefits of this library in standard library. Pip is a separate package and have it’s own lifecycle.

I’m in favor of this, and I’ll resist the temptation to bikeshed on the name :smiley:

One of the problems $we (corporate overlords) have encountered is toolchain issues with moving to a pyproject.toml future. Some tools vendor their TOML library, others make them build requirements with very tricky to support bootstrapping mechanisms. I just want to reiterate that in corporate environments, building from sdists rather than just consuming PyPI wheels is an important use case. In the scientific community, that’s increasingly difficult, but for the majority of packages that need to be internally imported from PyPI, building from sdist is greatly preferred. That means we don’t want to have to have special hacks to bootstrap pyproject.toml support. I think having a TOML library in the stdlib will (eventually, of course) help the packaging ecosystem finally move away from setup.py and friends, and that would be a Good Thing!

2 Likes

I’ll quip in to note that i fancy naming it as toml just as we do with json.

2 Likes

That’s sadly not possible, as we’d break most likely all tools/libraries using the widely popular toml · PyPI. The name chosen was specifically to avoid this quagmire.

2 Likes

I don’t want to lengthen the discussion on a simple issue. Though i understand, it feels officially throwing Python’s elegance down the drain for a 3rd party library. Nobody’s at fault, just the end result of it.

1 Like
$ find Lib -maxdepth 1 -name '*lib' -or -name '*lib.py' | wc -l
18

just saying…

(Disclaimer: I proposed the name tomllib first).

1 Like

Yes, but you import csv, json … i mean data formats

Edit: I am not aware of non-pure dataname imports

The PEP does mention plistlib, even if it’s an outlier.

We have two file format libraries: plistlib and xdrlib. Sun XDR is a binary data format used by NFS and RPCd.

Has anyone tried asking the current toml project whether they’d give us the name? They might be as disappointed with the json/csv/etc. inconsistency as the rest of us :wink:

1 Like

Yes, and the author is currently MIA. @pradyunsg has even put in a request to take over the project, but there is still potential compatibility issues as the API from toml is not the same as tomli.

This is true whenever we add anything to the stdlib. We might be aware of it this time, but it’s always true. Just one of the costs of updating Python version, and in ten years time I’d rather everyone be using toml rather than tomllib. If the existing toml is unresponsive, even more reason to claim the name now before it becomes typo-bait in a few years time.

I haven’t looked at either of the APIs, so I don’t have any idea how they differ, but I could see us starting with already-deprecated shims to help transition known common cases.

7 Likes

FYI, the PEP has an appendix discussing all the differences and their compat implications. At the very minimum, write support would need to be added (which I favor), but there would still be the issue of libraries/applications, even those using pinned deps, because the stdlib package would shadow the third party package and make it impossible to use, aside from hacks.

1 Like

I think we should start with read only API. We can always add right at a later point in time. The only way we keep the toml name without introducing potential backwards incompatibility if we have some future flag to toggle toml being imported from stdlib or 3rd party. The problem is that toml is very popular, and a lot of tools adopted it. And while we could track down public usages and force them to change it, that would not help with all the breaking changes we don’t see behind enterprise firewalls. Generally python 3 remains mostly compatible with previous Python versions, this would be a major change from that. I feel people understimate the potential impact here, similarly python 2 to 3 transition was initially.

2 Likes

Thanks @hauntsaninja , @hukkinj1 and @encukou !

For reference (might be good to include in the OP, @hauntsaninja ), the PEP was spawned from this previous discussion:

I’ll submit a PR with some formatting and copyediting fixes shortly, and raise any more substantial issues here.

I guess my one real concern is, as a number of others have mentioned, the lack of write support, for several reasons.

  • As the PEP links to, usecases requiring write are at least 1/3 as common as those requiring read, so we’d be excluding a large fraction of the potential use cases that could benefit for this (perhaps close to 50% more use cases than those that just need read support, assuming most use cases that write TOML also read it).

  • This also isn’t consistent with any (AFAIK) of the other similar formats (JSON, CSV, INI, Pickle, XML, etc.) in the standard library that all support both read and write, particularly the idiomatic load/loads and dump/dumps API common in both the standard library and third party packages.

  • The fact that the stdlib has read but not write support (unlike other formats, as above) is likely to confuse and frustrate users, and potentially push them toward other legacy or less-appropriate formats, as well as (combined with its inclusion in the stdlib and accelerating use in other contexts leading to widespread adoption), generating a substantial volume of questions, complaints, bug tickets and support requests,

  • This, in turn, prevents users restricted to the stdlib for various reasons from taking advantage of it, and requiring others to have to find, decide on, install and depend upon a third party package to do so. Furthermore, given the most popular current option is (and many of the others are) unmaintained, buggy and does not support the current stable standard (that tomli does), there is a serious risk of users either having to spend substantial effort finding a “good” choice, or picking a “bad” option, and either splitting their reads and writes over two different libraries or forgoing the stdlib altogether.

  • Also, as the PEP mentions, it would make moving to the toml name substantially more viable (though still probably not recommended, at least until usage of toml dies out and tomllib can be relied upon, due to the backward compat breakage).

Addressing the objections in the PEP:

The ability to write TOML is not needed for the use cases that motivate this PEP: for core Python packaging use cases or for tools that need to read configuration.

Use cases that involve editing TOML (as opposed to writing brand new TOML) are better served by a style preserving library.

Both in my experience using toml for a number of different packages, as well as browsing the first few pages of grep.app hits for toml.load (as linked in the PEP), the great majority of use cases are not for editing TOML, but rather creating it, particularly for outputting a default/initial config file for an application, tool, script, etc. (quite common), and various other use cases that didn’t involve a round-trip from user created/edited TOML. Furthermore, the fact that toml continues to see such large usage despite it not preserving any style and tomlkit and other alternatives doing so is a testament to it not being critical for the great majority of applications.

But even without considering style preservation, there are too many degrees of freedom in how to design a write API. For example, how much control to allow users over output formatting, over serialization of custom types, and over input and output validation. While there are reasonable choices on how to resolve these, the nature of the standard library is such that one only gets one chance to get things right.

Is there a reason why the existing, minimal toml_w API for this cannot be adopted for the initial implementation, which doesn’t include these features at all and should satisfy most use cases, and then support added later for them if there is demand? I’m a little confused about how “the nature of the standard library is such that one only gets one chance to get things right” applies if we simply don’t include those features at all in an initial implementation, and only add them later, which is the exact same rationale presented for write support as a whole.

Currently no CPython core developers have expressed willingness to maintain a write API or sponsor a PEP that includes a write API.

Perhaps the blockers here could be further explained? Tests aside, tomli_w is only 167 sloc in a single module (at least a few dozen of which appears similar or identical to code in tomli, and could be de-duplicated), vs 658 sloc split over multiple modules for tomli. Is maintaining around 20% more lines of simple, modern, well-tested code really a hard blocker to capturing perhaps as much as 50% more use cases, as well as the other points mentioned above? Or are there other reasons why maintenance of even a minimal, existing write implementation such as tomli_w's is a burden? Is there anything others can do to help?

1 Like

Thanks, that’s a helpful summary. It doesn’t look as bad as I feared, though of course the name shadowing is a major concern during a transition period. (I’m nowhere near as pessimistic as Bernát, though :slight_smile: )

If we manage to agree on renaming the existing toml before this ships, there’s no reason we couldn’t warn and recommend updating names to the new version. Python’s 3.x version is also pinned, and so users who change runtime version may also need to update other pinned versions. We don’t have to treat the choice of runtime version as more flexible than the packages running on it.

3 Likes

Because when it comes to the stdlib, there is no “initial”, only “final”.

Please understand that any API that lands in the stdlib is extremely hard to change (for instance, simply ditching some attributes on modules is going to take me around 7 years to accomplish). Basic stdlib policy is you start small and grow, not make assumptions and hope for the best.

And when it comes to human-readable output (which JSON, CSV, pickle, and arguably XML are not), everyone has an opinion (see every single style guide and formatter as examples of that). So asking for write support is a big ask on the Python core team to support, deal with feature requests, etc. in order to either appease everyone or constantly reject people’s asks. It’s way easier to push that to the community to support or ask people to use f-strings for their e.g. TOML templates. Plus writing TOML is way easier than reading it.

4 Likes

Awesome, thanks! :smiley_cat:

The main motivation for this PEP in my mind is fixing pyproject.toml based Python bootstrapping nightmare. And also enabling tools an easy way to read configuration from that same file.

The main motivation is not to maximize use cases, and I don’t think the stdlib should attempt to maximize use cases now that we have PyPI, packaging standards and the Internet. But that’s another discussion.

This is true. But this PEP doesn’t remove the option of adding write API in the future.

The idea is to mention tomli_w and other third-party TOML writers clearly in the docs.

The PEP tries to explain the degrees of freedom that we must take a stance on. Even simple things like the default output style (indentation width, single or double quotes etc.) should be considered features and changing them is a breaking change in the stdlib IMO.

3 Likes

Thanks CAM! I’ll edit the top level post to include links to previous discussion.

On the name “toml”:

I think we can all agree that the “toml” name is better.

Breakage seems unavoidable as discussed here
The amount of breakage is dicussed here

So it seems to me that the only productive discussion on this subject is determining how much breakage is acceptable, where I believe CPython has a high prior on the answer “zero breakage”. If there isn’t consensus or a ruling from the SC, my view is to stick with “tomllib”.

On adding a write API:

Discussed in the PEP here

A previous version of the PEP had some more detail on this, including listing reasons why a write API would be useful (nothing that CAM doesn’t mention) and a concrete discussion of the degrees of freedom in the design space. This can be found in Editing for the tomllib PEP by encukou · Pull Request #3 · hauntsaninja/peps · GitHub (see Appendix B + the section discussing write API). If people find it useful, I can restore that discussion.

I think people’s view of this depends on how much they weigh the core motivations of reading pyproject.toml vs that of using TOML more generally. It’s not the worst thing to push users to PyPI when they would be better served by more capable packages, but might stick to the standard library schelling point out of inertia (I’ve found several uses of “toml” that would likely be served better by “tomlkit”). Petr Viktorin made an interesting point on another thread that this might actually be a good reason to not use the name “toml”: to leave the more desirable name to the community.

As others have said, the current PEP doesn’t preclude the possibility of a write API in the future. The main reason to push for including a write API in this PEP, rather than later, is if we feel a) that it is likely that we want to adopt the “toml” name, b) we believe including a write API will effectively minimise the breakage of doing so. Since I’m inclined to think using the “toml” name is not an option, I am very happy to have discussion of a write API deferred.

With all that said, I would be curious if any CPython core developers feel strongly enough about this that they’d be willing to take on maintenance of a write API.

1 Like

Not me, certainly. Personally, I think of TOML as very much a human-editable format, whereas I see JSON, XML, CSV and similar as much less so. The fiddly details of formatting output in a write API are much more controversial in a format that’s human editable than in one that’s not. So I don’t think the design constraints for TOML are the same as those other formats. And as a result, I think it’s the right decision to not include a write API, at least in the initial version, and possibly never in the stdlib (because it’s too hard to change the stdlib).

3 Likes