I tend to be in the anarchistic laissez faire camp about this, but I don’t see why there must be something official or semi-official. The big benefit I see to including a TOML parser in the standard library is that there’s a standard file that packaging libraries need to read in many contexts, and bootstrapping packaging stuff is difficult, so it’s helpful to minimize dependencies. If we adopt a TOML reader for those purposes, also developing a new, full-featured TOML library just so that there’s an “official recommendation” seems unnecessary. Just let the best library win.
I think the main issue here is there’s no best library, every one of the full-featured solution kind of sucks.
Maybe bootstrapping packaging stuff is not as difficult as it used to be.
Adopting a toml library means pypa maintains one.
Putting it in the standard library is something else.
IMO, it’d be very hard to make a full-featured solution that won’t suck for anyone.
- I don’t need to write TOML at all.
- Just write a TOML file so other tools can read it; I don’t care about formatting.
- Also, numeric values under the
download_countkey should use thousands separators by default, but only if they’re greater than 10_000.
A library that would solve the last step would be so complex that it would no longer be a good fit for users at the beginning of the journey.
I think that rather than making sure the stdlib version can be extended, we need to encourage external libraries for formatting TOML. Just like
attrs exists to fill the gaps in
To clarify, I wasn’t referring to the implementation, but the projects. All the projects mentioned here except tomli are either abandonware or well on its way to become one. The only other one with hope is atoml, but from what I’ve read (sorry if I misunderstood the situation), the project needs maintenance help.
pytomlpp is maintained and was mentioned above, but does rely on C++17 in the underlying C++ implementation.
Note that the discussion regarding preservation of comments has come up there as well. Problem is that the underlying
std::map is not order-preserving (unlike python’s
dict), and would need some pretty fancy C++ to do that while also enabling heterogeneous lookup. It’s somewhere on a long list of things I’d like to get to at some point (independently of what the stdlib does; I don’t expect cpython to adopt something as recent as C++17 anytime soon).
tomlkit is not abandonned in any way but between Poetry, Pendulum (which is being reworked) and other projects I am spread pretty thin. Tomlkit was born from a need for Poetry to have a style-preserving parser/formatter and I updated it for each TOML spec bumping. So it’s currently compliant with the latest version of the spec.
And, honestly, saying it sucks while
pipenv jumped on it when it was released is ironic at best, cynical at worst.
As far as I know there are no other style-preserving parser out there, in any language, so I had to take it upon myself to build my own from the ground up. And apart from @frostming – Sorry for not getting back to you about the proposition to co-maintain tomlkit, hit me up if you still want to talk about it – no one here that complained about it has stepped up to help with its development.
Is tomlkit perfect? No. Does it have bugs? Yes. Does it fit in the standard library? No and a full-featured library does not make sense to be integrated in the stdlib in my opinion.
Considering we’ve managed to standardize the toml without having a toml parser in the stdlib and the fact that the language tries to remove packages from the stdlib, I’d be ok with not having it in the stdlib.
First off, thanks a lot for your efforts! TBH, there’s such a proliferation of toml-libraries that despite searching for them, I missed tomlkit, even though it apparently does exactly what I’d need (I’ll try it as soon as I can). This proliferation itself is IMO an argument for inclusion in the stdlib, but reasonable people can disagree about that.
Could you explain your thinking on that a bit more? I’d understand if the API still needs to move quickly, but TOML itself looks like it will be a very stable format, so assuming the right abstraction has been found (granted, that might take 2-3 years) - why should general python users not also be enabled to write the format that is becoming more and more ubiquituous, even in the python ecosystem itself?
The argument probably goes that you might need to read toml to install the writing toml library that you prefer. There are a lot of reasons you might want a different writer but the reader only has to produce a correct dict from a toml file. sdispater’s library is probably not ready for a permanent feature freeze required by the standard library.
I very often worked in a constrained environment where only the standard library is available and relying on additional modules is either impossible (e.g. no Internet) or needs some way to be vendored (installation folder is locked and pip cannot pass through the proxy). Possibly this is a niche situation, but I know a lot of companies where such a policy apply. For me, the “batteries included” has been a decision changer in favor of Python compared to other languages/frameworks for development of tools.
For all the tools I developed in these environments, I came to the same trade-offs when selecting how I would store my application’s configuration.
- Conf/INI is human-friendly, and the implementation in Python makes it really no-surprise (the concept for the syntax of conf/ini is widely known and the parser available in Python supports a large variety of separators, quotation, …). However, it falls short when the necessary structure grows in complexity.
- JSON has a much better support for structured data, but it doesn’t count as a human-editable format. So it can be used for persistence, where data compatibility makes a binary format less suited (cross-version or cross-tool compatibility, or when you need developers to edit the configuration but users should not).
- Python data, but it’s not easily writable and you may not wish (inexperienced) users to mess with the internals of your program (and for example defeat a nice error catcher you wrote expressly so that users don’t poke you everyday because they got a cryptic error)
- XML, the format is both human-readable and provides complex structuring, with comments for self-documentation. However, there will also be a trade-off as the format is very verbose and extremely strict. So the more complex your structure, the less human-readable and human-editable it will be.
I understand that style-preservation de/serializing can be out-of-scope of the standard library, due to the desire of keeping performance high and size or maintenance effort low. That said, compared to the enormous footprint of the XML/eTree support library despite alternatives like lxml, even tomlkit looks thin. Providing style-preserving modification for TOML would bring to Python a new format that could compete against what is currently the unique solution, XML, which slowly shows its age and limits.
IMHO, TOML has its place in the stdlib, reading obviously, writing surely, style-preserving modification… probably. I mean… We have an SMTP server in the stdlib, don’t we?
I’d like to link this article to the discussion:
The points made in that article are interesting but ultimately not relevant. Packaging has chosen TOML as a format, and it’s frankly too late to change that easily even if we wanted to. The question for this thread is what TOML parser to use, not whether to use TOML in the first place.
Hi, everyone! I would like to describe a use case where writing TOML files is required and formatting is not an issue. It can serve as an example how including the most basic (i.e. without guarantees on formatting) writing capabilities can be useful.
Namely, I used TOML files as inputs to my scripts when I was working on the following research project: GitHub - yandex-research/rtdl: The `rtdl` library + The official implementation of the paper "Revisiting Deep Learning Models for Tabular Data" There are >1000 of TOML files across the repository and the vast majority of them was generated automatically. The pattern I see here is “generating inputs for other programs”. The formatting was not important to me at all.
Additionally, the “writing without guarantees on formatting” approach is conceptually simple and means the same to everyone: “the content is preserved, the formatting is up to maintainers of the TOML-writer” (honestly, I did not like at all the style of the TOML-writer I used, but it did its job just fine).
Things are getting quite hairy lately on that front:
Do I understand correctly that having a TOML parser with reading capabilities helps address the packaging bootstrapping issues? And if so, is there a chance to reach a consensus on having read-only TOML parsing in the stdlib?
The discussion in this thread has several branches, but scanning quickly, I recall that:
- @sdispater has stated he doesn’t want tomlkit in the stdlib, so that rules tomlkit out.
- @brettcannon has asked whether a “(probably massive) discussion about the future of the stdlib” should be had before making this decision, but I perceive that TOML parsing is more urgent.
- @bernatgabor is inclined to not including any TOML parser in the stdlib, but it looks like it would indeed be useful.
- Several people have stated that they’d rather not want this to bikeshed into the pros/cons of TOML with respect to other formats.
- While writing capabilities seems desirable for some use cases, it is not clear that these use cases warrant adding such complexity to the stdlib.
- @hukkinj1 has stated that tomli is “spec 1.0.0 compliant and has 100% test coverage”
Thoughts on adopting tomli for the stdlib?
My thoughts are here.
My understanding is however that this discussion is about PyPA recommending a parser so stdlib inclusion may be offtopic?
Note that there’s already an ongoing discussion about stdlib inclusion of a TOML parser here Issue 40059: Provide a toml module in the standard library - Python tracker
And a general discussion about stdlib inclusions/removals Standardizing how to handle adding/removing modules from the stdlib · Issue #92 · python/steering-council · GitHub
While everyone can have reservations about TOML as a format (I find it utterly useless and misguided myself), if a TOML reader is needed in the stdlib for packaging sanity, and since the
tomli author seems to agree with putting it in the stdlib, then why isn’t it happening already? Surely practicality can beat purity here and spare us lengthy discussions about writers, style preservation and whatnot.
Because adding something to the stdlib that is less than a year old and exists outside the stdlib by a non-core dev is always a big discussion, especially when there’s a pre-existing
toml package which I suspect people would want to use as the name in the stdlib.
This part is a bit of a problem but I don’t think is a deal-breaker. The public package can evolve as
tomli, and the vendoring into stdlib can transform it into toml. Similar to how
importlib_metadata is the 3rd party API but
importlib.metadata is the stdlib API. A bigger problem would be backwards compatibility: unless tomli packages API matches the toml API shipping a new module might break some applications when the import resolves from the standard library rather than the 3rd party package.
I think the only contentious point here compared to
importlib.metadata is that that package is maintained by a core developer, where this is not. Something that could be solved by accepting the maintainer ( @hukkinj1) as a core developer; which would make sense as he would maintain part of the standard library. Alternatively could also be a solution to
convince an existing core developer to become a co-maintainer for the
tomli package, if @hukkinj1 agrees.
A bigger problem would be backwards compatibility: unless tomli packages API matches the toml API shipping a new module might break some applications when the import resolves from the standard library rather than the 3rd party package.
Yep this is a problem. The APIs are very similar but there’s a few differences where I’m unfortunately not interested at all in matching
toml API and do think it would be a mistake to add the
toml API to the standard library.
I’ll try to list the key differences and reasons why
toml API is not always great.
toml.loadtakes as input one of the following types: a text file object, pathlib.Path, a list of pathlib.Paths, or string (representing filepath).
tomli.loadonly takes binary file objects as input.
Accepting the various data types that
tomldoes is a problem because:
- it is unlike the behavior of any other
loadfunction in the standard library
- Accepting many types makes for code that is hard to read. My first thought when I see
toml.load("path_to/conf_file.toml")is always “that must be a TypeError, one should
openthe file first”
list[pathlib.Path]is just needless IMO, and whatever problem it solves should be trivial to solve by the consumer of the library
- accepting a text file object (instead of binary file object) is the easiest footgun ever, because correctly parsing TOML requires setting arguments as follows
open(path, encoding="utf8", newline=""). Omitting one of these two arguments or using other values runs the risk of incorrect parse results. TOML, specifying file encoding and valid newline sequences among other things, is simply a lot stricter format than what a text file object represents.
- it is unlike the behavior of any other
_dictkeyword argument for parsing TOML tables to other mapping types than
dict. In contrast,
tomlihas not such keyword argument.
It’s not exactly clear what the value of using other type than
dicthere would be, but this sure seems like an easy way to introduce bugs. And also
loadobjects that raise
decoderkeyword argument for customizing decoding. The decoder must implement
tomlidoesn’t have any of this.
It seems this is mostly useful for comment preservation, which I don’t want a poor implementation of. Also, the
toml.TomlDecoderinterface / base class with its 9 public methods seems a bit messy, not something I’d want to recreate or support.
tomluses and exposes custom
toml.tz.TomlTztimezone objects. In contrast
datetime.timezones from the standard library.
TOMLDecodeErrors. The casing that
tomluses conflicts with PEP8 and standard library conventions.
tomlincludes the whole encode/dump API while
tomlidoes not. This is probably the most breaking difference out of all of these.
So yeah there’s actually quite many differences considering how small the APIs are. Not sure I even have everything listed here.
In conclusion, if it’s required to match
toml API perfectly, then I think I prefer to not add
tomli to the standard library.
I think the only contentious point here compared to
importlib.metadatais that that package is maintained by a core developer, where this is not. Something that could be solved by accepting the maintainer ( @hukkinj1) as a core developer; which would make sense as he would maintain part of the standard library. Alternatively could also be a solution to
convincean existing core developer to become a co-maintainer for the
tomlipackage, if @hukkinj1 agrees.
I don’t really have a problem with either (or both) of these approaches.