Maybe bootstrapping packaging stuff is not as difficult as it used to be.
Adopting a toml library means pypa maintains one.
Putting it in the standard library is something else.
Maybe bootstrapping packaging stuff is not as difficult as it used to be.
Adopting a toml library means pypa maintains one.
Putting it in the standard library is something else.
IMO, itād be very hard to make a full-featured solution that wonāt suck for anyone.
This journey?
download_count
key should use thousands separators by default, but only if theyāre greater than 10_000.A library that would solve the last step would be so complex that it would no longer be a good fit for users at the beginning of the journey.
I think that rather than making sure the stdlib version can be extended, we need to encourage external libraries for formatting TOML. Just like attrs
exists to fill the gaps in dataclasses
.
To clarify, I wasnāt referring to the implementation, but the projects. All the projects mentioned here except tomli are either abandonware or well on its way to become one. The only other one with hope is atoml, but from what Iāve read (sorry if I misunderstood the situation), the project needs maintenance help.
pytomlpp is maintained and was mentioned above, but does rely on C++17 in the underlying C++ implementation.
Note that the discussion regarding preservation of comments has come up there as well. Problem is that the underlying std::map
is not order-preserving (unlike pythonās dict
), and would need some pretty fancy C++ to do that while also enabling heterogeneous lookup. Itās somewhere on a long list of things Iād like to get to at some point (independently of what the stdlib does; I donāt expect cpython to adopt something as recent as C++17 anytime soon).
tomlkit is not abandonned in any way but between Poetry, Pendulum (which is being reworked) and other projects I am spread pretty thin. Tomlkit was born from a need for Poetry to have a style-preserving parser/formatter and I updated it for each TOML spec bumping. So itās currently compliant with the latest version of the spec.
And, honestly, saying it sucks while pipenv
jumped on it when it was released is ironic at best, cynical at worst.
As far as I know there are no other style-preserving parser out there, in any language, so I had to take it upon myself to build my own from the ground up. And apart from @frostming ā Sorry for not getting back to you about the proposition to co-maintain tomlkit, hit me up if you still want to talk about it ā no one here that complained about it has stepped up to help with its development.
Is tomlkit perfect? No. Does it have bugs? Yes. Does it fit in the standard library? No and a full-featured library does not make sense to be integrated in the stdlib in my opinion.
Considering weāve managed to standardize the toml without having a toml parser in the stdlib and the fact that the language tries to remove packages from the stdlib, Iād be ok with not having it in the stdlib.
First off, thanks a lot for your efforts! TBH, thereās such a proliferation of toml-libraries that despite searching for them, I missed tomlkit, even though it apparently does exactly what Iād need (Iāll try it as soon as I can). This proliferation itself is IMO an argument for inclusion in the stdlib, but reasonable people can disagree about that.
Could you explain your thinking on that a bit more? Iād understand if the API still needs to move quickly, but TOML itself looks like it will be a very stable format, so assuming the right abstraction has been found (granted, that might take 2-3 years) - why should general python users not also be enabled to write the format that is becoming more and more ubiquituous, even in the python ecosystem itself?
The argument probably goes that you might need to read toml to install the writing toml library that you prefer. There are a lot of reasons you might want a different writer but the reader only has to produce a correct dict from a toml file. sdispaterās library is probably not ready for a permanent feature freeze required by the standard library.
I very often worked in a constrained environment where only the standard library is available and relying on additional modules is either impossible (e.g. no Internet) or needs some way to be vendored (installation folder is locked and pip cannot pass through the proxy). Possibly this is a niche situation, but I know a lot of companies where such a policy apply. For me, the ābatteries includedā has been a decision changer in favor of Python compared to other languages/frameworks for development of tools.
For all the tools I developed in these environments, I came to the same trade-offs when selecting how I would store my applicationās configuration.
I understand that style-preservation de/serializing can be out-of-scope of the standard library, due to the desire of keeping performance high and size or maintenance effort low. That said, compared to the enormous footprint of the XML/eTree support library despite alternatives like lxml, even tomlkit looks thin. Providing style-preserving modification for TOML would bring to Python a new format that could compete against what is currently the unique solution, XML, which slowly shows its age and limits.
IMHO, TOML has its place in the stdlib, reading obviously, writing surely, style-preserving modificationā¦ probably. I meanā¦ We have an SMTP server in the stdlib, donāt we?
Iād like to link this article to the discussion:
The points made in that article are interesting but ultimately not relevant. Packaging has chosen TOML as a format, and itās frankly too late to change that easily even if we wanted to. The question for this thread is what TOML parser to use, not whether to use TOML in the first place.
Hi, everyone! I would like to describe a use case where writing TOML files is required and formatting is not an issue. It can serve as an example how including the most basic (i.e. without guarantees on formatting) writing capabilities can be useful.
Namely, I used TOML files as inputs to my scripts when I was working on the following research project: GitHub - yandex-research/rtdl: The `rtdl` library + The official implementation of the paper "Revisiting Deep Learning Models for Tabular Data" There are >1000 of TOML files across the repository and the vast majority of them was generated automatically. The pattern I see here is āgenerating inputs for other programsā. The formatting was not important to me at all.
Additionally, the āwriting without guarantees on formattingā approach is conceptually simple and means the same to everyone: āthe content is preserved, the formatting is up to maintainers of the TOML-writerā (honestly, I did not like at all the style of the TOML-writer I used, but it did its job just fine).
Things are getting quite hairy lately on that front:
Do I understand correctly that having a TOML parser with reading capabilities helps address the packaging bootstrapping issues? And if so, is there a chance to reach a consensus on having read-only TOML parsing in the stdlib?
The discussion in this thread has several branches, but scanning quickly, I recall that:
Thoughts on adopting tomli for the stdlib?
My thoughts are here.
My understanding is however that this discussion is about PyPA recommending a parser so stdlib inclusion may be offtopic?
Note that thereās already an ongoing discussion about stdlib inclusion of a TOML parser here Issue 40059: Provide a toml module in the standard library - Python tracker
And a general discussion about stdlib inclusions/removals Standardizing how to handle adding/removing modules from the stdlib Ā· Issue #92 Ā· python/steering-council Ā· GitHub
While everyone can have reservations about TOML as a format (I find it utterly useless and misguided myself), if a TOML reader is needed in the stdlib for packaging sanity, and since the tomli
author seems to agree with putting it in the stdlib, then why isnāt it happening already? Surely practicality can beat purity here and spare us lengthy discussions about writers, style preservation and whatnot.
Because adding something to the stdlib that is less than a year old and exists outside the stdlib by a non-core dev is always a big discussion, especially when thereās a pre-existing toml
package which I suspect people would want to use as the name in the stdlib.
This part is a bit of a problem but I donāt think is a deal-breaker. The public package can evolve as tomli
, and the vendoring into stdlib can transform it into toml. Similar to how importlib_metadata
is the 3rd party API but importlib.metadata
is the stdlib API. A bigger problem would be backwards compatibility: unless tomli packages API matches the toml API shipping a new module might break some applications when the import resolves from the standard library rather than the 3rd party package.
I think the only contentious point here compared to importlib.metadata
is that that package is maintained by a core developer, where this is not. Something that could be solved by accepting the maintainer ( @hukkinj1) as a core developer; which would make sense as he would maintain part of the standard library. Alternatively could also be a solution to convince
an existing core developer to become a co-maintainer for the tomli
package, if @hukkinj1 agrees.
A bigger problem would be backwards compatibility: unless tomli packages API matches the toml API shipping a new module might break some applications when the import resolves from the standard library rather than the 3rd party package.
Yep this is a problem. The APIs are very similar but thereās a few differences where Iām unfortunately not interested at all in matching toml
API and do think it would be a mistake to add the toml
API to the standard library.
Iāll try to list the key differences and reasons why toml
API is not always great.
toml.load
takes as input one of the following types: a text file object, pathlib.Path, a list of pathlib.Paths, or string (representing filepath).
In contrast tomli.load
only takes binary file objects as input.
Accepting the various data types that toml
does is a problem because:
load
function in the standard librarytoml.load("path_to/conf_file.toml")
is always āthat must be a TypeError, one should open
the file firstālist[pathlib.Path]
is just needless IMO, and whatever problem it solves should be trivial to solve by the consumer of the libraryopen(path, encoding="utf8", newline="")
. Omitting one of these two arguments or using other values runs the risk of incorrect parse results. TOML, specifying file encoding and valid newline sequences among other things, is simply a lot stricter format than what a text file object represents.toml.load
and toml.loads
accepts a _dict
keyword argument for parsing TOML tables to other mapping types than dict
. In contrast, tomli
has not such keyword argument.
Itās not exactly clear what the value of using other type than dict
here would be, but this sure seems like an easy way to introduce bugs. And also load
objects that raise TypeError
when dump
ed.
toml.load
and toml.loads
accept a decoder
keyword argument for customizing decoding. The decoder must implement toml.TomlDecoder
interface.
tomli
doesnāt have any of this.
It seems this is mostly useful for comment preservation, which I donāt want a poor implementation of. Also, the toml.TomlDecoder
interface / base class with its 9 public methods seems a bit messy, not something Iād want to recreate or support.
toml
uses and exposes custom toml.tz.TomlTz
timezone objects. In contrast tomli
uses datetime.timezone
s from the standard library.
toml
raises TomlDecodeError
s while tomli
raises TOMLDecodeErrors
. The casing that toml
uses conflicts with PEP8 and standard library conventions.
toml
includes the whole encode/dump API while tomli
does not. This is probably the most breaking difference out of all of these.
So yeah thereās actually quite many differences considering how small the APIs are. Not sure I even have everything listed here.
In conclusion, if itās required to match toml
API perfectly, then I think I prefer to not add tomli
to the standard library.
I think the only contentious point here compared to
importlib.metadata
is that that package is maintained by a core developer, where this is not. Something that could be solved by accepting the maintainer ( @hukkinj1) as a core developer; which would make sense as he would maintain part of the standard library. Alternatively could also be a solution toconvince
an existing core developer to become a co-maintainer for thetomli
package, if @hukkinj1 agrees.
I donāt really have a problem with either (or both) of these approaches.
I mostly agree with the design choices.
Based on this, paths ahead:
toml
library, as guarantees backwards compatibility.tomli
library as is and use the tomli namespace.tomli
library under the toml
id and put it under some namespace (similar to how importlib.metadata is under importlib). There isnāt any good place I see at the moment though.So Iām personally in favour of 2. Either way, this would likely require a PEP, @hukkinj1 Iām open to writing that up if you co-sign it, probably the least controversial path would be to find a core developer thatās willing to co-maintain the library to also sign-off on it. Perhaps @pf_moore might be willing to help out here.
We cannot do (3). It would break all software that uses the toml
package.
We can always use a different name for a toml parser in the stdlib module, e.g. tomllib
(like pathlib
or contextlib
) or tomlparser
(like configparser
). This would prevent any conflict with upstream projects.