While everyone can have reservations about TOML as a format (I find it utterly useless and misguided myself), if a TOML reader is needed in the stdlib for packaging sanity, and since the tomli
author seems to agree with putting it in the stdlib, then why isn’t it happening already? Surely practicality can beat purity here and spare us lengthy discussions about writers, style preservation and whatnot.
Because adding something to the stdlib that is less than a year old and exists outside the stdlib by a non-core dev is always a big discussion, especially when there’s a pre-existing toml
package which I suspect people would want to use as the name in the stdlib.
This part is a bit of a problem but I don’t think is a deal-breaker. The public package can evolve as tomli
, and the vendoring into stdlib can transform it into toml. Similar to how importlib_metadata
is the 3rd party API but importlib.metadata
is the stdlib API. A bigger problem would be backwards compatibility: unless tomli packages API matches the toml API shipping a new module might break some applications when the import resolves from the standard library rather than the 3rd party package.
I think the only contentious point here compared to importlib.metadata
is that that package is maintained by a core developer, where this is not. Something that could be solved by accepting the maintainer ( @hukkinj1) as a core developer; which would make sense as he would maintain part of the standard library. Alternatively could also be a solution to convince
an existing core developer to become a co-maintainer for the tomli
package, if @hukkinj1 agrees.
A bigger problem would be backwards compatibility: unless tomli packages API matches the toml API shipping a new module might break some applications when the import resolves from the standard library rather than the 3rd party package.
Yep this is a problem. The APIs are very similar but there’s a few differences where I’m unfortunately not interested at all in matching toml
API and do think it would be a mistake to add the toml
API to the standard library.
I’ll try to list the key differences and reasons why toml
API is not always great.
-
toml.load
takes as input one of the following types: a text file object, pathlib.Path, a list of pathlib.Paths, or string (representing filepath).In contrast
tomli.load
only takes binary file objects as input.Accepting the various data types that
toml
does is a problem because:- it is unlike the behavior of any other
load
function in the standard library - Accepting many types makes for code that is hard to read. My first thought when I see
toml.load("path_to/conf_file.toml")
is always “that must be a TypeError, one shouldopen
the file first” - accepting
list[pathlib.Path]
is just needless IMO, and whatever problem it solves should be trivial to solve by the consumer of the library - accepting a text file object (instead of binary file object) is the easiest footgun ever, because correctly parsing TOML requires setting arguments as follows
open(path, encoding="utf8", newline="")
. Omitting one of these two arguments or using other values runs the risk of incorrect parse results. TOML, specifying file encoding and valid newline sequences among other things, is simply a lot stricter format than what a text file object represents.
- it is unlike the behavior of any other
-
toml.load
andtoml.loads
accepts a_dict
keyword argument for parsing TOML tables to other mapping types thandict
. In contrast,tomli
has not such keyword argument.It’s not exactly clear what the value of using other type than
dict
here would be, but this sure seems like an easy way to introduce bugs. And alsoload
objects that raiseTypeError
whendump
ed. -
toml.load
andtoml.loads
accept adecoder
keyword argument for customizing decoding. The decoder must implementtoml.TomlDecoder
interface.tomli
doesn’t have any of this.It seems this is mostly useful for comment preservation, which I don’t want a poor implementation of. Also, the
toml.TomlDecoder
interface / base class with its 9 public methods seems a bit messy, not something I’d want to recreate or support. -
toml
uses and exposes customtoml.tz.TomlTz
timezone objects. In contrasttomli
usesdatetime.timezone
s from the standard library. -
toml
raisesTomlDecodeError
s whiletomli
raisesTOMLDecodeErrors
. The casing thattoml
uses conflicts with PEP8 and standard library conventions. -
toml
includes the whole encode/dump API whiletomli
does not. This is probably the most breaking difference out of all of these.
So yeah there’s actually quite many differences considering how small the APIs are. Not sure I even have everything listed here.
In conclusion, if it’s required to match toml
API perfectly, then I think I prefer to not add tomli
to the standard library.
I think the only contentious point here compared to
importlib.metadata
is that that package is maintained by a core developer, where this is not. Something that could be solved by accepting the maintainer ( @hukkinj1) as a core developer; which would make sense as he would maintain part of the standard library. Alternatively could also be a solution toconvince
an existing core developer to become a co-maintainer for thetomli
package, if @hukkinj1 agrees.
I don’t really have a problem with either (or both) of these approaches.
I mostly agree with the design choices.
Based on this, paths ahead:
- Adopt the
toml
library, as guarantees backwards compatibility. - Adopt the
tomli
library as is and use the tomli namespace. - Adopt the
tomli
library under thetoml
id and put it under some namespace (similar to how importlib.metadata is under importlib). There isn’t any good place I see at the moment though.
So I’m personally in favour of 2. Either way, this would likely require a PEP, @hukkinj1 I’m open to writing that up if you co-sign it, probably the least controversial path would be to find a core developer that’s willing to co-maintain the library to also sign-off on it. Perhaps @pf_moore might be willing to help out here.
We cannot do (3). It would break all software that uses the toml
package.
We can always use a different name for a toml parser in the stdlib module, e.g. tomllib
(like pathlib
or contextlib
) or tomlparser
(like configparser
). This would prevent any conflict with upstream projects.
Just to clarify, I was under the impression that the proposal of adopting ‘tomli’ also included ‘tomli_w’ to write files. Is that the case?
(In the past I heard people in the packaging community talking about backends writing TOML files, e.g. for core metadata or refined versions of pyproject.toml with dynamic field resolved. I believe that including a write brings substantial benefit)
Aside: I’d guess the motivation was to allow preserving order (by using collections.OrderedDict
), before regular dicts preserved order (implemented in 3.6, documented as part of the language for 3.7). The json
module has a similar parameter. I agree that tomli shouldn’t need this.
I think this is still an open question. I’m inclined to leave writing out at first, because it involves a whole lot more design choices, and thus more discussion. Writing TOML files isn’t (currently) fundamental to packaging like reading them is, so it’s easier to use a TOML writer library as a regular dependency installed from PyPI.
I might avoid tomlparser
as a name even so, because I imagine we might want to add the write part later on. But configparser
can write INI files, so it wouldn’t be a big problem if tomlparser
also wrote TOML.
It wouldn’t be.
I don’t think it would break all toml
users, but it is a reason to not go with that name in the stdlib.
Not necessarily.
At this point I think a PEP is necessary. I can sponsor it but I don’t have the bandwidth to be an author on it. I would review here and the issue on bugs.python.org potential concerns to address in the PEP. And to be clear, this would be a standards-track PEP and not a packaging PEP, so that means it will be going to python-dev and the SC.
FWIW, I think it would make sense to make changes to toml
on PyPI (potentially even some backwards incompatible on), to get it to a point where including it in the standard library would be sensible (getting it to parse + dump TOML 1.0.0, have it better match the json
API and so on). It could certainly be disruptive and backwards incompatible; but the work for that can be undertaken before considering adding it to 3.11 or 3.12.
I’ve not had the time to do this and it would effectively be the same idea as @bernatgabor’s case (3) with all the disruptive work happening outside of the Python standard library; and prior to considering addition into the standard library.
I think the end state here is a much better one though: There’s a single toml
package (likely based off of tomli’s current implementation) that evolves to a stage where including it in the Python standard library is a straightforward thing to do.
PS: This is obviously contingent on getting the current author for the toml
package on PyPI on board for doing this.
Hmm firstly, I feel like people are systematically misunderstanding @bernatgabor’s case 3.
I don’t think they intended to break/steal the import toml
namespace but rather use a name something like import parser.toml
(or from parser import toml
if you prefer).
Great, if there was something you disagree with I’d be happy to hear (but perhaps better take it to tomli’s issue tracker).
I’d be happy to help and co-sign!
Would it make sense to name squat tomllib
and tomlparser
just in case we end up wanting to use one of them?
I agree 100%. The case for TOML parsing is pretty easy to justify solely based on the fact that it fixes packaging/bootstrapping circular dependency madness. The case for writing is not nearly as clear.
Yeah this would be great and definitely have the nicest end state. I’m curious, how much would you be willing to break uiri/toml
? For instance, would you remove write capability? If not then we end up having the debate whether writing belongs in the standard library etc…
I’d be willing to support this proposal, but I’m cautious about offering to co-maintain, as I’m likely to be pretty busy over the next few months, so I don’t want to commit to too much. Longer term, maybe.
I’ve looked at tomli
, I agree with its minimalist+strict philosophy, and I believe I’d be able to maintain it.
@hukkinj1, if you want to do the heavy lifting of integrating tomli into the stdlib and maintaining it there (probably along with a backport on PyPI, à la importlib_resources, I can co-maintain (i.e. advise, merge your PRs, and take over in the worst-case scenario of you disappearing).
One thing that worries me is how future versions of TOML will be supported. There’s precedent in e.g. json
and pickle
, but it’ll need to be in the PEP, so everyone can agree on it.
Yes. I’d pick one, and when there’s a PEP draft, ask @dustin
to reserve it. (Please don’t squat by uploading a placeholder.)
That has a major disadvantage: it would break anyone using a pinned version of toml
.
Has there ever been any discussion about adding a module named formats
or something similar to the stdlib, that could be used for these kind of encoding library promotions? It would make backward compatibility easier since fewer new package names would need to be “taken over” in the future, and allow for simple format-to-module-name conventions. So for example if a YAML parser would be added later it could simple live in formats.yaml
, next to formats.toml
. I guess encodings
is the most similar name that could be reused.
I guess the downsides would be “Flat is better than nested”, although “Readability counts”, and it would make it easier for a new user to check for a list of available serialisation formats. And there are a gazillion other parsers for other formats (.json, .zip, .eml, .csv, .html, .cfg…), would helpers for them be added as well?
Guess I’ve answered my own question, but I just wanted to mention the idea if it is of any help to anyone.
Unfortunately, formats
is already taken:
It’s unclear whether this package is actively maintained or has ever been widely used.
-Fred
Great. Yeah I can do stdlib integration, maintenance, and backport. I probably won’t start until we’ve drafted a PEP.
There’s some work already made in this Tomli issue I’ve tagged you there!
Just for clarity, to ensure everyone is aware, uiri/toml
has been completely unmaintained for well over a year now, but it seems that @pradyunsg may possibly be able to get the name transfered. As a side note, @hukkinj1 , if this will be an officially blessed and supported project, it might be a good idea to learn from that experience, minimize bus factor and follow responsible practices by ensuring it is GitHub/GitLab/etc org, and has multiple maintainers on PyPI, to avoid a single point of failure, which we indeed was the case with uiri/toml
. Though, that might only be relevant for the backport/upstream, if there is no longer an independent, maintained hukkin/tomli
repo and tomli
PyPI project at all (not sure what your plans are there).
I don’t think this is relevant personally because I don’t see a way how we’ll not break at least some people if we start having a stdlib and a 3rd party library under the same name.
If people don’t specify their dependencies appropriately then you’re right that some people will break. But if people use python_version
markers and such appropriately it works fine (or just choose to always use the PyPI version thanks to the stdlib being later on sys.path
).