Adopting/recommending a toml parser?

I mostly agree with the design choices.

Based on this, paths ahead:

  1. Adopt the toml library, as guarantees backwards compatibility.
  2. Adopt the tomli library as is and use the tomli namespace.
  3. Adopt the tomli library under the toml id and put it under some namespace (similar to how importlib.metadata is under importlib). There isn’t any good place I see at the moment though.

So I’m personally in favour of 2. Either way, this would likely require a PEP, @hukkinj1 I’m open to writing that up if you co-sign it, probably the least controversial path would be to find a core developer that’s willing to co-maintain the library to also sign-off on it. Perhaps @pf_moore might be willing to help out here. :blush:

2 Likes

We cannot do (3). It would break all software that uses the toml package.

We can always use a different name for a toml parser in the stdlib module, e.g. tomllib (like pathlib or contextlib) or tomlparser (like configparser). This would prevent any conflict with upstream projects.

3 Likes

Just to clarify, I was under the impression that the proposal of adopting ‘tomli’ also included ‘tomli_w’ to write files. Is that the case?

(In the past I heard people in the packaging community talking about backends writing TOML files, e.g. for core metadata or refined versions of pyproject.toml with dynamic field resolved. I believe that including a write brings substantial benefit)

1 Like

Aside: I’d guess the motivation was to allow preserving order (by using collections.OrderedDict), before regular dicts preserved order (implemented in 3.6, documented as part of the language for 3.7). The json module has a similar parameter. I agree that tomli shouldn’t need this.

1 Like

I think this is still an open question. I’m inclined to leave writing out at first, because it involves a whole lot more design choices, and thus more discussion. Writing TOML files isn’t (currently) fundamental to packaging like reading them is, so it’s easier to use a TOML writer library as a regular dependency installed from PyPI.

I might avoid tomlparser as a name even so, because I imagine we might want to add the write part later on. But configparser can write INI files, so it wouldn’t be a big problem if tomlparser also wrote TOML.

1 Like

It wouldn’t be.

I don’t think it would break all toml users, but it is a reason to not go with that name in the stdlib.

Not necessarily.

At this point I think a PEP is necessary. I can sponsor it but I don’t have the bandwidth to be an author on it. I would review here and the issue on bugs.python.org potential concerns to address in the PEP. And to be clear, this would be a standards-track PEP and not a packaging PEP, so that means it will be going to python-dev and the SC.

1 Like

FWIW, I think it would make sense to make changes to toml on PyPI (potentially even some backwards incompatible on), to get it to a point where including it in the standard library would be sensible (getting it to parse + dump TOML 1.0.0, have it better match the json API and so on). It could certainly be disruptive and backwards incompatible; but the work for that can be undertaken before considering adding it to 3.11 or 3.12.

I’ve not had the time to do this and it would effectively be the same idea as @bernatgabor’s case (3) with all the disruptive work happening outside of the Python standard library; and prior to considering addition into the standard library.

I think the end state here is a much better one though: There’s a single toml package (likely based off of tomli’s current implementation) that evolves to a stage where including it in the Python standard library is a straightforward thing to do.

PS: This is obviously contingent on getting the current author for the toml package on PyPI on board for doing this.

5 Likes

Hmm firstly, I feel like people are systematically misunderstanding @bernatgabor’s case 3. :grinning_face_with_smiling_eyes:
I don’t think they intended to break/steal the import toml namespace but rather use a name something like import parser.toml (or from parser import toml if you prefer).

Great, if there was something you disagree with I’d be happy to hear (but perhaps better take it to tomli’s issue tracker).

I’d be happy to help and co-sign!

Would it make sense to name squat tomllib and tomlparser just in case we end up wanting to use one of them?

I agree 100%. The case for TOML parsing is pretty easy to justify solely based on the fact that it fixes packaging/bootstrapping circular dependency madness. The case for writing is not nearly as clear.

Yeah this would be great and definitely have the nicest end state. I’m curious, how much would you be willing to break uiri/toml? For instance, would you remove write capability? If not then we end up having the debate whether writing belongs in the standard library etc…

1 Like

I’d be willing to support this proposal, but I’m cautious about offering to co-maintain, as I’m likely to be pretty busy over the next few months, so I don’t want to commit to too much. Longer term, maybe.

I’ve looked at tomli, I agree with its minimalist+strict philosophy, and I believe I’d be able to maintain it.
@hukkinj1, if you want to do the heavy lifting of integrating tomli into the stdlib and maintaining it there (probably along with a backport on PyPI, à la importlib_resources, I can co-maintain (i.e. advise, merge your PRs, and take over in the worst-case scenario of you disappearing).

One thing that worries me is how future versions of TOML will be supported. There’s precedent in e.g. json and pickle, but it’ll need to be in the PEP, so everyone can agree on it.

Yes. I’d pick one, and when there’s a PEP draft, ask @dustin to reserve it. (Please don’t squat by uploading a placeholder.)

That has a major disadvantage: it would break anyone using a pinned version of toml.

2 Likes

Has there ever been any discussion about adding a module named formats or something similar to the stdlib, that could be used for these kind of encoding library promotions? It would make backward compatibility easier since fewer new package names would need to be “taken over” in the future, and allow for simple format-to-module-name conventions. So for example if a YAML parser would be added later it could simple live in formats.yaml, next to formats.toml. I guess encodings is the most similar name that could be reused.

I guess the downsides would be “Flat is better than nested”, although “Readability counts”, and it would make it easier for a new user to check for a list of available serialisation formats. And there are a gazillion other parsers for other formats (.json, .zip, .eml, .csv, .html, .cfg…), would helpers for them be added as well?

Guess I’ve answered my own question, but I just wanted to mention the idea if it is of any help to anyone. :slightly_smiling_face:

2 Likes

Unfortunately, formats is already taken:

It’s unclear whether this package is actively maintained or has ever been widely used.

-Fred

1 Like

Great. Yeah I can do stdlib integration, maintenance, and backport. I probably won’t start until we’ve drafted a PEP.

There’s some work already made in this Tomli issue I’ve tagged you there!

2 Likes

Just for clarity, to ensure everyone is aware, uiri/toml has been completely unmaintained for well over a year now, but it seems that @pradyunsg may possibly be able to get the name transfered. As a side note, @hukkinj1 , if this will be an officially blessed and supported project, it might be a good idea to learn from that experience, minimize bus factor and follow responsible practices by ensuring it is GitHub/GitLab/etc org, and has multiple maintainers on PyPI, to avoid a single point of failure, which we indeed was the case with uiri/toml. Though, that might only be relevant for the backport/upstream, if there is no longer an independent, maintained hukkin/tomli repo and tomli PyPI project at all (not sure what your plans are there).

I don’t think this is relevant personally because I don’t see a way how we’ll not break at least some people if we start having a stdlib and a 3rd party library under the same name.

3 Likes

If people don’t specify their dependencies appropriately then you’re right that some people will break. But if people use python_version markers and such appropriately it works fine (or just choose to always use the PyPI version thanks to the stdlib being later on sys.path).

1 Like

Well, assuming the version from PyPI is in site packages, that’s going to come after the stdlib on sys.path, not earlier.

1 Like

Right, sorry; was thinking of the current directory.

1 Like

Just throwing it out there since I haven’t seen it mentioned - has anyone checked out cktan’s tomlc99? It’s quite mature (from 2017) well maintained (0 open / 33 closed issues, same stats for PRs, last commit 5 days ago), TOML v1.0 compliant, and < 100kB source. It seems like it’s parse-only but I can’t tell for sure (this is a long long message chain and I’m not sure what the final decision was on that).

Anyway, seems like it would be easy to wrap into Python. I know the efficiency of C isn’t an absolute necessity for toml, but it’s also not a bad thing if pypa/stdlib toml will become the de facto standard. If the language keeps becoming more popular, it won’t be long until somebody is crunching files by the millions.

(Sorry if we’re past the point of this discussion - I tried reading the whole thread to catch up but gave up half an hour in)

1 Like

Welcome, Trevor!

Well, as you mention, its a C library, there aren’t existing Python wrappers or an API, its usage in current Python projects is (AFAIK) non-existent, and the CPython source C89 with limited C99 constructs, it would require:

  1. Adopting or hard-forking the library
  2. Rewriting it in C89
  3. Finding someone to maintain it
  4. Designing a Python API and writing appropriate bindings

TOML, at least as currently designed and deployed, is rarely appropriate for or used in particularly large-scale, highly performance-sensitive contexts, doing all this to essentially invent a new solution (and convincing dependent packages to switch) and retaining the overhead and difficulty maintaining it seems a lot less viable than adopting an existing, proven, widely-used solution, particularly for the much more stable standard library intended for already mature projects.

1 Like