Adopting/recommending a toml parser?

pip recently switched from pytoml to toml, and flit might well follow it. But I notice that toml has not had a release since 2018, and the pip PR I linked contains a discussion about a newer TOML syntax which Poetry would like to use, but pip would be unable to parse.

@dholth suggested that PyPA should take over maintenance of a toml parser. I think it’s a good idea to ensure there’s one package we can wholeheartedly recommend, because we’ve adopted the format in packaging standards (PEP 517 & 518). I think this should ultimately be a candidate for adding to the standard library, maybe once the TOML specification has reached version 1.0.

The three contenders that I’m aware of:

  • toml (github): last release October 2018, last commit January 2020. Newly vendored in pip.
  • pytoml (github: last release July 2019, last commit July 2019. Marked as unmaintained in the readme.
  • tomlkit (github): last release April 2020, last commit April 2020. Supports newer TOML features than the other two. Extra features to work with comments & layout, beyond simple parse/dump.

Tomlkit is clearly the most actively maintained, and it belongs to @sdispater, who’s also the author of Poetry. However, I imagine that the ability to preserve and manipulate style adds a considerable amount of complexity relative to a library which parses TOML and discards the style. I find it hard to imagine that being added to the standard library, for instance.

9 Likes

I will mention that once TOML hits 1.0 I plan to talk to python-dev about somehow getting a TOML parser into the stdlib. This might also lead to a reckoning over what the future of the stdlib should be. :grin:

9 Likes

I chose pytoml for enscons because its source code seemed easier to read than toml, I understand @takluyver had the same impression. We all chose TOML as a file format because you can have interoperable parsers. It’s not a lot of code.

Doesn’t @pradyunsg maintain one?

Reading through the feature list and implementation I personally find tomlkit to be the best implementation. IMHO ability to pass the roundtrip (read and then write equals the same file) should be offered.

I don’t think so. IIUC @pradyunsg is one of the core members of the TOML spec, but they don’t maintain a reference implementation.

2 Likes

Hi I just want to mention that I have wrapped around a C++ single header library to provide a fast TOML: parser in Python.

The repo is here: https://github.com/bobfang1992/pytomlpp

Thanks Bob. For my purposes, the ease of managing a pure Python package outweighs the performance benefit of a compiled parser, but it’s great to have a compiled option if people need that performance.

If pure speed is what one is looking for there’s https://github.com/samuelcolvin/rtoml too. I tried to compare pytomlpp to it, but did not manage to make pytomlpp work on MacOs.

Interesting, but toml++ uses C++17 – an extremely recent version of the C++ language – , while PEP 7 mandates the use of “C89 with several select C99 features”. So I’m afraid integrating pytomlpp into the stdlib would be a no-no, and would remain so for at least the next 10 years.

As for speed, well, the dominant use case for TOML is configuration files, and I don’t think those are usually performance-critical (unless you’re batch-processing thousands of them, perhaps :slight_smile: ).

3 Likes

Hi, thanks for introducing rtoml to me. Did not see this one before. I am trying to resolve the issue on Mac OS recently and once done I will provide a benchmark. Thanks!

Hey thanks for the reply. Yeah I never expected it will be part of stdlib, as it also requires pybind11 which is not part of stdlib. My aim is to provide an alternative route when perforemance is sensitive.

As for apps that care about toml parsing performance, I worked in a quant fund where toml is used for configuring trading strategies and we have thousands of them and when the app starts it reads in all of the toml configs which take quite a while. This is the mian montive for me to start this project in the first place. I agree this is a quite niche scenario but pytomlpp provides other benefit as well I think – fully compatiable with toml1.0rc1 and stick to only python native types for example.

Thanks!

Hi thanks! Yeah I agree this should not be part of the stdlib, but just thought if I mention it here then when people need a performant toml parser they can discover it!

What’s the easiest way for allowing the user to choose between packages which provide the same interface without code change? Is that possible? Is it a security risk?

This might be Provides-Dist, in theory only though, since there doesn’t seem to be any tool that implements it.

Discussed here, recently:

1 Like

There is now 0.10.1 from May 14, 2020.

It seems that date is getting close. As of October 7, there is a pre-release candidate (1.0.0-rc.3) for TOML.

Again, as the time approaches… I’m looking forward to seeing what @brettcannon is thinking on the topic of the standard library. While “batteries included” served us well for many years, we’re beyond it being an advantage now.

I would also like to here what @brettcannon thinks on the topic of the standard library. :wink: (Seriously, it fluctuates so I don’t even know where I’m going to land after the topic is discussed.)

4 Likes

I agree with this sentiment in general, but it really is hard for stuff like setuptools and pip to take on dependencies, which is why they vendor everything. Considering the universal nature of pyproject.toml, it would be very nice to be able to parse these things without taking on a dependency.

Of course, it’s not likely that we’ll drop support for Python 3.9 for many years from now, and maybe by then the need for pip and setuptools to vendor all their dependencies will have been considerably alleviated, so we’ll gain very little advantage to it.

That said, the other time I find it convenient to work without any dependencies are for little scripts where I essentially replace what would have been a bash script with a Python script — I tend to use only the standard library for this sort of thing, since it’s inconvenient to declare dependencies and build a virtualenv for single standalone .py files. I can imagine a TOML parser / emitter will increasingly be valuable for these sorts of things as well, so I’m +1 on including it, even from someone who is mostly in the “kernel / minimal python” camp.

10 Likes