Revisiting adding a writer to `tomllib`

FFY00 · September 5, 2023, 4:14pm

I have run multiple times into the lack of write functionality in tomllib, and would be willing to help merging and maintaining this feature.

When adding tomllib, an API to write TOML was not included, as explained by PEP 680. To consider adding this feature, the points raised by the PEP should be addressed, which I will do below.

Please let me know if you think my response to the points raised by the PEP is sufficient. If the general feedback is positive, I can go ahead and start drafting a new PEP.

cc @encukou @hukkinj1

Addressing the PEP 680 motivation for not including a writer:

The ability to write TOML is not needed for the use cases that motivate this PEP: core Python packaging tools, and projects that need to read TOML configuration files.

While the ability to write TOML is not often needed for the mentioned use-cases, there are actually situations where it would be useful (eg. Install a static installation description file as part of the Python installation · Issue #107956 · python/cpython · GitHub).

This functionality is also used by build backends, such as pdm-backend, where having it available in the standard library would make bootstrapping easier for downstream distributors, and key but not core packaging tools, such as flit and hatch.

Use cases that involve editing an existing TOML file (as opposed to writing a brand new one) are better served by a style preserving library. TOML is intended as a human-readable and -editable configuration format, so it’s important to preserve comments, formatting and other markup. This requires a parser whose output includes style-related metadata, making it impractical to output plain Python types like str and dict. Furthermore, it substantially complicates the design of the API.

This is true, however, having this functionality without style preservation would already be a huge improvement.

All the projects mentioned in my reply above (pdm-backend, flit, and hatch), for example, depend on tomli-w, which is not style-preserving.

Even without considering style preservation, there are too many degrees of freedom in how to design a write API. For example, what default style (indentation, vertical and horizontal spacing, quotes, etc) should the library use for the output, and how much control should users be given over it? How should the library handle input and output validation? Should it support serialization of custom types, and if so, how? While there are reasonable options for resolving these issues, the nature of the standard library is such that we only get “one chance to get it right”.

IMO if we were to add this feature, I’d like to go for a KISS approach, where we provide a simple implementation that can be extended, similarly to the json module.

Currently, no CPython core developers have expressed willingness to maintain a write API, or sponsor a PEP that includes one. Since it is hard to change or remove something in the standard library, it is safer to err on the side of exclusion for now, and potentially revisit this later.

That changed now, so we are on the mentioned “revisiting this later” step

ajoino · September 5, 2023, 5:05pm

I disagree that having a toml writer in the stdlib would be any sort of improvement. In my experience using tomli and later tomllib is that I typically never write any toml files (unless by hand), I only read them, much like the PEP suggests. When I make the decision between using toml or any other format, I use toml if it’s meant to be written by a human and other formats in otherwise. But this is a circular opinion, as the tomli/tomllib and tomli-w distinction heavily influenced me making that choice.

Furthermore, having the toml writer in the stdlib will only really matter in the cases where you want to write toml files, and where you have no other external dependencies. In my experience, any time I’ve considered having an auto generated configuration file the project already has a couple of dependencies, so adding tomli-w is not a concern. I guess this could be a concern in certain orgs, but probably not the ones you’ve pointed out.

In summary, don’t think the rationale is invalid now, at least not any more than it was when the PEP was accepted. Any reason to add a toml writer would still need to trump the given reasons not to. But personally, it wouldn’t change much for me if a minimal toml writer was added, as I wouldn’t use it.

However, I would use it if it did do style preservation, so count me in camp +0 if minimal +1 if maximal.

FFY00 · September 5, 2023, 5:54pm

Thanks for the reply.

Not wanting to sound harsh here, but I did provide objective examples on how and why having a TOML writer in tomllib would be an improvement.

It’s perfectly fine if you disagree on how much an improvement that would be, but I think stating it wouldn’t “be any sort of improvement” without addressing any of the examples to explain how this isn’t actually an improvement in those situations seems like a flawed argument. If I misunderstood what you were trying to say here please let me know

That said, I think your experience is probably on-par with most general users. The main beneficiaries would probably be packaging-related projects and downstream packagers, which can have their bootstrapping workflows for Python packaging tooling simplified.

Sure, though, this is often the situation with packaging tooling. Even when you could have external dependencies, it is very often undesirable. One of the main reasons for this is packaging tooling bootstrapping workflows.

Can you elaborate? Again, not wanting to sound harsh, but this seems like a weird thing to comment without any further elaboration. I am struggling to understand the motivation behind this comment.

Adding a writer with style preservation is a much bigger task. To make it clear, it is out of scope for now and not something I am planning on working on in the foreseeable future.

However, this proposal is not incompatible with that. Even though style preservation is not something I am currently planning to work on, nothing prevents us from adding it in the future.

ajoino · September 5, 2023, 6:36pm

I think the only person that possibly sounded harsh is me, I’m very tired today and probably should have kept from posting until tomorrow.

I’ll start with saying that I think the most important thing is that there is a maintainer willing to shoulder the work. I also thought some more on the minimalism vs maximalism aspect and I agree that total style-preservation is likely too complicated and fragile, but for me to consider using the writer comments would need to be preserved. But that’s just me, consider it just one datapoint

But to elaborate on the comment-preservation point, to me it seems that would require the output of tomllib.load to not be a regular dict but a custom object implementating the MutableMapping protocol. Not sure if that would be backwards compatible with what exists at the moment.

Now to answer your questions/comments in detail (beyond my comments above):

What I meant to say was that, to me, it’s a net neutral change, hence my objections to calling it a huge improvement. While your examples are valid data points showing that this feature would be used, I don’t see that as objective proof that this an improvement. I didn’t address any specific examples because I think my point was valid despite there being examples of projects using toml writers. I also tend to shy away from talking specifics in broader argument because the we tend to miss the forest for the trees.

In the end I think we both agree that a toml writer would be better for some and neutral for others, so the amount of improvement really is depedent on your perspective (only negative would be increased maintenance burden IMO).

I looked at poetry, pdm, flit, hatch, and setuptools and the only one not having any other external dependencies (at least as best I could tell) was setuptools, which is what I expected. I honestly don’t see how adding another dependency in this case is in any way bad, but I’d be happy to hear if I’m missing something.

My point was that in certain organizations it might be difficult to use projects on PyPI (like in a security-minded company), but afact the projects you pointed out are open source and should not have any problems in that sense. Just a throwaway comment really that I should have removed.

I totally agree that this proposal works as is for the use cases you presented, I just wanted to give one perspective of someone who wouldn’t use this writer (unless comments are preserved). Seems I failed on my first try so I hope this second one clarifies things.

Once again, I’m not opposed to the proposal and I hope my comments are helpful. If anything else is unclear just ask

FFY00 · September 6, 2023, 10:35pm

Thank you for the reply, it clarifies things

There are a couple different ways we could do it in a backwards-compatible way (eg. using a custom dict instance, requiring the user to opt-in to style-preserving in writer, etc.), so wouldn’t worry too much about it. This is just something that we should consider when we design the writer API, to make sure we make things easier for if/when we want to add this feature in the future, but shouldn’t be a big issue.

That’s because, with the exception of setuptools, the actual build backend bit is split into a different package (poetry-core, flit_core, hatchling, respectively).

flit_core, for example, has to vendor tomli ^[1] to make bootstrapping possible, which then requires some downstream distributors to have to build the package two times, one in with the vendored dependencies for bootstraping, and then one again to de-vendor the dependencies. All this requires heavy patching of the build system and is a major PITA.

There are many quirks when it comes to bootstraping, and I could spend all day here discussing it. While, as I mentioned above, not having a writer in tomllib is not super impactful, there are a couple situations where it would in fact simplify things.

Gotcha, I think I misunderstood your comment then

With tomllib now available on the standard library, flit_core was able to drop the tomli dependency, but still needs to vendor it for older Python versions. ↩︎

EpicWink · September 6, 2023, 11:18pm

For static metadata file generation, this proposal has to compete with multiple f-strings f"""...""" (of valid TOML) simply embedded in the generator. If there is dynamic or nested data being generated, then that’s a good argument for TOML serialising. Otherwise, there’s still the argument for build backends, but that’s weaker because they already successfully vendor a writer.

encukou · September 7, 2023, 12:08pm

The tomllib PEP addressed a particular bootstrap loop: you need a TOML reader to build a TOML reader library.
But, AFAIK, writing doesn’t have this loop. You don’t need a TOML writer to build a TOML writer library. And once you have that library, it can be used as a dependency. (And yes, dependencies are tricky, and sometimes you need to do things like the flit/flit-core split. But IMO, that’s solvable without stdlib help.)

For tomllib’s externally visible behaviour, there were about one or two design decisions that weren’t answered by doing the one most minimal correct thing.
A TOML writer, especially an extendable one, has many such decisions to make.

tomllib is done: it has a well-defined scope, and it fills it. By including it in stdlib, we’re actively discouraging the development of other TOML readers (except ones that expand the scope significantly – like those paired with a style-preserving writer).
I don’t think we want to do that for TOML writers.
(I recommend reading “The Dark Times” in Hynek’s post on attrs vs. dataclasses as a cautionary tale.)

JamesParrott · September 7, 2023, 1:32pm

I’m not familiar with any extra requirements for tomlib (the core module) over tomli, but I just want to add some hopefully useful input, and a shameless plug.

It’s surprisingly easy to merge tomli and tomli-w.

At least that was the case when forking, merging, and back-porting both of them (to Python2!) for toml-tools. I was delighted it still passed all the relevant tests. It was so little work, I left it all under Taneli’s copyright.

FFY00 · October 5, 2023, 1:32am

Yes, this is true. It would be an improvement of the current situation, rather than a necessity like the TOML reader.

Perhaps I am underestimating the problem, but from what I saw when I looked into this possibility, the required decisions didn’t seem that difficult (maybe I am just too used to working with packaging code ).

That’s a fair point. I considered it, and the situation didn’t seem bad enough, but again, this is all subjective

It seems like, at least for now, there isn’t a positive consensus regarding the inclusion of a TOML writer to tomllib, and I don’t think it makes sense for me to pursue it further for the time being.

Topic		Replies	Views
Adopting/recommending a toml parser? Packaging	102	17882	January 11, 2022
PEP 680: "tomllib" Support for parsing TOML in the Standard Library PEPs	96	8609	December 30, 2023
Why doesn't pip write installed packages to pyproject.toml? Packaging	70	4976	February 1, 2024
General discussion of some proposals I have for pyproject.toml extensions Packaging	15	1585	November 15, 2023
Seeking a consensus about the purpose and future of `pyproject.toml` Packaging	23	1686	December 5, 2023

Revisiting adding a writer to `tomllib`

Related Topics