Add json.ExtendedEncoder to the standard library for common types (datetime, UUID, Decimal, set)

Introduction

Python’s built-in json module is a vital tool, but it currently lacks native support for several foundational types within Python’s own standard library. Nearly every modern application relies on datetime, uuid.UUID, decimal.Decimal, and set, yet passing any of these to json.dumps() results in a TypeError.

This forces developers into a repetitive cycle of rewriting custom JSONEncoder subclasses, copying boilerplate snippets from Stack Overflow, or importing heavy external libraries just to handle basic serialization.

I propose adding an opt-in json.ExtendedEncoder to json/encoder.py. This provides a standard, “batteries-included” way to serialize these common types without introducing any backward-compatibility breaking changes to the default behavior.

The Proposed Class

import json
from datetime import datetime, date, time
from decimal import Decimal
from uuid import UUID

class ExtendedEncoder(json.JSONEncoder):
    """
    An opt-in JSONEncoder that natively handles common Python standard library 
    types like datetime, UUID, Decimal, and set.
    """
    def default(self, obj):
        if isinstance(obj, (datetime, date, time)):
            return obj.isoformat()
        if isinstance(obj, UUID):
            return str(obj)
        if isinstance(obj, Decimal):
            # Serialized as string to prevent IEEE 754 float precision loss
            return str(obj)
        if isinstance(obj, (set, frozenset)):
            try:
                return sorted(list(obj))
            except TypeError:
                # Mixed-type sets (e.g. {1, 'a'}) cannot be sorted reliably.
                # We acknowledge this fallback is imperfect; an alternative
                # would be raising TypeError with a clear message. Open to
                # discussion on which is preferable.
                return sorted(list(obj), key=str)
        return super().default(obj)

Usage Example

import json
from datetime import datetime
from decimal import Decimal
import uuid

data = {
    "transaction_id": uuid.uuid4(),
    "amount": Decimal("100.50"),
    "timestamp": datetime.now(),
    "tags": {"finance", "api"}
}

# Clean, safe, out-of-the-box serialization
json_string = json.dumps(data, cls=json.ExtendedEncoder)

Why This Belongs in the Standard Library

  1. Zero Regression Risk: Because this is strictly opt-in (cls=json.ExtendedEncoder), the default behavior of json.dumps() is left entirely untouched. Performance and compatibility for existing codebases remain identical.

  2. Upstreaming Existing Ecosystem Consensus: We wouldn’t be inventing new standards. Major Python frameworks have used this exact logic for over a decade: Django’s DjangoJSONEncoder handles datetime, UUID, and Decimal; Flask/Werkzeug natively serializes datetime to ISO-8601; orjson and msgspec gained massive adoption partly by baking these exact defaults into their core engines.

  3. Fulfilling “Batteries Included”: It is a poor developer experience when native standard library siblings like json and datetime cannot communicate without boilerplate wrapper code.

Anticipated Design Critiques & Defenses

1. Why draw the line at these four types? What about bytes, pathlib.Path, or Enum?

We apply a strict rule: only serialize types that map cleanly to an unambiguous internet-standard primitive.

  • bytes is excluded: JSON is a text format. Converting bytes requires choosing an encoding (Base64, Hex, UTF-8) no single right answer exists.
  • pathlib.Path is excluded: Path strings change structurally across platforms (\ vs /), introducing cross-platform bugs.
  • enum.Enum is excluded: Preferences vary too widely between .name and .value.
  • The included types: datetime maps to ISO-8601 (RFC 3339); UUID maps to its canonical 36-character string (RFC 4122); Decimal serializes as a string to prevent IEEE 754 precision loss.

2. Won’t serializing set to a list cause non-deterministic behavior?

ExtendedEncoder enforces sorting via sorted(list(obj)) to guarantee deterministic output. For mixed-type sets that can’t be sorted naturally, it falls back to key=str. We acknowledge this fallback is imperfect and are open to the alternative of raising a TypeError with a descriptive message instead, happy to discuss which behavior the community prefers.

3. Why not keep this on PyPI?

The standard library should provide standard solutions to universal standard library problems. Requiring a pip install to convert a datetime to JSON forces fragmentation for a problem with decade-long industry consensus.

The implementation itself is small roughly 20–30 lines in json/encoder.py plus tests. I’d love to get the core team’s thoughts on whether a PR is something the steering committee would be open to reviewing.

5 Likes

+1 from me, with one comment: sorted(list(obj)) introduces non-obvious performance penalty. Is being non-deterministic good enough reason to add sorting?

4 Likes

I’ve run into this issues too.

I think on the balance it’s worth including bytes, Path and StrEnum too, but just having a buildin way to encode datetime would be a big win for me. And those 3 could always be added later?

1 Like

@pawelswiecki Fair point on the sorting overhead. The tradeoff is:

- Sort: deterministic output (same set → same JSON every time), but O(n log n)
- Don’t sort: O(n), but non-deterministic output (can break caching, diffs, tests)

For small sets (which is the common case in configs/API responses), the performance difference is negligible. For large sets, the user probably shouldn’t be serializing them to JSON anyway.

1 Like

@peterc I totally get it bytes, Path, and StrEnum definitely trip a lot of people up too.

My main hesitation with throwing them in right now is just keeping things simple for the first launch. For instance, with bytes, there isn’t one obvious way to handle encoding should it be Base64, Hex, or UTF-8? No matter what we pick as a default, it’s going to frustrate someone. I’d suggest it’s safer to ship the types everyone agrees on first.

Path is another tricky one because platform-dependent separators tend to introduce annoying cross-OS bugs in JSON. As for StrEnum, that one is definitely easier, but I really want to get the “core four” merged before we start expanding the scope.

That said, if everyone agrees we should just include StrEnum now, I’m totally open to it since it maps so cleanly to a string anyway.

My instinct is that for a stdlib feature, there’s a lot more need to add configuration knobs. If the user is writing their own encoder, they can include whatever they want, but for a stdlib supplied encoder, it’s use it or don’t. To give an example, some people might be OK with the sorting overhead for sets. Others might not. Or might not even want sets to be encoding, preferring an error to catch unexpected data - while still wanting to encode datetime values.

Obviously, we don’t want some frankenstein monster of a class with a multitude of config options. That’s no easier to use than writing your own. But if there’s no flexibility, people will simply end up having to write their own encoder and nothing is gained.

What’s the current state of the art here? I know there’s libraries like cattrs, which are more full-featured than what’s being proposed here, but can we learn anything from them? How often do people write their own encoders at the moment, and what types do they typically cover? Do they need decoders as well as encoders, and if so how do they work?

The proposal seems reasonable, but I think it needs a bit of development to make sure it covers actual real world use cases (that’s why some research to establish what those use cases actually are would be helpful as well).

If you just had an encoder but no decoder, is that enough? I’m having trouble understanding how being able to write extra types, but having no means to read them, is useful.

6 Likes

Great questions, really appreciate the detailed feedback.

  • On cattrs: It’s the most sophisticated solution out there it handles both structuring (decode) and unstructuring (encode), and has preconfigured converters for orjson, msgspec, and others. The tradeoff is it’s a full framework, not a simple drop-in for json.dumps.
  • On the decoder question: This is the honest gap in my proposal. Decoding is fundamentally harder a JSON string “2024-01-01” is ambiguous without a schema or type hints. cattrs solves this by requiring you to declare target types explicitly. I don’t think ExtendedEncoder should attempt decoding for that reason, but I’m open to discussion.
  • On configuration: Your point is well taken. One option worth considering is making ExtendedEncoder subclass-friendly by design, so users can override just the behaviors they don’t want, rather than adding config knobs to the class itself.On real-world usage: Django and Flask have shipped essentially this class for ~10 years with no configuration options and minimal complaints. That suggests the opinionated defaults are largely acceptable for the common case.

Happy to do more research on actual usage patterns if that would help move this forward.

When you’re playing around with interactive python or in a Notebook, there’s usually options to load from a json, then notice the dates are strings, then try_parse_dates() or something in that direction, and it doesn’t feel like it costs much time or interrupts the flow.

In contrast hitting the JsonCantHandleDateTimeError often feels like hitting a rock.

2 Likes

I do a lot of work deserializing binary data into python dicts. It’s helpful to be able to output that data to JSON, for debugging and auditing. Sometimes the formats are complicated enough to warrant a full on custom encoder, but oftentimes I do get annoyed needing to use one just for a simple UUID or datetime. I’m +0 overall on the idea, as it’s not THAT much of an annoyance, but it is one I come across.

  • DjangoJSONEncoder today:

    • datetime.datetimeisoformat(sep="T" … +“Z”
    • datetime.dateisoformat()
    • datetime.timeisoformat(...)
    • datetime.timedelta → …
    • decimal.Decimal, uuid.UUIDstr()
    • … (no set?)
  • DjangoJSONEncoder 10 years ago: Similar, various small changes (last yesterday)

  • flask.json.JSONEncoder 4 years ago:

    • datewerkzeug.http.http_date
    • decimal.Decimal, uuid.UUIDstr()
    • … (No datetime, decimal, set etc.?)
  • Removed 4 years ago?

I’m not that familiar with flask. Could you link to where they do this now?

It seems decimal.Decimal and uuid.UUID are the only really stable cases?

These days I use pydantic_core.to_json whenever I need something more powerful than the standard encoder.

It handles all the types mentioned above (see the full conversion table), with a few configuration knots.

A few thoughts on the points raised:

@duncathan encoder without decoder?
Yeah, this is write-only for now. The real pain is on the way out. APIs,
logs, debugging, caching. That’s where I keep writing the same 10 lines
over and over. A decoder would be nice someday, but ISO-8601 strings are
ambiguous (is that a datetime or a date?) and UUIDs need context. Encoder
first, decoder can be a separate conversation. It solves the 80% case.

@petersuter Django/Flask history
Good research. Django’s version has been stable for ~10 years with
basically no complaints datetime, date, time, timedelta, Decimal, UUID,
all via str() and isoformat(). That’s the strongest argument I can think
of: if one of the biggest Python frameworks has shipped this exact code
for a decade, the design is proven.

Flask’s story is messier, I won’t pretend otherwise. But Django’s
longevity is the data point I’d lean on.

On set Django doesn’t include it. I threw it in because set-to-JSON
comes up surprisingly often in API responses and configs. But I’m not
married to it. If the committee wants narrower scope for v1 (datetime,
UUID, Decimal only), that’s totally fine.

But without a decoder, how does it solve the 80% case? You can write the data out, but then you can’t do anything with it. The encodings you used aren’t standardised, so non-Python tools won’t recognise them, and without a decoder, nor will Python.

I’m pretty certain that far more than 20% of cases where someone writes out JSON data, it’s expected to be read by someone.

2 Likes

No need to pretend, because this just isn’t true. It really reads like you’re using an LLM to generate your responses here. Stop doing that.

Flask’s is here, the interface has changed but not the use case. We (and Django etc) dump some additional types, because otherwise we constantly get people who are confused why they can’t output common Python data in their API response. They don’t like being told that JSON doesn’t support that data and they need to decide what to do.

That said, if I was designing this now, I wouldn’t add a custom JSON serializer, it just complicates things. The answer should always be “use a data serialization library such as Marshmallow, Attrs/Cattrs, Pydantic, etc to produce data the exact way you expect your users to consume it.” Better to have users understand their data vs their API and the design of that, rather than relying on unstandardized decisions.

3 Likes

@pf_moore That’s a fair point on the decoder. When I said “80% case” I meant local developer ergonomics rather than system interchange. I’m thinking of those times you just need to dump a config file, log some structured data, or throw a debug dict into the terminal at 11 PM. In those cases, the consumer is just me or my own app, and I’m just trying to avoid writing the same boilerplate custom encoder for the hundredth time.
If the consensus is that a decoder is a hard prerequisite for this to make sense in the stdlib, I’m open to expanding the scope to cover both. But the friction I was trying to solve is really just that quick-and-dirty, local serialization step.
@davidism Apologies for the sloppiness on the Flask point, you’re totally right. I didn’t check the current source before replying and shouldn’t have thrown that out there without verifying first. That’s my bad.
On the bigger point about Pydantic or Cattrs, I agree they are the right move for serious apps. I just find they feel a bit heavy when you’re working on a tiny script or a quick tool and just want json.dumps to work out of the box with basic types.
But maybe that friction isn’t a big enough problem to justify changing the standard library, which is totally fair. I just wanted to pitch the use case and see what the consensus was.

The stdlib shouldn’t be designed around these use cases. And json.dump is about outputting data to be consumed by another system, it’s not a pretty printing library.

Sorry, but I don’t think this is a useful addition. Write your own decoder and put it in a utility library (or in your PYTHONSTARTUP file, if you prefer) if you want to make json.dump more useful for adhoc debugging.

Of course it exists, and it’s fortit… utf-8. It’s the default.

I could agree with [Pure]Path, but not with [Pure]WindowsPath and [Pure]UnixPath. If you want your decoder, you can create and use it as always.

What about adding them by default to json? Note that from 3.15 you can import datetime etc lazily, so the performance will be the same.

About compatibility, I’m not sure. There could be code when you added a try-except, and instead of failing with these new types, it will pass. And maybe this is not what you want.

I don’t know what was done in the past. A new type was ever added to json? If the answer is “No”, so ExtendedEncoder is the way :slight_smile:

It’s what we have now with any type supported by json: you can use it with them as it’s shipped or you pass a custom encoder/decoder.

json already have a lot of params:

Will a yet another param for sorting sets be useful? Dunno. It already exists sort_keys for dicts for the same reasons Uwhuseba wants it. But how many people want to sort sets?

I suppose that, if this proposal will be ever accepted and if there’s really a compelling need, a lot of people will ask for it later. Anyway, I’m -1 on sorting sets by default.

Of course it exists, and it’s fortit… utf-8. It’s the default.
[/quote]

JSON requires text. If your byte stream is UTF-8, then you should be using it as text. The question is what to do with bytes that AREN’T text.

1 Like

? Why? It depends on what you’re using. A socket for example returns bytes. A bytearray can be text that you want to change without speed and memory penalties.

bytes.decode() by default uses utf-8, and that’s what pydantic does, as Loic shared. If you want to use something else, as base64, you can use your encoder.

But then it wouldn’t be JSON, it would be, err PythON, or Pythonon, or PYON lol /s

I’m -1 on this because it seems like bloat with too many special cases, and creates a special version of JSON that isn’t portable between languages.

Previously I’ve used dataclasses and a 3rd party module dacite to instantiate them from JSON but the boilerplate is big and I went back to just using the raw Python dicts / lists. Also allowing a JSON file to self-declare its schema, ie what function should be applied to its data to transform it into eg a pathlib.Path seems like a big security risk.

The only thing I’ve ever semi-wanted from stdlib JSON module is the ability to read JSON files with comments. Otherwise I’m a JSON >> TOML > YAML person.

The JSON file itself should be a UTF-8 encoded text file, but within which we may wish to store arbitrary bytes as you say, but sub-encoded using a subset of UTF-8 codepoints (which for internationalization / simplicity usually means in the lower 128 of UTF-8 aka ASCII).

I’ve used ASCII hex digits before being lazy (information rate of 4/8)

>>> ''.join(f'{c:02X}' for c in b'Hi\0')
'486900'

but base64 also exists (information rate of 6/8) if you’ve agreed on what the extra two symbols are going to be (RFC 4648 or its websafe suggestion; YouTube urls use websafe -_, some people use the raw +/)