New default/preferred dbm backend

Sixteen years ago in bpo-3783 @smontanaro suggested to add an SQLite backend for the stdlib dbm module. Some activity happened on the issue, but it stalled for various reasons. A little over a year ago, @rhettinger rebooted the discussion in gh-100414.

I created a PR for adding the dbm.sqlite3 backend, based on Raymond’s year-old patch.

Additionally, I want to make dbm.sqlite3 the preferred backend:

  • sqlite3 is available on pretty much all platforms, which means:
    • using the backend agnostic dbm.open("test.db", "c"), you can create a database on Linux, move the database over to your macOS or Windows box, and continue your database hacking there
    • shelve is available on all platforms, including Windows, with a backend the uses a standard format (an SQLite database)
  • dbm.sqlite3 is easy to introspect; we can use the stdlib sqlite3 CLI to examine the database contents
  • dbm.sqlite3 is faster than dbm.dumb

What do you think? I created a poll for committers only[1]. I’ll quote Raymond[2]:

Right now we support ndbm and gnu.dbm which might or might not be part of a given build. The fallback is the super slow dumbdbm. Not the sqlite3 is part of the standard build, we can do better.

The module docstring says:

Future versions may change the order in which implementations are tested for existence, and add interfaces to other dbm-like implementations.

The future is now. Let’s provide a fast, stable, robust, always available alternative.

Should dbm.sqlite3 be introduced as the default dbm backend in 3.13?
  • Yes, dbm.sqlite3 should be the default; the future is now :rocket:
  • No, the current default (dbm.gnu) should be used
0 voters

  1. sorry, triagers, there is no dedicated category for you :frowning: ↩︎

  2. Add sqlite3 as another possible backing store for the dbm module · Issue #100414 · python/cpython · GitHub ↩︎

Why is there no option for C?

Yes, dbm.sqlite3 should be the default in the next version. The future is the next version :slight_smile:

The value of the dbm.sqlite3 backend is greatly diminished if it is not made the default backend. Good user experience needs a good default, and a good default has the traits that @rhettinger mention[1].


  1. fast, stable, robust, always available ↩︎

I voted to switch to sqlite3 since I like it. It’s fast, reliable, portable, and shipped with Python whereas other backend are only available on some platforms.

I have some questions.

What do you mean by pretty much? As far as I know, it’s always available on all platforms, no? The sqlite3 documentation says nothing about “Availability” (only about specific features).

The fallback is the super slow dumbdbm. Not the sqlite3 is part of the standard build, we can do better.

I’m curious: is there a benchmark showing how slow “dumbdbm” is, and how fast sqlite3 is? :slight_smile:

The future is now. Let’s provide a fast, stable, robust, always available alternative.

Has dumbdbm known issues of losing data, that sqlite3 doesn’t have?

I heard that the “dbm” module has small limits for the maximum key and value size. What are these limits for the different backends, especially dumb and sqlite3?

1 Like

I was unsure about WebAssembly platforms :slight_smile:

Brett likes to ask “what is Python?”, “is Python with the REPL still Python?”, etc. :wink: Many stdlib modules are not available in the WASI build. I think that it would be reasonable that if sqlite3 is not available, it’s not the default dbm backend :slight_smile: (I don’t know if it’s available or not.)

1 Like

Sqlite3 is not a specialized key-value database, and we do not know how it will work in different circumfaces in comparison with specialized solutions. We suppose that it is better than dbm.dumb, so placing it just before dbm.dumb will benefit Windows users where alternatives are not available. When time pass and we have more data on hands, we can reconsider the priority.

Meanwhile, I would add possibility to change the priority of dbm implementations for dbm.open() and shelve.open() and/or specify the implementation for a new database. Currently it is only possible with hacking internal variables (and it is easy to do incorrectly).

5 Likes

I think I agree with Serhyi; it’s better to get some real-world experience with the new sqlite backend instead of immediately making it the default. So let’s first add dbm.sqlite3 in 3.13, and if it works well in practice, make it the default in 3.14 or 3.15.

3 Likes

As long as it goes before dbm.dumb in the priority order (so that dbm.open(filename, flag="n") creates a SQLIte database on Windows, when gdbm and ndbm are unavailable) I can live with that. Although I think the benefit of having the default be a backend that is available on all platforms (so that dbm and shelve files are cross-platform by default) shouldn’t be ignored, so I still prefer the “SQLite as the default” option.

5 Likes

I think we need a little more detail on the implications, especially around performance and semantics. Does SQLite force a fsync on every write? Are concurrent accesses safe? What about other backends?

2 Likes

I actually did not expect the Spanish inquisition! :slight_smile: Thanks for the discussion so far! I’m no longer sure about my own recommendation in the OP, which IMO is a sign that the discussion is good; thanks again.

Since @encukou didn’t say it yet, I can say it myself: perhaps this change[1] needs a PEP.


  1. both the introduction of a new (sub)module, and the change of the default dbm backend. ↩︎

No, it does not.

Yes, it is.

Gut take: Nobody who uses the lowly old dbm module cares about any of that. They already chose not to use a real database. Especially if they chose to let it pick an implementation.

I suggest what others have already suggested: Just put dbm.sqlite3 before dbm.dumb because it’ll always be better than that. Update the docs to pre-announce that the default will change in release .n+2, and we can reorder the preference list at that time.

5 Likes

I see two questions:

  • Should sqlite be the default eventually?

    IMO, yes. It’s available everywhere[1], consistent, has less limitations than common dbm implementations.
    If someone is relying on the characteristics of a particular dbm implementation, they should use that particular implementation, e.g. import dbm.gnu.

    That said, it might be nice to implement a backend’s “extra” API (like gdbm’s firstkey/nextkey, reorganize, sync, and “fast mode”), before moving past that backend in the priority list.

  • Should sqlite be the default now?

    I can see two reasons to delay:

    • Incompatibility with older versions of Python: databases created on 3.13 wouldn’t be readable on 3.12. If we delay changing the default for, say, 2 years, db’s created on 3.15 won’t be readable on 3.12 – which might make it easier to upgrade.
    • discovering real-world issues. Unfortunately, I doubt we’ll see much real-world usage until sqlite is the default.

  1. if you build CPython without SQLite, let’s assume you know what you’re doing ↩︎

4 Likes

That is not correct; dbm.open(path) calls dbm.whichdb(path) to determine which backend to use. You’d still be able to open a database created with dbm.gnu, even if the default changes. The default is only used when you’re creating a new database (flag="c" or flag="n").

The current scenario is:

  • Windows: only dbm.dumb is available; you can create a database using dbm.open(path, "n") and transfer the database file to any other platform and open it there
  • macOS (python.org installer): dbm.dumb and dbm.ndbm are available; you can create a database using dbm.open(path, "n"); ndbm will take precedence over dumbdbm; you can transfser the database til to another macOS or possibly a Linux system (depending on which backends are available)
  • Linux: dbm.dumb is available, possibly dbm.ndbm, and likely dbm.gnu (gdbm); you can create a database using dbm.open(path, "n"); gdbm will likely be chosen, so you cannot easily transfer the database to another platform.

If dbm.sqlite3 is made available, but not made the default, this will be the user experience:

  • Windows: create a database using dbm.open(path, "n"); the sqlite3 backend is selected; you can transfer your database to any other system => increased user experience
  • macOS: create a database using dbm.open(path, "n"); the ndbm backend is selected; no change in user experience
  • Linux: create a database using dbm.open(path, "n"); either gdbm or ndbm is selected; no change in user experience

If dbm.sqlite3 is made available and made the default, this will be the user experience:

  • Any platform: create a database using dbm.open(path, "n"); transfer the database file to any other system and continue hacking on it there => increase in user experience on all platforms

Regarding performance; as Greg says, if people are looking for a fine-tuned database, they’ll probably be using a real database where you can fine-tune it for whatever requirements your application have.

(So I’m still +1 for making sqlite3 the default backend. Let’s provide a consistent user experience.)

3 Likes

To clarify: yes, I’m talking about new databases created on 3.13. If they are SQLite by default, older whichdb won’t be able to detect them.
That is, the issue is with sharing a database between apps that use different Python versions.

1 Like

That is a valid point and a good reason not to make it default right now.

1 Like

Would someone actually use dbm for that?

1 Like

Good question. But if we don’t know the answer, IMO we need to default to yes.

I’ll admit I’ve never considered using dbm for anything before (I always assumed it wouldn’t work at all on Windows based on seeing the tests skip), but if I did it’d probably be as a quick and easy settings or cache file.

For an app like, say, Jupyter, that file is very likely to end up being shared between versions. But even more likely is in-house apps using it for that purpose. So I think on probability, it’s more likely to be “yes” than no.

The other question is what the behaviour would be when encountering a dbm file that can’t be opened. Presumably a user-provided file would just result in an error, but I’d expect a private app settings/cache file to be overwritten, assuming that the error means no file currently exists. Is that acceptable? I suspect not, with no warning, so I lean towards Greg’s proposal (put it above dumb for two releases, then put it at the top).

2 Likes