Dbm module: add vacuuming

Hi all,
first-time poster here, happy to join the community :slight_smile: .
A couple of days ago I found out dbm.dumb, which was the default backend of shelve until Python 3.13, does not free disk space when a key is deleted.
Last week I used shelve and ended up consuming hundreds of GB in insertions/deletions :joy:

Do you think contributing code that vacuums the DB in-place would be useful?
I am thinking:

  • A .vacuum() method to be added to all submodules of dbm and to the shelve.Shelf class. shelve.Shelf.vacuum would simply call the .vacuum method of the underlying dbm object
  • Vacuuming on dbm.sqlite3: runs “VACUUM” query
  • Vacuuming on dbm.dumb: new method I can contribute which in-place copies used parts of the binary file and updates the index. The advantage is this won’t use more disk space while vacuuming, but if program is interrupted during vacuum, DB will be corrupted (note: this is the case for many dbm.dump operations already)
  • Would do whatever appropriate for other submodules, or nothing is not needed (haven’t looked into them yet)

Additionally, I would like to update the documentation to highlight the disadvantages of dbm.dumb other than slowness. For now they are only comments in the source code and are hidden from developers reading the doc.

Looking forwards to hearing your thoughts!

1 Like

As a note: despite Python 3.13 using sqlite3 as a default backend, which will help by allowing to import sqlite3 and vacuuming if necessary, this backend is not available for WASI, which will therefore still fallback to dbm.dumb !
I can only guess this is the reason dbm.dumb was not deprecated in Python 3.13

These sound like sensible ideas to me. Feel free to open issues on CPython with your ideas and start working on PRs. The biggest problem is probably going to be finding a core dev interested in working on dbm to merge your changes, but hopefully somebody will come along.

1 Like

I’ll be happy to review such a change. Feel free to ping me.

2 Likes

Good morning, just a heads-up that I have submitted my pull request, called gh-134004: Dbm vacuuming #134028. Feel free to review it if or whenever suitable for you, and thank you in advance :slightly_smiling_face:

1 Like