Hi all,
first-time poster here, happy to join the community .
A couple of days ago I found out dbm.dumb, which was the default backend of shelve until Python 3.13, does not free disk space when a key is deleted.
Last week I used shelve and ended up consuming hundreds of GB in insertions/deletions
Do you think contributing code that vacuums the DB in-place would be useful?
I am thinking:
- A .vacuum() method to be added to all submodules of dbm and to the shelve.Shelf class. shelve.Shelf.vacuum would simply call the .vacuum method of the underlying dbm object
- Vacuuming on dbm.sqlite3: runs “VACUUM” query
- Vacuuming on dbm.dumb: new method I can contribute which in-place copies used parts of the binary file and updates the index. The advantage is this won’t use more disk space while vacuuming, but if program is interrupted during vacuum, DB will be corrupted (note: this is the case for many dbm.dump operations already)
- Would do whatever appropriate for other submodules, or nothing is not needed (haven’t looked into them yet)
Additionally, I would like to update the documentation to highlight the disadvantages of dbm.dumb other than slowness. For now they are only comments in the source code and are hidden from developers reading the doc.
Looking forwards to hearing your thoughts!