Expose SQLite's `sqlite3_recover` API in `sqlite3`?

SQLite 3.40.0 (November 2022) introduced a public recovery API for recovering data from corrupted databases: sqlite3_recover_init(), sqlite3_recover_step(), and sqlite3_recover_finish().

The sqlite3 standard library module already wraps somewhat comparable C APIs, like Connection.backup() wraps sqlite3_backup_* or Connection.blobopen() wraps sqlite3_blob_*, although maybe none of these are as involved as the recover extension.

For recovery, users are currently required to shell out to the sqlite3 CLI binary, which is not always available. A GitHub serach for "subprocess.run" "sqlite3" ".recover" demonstrates that Python users often use subprocess to access this functionality and often have dedicated exception handling for when sqlite3 binary is absent.

Having the recovery functionality accessible from Python bindings would make downstream code more robust. Currently in Jupyter we often just drop the databases and recreate on error. Being able to recover some of the content would make it much more user friendly.

The challenge here is twofold:

  • the recovery extension is not bundled in the default sqlite3.c amalgamation, although a dedicated expanded sqlite3r.c amalgamation target exists since SQLite 3.41.0
  • I do not know of other bindings that would ship this extension (although both Perl and Rust binding libraries were keen to explore it, the timelines did not align with when the feature was made available by SQLite)

What could be a path forward here? Is it still too early for Python to consider adding this extension?

Relevant SQLite versions:

  • 3.29.0 (2019-10-04) added .recover CLI option
  • 3.40.0 (2022-11-16) added the recovery extension
  • 3.41.0 (2023-02-21) added the sqlite3r.c amalgamation that includes the recovery extension.

Relevant discussions outside of Python:

I was encouraged to post here, coming from Expose SQLite's `sqlite3_recover` API in `sqlite3`? · Issue #149735 · python/cpython · GitHub .

1 Like

This does look like it should be provided in a third-party tool, rather than the standard library. I 100% agree with Simon Slavin’s 2022 post you linked.
You say you drop & recreate the DB; that’ll always be way more robust than the recovery API.

3 Likes

Do you mean as in the success rate will be higher? I plan to keep the drop & recreate logic as a fallback for when recovery does not work. Also, the databases involved are really trivial (1 table).

No, I mean you can’t trust the data in a recovered database. Look at the documented limitations of the recovery API:

  • Some content might be permanently deleted and unrecoverable.
  • Previously deleted content might reappear.
  • Recovered content might be altered.
  • Constraints may not be valid after recovery. (This does not apply to you, but is relevant if you’re asking to add stdlib API.)
  • Content might be moved from one table into another. (Same.)

Essentially, after recovery you need to check the result, and correct it somehow. That seems impossible to automate.

3 Likes