Large, low-maintenance database recommendation?

dstromberg · August 26, 2023, 11:56pm

I’m looking for a database library (probably a key-value store, or something like it).

I need it to scale to multiple terabytes, and I need it to be sturdy (no unrecoverable databases).

I’d like it to be:

low maintenance
fast
single file per table?
concurrent readers and writers
no daemon to fire up?
able to store nulls in keys and values would be nice

It doesn’t necessarily have to shard to multiple servers. In fact, I’d probably avoid a database library that required that.

I’m almost happy with anydbm+gdbm, but it’s a bit slow for large tables, and I don’t think it’s concurrent. And I’m almost happy with dbhash, but I’m not sure what maintenance is like for it, and I don’t think it’s concurrent either - and more importantly it seems to be moribund.

I know that shelve and Samba’s trivial db (tdb, not the same as text db) exist, but I have no experience with them.

The general trend seems to be away from single-table key-value stores of yore, and toward things that require a server process. Why?

Any suggestions?

Thanks!

Rosuav · August 27, 2023, 12:05am

Multiple terabytes? I’d stick that into PostgreSQL. You’d like it to be:

low maintenance - that’s a relative term, but a properly-deployed PGSQL server shouldn’t require too much hassle
fast - highly respected in the industry
single file per table? - not sure, but you can architect it any way you like
concurrent readers and writers - no problems here
no daemon to fire up? - ehh, this one it fails on
able to store nulls in keys and values would be nice - check

So it does lose out on point 5, but you’re looking at multiple terabytes, so I’m hoping a Postgres back end is worth that cost. (And for the record, on my Linux system, I can easily forget that I have Postgres installed - it starts automatically and doesn’t consume significant resources when not in use.) Having an external daemon is the easiest way to manage concurrent readers and writers, since it means that there’s a dedicated broker that keeps track of everything.

dstromberg · September 15, 2023, 1:39am

This: A Guide to Partitioning Data In PostgreSQL | Severalnines
…sadly says:
The maximum table size allowed in a PostgreSQL database is 32TB, however unless it’s running on a not-yet-invented computer from the future, performance issues may arise on a table with only a hundredth of that space.

csm10495 · September 15, 2023, 2:03am

I haven’t tried it at the scale you’re talking about but I’ve had good luck with DiskCache: Disk Backed Cache — DiskCache 5.6.1 documentation.

Typically dbs have their own process to allow that process to offload some processing from the app using it. Also because that process can synchronize requests and allow access via sockets to other hosts.