It’s Time to Stop Building KV Databases • Buttondown

2025-03-24 16:31:00
buttondown.com

March 24, 2025

I’m so sick of Key-Value. The most unopinionated, uninspired data model that is a pain to use for anything.

They’re very popular! Among database vendors. Or, specifically, storage engine vendors. They love to offer you a byte array to byte array map, and if they are feeling particularly generous, it might even be ordered so that you can do range scans.

The problem is that the only thing that a KV interface is good for is as a building block for a reasonable data model. This being the primitive means that everyone who uses a KV database has to do that from scratch. And they (read: me) usually do a middling job of it, because it’s not actually the problem they were setting out to solve. Yes: we could have a third-party library that is built to sit atop RocksDB. That is the natural “separation of concerns” answer. But look: are you going to use some guy’s Rust library with 300 dependencies in its Cargo.toml and an anime logo? Because I’m not. I want to use a singular database to the extent possible and I don’t want that database to foist the responsibility of more dependencies onto me.

“Data independence” is, to me, the core selling point of the relational model. Data independence comprises rules 8 and 9 of Codd’s twelve rules for relational databases. From Wikipedia:

Rule 8: Physical data independence:

Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods.

Rule 9: Logical data independence:

Application programs and terminal activities remain logically unimpaired when information-preserving changes of any kind that theoretically permit unimpairment are made to the base tables.

This draws a distinction between the physical schema, which constitutes things like data encoding and index structure, and logical schema, which represents things like which columns are in a given table and what their names and types and constraints are. The platonic relational view of the world is that queries should always be written against the logical schema, and never against the physical.

The piece of a database that enables data independence is the query planner. The thing that takes a declarative statement of what the user would like to achieve and translates it into an imperative plan. It’s this translation layer that protects application code from knowing details about how data is stored.

I have heard repeatedly from many users of databases that they like KV over SQL because they don’t like to wade through a query planner in order to get their queries to work. This is absolutely justified! For a lot of use cases it’s simply not appropriate to declaratively specify your computation. You need to know that you are doing a constrained range scan, not that “if I write the SQL this way, the database is very likely to execute it as a constrained range scan.” My contention is that this is a false dilemma: we can achieve something better than “byte to byte array” without going all the way up to “relational utopia data independence.”

My preferred base data model would be records. Typed records that you might find in a relational database consisting of both a logical level describing “here are the columns I want access to” and a physical level describing “here are sets of columns I want to be able to query by.” That is, a decoupling of the logical schema and physical schema.

More simply:

KV: no distinction between logical and physical schema.
Relational Model: logical and physical schema are distinct, queries are written again the logical schema.
My proposed sweet spot for embedded databases: logical and physical schema are distinct, queries are written against the physical schema.

Let’s get more concrete. We might create a table in SQL like this:

-- Logical schema
CREATE TABLE data (
  ts TIMESTAMP NOT NULL,
  id INT NOT NULL,
  user_id INT NOT NULL,
  reason TEXT NOT NULL
);

-- Physical schema
CREATE UNIQUE INDEX ON data primary_idx (ts, id);

This is fine, and normal, and maps cleanly onto a KV database like RocksDB: you just make the key some kind of encoding of (id, ts) which preserves lexicographic comparisons (which is easy for fixed-size types, and a little tricky for variable-sized types), and then you make the value some concatenation of user_id and reason.

The concern with actually using a SQL database that can arise if you have other indexes:

CREATE INDEX ON data secondary_user_id_idx (user_id, ts);
CREATE INDEX ON data secondary_reason_idx (reason, ts);

And I write a SQL query like:

SELECT * FROM data WHERE ts > 100 AND ts  200 AND user_id = 4 AND reason = 'expired';

There are two natural plans for this query:

Scan the index secondary_user_id_idx restricting user_id and ts, having a residual filter on the reason column or
scan the index secondary_reason_idx restricting reason and ts, having a residual filter on the user_id column.

Picking between these comes down to knowing if reason_id or user_id are more selective. We want to pick the option that will force us to process fewer rows with our residual filter on top. This is the job of the query planner.

The objection to query planners in general comes down to not wanting this “smart” layer in between the authoring of a query and its execution. There is a risk here that the query planner will make the wrong choice and we won’t have an easy way to predict or correct that. For a very limited use case, where we know for the most part, ahead of time what our access patterns will be, and have a small set of queries, this risk (and headache) might not be worth the benefits of data independence. This is why people sometimes argue in favour of KV databases, it means this kind of thing doesn’t come up.

We don’t have to go all the way to that, though. We can have a lot of the nice benefits of records and avoid these headaches by simply…not having a query planner. If you don’t mind writing query plans explicitly (and writing code to process KV data is strictly easier than writing query plans), then this is completely fine. You just make your query language require you to specify what index you want to read and in what order you want operations performed.

You don’t get the benefits of a real query planner in that you don’t get to dynamically respond to data distribution changes, and you don’t get to specialize queries based on their parameters, and you need to at once understand the physical and logical schema of your database, but these are all things that are still true of a KV database.

If your database understands your schema, here are some of the things it is now able to do for you:

Physical schema changes such as secondary indexing, including asynchronous secondary index builds.
Logical schema changes, like adding or deleting new columns.
Transparent swapping between row- and column-oriented layouts without requiring code changes. Perhaps even co-existing within the same database.
Save you from implementing the ad hoc, informally-specified, bug-ridden, slow implementation of column encoding you were probably going to write anyway.

Of course, there are good reasons to provide the most unopinionated interface possible. If you want to provide, say, range scans, well, now you are shipping a type system and semantics on your data. Every decision you have to make is an opportunity to make the wrong one.

Perhaps the biggest decision that has to be made is “how deeply do your records permeate your data model,” meaning, do you have types that you write to disk and then know how to compare? Or do you have types that you know how to convert into byteslices that will then compare lexicographically. Both options come with tradeoffs. I’m not aware of a perfect solution here.

Despite that, I think there is a place for embedded databases that has a more opinionated view on how data should be stored. SQLite is sort of this! SQLite doesn’t get floated as a RocksDB alternative all that often but it actually is that, in a lot of ways. I think it is perhaps hampered by the fact that it is SQL and so has a lot of bad connotations for the kind of workloads people would want to use RocksDB for, which is why I think it’s time for something new.

To summarize, I would like an embedded database with the following features:

A reasonable type system and language semantics that resemble SQL only closely enough for people to port their understanding of the relational model over.
Bounded memory use and linear-time queries. Any operations that this can’t be guaranteed for are not allowed.
Asynchronous schema change support. Both logical (adding columns) and physical (adding or removing secondary indexes).
No query planner, plans are executed as they are written. I could soften on this one for very reliable transformations like removing redundant predicates, which are extremely important if you do any kind of programmatic query construction, but it should be basically possible to tell how the query will run by looking at it.
The ability to switch between row- and column-based layouts.

Things like this have been tried, but I think their time has come again. No more KV.

Source Link

Keep your files stored safely and securely with the SanDisk 2TB Extreme Portable SSD. With over 69,505 ratings and an impressive 4.6 out of 5 stars, this product has been purchased over 8K+ times in the past month. At only $129.99, this Amazon’s Choice product is a must-have for secure file storage.

Help keep private content private with the included password protection featuring 256-bit AES hardware encryption. Order now for just $129.99 on Amazon!

Unlock unlimited streaming with a free Amazon Prime trial!
Sign up today!

Help Power Techcratic’s Future – Scan To Support

If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.

As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!

BITCOIN

Bitcoin Logo