Why Build a Key-Value Store in C?
Key-value stores are at the heart of systems like Redis, LevelDB, and Amazon DynamoDB. If you want to truly understand database internals, there is no better exercise than building one yourself from scratch. In this tutorial, we will build a key-value store in C that supports insert, lookup, delete, and basic persistence to disk.
By the end of this post, you will have a working, minimal key-value store you can extend into something more ambitious. Everything here is practical and hands-on, with full code examples you can compile and run today.
What We Are Building
Our key-value store will have the following features:
- In-memory hash table for fast lookups
- Separate chaining for collision handling
- Dynamic memory allocation with proper cleanup
- Persistence to disk so data survives restarts
- A simple command-line interface for interacting with the store
Here is a quick overview of the architecture:
| Component | Purpose |
|---|---|
| Hash Function | Maps string keys to array indices |
| Hash Table (Bucket Array) | Core data structure holding pointers to entries |
| Linked List Nodes | Handle collisions via separate chaining |
| Persistence Layer | Save and load data to/from a flat file |
| CLI Interface | Accept GET, SET, DEL commands from stdin |
Step 1: Define the Data Structures
Every key-value store needs a way to represent entries. We will use a simple struct for each entry and a linked list for collision handling.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define TABLE_SIZE 1024
#define MAX_KEY_LEN 256
#define MAX_VAL_LEN 1024
#define DATA_FILE "kvstore.dat"
typedef struct Entry {
char *key;
char *value;
struct Entry *next; /* for separate chaining */
} Entry;
typedef struct {
Entry *buckets[TABLE_SIZE];
size_t count;
} KVStore;
Key design decisions:
TABLE_SIZEis set to 1024. You can increase this for better performance with large datasets.- Both key and value are dynamically allocated strings, giving us flexibility.
- Each bucket points to a linked list of entries (separate chaining).
Step 2: Implement the Hash Function
A good hash function distributes keys evenly across buckets. We will use the popular djb2 algorithm, which is simple, fast, and effective for string keys.
unsigned long hash_function(const char *key) {
unsigned long hash = 5381;
int c;
while ((c = *key++)) {
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
}
return hash % TABLE_SIZE;
}
Why djb2?
There are many hash functions you could use. Here is how djb2 compares to some alternatives:
| Hash Function | Speed | Distribution Quality | Complexity |
|---|---|---|---|
| djb2 | Very fast | Good | Low |
| FNV-1a | Fast | Very good | Low |
| MurmurHash3 | Fast | Excellent | Medium |
| SHA-256 (overkill) | Slow | Excellent | High |
For a simple key-value store, djb2 strikes the right balance. If you later need better distribution, swapping in FNV-1a or MurmurHash3 is straightforward.
Step 3: Initialize and Free the Store
We need functions to create and destroy the store, properly managing memory.
KVStore *kv_create(void) {
KVStore *store = malloc(sizeof(KVStore));
if (!store) {
perror("Failed to allocate KVStore");
exit(EXIT_FAILURE);
}
memset(store->buckets, 0, sizeof(store->buckets));
store->count = 0;
return store;
}
void kv_destroy(KVStore *store) {
for (int i = 0; i < TABLE_SIZE; i++) {
Entry *entry = store->buckets[i];
while (entry) {
Entry *next = entry->next;
free(entry->key);
free(entry->value);
free(entry);
entry = next;
}
}
free(store);
}
Important: Always walk the entire chain in each bucket when freeing. Forgetting this is a classic memory leak in hash table implementations.
Step 4: Implement SET (Insert or Update)
The SET operation either inserts a new key-value pair or updates an existing one.
int kv_set(KVStore *store, const char *key, const char *value) {
unsigned long index = hash_function(key);
Entry *entry = store->buckets[index];
/* Check if key already exists */
while (entry) {
if (strcmp(entry->key, key) == 0) {
/* Update existing value */
char *new_value = strdup(value);
if (!new_value) return -1;
free(entry->value);
entry->value = new_value;
return 0;
}
entry = entry->next;
}
/* Insert new entry at head of chain */
Entry *new_entry = malloc(sizeof(Entry));
if (!new_entry) return -1;
new_entry->key = strdup(key);
new_entry->value = strdup(value);
if (!new_entry->key || !new_entry->value) {
free(new_entry->key);
free(new_entry->value);
free(new_entry);
return -1;
}
new_entry->next = store->buckets[index];
store->buckets[index] = new_entry;
store->count++;
return 0;
}
How Collision Handling Works Here
When two different keys hash to the same bucket index, we simply add the new entry at the head of that bucket’s linked list. This is called separate chaining. It is one of the most common collision resolution strategies and works well when the load factor (entries / table size) stays reasonable.
Other collision handling strategies you might explore later:
- Open addressing (linear probing) – store everything in the array itself, probing for the next open slot
- Robin Hood hashing – a variation of open addressing that reduces variance in probe lengths
- Cuckoo hashing – uses two hash functions and guarantees O(1) worst-case lookups
Step 5: Implement GET (Lookup)
The GET operation searches for a key and returns its value, or NULL if not found.
const char *kv_get(KVStore *store, const char *key) {
unsigned long index = hash_function(key);
Entry *entry = store->buckets[index];
while (entry) {
if (strcmp(entry->key, key) == 0) {
return entry->value;
}
entry = entry->next;
}
return NULL; /* key not found */
}
Average time complexity: O(1) when the hash table is not overloaded. Worst case is O(n) if all keys collide into the same bucket, but with a decent hash function and table size this almost never happens.
Step 6: Implement DEL (Delete)
Deletion requires careful pointer manipulation since we are working with a singly linked list.
int kv_del(KVStore *store, const char *key) {
unsigned long index = hash_function(key);
Entry *entry = store->buckets[index];
Entry *prev = NULL;
while (entry) {
if (strcmp(entry->key, key) == 0) {
if (prev) {
prev->next = entry->next;
} else {
store->buckets[index] = entry->next;
}
free(entry->key);
free(entry->value);
free(entry);
store->count--;
return 0; /* success */
}
prev = entry;
entry = entry->next;
}
return -1; /* key not found */
}
Step 7: Add Persistence to Disk
An in-memory store is fast but loses all data when the process exits. Let’s add simple file-based persistence. We will write all key-value pairs to a flat file and read them back on startup.
Saving to Disk
int kv_save(KVStore *store, const char *filename) {
FILE *fp = fopen(filename, "w");
if (!fp) {
perror("Failed to open file for writing");
return -1;
}
for (int i = 0; i < TABLE_SIZE; i++) {
Entry *entry = store->buckets[i];
while (entry) {
/* Format: key_length key value_length value */
fprintf(fp, "%zu %s %zu %s\n",
strlen(entry->key), entry->key,
strlen(entry->value), entry->value);
entry = entry->next;
}
}
fclose(fp);
return 0;
}
Loading from Disk
int kv_load(KVStore *store, const char *filename) {
FILE *fp = fopen(filename, "r");
if (!fp) {
/* File might not exist on first run, that is fine */
return 0;
}
size_t key_len, val_len;
char key_buf[MAX_KEY_LEN];
char val_buf[MAX_VAL_LEN];
while (fscanf(fp, "%zu %255s %zu %1023s",
&key_len, key_buf, &val_len, val_buf) == 4) {
kv_set(store, key_buf, val_buf);
}
fclose(fp);
return 0;
}
Limitations of this approach: This simple format does not handle keys or values that contain spaces or newlines. For a production store, you would want a binary format or proper escaping. But for learning purposes, this works perfectly.
Step 8: Build the Command-Line Interface
Now let’s tie everything together with a simple interactive loop.
int main(void) {
KVStore *store = kv_create();
kv_load(store, DATA_FILE);
printf("Key-Value Store ready. Commands: SET key value | GET key | DEL key | SAVE | QUIT\n");
char command[16];
char key[MAX_KEY_LEN];
char value[MAX_VAL_LEN];
while (1) {
printf("> ");
if (scanf("%15s", command) != 1) break;
if (strcmp(command, "SET") == 0) {
if (scanf("%255s %1023s", key, value) == 2) {
if (kv_set(store, key, value) == 0)
printf("OK\n");
else
printf("ERROR: could not set key\n");
}
} else if (strcmp(command, "GET") == 0) {
if (scanf("%255s", key) == 1) {
const char *result = kv_get(store, key);
if (result)
printf("%s\n", result);
else
printf("(nil)\n");
}
} else if (strcmp(command, "DEL") == 0) {
if (scanf("%255s", key) == 1) {
if (kv_del(store, key) == 0)
printf("DELETED\n");
else
printf("NOT FOUND\n");
}
} else if (strcmp(command, "SAVE") == 0) {
if (kv_save(store, DATA_FILE) == 0)
printf("SAVED\n");
else
printf("ERROR: save failed\n");
} else if (strcmp(command, "QUIT") == 0) {
kv_save(store, DATA_FILE);
break;
} else {
printf("Unknown command: %s\n", command);
}
}
kv_destroy(store);
return 0;
}
Compiling and Running
Save all the code in a single file called kvstore.c and compile it:
gcc -Wall -Wextra -O2 -o kvstore kvstore.c
./kvstore
You should see the prompt. Try these commands:
> SET name LinuxFuture
OK
> SET version 1.0
OK
> GET name
LinuxFuture
> DEL version
DELETED
> GET version
(nil)
> SAVE
SAVED
> QUIT
Performance Characteristics
| Operation | Average Case | Worst Case |
|---|---|---|
| SET | O(1) | O(n) |
| GET | O(1) | O(n) |
| DEL | O(1) | O(n) |
| SAVE | O(n) | O(n) |
| LOAD | O(n) | O(n) |
Where n is the total number of stored entries. The worst case for GET, SET, and DEL only occurs if every key hashes to the same bucket, which is extremely unlikely with djb2.
Ideas for Extending Your Key-Value Store
Once you have the basics working, here are some features worth adding to deepen your understanding:
- Dynamic resizing – Automatically grow the hash table when the load factor exceeds a threshold (e.g., 0.75). This keeps operations fast even with millions of entries.
- Binary file format – Replace the text-based persistence with a binary format that supports keys and values containing any characters, including spaces and newlines.
- Write-Ahead Log (WAL) – Log every write operation before applying it. This protects against data loss from crashes during a save.
- TTL (Time-To-Live) – Add expiration timestamps to entries, similar to Redis EXPIRE.
- Network interface – Use POSIX sockets to accept TCP connections and handle commands over the network, turning your store into a proper server.
- Thread safety – Add mutexes or read-write locks so multiple threads can access the store concurrently.
- SSTable/LSM-Tree persistence – Replace the flat file with sorted string tables for much better write and read performance at scale.
How This Compares to Real-World Key-Value Stores
Our implementation is intentionally minimal. Here is how it stacks up against production systems:
| Feature | Our Store | Redis | LevelDB |
|---|---|---|---|
| In-memory storage | Yes | Yes | No (disk-based) |
| Persistence | Basic flat file | RDB + AOF | LSM-Tree + SSTables |
| Concurrency | Single-threaded | Single-threaded (event loop) | Multi-threaded reads |
| Data types | String only | Strings, lists, sets, hashes, etc. | Byte arrays |
| Network support | No | Yes (TCP) | No (embedded library) |
The gap is large, but the core concepts are the same. Understanding how a hash table maps keys to values, how collisions are resolved, and how data gets written to disk gives you the foundation to read and understand the source code of Redis or any other key-value store.
Common Pitfalls When Building a Key-Value Store in C
Watch out for these mistakes that trip up many developers:
- Forgetting to free memory on update: When you overwrite a value with SET, you must
free()the old value string first. - Not handling
mallocfailures: Always check the return value ofmallocandstrdup. In low-memory situations, they return NULL. - Buffer overflows in
scanf: Always specify field width (e.g.,%255s) to prevent writing past the buffer. - Dangling pointers after delete: Make sure your delete function properly relinks the chain before freeing the entry.
- Not flushing or closing files: Always call
fclose()after writing. Data may sit in a buffer and never reach disk otherwise.
Full Source Code
You can find the complete, single-file source code by combining all the code blocks above into one kvstore.c file. The order should be:
- Includes and defines
- Struct definitions
- Hash function
kv_createandkv_destroykv_set,kv_get,kv_delkv_saveandkv_loadmain
If you prefer, you can also split these into separate .h and .c files for better project organization.
Frequently Asked Questions
What is a key-value store?
A key-value store is a type of database that stores data as pairs of keys and values. You use a unique key to store and retrieve a corresponding value. Think of it like a dictionary or a phone book: you look up a name (key) to find a phone number (value).
Why use C to build a key-value store instead of Python or Go?
C gives you direct control over memory allocation, pointer manipulation, and system calls. This is exactly what you need to understand how databases work at a low level. Most production key-value stores (Redis, LevelDB, RocksDB) are written in C or C++ for this reason.
Is a hash table the only way to implement a key-value store?
No. Other popular approaches include B-Trees (used by many on-disk databases), LSM-Trees (used by LevelDB and RocksDB), and skip lists. Each has different trade-offs for read vs. write performance and memory usage. A hash table is the simplest starting point for an in-memory store.
How do I handle keys and values with spaces?
The simple scanf-based parser in this tutorial does not support spaces in keys or values. To support them, you could read full lines with fgets and parse them manually, or switch to a binary protocol with length-prefixed strings.
Can I use this key-value store in a production application?
This implementation is designed for learning. It lacks features like thread safety, proper error recovery, efficient persistence, and network access. However, it is an excellent foundation to build upon. Adding each of those features is a great exercise in systems programming.
How many entries can this store handle?
With a table size of 1024, performance stays good up to roughly 700-800 entries (load factor ~0.75). Beyond that, chains get longer and lookups slow down. Adding dynamic resizing (doubling the table and rehashing when the load factor is exceeded) removes this limitation entirely.
