How to Build a Key-Value Store from Scratch in C: A Step-by-Step Tutorial

by Samantha Maresca | May 7, 2026 | Uncategorized | 0 comments

Why Build a Key-Value Store in C?

Key-value stores are at the heart of systems like Redis, LevelDB, and Amazon DynamoDB. If you want to truly understand database internals, there is no better exercise than building one yourself from scratch. In this tutorial, we will build a key-value store in C that supports insert, lookup, delete, and basic persistence to disk.

By the end of this post, you will have a working, minimal key-value store you can extend into something more ambitious. Everything here is practical and hands-on, with full code examples you can compile and run today.

What We Are Building

Our key-value store will have the following features:

In-memory hash table for fast lookups
Separate chaining for collision handling
Dynamic memory allocation with proper cleanup
Persistence to disk so data survives restarts
A simple command-line interface for interacting with the store

Here is a quick overview of the architecture:

Component	Purpose
Hash Function	Maps string keys to array indices
Hash Table (Bucket Array)	Core data structure holding pointers to entries
Linked List Nodes	Handle collisions via separate chaining
Persistence Layer	Save and load data to/from a flat file
CLI Interface	Accept GET, SET, DEL commands from stdin

Step 1: Define the Data Structures

Every key-value store needs a way to represent entries. We will use a simple struct for each entry and a linked list for collision handling.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define TABLE_SIZE 1024
#define MAX_KEY_LEN 256
#define MAX_VAL_LEN 1024
#define DATA_FILE "kvstore.dat"

typedef struct Entry {
    char *key;
    char *value;
    struct Entry *next;  /* for separate chaining */
} Entry;

typedef struct {
    Entry *buckets[TABLE_SIZE];
    size_t count;
} KVStore;

Key design decisions:

TABLE_SIZE is set to 1024. You can increase this for better performance with large datasets.
Both key and value are dynamically allocated strings, giving us flexibility.
Each bucket points to a linked list of entries (separate chaining).

Step 2: Implement the Hash Function

A good hash function distributes keys evenly across buckets. We will use the popular djb2 algorithm, which is simple, fast, and effective for string keys.

unsigned long hash_function(const char *key) {
    unsigned long hash = 5381;
    int c;
    while ((c = *key++)) {
        hash = ((hash << 5) + hash) + c;  /* hash * 33 + c */
    }
    return hash % TABLE_SIZE;
}

Why djb2?

There are many hash functions you could use. Here is how djb2 compares to some alternatives:

Hash Function	Speed	Distribution Quality	Complexity
djb2	Very fast	Good	Low
FNV-1a	Fast	Very good	Low
MurmurHash3	Fast	Excellent	Medium
SHA-256 (overkill)	Slow	Excellent	High

For a simple key-value store, djb2 strikes the right balance. If you later need better distribution, swapping in FNV-1a or MurmurHash3 is straightforward.

Step 3: Initialize and Free the Store

We need functions to create and destroy the store, properly managing memory.

KVStore *kv_create(void) {
    KVStore *store = malloc(sizeof(KVStore));
    if (!store) {
        perror("Failed to allocate KVStore");
        exit(EXIT_FAILURE);
    }
    memset(store->buckets, 0, sizeof(store->buckets));
    store->count = 0;
    return store;
}

void kv_destroy(KVStore *store) {
    for (int i = 0; i < TABLE_SIZE; i++) {
        Entry *entry = store->buckets[i];
        while (entry) {
            Entry *next = entry->next;
            free(entry->key);
            free(entry->value);
            free(entry);
            entry = next;
        }
    }
    free(store);
}

Important: Always walk the entire chain in each bucket when freeing. Forgetting this is a classic memory leak in hash table implementations.

Step 4: Implement SET (Insert or Update)

The SET operation either inserts a new key-value pair or updates an existing one.

int kv_set(KVStore *store, const char *key, const char *value) {
    unsigned long index = hash_function(key);
    Entry *entry = store->buckets[index];

    /* Check if key already exists */
    while (entry) {
        if (strcmp(entry->key, key) == 0) {
            /* Update existing value */
            char *new_value = strdup(value);
            if (!new_value) return -1;
            free(entry->value);
            entry->value = new_value;
            return 0;
        }
        entry = entry->next;
    }

    /* Insert new entry at head of chain */
    Entry *new_entry = malloc(sizeof(Entry));
    if (!new_entry) return -1;

    new_entry->key = strdup(key);
    new_entry->value = strdup(value);
    if (!new_entry->key || !new_entry->value) {
        free(new_entry->key);
        free(new_entry->value);
        free(new_entry);
        return -1;
    }

    new_entry->next = store->buckets[index];
    store->buckets[index] = new_entry;
    store->count++;
    return 0;
}

How Collision Handling Works Here

When two different keys hash to the same bucket index, we simply add the new entry at the head of that bucket’s linked list. This is called separate chaining. It is one of the most common collision resolution strategies and works well when the load factor (entries / table size) stays reasonable.

Other collision handling strategies you might explore later:

Open addressing (linear probing) – store everything in the array itself, probing for the next open slot
Robin Hood hashing – a variation of open addressing that reduces variance in probe lengths
Cuckoo hashing – uses two hash functions and guarantees O(1) worst-case lookups

Step 5: Implement GET (Lookup)

The GET operation searches for a key and returns its value, or NULL if not found.

const char *kv_get(KVStore *store, const char *key) {
    unsigned long index = hash_function(key);
    Entry *entry = store->buckets[index];

    while (entry) {
        if (strcmp(entry->key, key) == 0) {
            return entry->value;
        }
        entry = entry->next;
    }
    return NULL;  /* key not found */
}

Average time complexity: O(1) when the hash table is not overloaded. Worst case is O(n) if all keys collide into the same bucket, but with a decent hash function and table size this almost never happens.

Step 6: Implement DEL (Delete)

Deletion requires careful pointer manipulation since we are working with a singly linked list.

int kv_del(KVStore *store, const char *key) {
    unsigned long index = hash_function(key);
    Entry *entry = store->buckets[index];
    Entry *prev = NULL;

    while (entry) {
        if (strcmp(entry->key, key) == 0) {
            if (prev) {
                prev->next = entry->next;
            } else {
                store->buckets[index] = entry->next;
            }
            free(entry->key);
            free(entry->value);
            free(entry);
            store->count--;
            return 0;  /* success */
        }
        prev = entry;
        entry = entry->next;
    }
    return -1;  /* key not found */
}

Step 7: Add Persistence to Disk

An in-memory store is fast but loses all data when the process exits. Let’s add simple file-based persistence. We will write all key-value pairs to a flat file and read them back on startup.

Saving to Disk

int kv_save(KVStore *store, const char *filename) {
    FILE *fp = fopen(filename, "w");
    if (!fp) {
        perror("Failed to open file for writing");
        return -1;
    }

    for (int i = 0; i < TABLE_SIZE; i++) {
        Entry *entry = store->buckets[i];
        while (entry) {
            /* Format: key_length key value_length value */
            fprintf(fp, "%zu %s %zu %s\n",
                    strlen(entry->key), entry->key,
                    strlen(entry->value), entry->value);
            entry = entry->next;
        }
    }

    fclose(fp);
    return 0;
}

Loading from Disk

int kv_load(KVStore *store, const char *filename) {
    FILE *fp = fopen(filename, "r");
    if (!fp) {
        /* File might not exist on first run, that is fine */
        return 0;
    }

    size_t key_len, val_len;
    char key_buf[MAX_KEY_LEN];
    char val_buf[MAX_VAL_LEN];

    while (fscanf(fp, "%zu %255s %zu %1023s",
                  &key_len, key_buf, &val_len, val_buf) == 4) {
        kv_set(store, key_buf, val_buf);
    }

    fclose(fp);
    return 0;
}

Limitations of this approach: This simple format does not handle keys or values that contain spaces or newlines. For a production store, you would want a binary format or proper escaping. But for learning purposes, this works perfectly.

Step 8: Build the Command-Line Interface

Now let’s tie everything together with a simple interactive loop.

int main(void) {
    KVStore *store = kv_create();
    kv_load(store, DATA_FILE);

    printf("Key-Value Store ready. Commands: SET key value | GET key | DEL key | SAVE | QUIT\n");

    char command[16];
    char key[MAX_KEY_LEN];
    char value[MAX_VAL_LEN];

    while (1) {
        printf("> ");
        if (scanf("%15s", command) != 1) break;

        if (strcmp(command, "SET") == 0) {
            if (scanf("%255s %1023s", key, value) == 2) {
                if (kv_set(store, key, value) == 0)
                    printf("OK\n");
                else
                    printf("ERROR: could not set key\n");
            }
        } else if (strcmp(command, "GET") == 0) {
            if (scanf("%255s", key) == 1) {
                const char *result = kv_get(store, key);
                if (result)
                    printf("%s\n", result);
                else
                    printf("(nil)\n");
            }
        } else if (strcmp(command, "DEL") == 0) {
            if (scanf("%255s", key) == 1) {
                if (kv_del(store, key) == 0)
                    printf("DELETED\n");
                else
                    printf("NOT FOUND\n");
            }
        } else if (strcmp(command, "SAVE") == 0) {
            if (kv_save(store, DATA_FILE) == 0)
                printf("SAVED\n");
            else
                printf("ERROR: save failed\n");
        } else if (strcmp(command, "QUIT") == 0) {
            kv_save(store, DATA_FILE);
            break;
        } else {
            printf("Unknown command: %s\n", command);
        }
    }

    kv_destroy(store);
    return 0;
}

Compiling and Running

Save all the code in a single file called kvstore.c and compile it:

gcc -Wall -Wextra -O2 -o kvstore kvstore.c
./kvstore

You should see the prompt. Try these commands:

> SET name LinuxFuture
OK
> SET version 1.0
OK
> GET name
LinuxFuture
> DEL version
DELETED
> GET version
(nil)
> SAVE
SAVED
> QUIT

Performance Characteristics

Operation	Average Case	Worst Case
SET	O(1)	O(n)
GET	O(1)	O(n)
DEL	O(1)	O(n)
SAVE	O(n)	O(n)
LOAD	O(n)	O(n)

Where n is the total number of stored entries. The worst case for GET, SET, and DEL only occurs if every key hashes to the same bucket, which is extremely unlikely with djb2.

Ideas for Extending Your Key-Value Store

Once you have the basics working, here are some features worth adding to deepen your understanding:

Dynamic resizing – Automatically grow the hash table when the load factor exceeds a threshold (e.g., 0.75). This keeps operations fast even with millions of entries.
Binary file format – Replace the text-based persistence with a binary format that supports keys and values containing any characters, including spaces and newlines.
Write-Ahead Log (WAL) – Log every write operation before applying it. This protects against data loss from crashes during a save.
TTL (Time-To-Live) – Add expiration timestamps to entries, similar to Redis EXPIRE.
Network interface – Use POSIX sockets to accept TCP connections and handle commands over the network, turning your store into a proper server.
Thread safety – Add mutexes or read-write locks so multiple threads can access the store concurrently.
SSTable/LSM-Tree persistence – Replace the flat file with sorted string tables for much better write and read performance at scale.

How This Compares to Real-World Key-Value Stores

Our implementation is intentionally minimal. Here is how it stacks up against production systems:

Feature	Our Store	Redis	LevelDB
In-memory storage	Yes	Yes	No (disk-based)
Persistence	Basic flat file	RDB + AOF	LSM-Tree + SSTables
Concurrency	Single-threaded	Single-threaded (event loop)	Multi-threaded reads
Data types	String only	Strings, lists, sets, hashes, etc.	Byte arrays
Network support	No	Yes (TCP)	No (embedded library)

The gap is large, but the core concepts are the same. Understanding how a hash table maps keys to values, how collisions are resolved, and how data gets written to disk gives you the foundation to read and understand the source code of Redis or any other key-value store.

Common Pitfalls When Building a Key-Value Store in C

Watch out for these mistakes that trip up many developers:

Forgetting to free memory on update: When you overwrite a value with SET, you must free() the old value string first.
Not handling malloc failures: Always check the return value of malloc and strdup. In low-memory situations, they return NULL.
Buffer overflows in scanf: Always specify field width (e.g., %255s) to prevent writing past the buffer.
Dangling pointers after delete: Make sure your delete function properly relinks the chain before freeing the entry.
Not flushing or closing files: Always call fclose() after writing. Data may sit in a buffer and never reach disk otherwise.

Full Source Code

You can find the complete, single-file source code by combining all the code blocks above into one kvstore.c file. The order should be:

Includes and defines
Struct definitions
Hash function
kv_create and kv_destroy
kv_set, kv_get, kv_del
kv_save and kv_load
main

If you prefer, you can also split these into separate .h and .c files for better project organization.

Frequently Asked Questions

What is a key-value store?

A key-value store is a type of database that stores data as pairs of keys and values. You use a unique key to store and retrieve a corresponding value. Think of it like a dictionary or a phone book: you look up a name (key) to find a phone number (value).

Why use C to build a key-value store instead of Python or Go?

C gives you direct control over memory allocation, pointer manipulation, and system calls. This is exactly what you need to understand how databases work at a low level. Most production key-value stores (Redis, LevelDB, RocksDB) are written in C or C++ for this reason.

Is a hash table the only way to implement a key-value store?

No. Other popular approaches include B-Trees (used by many on-disk databases), LSM-Trees (used by LevelDB and RocksDB), and skip lists. Each has different trade-offs for read vs. write performance and memory usage. A hash table is the simplest starting point for an in-memory store.

How do I handle keys and values with spaces?

The simple scanf-based parser in this tutorial does not support spaces in keys or values. To support them, you could read full lines with fgets and parse them manually, or switch to a binary protocol with length-prefixed strings.

Can I use this key-value store in a production application?

This implementation is designed for learning. It lacks features like thread safety, proper error recovery, efficient persistence, and network access. However, it is an excellent foundation to build upon. Adding each of those features is a great exercise in systems programming.

How many entries can this store handle?

With a table size of 1024, performance stays good up to roughly 700-800 entries (load factor ~0.75). Beyond that, chains get longer and lookups slow down. Adding dynamic resizing (doubling the table and rehashing when the load factor is exceeded) removes this limitation entirely.

SUBSCRIBE TO OUR NEWSLETTERS

How to Build a Key-Value Store from Scratch in C: A Step-by-Step Tutorial

How to Build a Key-Value Store from Scratch in C: A Step-by-Step Tutorial

Why Build a Key-Value Store in C?

What We Are Building

Step 1: Define the Data Structures

Step 2: Implement the Hash Function

Why djb2?

Step 3: Initialize and Free the Store

Step 4: Implement SET (Insert or Update)

How Collision Handling Works Here

Step 5: Implement GET (Lookup)

Step 6: Implement DEL (Delete)

Step 7: Add Persistence to Disk

Saving to Disk

Loading from Disk

Step 8: Build the Command-Line Interface

Compiling and Running

Performance Characteristics

Ideas for Extending Your Key-Value Store

How This Compares to Real-World Key-Value Stores

Common Pitfalls When Building a Key-Value Store in C

Full Source Code

Frequently Asked Questions

What is a key-value store?

Why use C to build a key-value store instead of Python or Go?

Is a hash table the only way to implement a key-value store?

How do I handle keys and values with spaces?

Can I use this key-value store in a production application?

How many entries can this store handle?

Recent Posts

SUBSCRIBE TO OUR NEWSLETTERS