NoSQL Engine

A from-scratch NoSQL key-value storage engine written in Go, implementing an LSM-tree architecture inspired by RocksDB and Cassandra. Built as a learning project toward a distributed database system.

Language: Go 1.23.2
Module: nosqlEngine
Interface: interactive CLI (go run ./cmd) + programmatic API

Architecture

Write ──► WAL (group commit) ──► Memtable ──► Immutable queue ──► Flusher ──► SSTable (lvl0)
                                                                              │
                                                                    Compactor (size-tiered)
                                                                              │
                                                                        SSTable (lvl1..N)

Read ──► Active memtable ──► Immutable queue ──► SSTable levels (bloom → index → data)

Write path

Writes are serialized through a write queue (single writer goroutine) — eliminates all write-path locking.
The WAL writer buffers multiple writes per call, then issues one fsync covering the entire batch (group commit). Async writes (WriteAsync) skip the fsync wait entirely.
Each write goes into the active memtable (hash map, skip-list, or B-tree, selectable via config).
When the memtable reaches MEMTABLE_SIZE bytes it is sealed into an immutable memtable carrying its max LSN and pushed to the immutable queue.
The flusher goroutine drains the immutable queue, writing sorted SSTables to lvl0. Flush and compaction run concurrently.
When a level accumulates COMPACTION_THRESHOLD SSTables, the compactor goroutine k-way-merges them into the next level (min-heap, O(N log K)).

Read path

Active memtable (O(1) for hash map, O(log n) for skip-list/B-tree).
Immutable queue, newest-first.
SSTables: level 0 newest-first, levels 1+ in order. Per table: Bloom filter → in-memory dense index → block-aligned data read via LRU block cache.

SSTable format

Single-file layout, all fields big-endian:

[Data blocks][Index block][Filter block][Footer: 56 bytes]

Footer: [indexOffset:8][indexSize:8][filterOffset:8][filterSize:8]
        [itemCount:8][maxLSN:8][magic:8]

The Bloom filter and dense index are loaded into memory when an SSTable is first opened (SSTableReader). TableCache (LRU, configurable capacity) keeps the most recently accessed readers in memory; evicted readers release their file descriptors back to the FD pool in BlockManager.

Concurrency model

Component	Mechanism
Write serialization	Single writer goroutine + buffered channel
Memtable reads	`sync.RWMutex` on active mem pointer; inner impl lock-free where possible
Immutable queue	`sync.Mutex` + `sync.Cond` for backpressure
SSTable version registry	`sync.RWMutex`; readers hold RLock for all SSTable I/O
Compaction	Snapshots version list under brief RLock, then works without any lock
Rate limiting	Per-user token buckets

Distribution layer readiness

Three APIs are implemented to support the first replica joining:

WAL streaming (`WALCursor`)

cursor, err := engine.wal.CursorFrom(afterLSN)
for {
    entry, err := cursor.Next()
    if err == io.EOF {
        <-cursor.Notify() // blocks until durableLSN advances
        continue
    }
    // ship entry to follower
}
cursor.Close()

Next() only returns entries that have been fsync'd. Notify() is a buffered channel signalled on every WAL flush — no polling needed.

Snapshot export

snap, err := engine.TakeSnapshot()
// snap.LSN    — WAL position at snapshot time
// snap.SSTables — []SnapshotFile{Level, Path, Size}
// transfer files, then:
snap.Release() // unpins files; deferred compaction deletions run

TakeSnapshot drains the active memtable, waits for all pending flushes, then captures a consistent {LSN, file list} pair. Files are reference-counted — compaction cannot delete a pinned file; deletion is deferred until Release().

Safe compaction LSN gate

engine.SetSafeCompactionLSN(minReplicaLSN)

Each SSTable footer stores the maxLSN of its entries. The compactor gates tombstone dropping on:

dropTombstones = isLastLevel &&
    (safeCompactionLSN == 0 || (batchMaxLSN > 0 && batchMaxLSN <= safeCompactionLSN))

safeCompactionLSN == 0 (default) preserves standalone behaviour — tombstones drop freely.
In distributed mode, the distribution layer bumps this as replicas confirm progress, preventing data resurrection on lagging replicas.

Benchmarks

Machine: AMD Ryzen 7 7730U, WSL2 (Linux), go test -bench=. -benchtime=5s.
All writes are sync-durability (fsync) unless noted. Value size: 64 bytes.

Benchmark	ops/sec	Notes
`PutSequential`	697	Single goroutine, sync, waits flush per op
`PutThroughput`	764	Single goroutine, sync, no flush wait per op
`PutThroughputParallel`	5,937	16 goroutines, sync, group commit batching
`PutParallel`	5,915	Same as above, different key space
`PutBurst`	3,327	64-goroutine bursts, group commit
`PutAsyncThroughput`	2,765	Single goroutine, no fsync wait
`PutAsyncParallel`	3,466	16 goroutines, no fsync wait
`GetMemtableHit`	410,429	Active memtable (hash map)
`GetSSTable`	441,070	SSTable read via block cache (warm)
`DeleteSequential`	875	Write + tombstone, sync

Key observations:

Sequential sync puts are WAL-bound — every op pays one fsync (~1.4 ms). The single-writer design means no write amplification from lock contention.
Parallel sync puts show ~8.5x speedup over sequential. Group commit amortizes the fsync cost across concurrent writers: 16 goroutines sharing one fsync per batch.
SSTable reads match memtable reads because the block cache keeps hot blocks in memory; the Bloom filter + in-memory dense index avoids most disk seeks entirely.
Async writes offer a middle ground — WAL-buffered but not fsync'd; useful for bulk ingest where a subsequent sync write or graceful shutdown provides the durability boundary.

Build & run

go mod tidy
go build -o bin/nosql-engine ./cmd
./bin/nosql-engine

# or without building:
go run ./cmd

Tests

go test ./...                                    # all packages
go test ./src/tests/integration/ -v              # integration tests
go test ./src/tests/benchmark/ -bench=. -benchtime=5s  # benchmarks

Note: tests write real files under data/. Run from the repo root.
Note: config.json is embedded at compile time — config changes require a rebuild.

Configuration (`src/config/config.json`)

Key	Default	Description
`BLOCK_SIZE`	4096	Disk I/O block size in bytes
`MEMTABLE_SIZE`	1000	Flush threshold in bytes (sum of key+value lengths)
`MEMTABLE_TYPE`	`hashmap`	`hashmap`, `skiplist`, or `btree`
`MAX_IMMUTABLE_COUNT`	4	Immutable queue depth before writes block
`WAL_SEGMENT_SIZE`	65536	Max bytes per WAL segment file
`WAL_BUFFER_SIZE`	8	WAL write buffer slots for group commit
`LSM_LEVELS`	4	Number of LSM levels
`COMPACTION_THRESHOLD`	4	SSTables per level before compaction
`CACHE_CAPACITY`	512	Block cache LRU capacity (number of blocks)
`TABLE_CACHE_SIZE`	64	SSTableReader LRU capacity
`BLOOM_FILTER_FALSE_POSITIVE_RATE`	0.01	Target Bloom filter FPR

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 210 Commits
.cursor/rules		.cursor/rules
benchmarks		benchmarks
cmd		cmd
proto/cluster/v1		proto/cluster/v1
scripts		scripts
src		src
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLI_USAGE.md		CLI_USAGE.md
Makefile		Makefile
README.md		README.md
buf.gen.yaml		buf.gen.yaml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NoSQL Engine

Architecture

Write path

Read path

SSTable format

Concurrency model

Distribution layer readiness

WAL streaming (`WALCursor`)

Snapshot export

Safe compaction LSN gate

Benchmarks

Build & run

Tests

Configuration (`src/config/config.json`)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NoSQL Engine

Architecture

Write path

Read path

SSTable format

Concurrency model

Distribution layer readiness

WAL streaming (WALCursor)

Snapshot export

Safe compaction LSN gate

Benchmarks

Build & run

Tests

Configuration (src/config/config.json)

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

WAL streaming (`WALCursor`)

Configuration (`src/config/config.json`)

Packages