In a strictly append-only TSDB built in C++, is WAL unnecessary due to sequential appends acting like a log? What flaws exist and why is WAL needed for durability and recovery?

No, WAL provides explicit fsync durability before acknowledging writes, enables crash recovery via replay, simplifies replication, and handles failure modes like power failures where page cache loses data. Real TSDBs like QuestDB use it despite append-only design.

Databases

Is WAL Redundant for Append-Only Time-Series TSDB?

Discover why WAL isn't redundant in append-only TSDBs. Learn flaws in data file reasoning, failure modes like power loss, crash recovery benefits, and real-world examples from QuestDB, SQLite for durability and replication.

1 answer• 1 view

01/17/2026, 03:45 PM

Is WAL (Write-Ahead Log) Redundant for an Append-Only Time-Series Database (TSDB)?

I’m building a time-series database (TSDB) in C++ for learning purposes, where all changes to record sets are strictly append-only.

My Reasoning for WAL Being Unnecessary:

The data file is sequential and only modified via appends, making random disk operations unnecessary.
Transactions are appended sequentially, similar to a WAL.
If a crash occurs after acknowledging success but before flushing, data loss happens regardless—WAL would just duplicate the data file (with minor structural differences), adding storage and overhead without unique benefits.
In general databases, WAL solves random writes and ensures durability (ACID ‘D’) by fsync’ing the log before acknowledging. But here, the data file itself behaves like a log.

Questions:

What are the flaws in this reasoning?
Why might a WAL still be necessary for durability, crash recovery, or other reasons in an append-only TSDB?
Are there specific scenarios (e.g., power failure, OS crashes) where WAL provides value even in this setup?

No, a WAL (Write-Ahead Log) isn’t redundant even for a strictly append-only time-series database (TSDB) like the one you’re building in C++. Your data file might handle sequential appends just fine, but it lacks the explicit fsync-before-acknowledge durability that a WAL provides—crucial for surviving power failures or OS crashes where the kernel’s page cache swallows your writes without hitting disk. A WAL also enables clean crash recovery via replay and simplifies replication, without forcing random I/O or duplicating your entire dataset.

The Append-Only TSDB Myth: Why WAL Isn’t Redundant
What Exactly Is a WAL?
Flaws in the “Data File = Log” Reasoning
Failure Modes That Break Append-Only Durability
Unique WAL Benefits for TSDBs
Real-World TSDB Implementations Using WAL
When You Might Skip WAL (and Tradeoffs)
Quick C++ WAL Checklist for Your TSDB
Sources
Conclusion

The Append-Only TSDB Myth: Why WAL Isn’t Redundant

You’re spot on that append-only designs shine for TSDBs—no updates, no deletes, just timestamps piling up sequentially. It feels elegant: why bother with a write-ahead log when your data file is the log? But here’s the catch. That reasoning overlooks how modern OSes and hardware betray your appends. Linux’s page cache (or Windows equivalents) buffers writes in RAM for speed, promising “it’s on disk” only after an explicit fsync() or fdatasync().

Without a WAL, if you ack a write to your client before fsync’ing the data file, a power yank mid-buffer leaves data lost. A WAL flips this: log the operation (tiny, sequential), fsync it first, ack the client, then append to the main file asynchronously. Crash? Replay the WAL to rebuild. No data file fsync lag killing your write throughput.

Think about it—your “similar to WAL” data file still needs fsync discipline. But layering a WAL separates concerns: log for durability, data for queries. Tools like QuestDB append incoming data to a WAL first for exactly this reason, surviving failures without loss.

What Exactly Is a WAL?

At its core, a WAL is an append-only file logging every change before it hits the main store. “Write-ahead” means log first, apply later. From Wikipedia: it’s a disk-resident structure for atomicity and durability in databases.

In practice? For your TSDB, a WAL entry might be: {timestamp: 2026-01-17T10:00, metric: cpu_load, value: 0.75}. Append to WAL, fsync(), ack client. Background thread replays it to your append-only file. SQLite’s WAL mode does this elegantly—appends grow to ~4MB, then checkpoint to main DB, recycle.

Why sequential? WALs avoid random I/O by design. No redundancy myth here—it’s a lightweight operation log, not a data mirror.

Flaws in the “Data File = Log” Reasoning

Let’s poke holes in your points one by one.

Sequential appends avoid random I/O? True, but irrelevant—WAL is also pure appends. Zero random ops.
Transactions append like WAL? Kinda. But does your data file fsync before ack? If not, same fragility. WAL enforces it contractually.
Crash after ack but pre-flush loses data anyway? Yes, unless WAL fsync’d first. Client sees “committed” only after disk durability. Your data file alone risks “ghost writes” in kernel cache.
Data file behaves like a log? Surface-level yes. But logs are fsync’d religiously; data files often aren’t, for perf. WAL decouples: cheap fsyncs on tiny logs vs. chunky data blocks.

A nakabonne.dev TSDB writeup nails it: WAL optional if you tolerate loss, but vital for replay after crash. Your C++ write() to data file? Buffered until fsync. No fsync? Poof.

And replication? WAL ships changes to followers easily. Data file replay? Nightmare.

Failure Modes That Break Append-Only Durability

Append-only sounds bulletproof. Reality? Hardware and OS conspire against you.

Power failure post-ack, pre-fsync: Kernel cache holds your append. Reboot: gone. WAL fsync’d? Replays safely. Architecture Weekly stresses: WAL flushed first ensures recovery.
OS crash/kernel panic: Page cache vanishes. Same loss. WAL on disk survives.
Device write caching: SSDs/NVMe lie about completion. O_DIRECT helps, but fsync ordering still needed. Partial appends corrupt your sequential file.
Teardown races: SIGTERM during write. Buffers discarded.

Even PostgreSQL WAL intros highlight replay fixing inconsistencies post-crash. For TSDBs, where writes flood in, these hit hard—seconds of loss cascades queries.

What if your TSDB ingests 1M points/sec? Data file fsync every batch tanks latency. WAL fsync per-op? Negligible overhead.

Unique WAL Benefits for TSDBs

Beyond durability:

Crash recovery: Replay WAL to rebuild in-memory state or data file tails. Fast, deterministic.
Replication: Ship WAL to replicas for bootstrap. QuestDB async-ships WAL to object storage—new nodes catch up quick.
Point-in-time recovery (PITR): Rollback to WAL position. Append-only data file? Truncate and replay.
Logical replication/change capture: Stream ops without full data scan.
Checkpoints: Periodically apply WAL to data, reset. SQLite auto-checkpoints at 1000 pages.

For your learning TSDB? WAL teaches ACID ‘D’ properly. Skip it, and you’re building a fast log, not a durable DB.

Real-World TSDB Implementations Using WAL

Don’t take my word—production TSDBs swear by WAL.

QuestDB: “Writes durable before processing via WAL.” Order preserved, failures ignored.
tstorage (from scratch): WAL for memory partition recovery. Disable if loss OK.
Prometheus (tsdb variant): WAL for write-ahead before segments.

Even SQLite WAL proves append-only + WAL scales. No “redundancy”—WAL ~1-5% storage for active writes.

When You Might Skip WAL (and Tradeoffs)

Fair question: WAL adds I/O, storage (temp). Skip if:

Data loss tolerable (analytics, not finance).
In-memory only, with snapshots.
O_SYNC on data file (brutal perf hit).

Tradeoffs? No recovery, no easy replication. For learning? Implement both—toggle flag. Measure throughput.

Quick C++ WAL Checklist for Your TSDB

Building in C++? Here’s a minimal blueprint:

Open WAL: int wal_fd = open("tsdb.wal", O_WRONLY|O_CREAT|O_APPEND|O_DIRECT, 0644);
Per-write: Serialize op to buffer, write(wal_fd, buf, len); fdatasync(wal_fd); Ack client.
Async apply: Thread appends to data_fd.
Recovery: Startup, read WAL sequentially, replay to data.
Checkpoint: Apply WAL to data, truncate WAL.

Pseudocode:

cpp

struct WalEntry { uint64_t ts; std::string metric; double value; };
ssize_t append_wal(int fd, const WalEntry& e) {
 char buf[256]; size_t len = serialize(e, buf); // Your serializer
 ssize_t w = write(fd, buf, len);
 fdatasync(fd); // Durability!
 return w;
}

Test with stress-ng --hdd 4 --hdd-bytes 1G + kill -9. WAL survives; data file doesn’t.

Sources

Conclusion

WAL isn’t redundant for your append-only TSDB—it plugs gaping holes in durability, recovery, and scalability that a lone data file ignores. Power failures, OS crashes, and replication needs demand it; real TSDBs like QuestDB prove the point. For your C++ project, start with a toggleable WAL—you’ll hit fewer “where’d my data go?” moments. Trade perf for guarantees? Almost always worth it in 2026’s unreliable world. Build it, benchmark it, learn from the crashes.

Authors

NeuroAnswers

Author

Verified by moderation