Devops Interview Prep

Database Architecture & Design

Source: Database Architecture & Design.html

Glossary

Core workload and design

Transaction: All-or-nothing group of changes. Example: create order + charge + decrement stock.
OLTP (Online Transactional Processing): Many small reads and writes. Example: checkout, login.
OLAP (Online Analytical Processing): Fewer but heavy queries that scan and aggregate. Example: BI dashboards.
Isolation level: How transactions interact. Higher isolation avoids anomalies, can increase blocking.
Strong consistency: After a write succeeds, reads see it immediately.
Eventual consistency: Reads can be briefly stale, then converge.
Partitioning: Split one logical table into smaller parts (often by time or range) inside the same DB.
Sharding: Split the dataset across multiple databases by a shard key.
Hot key: One key gets disproportionate traffic.
Working set: Frequently accessed data. If it fits in memory, performance improves.

Database types (high-level)

Relational Database (RDBMS): Tables with rows and columns, supports SQL, transactions, and joins.
Example: Postgres.
Mechanism: stores structured data in tables and uses indexes, constraints, and transactions to manage relationships and consistency.
Document Database: Stores JSON-like documents with flexible structure.
Example: MongoDB.
Mechanism: each record is stored as a self-contained document, so adding new fields does not require altering a central table structure.
Key-Value Store: Stores data as key → value pairs.
Example: Redis.
Mechanism: data is accessed directly by key, usually via a hash or in-memory structure, so lookups avoid scans and joins.
Wide-Column Store: Stores data by partition key and clustering structure across distributed nodes.
Example: Cassandra.
Mechanism: data is partitioned by a primary key (for example, user_id), and each key range is assigned to a specific node, so requests go only to the node that owns that partition.
Columnar / Analytical Database: Stores data by column instead of row.
Example: ClickHouse.
Mechanism: queries read only the needed columns and use strong compression, which reduces disk I/O for scans and aggregates.
Time-Series Database: Stores and indexes data primarily by time.
Example: TimescaleDB.
Mechanism: time-based indexing and partitioning make range queries and retention efficient.
Search Engine: Stores and indexes text for retrieval and ranking.
Example: Elasticsearch.
Mechanism: inverted indexes map terms to documents, so search jumps directly to matches instead of scanning all rows.

Write operations (examples and short explanations)

Append: Add new rows without modifying existing ones.
Example: inserting new event logs.
Update: Modify existing rows.
Example: changing order status from pending to shipped.
Upsert: Update if row exists, insert if it does not.
Example: syncing user profile from external system.
Delete: Remove rows.
Example: deleting user data for GDPR compliance.
Bulk backfill: Large historical insert or migration.
Example: importing one year of archived data.

Reliability and recovery

SLO: Target like 99.9% uptime and p99 under 200 ms.
RPO (Recovery Point Objective): Max data loss allowed.
RTO (Recovery Time Objective): Max downtime allowed.
PITR (Point-in-Time Recovery): Restore to an exact time.
Replica: Copy of data for HA and or reads.
Replication lag: Replica behind primary.
Failover: Switch to a replica.

Performance fundamentals

p95, p99: Tail latency percentiles.
IOPS: Small random I/O capacity.
Throughput: MB per second.
fsync: a system call that forces the operating system to flush a file’s buffered data from memory to stable storage
WAL: Write-ahead log.
Binlog: MySQL replication log.
MVCC (Multiversion Concurrency Control): Readers do not block writers, but creates old versions to clean.
Bloat: Space and work waste from old versions or fragmented indexes.
Compaction: Background merging in log-structured engines.
Tombstone: Delete marker in log-structured engines.

Redis specific

AOF: Append Only File. Logs writes to rebuild after restart. More durable than snapshots, adds disk I/O.
RDB: Snapshot file.

Partitioning vs sharding

Summary

What it means

Partitioning: one database, many partitions.
Sharding: many databases, each holds a slice.

What it solves

Partitioning: query speed on huge tables, cheaper retention, easier maintenance.
Sharding: write and storage scale beyond a single DB.

What breaks

Partitioning: mostly operational complexity (creating partitions, ensuring queries prune).
Sharding: cross-shard joins and transactions become hard or impossible.

When to choose

Choose partitioning first when tables are large and time-based.
Choose sharding only when a single primary cannot keep up with writes or storage.

Partitioning

What it is

Splitting a single database’s data into smaller logical pieces, usually by:

Range (by date, ID range)
Hash
List

Who decides?

Depends on the database.

Example: PostgreSQL

Postgres supports native table partitioning, but you define it:

PARTITION BY RANGE (created_at)

You decide:

Partition key
Strategy
When to add new partitions

Example: MySQL

You define partitioning strategy. It does not auto-partition your tables intelligently.

Key points

Partitioning is:

Logical
Inside one DB instance
Usually performance and manageability focused
Configured by you

Sharding

What it is

Splitting data across multiple database instances or nodes (horizontal scaling).

Example:

Users 1 - 1M → DB1
Users 1M - 2M → DB2

Who decides?

In most systems: You. At the architecture level.

You decide:

Shard key
Routing logic
How to rebalance

Classic examples:

App-level sharding in web apps
Proxy-based sharding (Vitess for MySQL)
Custom sharding logic

Databases That Auto-Shard

Some distributed databases manage sharding internally.

Example: MongoDB

You choose a shard key, but cluster handles:

Chunk distribution
Balancing
Migration

Example: Apache Cassandra

Automatically distributes data by partition key across nodes using consistent hashing.

You choose partition key, but:

Node distribution is automatic
Rebalancing is automatic

In modern systems

If you're running:

Single-node Postgres → partitioning is config-level
Distributed Cassandra → sharding is built-in system behavior
Large-scale SaaS → sharding is an architecture strategy

Real-World Database Design

Operating mindset

Design for p99 latency, failure recovery, and day-2 operations. Keep it simple until evidence forces complexity.

For every decision:

Why is this a fit?
What trade-off are we accepting?
What should we monitor?
What is the failure plan (and have we tested restore)?

Constraints That Shape the Database

Workload truth (do not guess)

Capture:

Top 10 queries by importance.
Read/write ratio (avg and peak).
Concurrency: QPS (Queries Per Second) and active connections.
Write shapes: append, update, upsert, delete, bulk backfill.
Data size now and in 12 to 24 months.

Why this matters:

DB performance usually fails at disk I/O, locks, or bad plans. Your workload tells you which.

Examples:

Payments: relational is a fit because transactions and constraints prevent money bugs.

Trade-off: harder write scaling due to coordination.
Architecture: app servers connect to a primary relational DB with 1-2 read replicas for HA, backups with PITR enabled, and SSD storage for low-latency commits.

Activity feed: append and time-based access makes partitioning a fit.

Trade-off: flexible queries later can be harder.
Architecture: write-heavy service appends events to a partitioned table (often by time), optional cache in front for hot reads, read replicas for feed queries, and background jobs for cleanup or aggregation.

Correctness requirements

Decide:

Consistency: strong vs eventual.
Transaction scope.
Isolation level.

Examples:

Ledger: strong consistency is a fit because stale reads cause real damage.

Trade-off: more coordination, higher latency.
Architecture: primary DB with synchronous replica (or strict commit settings), clients read/write to primary, fast SSD storage, automated failover.

Dashboard: eventual is a fit because small staleness is acceptable.

Trade-off: reads may be briefly behind.
Architecture: primary handles writes, async read replicas serve dashboard queries, optional cache layer for aggregates.

Availability and recovery

Define:

SLOs (include p99).
RPO and RTO.

Rule:

Backups without restore tests are not real backups.

Database categories - fit and trade-offs

Relational (Postgres, MySQL)

When to use:

Transactions, joins, constraints, strong consistency.

Why it fits:

The database enforces business rules and keeps related data consistent.

Trade-off:

Write scaling is harder because writes must coordinate and remain consistent.

Wide-column and key-value (Cassandra, DynamoDB)

When to use:

Predictable access patterns and high throughput.

Why it fits:

Requests are designed around known keys or partitions, which scales well horizontally.

Trade-off:

Limited query flexibility because arbitrary queries become distributed scans.

Explanation: these systems are optimized to read by partition key. If you query by a different field, the database must ask many nodes for partial results and merge them. That means higher latency, more network traffic, and unpredictable performance at scale.

Document (MongoDB)

When to use:

Evolving document-shaped data.

Why it fits:

Flexible documents make schema changes faster and easier.

Trade-off:

Schema drift and harder relational queries.

Explanation: because the schema is flexible, different documents can evolve with different fields and shapes over time. This makes indexing and optimization harder, and performing joins or enforcing strict relationships becomes more complex and less efficient compared to relational databases.

Columnar analytics (BigQuery, Snowflake, ClickHouse)

When to use:

Scans and aggregates.

Why it fits:

Reads only needed columns and compresses well, which speeds up analytics.

Trade-off:

Frequent row updates are expensive because column segments get rewritten.

Explanation: in columnar storage, data is stored by column in large compressed blocks. Updating a single row often means rewriting part of an entire column block, which increases I/O and reduces performance compared to row-based systems.

Cache (Redis)

When to use:

Very low latency and simple data structures.

Why it fits:

Keeps hot data in memory for fast access.

Trade-off:

Memory is finite; persistence adds I/O and can add latency spikes.

Scaling strategy ladder (use the least complex that works)

Single node + indexes + right hardware.
Read replicas for read scale.
Partitioning for huge tables and retention.
Caching for hot reads.
Sharding only when forced by write or storage ceilings.

Storage and I/O decisions

Why storage dominates

Most OLTP pain is random I/O + fsync. If storage p99 gets worse, DB p99 gets worse.

Local storage types, SSD vs HDD

SSD

Fit: OLTP, random reads/writes, low p99 latency.
Why: SSD has much lower seek latency.
Trade-off: more expensive per GB.

HDD

Fit: cold data, archives, sequential workloads, cheap capacity.
Why: HDD is slow for random I/O due to physical seeks.
Trade-off: p99 latency is much worse for OLTP.

Quick rule:

If you need p99 stability for OLTP, use SSD.

RAID basics (local disks)

RAID is about performance and fault tolerance. It does not replace backups.

RAID 0 (striping)

Fit: max performance when data loss is acceptable.
Trade-off: one disk failure loses everything.

RAID 1 (mirror)

Fit: simple redundancy.
Trade-off: 50% capacity overhead.

RAID 10 (striped mirrors)

Fit: OLTP, good performance + redundancy.
Trade-off: 50% capacity overhead, more disks.

RAID 5/6 (parity)

Fit: read-heavy, cost-efficient capacity.
Trade-off: slower writes (parity overhead), long rebuild when under load.
(Rebuild is the process of reconstructing data onto a replaced disk using parity after a disk failure, which is risky under load because it requires intensive full-array reads and writes that heavily stress I/O and degrade performance)

Considerations:

Write-heavy OLTP prefers RAID 10 over parity RAID.
Rebuild time matters. Big disks rebuild slowly, and performance degrades during rebuild.

Networked block storage (cloud disks)

Fit: durability and managed DB patterns.

Trade-off: network adds latency and providers cap IOPS and throughput.

What to size

OLTP: random IOPS + p99 latency.
OLAP: throughput.

Failure-mode table

Symptom	Likely root cause	Why it happens	What to check	Quick fix	Long-term fix
p99 latency jumps, CPU ok	Disk latency spike, checkpoint, fsync stalls	Queries wait on storage, not CPU	Disk latency, queue depth, checkpoint time, WAL fsync	Pause batch, reduce load	Faster storage, tune checkpoints/WAL
CPU pegged, slow gets worse	Missing index, bad plan, hot query	DB reads far more rows than needed	Slow query log, EXPLAIN, rows scanned vs returned	Add index, rewrite query	Query review, guardrails, partitioning
Timeouts, CPU moderate	Lock contention, long tx	Queries block waiting for locks	Lock waits, long tx, deadlocks	Kill blocker, reduce concurrency	Fix tx scope, add indexes
Unstable during deploy or failover	Connection storm, no pooling, retries	Connections and retries amplify load	Conn count and churn, retry rate	Pooling + throttling + backoff	Standardize pooling, limits, circuit breakers
Replica lag grows	Heavy writes, slow replica I/O	Replica cannot apply changes fast enough	Lag, WAL/binlog backlog, replica disk latency	Route reads to primary, stop heavy replica queries	Upgrade replica storage, move analytics to OLAP
Disk fills quickly	WAL/binlog retention, stuck slots	Logs append, cleanup blocked	WAL/binlog size, slots, retention	Free space, fix slot, expand	Fix retention, early alerts
OOMKilled or random restarts	Memory limits, per-query memory * concurrency	Memory multiplies under concurrency	OOMKilled events, memory graphs	Raise limit, lower per-query mem, reduce concurrency	Right-size, tune, isolate workloads

Pooling and throttling, concise:

Pooling: reuse a small set of DB connections instead of opening one per request.
Throttling: cap request rate so spikes do not overwhelm the DB.

Cross-engine operational patterns

Durability log

What it is: append-only write log used for crash recovery and often replication.
Why it matters: protects committed data and supports restore/replication.
What can go wrong: logs grow too fast, fill disk, or add write latency.
What to monitor: log growth rate, disk usage, fsync latency.
How it maps per engine:

Postgres: WAL
MySQL: redo log + binlog
Redis: AOF
Cassandra: commit log

Background maintenance

What it is: background work that keeps storage healthy and query performance stable.
Why it matters: prevents bloat, cleans old data, merges files, and persists state.
What can go wrong: maintenance can consume I/O and cause latency spikes.
What to monitor: maintenance duration, pending work, disk latency during maintenance.
How it maps per engine:

Postgres: autovacuum, checkpoints
MySQL: purge, flush, page cleaning
Redis: AOF rewrite, RDB save
Cassandra: compaction, repair

Replication lag

What it is: replicas fall behind the write source.
Why it matters: stale reads and riskier failover.
What can go wrong: lag grows during heavy writes or slow replica I/O.
What to monitor: lag time, lag bytes, replica disk latency.
How it maps per engine:

Postgres: replica lag from WAL replay
MySQL: replica lag from binlog apply
Redis: replica staleness in async replication
Cassandra: consistency and repair drift show up differently than classic lag

Memory pressure

What it is: the engine runs short on useful memory for cache or active work.
Why it matters: cache misses rise, latency rises, and OOM can happen.
What can go wrong: spills to disk, eviction, or process restarts.
What to monitor: memory usage, cache hit rate, evictions, OOM events.
How it maps per engine:

Postgres: shared_buffers, work_mem
MySQL: buffer pool
Redis: maxmemory, eviction
Cassandra: heap/off-heap pressure, compaction pressure

Disk growth

What it is: storage usage grows faster than expected.
Why it matters: full disks often stop writes.
What can go wrong: retention misconfig, stuck cleanup, log backlog.
What to monitor: disk growth rate, free space, log directories, cleanup blockers.
How it maps per engine:

Postgres: WAL growth, replication slots, bloat
MySQL: binlog growth
Redis: AOF growth
Cassandra: compaction backlog, tombstones, SSTable growth

Connection and request pressure

What it is: too many concurrent clients or requests hit the DB at once.
Why it matters: connection overhead and retries can destabilize the system.
What can go wrong: connection storms, pool exhaustion, retry amplification.
What to monitor: connection count, churn, retry rate, queueing.
How it maps per engine:

Postgres: max_connections pressure, pooling need
MySQL: connection/thread pressure
Redis: client pressure and blocked clients
Cassandra: coordinator overload and request timeouts

Write stalls and latency spikes

What it is: writes slow down suddenly or periodically.
Why it matters: user-facing p99 rises and queues build.
What can go wrong: fsync stalls, checkpoints, rewrites, compaction.
What to monitor: fsync latency, checkpoint timing, compaction backlog, persistence events.
How it maps per engine:

Postgres: checkpoint stalls, WAL fsync spikes
MySQL: flush pressure, redo/binlog sync cost
Redis: AOF rewrite or RDB save spikes
Cassandra: compaction backlog and commit log pressure

Engine quick reference

Postgres

Key knobs:

shared_buffers: cache pages.
work_mem: per-query sort/hash memory.
max_connections: keep controlled, prefer pooling.

Engine-specific risks:

replication slots can prevent WAL cleanup
autovacuum debt causes bloat

Common metric names / terms:

WAL rate
checkpoint duration
lock waits
long transactions
bloat

MySQL (InnoDB)

Key knobs:

innodb_buffer_pool_size: main cache.
innodb_log_file_size: redo log capacity for write bursts.
innodb_flush_log_at_trx_commit, sync_binlog: durability vs latency.

Engine-specific risks:

metadata locks during DDL
binlog retention issues

Common metric names / terms:

buffer pool hit rate
replication lag
binlog growth
lock waits

Redis

Key knobs:

maxmemory and eviction policy.
AOF and RDB persistence settings.

Engine-specific risks:

hot key bottlenecks
fork-related latency spikes

Common metric names / terms:

used memory
evictions
blocked clients
persistence duration

Cassandra

Key knobs:

replication factor and consistency level.
compaction strategy.

Engine-specific risks:

tombstone storms
repair neglect

Common metric names / terms:

coordinator latency
compaction pending
tombstones
repair status