Devops Interview Prep
Database Architecture & Design
Glossary
Core workload and design
- Transaction: All-or-nothing group of changes. Example: create order + charge + decrement stock.
- OLTP (Online Transactional Processing): Many small reads and writes. Example: checkout, login.
- OLAP (Online Analytical Processing): Fewer but heavy queries that scan and aggregate. Example: BI dashboards.
- Isolation level: How transactions interact. Higher isolation avoids anomalies, can increase blocking.
- Strong consistency: After a write succeeds, reads see it immediately.
- Eventual consistency: Reads can be briefly stale, then converge.
- Partitioning: Split one logical table into smaller parts (often by time or range) inside the same DB.
- Sharding: Split the dataset across multiple databases by a shard key.
- Hot key: One key gets disproportionate traffic.
- Working set: Frequently accessed data. If it fits in memory, performance improves.
Database types (high-level)
- Relational Database (RDBMS): Tables with rows and columns, supports SQL, transactions, and joins.Example: Postgres.Mechanism: stores structured data in tables and uses indexes, constraints, and transactions to manage relationships and consistency.
- Document Database: Stores JSON-like documents with flexible structure.Example: MongoDB.Mechanism: each record is stored as a self-contained document, so adding new fields does not require altering a central table structure.
- Key-Value Store: Stores data as key → value pairs.Example: Redis.Mechanism: data is accessed directly by key, usually via a hash or in-memory structure, so lookups avoid scans and joins.
- Wide-Column Store: Stores data by partition key and clustering structure across distributed nodes.Example: Cassandra.Mechanism: data is partitioned by a primary key (for example, user_id), and each key range is assigned to a specific node, so requests go only to the node that owns that partition.
- Columnar / Analytical Database: Stores data by column instead of row.Example: ClickHouse.Mechanism: queries read only the needed columns and use strong compression, which reduces disk I/O for scans and aggregates.
- Time-Series Database: Stores and indexes data primarily by time.Example: TimescaleDB.Mechanism: time-based indexing and partitioning make range queries and retention efficient.
- Search Engine: Stores and indexes text for retrieval and ranking.Example: Elasticsearch.Mechanism: inverted indexes map terms to documents, so search jumps directly to matches instead of scanning all rows.
Write operations (examples and short explanations)
- Append: Add new rows without modifying existing ones.Example: inserting new event logs.
- Update: Modify existing rows.Example: changing order status from pending to shipped.
- Upsert: Update if row exists, insert if it does not.Example: syncing user profile from external system.
- Delete: Remove rows.Example: deleting user data for GDPR compliance.
- Bulk backfill: Large historical insert or migration.Example: importing one year of archived data.
Reliability and recovery
- SLO: Target like 99.9% uptime and p99 under 200 ms.
- RPO (Recovery Point Objective): Max data loss allowed.
- RTO (Recovery Time Objective): Max downtime allowed.
- PITR (Point-in-Time Recovery): Restore to an exact time.
- Replica: Copy of data for HA and or reads.
- Replication lag: Replica behind primary.
- Failover: Switch to a replica.
Performance fundamentals
- p95, p99: Tail latency percentiles.
- IOPS: Small random I/O capacity.
- Throughput: MB per second.
fsync: a system call that forces the operating system to flush a file’s buffered data from memory to stable storage- WAL: Write-ahead log.
- Binlog: MySQL replication log.
- MVCC (Multiversion Concurrency Control): Readers do not block writers, but creates old versions to clean.
- Bloat: Space and work waste from old versions or fragmented indexes.
- Compaction: Background merging in log-structured engines.
- Tombstone: Delete marker in log-structured engines.
Redis specific
- AOF: Append Only File. Logs writes to rebuild after restart. More durable than snapshots, adds disk I/O.
- RDB: Snapshot file.
Partitioning vs sharding
Summary
- What it means
- Partitioning: one database, many partitions.
- Sharding: many databases, each holds a slice.
- What it solves
- Partitioning: query speed on huge tables, cheaper retention, easier maintenance.
- Sharding: write and storage scale beyond a single DB.
- What breaks
- Partitioning: mostly operational complexity (creating partitions, ensuring queries prune).
- Sharding: cross-shard joins and transactions become hard or impossible.
- When to choose
- Choose partitioning first when tables are large and time-based.
- Choose sharding only when a single primary cannot keep up with writes or storage.
Partitioning
What it is
Splitting a single database’s data into smaller logical pieces, usually by:
- Range (by date, ID range)
- Hash
- List
Who decides?
Depends on the database.
Example: PostgreSQL
Postgres supports native table partitioning, but you define it:
PARTITION BY RANGE (created_at)You decide:
- Partition key
- Strategy
- When to add new partitions
Example: MySQL
You define partitioning strategy. It does not auto-partition your tables intelligently.
Key points
Partitioning is:
- Logical
- Inside one DB instance
- Usually performance and manageability focused
- Configured by you
Sharding
What it is
Splitting data across multiple database instances or nodes (horizontal scaling).
Example:
- Users 1 - 1M → DB1
- Users 1M - 2M → DB2
Who decides?
In most systems: You. At the architecture level.
You decide:
- Shard key
- Routing logic
- How to rebalance
Classic examples:
- App-level sharding in web apps
- Proxy-based sharding (Vitess for MySQL)
- Custom sharding logic
Databases That Auto-Shard
Some distributed databases manage sharding internally.
Example: MongoDB
You choose a shard key, but cluster handles:
- Chunk distribution
- Balancing
- Migration
Example: Apache Cassandra
Automatically distributes data by partition key across nodes using consistent hashing.
You choose partition key, but:
- Node distribution is automatic
- Rebalancing is automatic
In modern systems
If you're running:
- Single-node Postgres → partitioning is config-level
- Distributed Cassandra → sharding is built-in system behavior
- Large-scale SaaS → sharding is an architecture strategy
Real-World Database Design
Operating mindset
Design for p99 latency, failure recovery, and day-2 operations. Keep it simple until evidence forces complexity.
For every decision:
- Why is this a fit?
- What trade-off are we accepting?
- What should we monitor?
- What is the failure plan (and have we tested restore)?
Constraints That Shape the Database
Workload truth (do not guess)
Capture:
- Top 10 queries by importance.
- Read/write ratio (avg and peak).
- Concurrency: QPS (Queries Per Second) and active connections.
- Write shapes: append, update, upsert, delete, bulk backfill.
- Data size now and in 12 to 24 months.
Why this matters:
- DB performance usually fails at disk I/O, locks, or bad plans. Your workload tells you which.
Examples:
- Payments: relational is a fit because transactions and constraints prevent money bugs.
- Trade-off: harder write scaling due to coordination.
- Architecture: app servers connect to a primary relational DB with 1-2 read replicas for HA, backups with PITR enabled, and SSD storage for low-latency commits.
- Activity feed: append and time-based access makes partitioning a fit.
- Trade-off: flexible queries later can be harder.
- Architecture: write-heavy service appends events to a partitioned table (often by time), optional cache in front for hot reads, read replicas for feed queries, and background jobs for cleanup or aggregation.
Correctness requirements
Decide:
- Consistency: strong vs eventual.
- Transaction scope.
- Isolation level.
Examples:
- Ledger: strong consistency is a fit because stale reads cause real damage.
- Trade-off: more coordination, higher latency.
- Architecture: primary DB with synchronous replica (or strict commit settings), clients read/write to primary, fast SSD storage, automated failover.
- Dashboard: eventual is a fit because small staleness is acceptable.
- Trade-off: reads may be briefly behind.
- Architecture: primary handles writes, async read replicas serve dashboard queries, optional cache layer for aggregates.
Availability and recovery
Define:
- SLOs (include p99).
- RPO and RTO.
Rule:
- Backups without restore tests are not real backups.
Database categories - fit and trade-offs
Relational (Postgres, MySQL)
When to use:
- Transactions, joins, constraints, strong consistency.
Why it fits:
- The database enforces business rules and keeps related data consistent.
Trade-off:
- Write scaling is harder because writes must coordinate and remain consistent.
Wide-column and key-value (Cassandra, DynamoDB)
When to use:
- Predictable access patterns and high throughput.
Why it fits:
- Requests are designed around known keys or partitions, which scales well horizontally.
Trade-off:
- Limited query flexibility because arbitrary queries become distributed scans.
- Explanation: these systems are optimized to read by partition key. If you query by a different field, the database must ask many nodes for partial results and merge them. That means higher latency, more network traffic, and unpredictable performance at scale.
Document (MongoDB)
When to use:
- Evolving document-shaped data.
Why it fits:
- Flexible documents make schema changes faster and easier.
Trade-off:
- Schema drift and harder relational queries.
- Explanation: because the schema is flexible, different documents can evolve with different fields and shapes over time. This makes indexing and optimization harder, and performing joins or enforcing strict relationships becomes more complex and less efficient compared to relational databases.
Columnar analytics (BigQuery, Snowflake, ClickHouse)
When to use:
- Scans and aggregates.
Why it fits:
- Reads only needed columns and compresses well, which speeds up analytics.
Trade-off:
- Frequent row updates are expensive because column segments get rewritten.
- Explanation: in columnar storage, data is stored by column in large compressed blocks. Updating a single row often means rewriting part of an entire column block, which increases I/O and reduces performance compared to row-based systems.
Cache (Redis)
When to use:
- Very low latency and simple data structures.
Why it fits:
- Keeps hot data in memory for fast access.
Trade-off:
- Memory is finite; persistence adds I/O and can add latency spikes.
Scaling strategy ladder (use the least complex that works)
- Single node + indexes + right hardware.
- Read replicas for read scale.
- Partitioning for huge tables and retention.
- Caching for hot reads.
- Sharding only when forced by write or storage ceilings.
Storage and I/O decisions
Why storage dominates
Most OLTP pain is random I/O + fsync. If storage p99 gets worse, DB p99 gets worse.
Local storage types, SSD vs HDD
- SSD
- Fit: OLTP, random reads/writes, low p99 latency.
- Why: SSD has much lower seek latency.
- Trade-off: more expensive per GB.
- HDD
- Fit: cold data, archives, sequential workloads, cheap capacity.
- Why: HDD is slow for random I/O due to physical seeks.
- Trade-off: p99 latency is much worse for OLTP.
Quick rule:
- If you need p99 stability for OLTP, use SSD.
RAID basics (local disks)
RAID is about performance and fault tolerance. It does not replace backups.
- RAID 0 (striping)
- Fit: max performance when data loss is acceptable.
- Trade-off: one disk failure loses everything.
- RAID 1 (mirror)
- Fit: simple redundancy.
- Trade-off: 50% capacity overhead.
- RAID 10 (striped mirrors)
- Fit: OLTP, good performance + redundancy.
- Trade-off: 50% capacity overhead, more disks.
- RAID 5/6 (parity)
- Fit: read-heavy, cost-efficient capacity.
- Trade-off: slower writes (parity overhead), long rebuild when under load.(Rebuild is the process of reconstructing data onto a replaced disk using parity after a disk failure, which is risky under load because it requires intensive full-array reads and writes that heavily stress I/O and degrade performance)
Considerations:
- Write-heavy OLTP prefers RAID 10 over parity RAID.
- Rebuild time matters. Big disks rebuild slowly, and performance degrades during rebuild.
Networked block storage (cloud disks)
Fit: durability and managed DB patterns.
- Trade-off: network adds latency and providers cap IOPS and throughput.
What to size
- OLTP: random IOPS + p99 latency.
- OLAP: throughput.
Failure-mode table
Symptom | Likely root cause | Why it happens | What to check | Quick fix | Long-term fix |
|---|---|---|---|---|---|
p99 latency jumps, CPU ok | Disk latency spike, checkpoint, fsync stalls | Queries wait on storage, not CPU | Disk latency, queue depth, checkpoint time, WAL fsync | Pause batch, reduce load | Faster storage, tune checkpoints/WAL |
CPU pegged, slow gets worse | Missing index, bad plan, hot query | DB reads far more rows than needed | Slow query log, EXPLAIN, rows scanned vs returned | Add index, rewrite query | Query review, guardrails, partitioning |
Timeouts, CPU moderate | Lock contention, long tx | Queries block waiting for locks | Lock waits, long tx, deadlocks | Kill blocker, reduce concurrency | Fix tx scope, add indexes |
Unstable during deploy or failover | Connection storm, no pooling, retries | Connections and retries amplify load | Conn count and churn, retry rate | Pooling + throttling + backoff | Standardize pooling, limits, circuit breakers |
Replica lag grows | Heavy writes, slow replica I/O | Replica cannot apply changes fast enough | Lag, WAL/binlog backlog, replica disk latency | Route reads to primary, stop heavy replica queries | Upgrade replica storage, move analytics to OLAP |
Disk fills quickly | WAL/binlog retention, stuck slots | Logs append, cleanup blocked | WAL/binlog size, slots, retention | Free space, fix slot, expand | Fix retention, early alerts |
OOMKilled or random restarts | Memory limits, per-query memory * concurrency | Memory multiplies under concurrency | OOMKilled events, memory graphs | Raise limit, lower per-query mem, reduce concurrency | Right-size, tune, isolate workloads |
Pooling and throttling, concise:
- Pooling: reuse a small set of DB connections instead of opening one per request.
- Throttling: cap request rate so spikes do not overwhelm the DB.
Cross-engine operational patterns
Durability log
- What it is: append-only write log used for crash recovery and often replication.
- Why it matters: protects committed data and supports restore/replication.
- What can go wrong: logs grow too fast, fill disk, or add write latency.
- What to monitor: log growth rate, disk usage, fsync latency.
- How it maps per engine:
- Postgres: WAL
- MySQL: redo log + binlog
- Redis: AOF
- Cassandra: commit log
Background maintenance
- What it is: background work that keeps storage healthy and query performance stable.
- Why it matters: prevents bloat, cleans old data, merges files, and persists state.
- What can go wrong: maintenance can consume I/O and cause latency spikes.
- What to monitor: maintenance duration, pending work, disk latency during maintenance.
- How it maps per engine:
- Postgres: autovacuum, checkpoints
- MySQL: purge, flush, page cleaning
- Redis: AOF rewrite, RDB save
- Cassandra: compaction, repair
Replication lag
- What it is: replicas fall behind the write source.
- Why it matters: stale reads and riskier failover.
- What can go wrong: lag grows during heavy writes or slow replica I/O.
- What to monitor: lag time, lag bytes, replica disk latency.
- How it maps per engine:
- Postgres: replica lag from WAL replay
- MySQL: replica lag from binlog apply
- Redis: replica staleness in async replication
- Cassandra: consistency and repair drift show up differently than classic lag
Memory pressure
- What it is: the engine runs short on useful memory for cache or active work.
- Why it matters: cache misses rise, latency rises, and OOM can happen.
- What can go wrong: spills to disk, eviction, or process restarts.
- What to monitor: memory usage, cache hit rate, evictions, OOM events.
- How it maps per engine:
- Postgres: shared_buffers, work_mem
- MySQL: buffer pool
- Redis: maxmemory, eviction
- Cassandra: heap/off-heap pressure, compaction pressure
Disk growth
- What it is: storage usage grows faster than expected.
- Why it matters: full disks often stop writes.
- What can go wrong: retention misconfig, stuck cleanup, log backlog.
- What to monitor: disk growth rate, free space, log directories, cleanup blockers.
- How it maps per engine:
- Postgres: WAL growth, replication slots, bloat
- MySQL: binlog growth
- Redis: AOF growth
- Cassandra: compaction backlog, tombstones, SSTable growth
Connection and request pressure
- What it is: too many concurrent clients or requests hit the DB at once.
- Why it matters: connection overhead and retries can destabilize the system.
- What can go wrong: connection storms, pool exhaustion, retry amplification.
- What to monitor: connection count, churn, retry rate, queueing.
- How it maps per engine:
- Postgres: max_connections pressure, pooling need
- MySQL: connection/thread pressure
- Redis: client pressure and blocked clients
- Cassandra: coordinator overload and request timeouts
Write stalls and latency spikes
- What it is: writes slow down suddenly or periodically.
- Why it matters: user-facing p99 rises and queues build.
- What can go wrong: fsync stalls, checkpoints, rewrites, compaction.
- What to monitor: fsync latency, checkpoint timing, compaction backlog, persistence events.
- How it maps per engine:
- Postgres: checkpoint stalls, WAL fsync spikes
- MySQL: flush pressure, redo/binlog sync cost
- Redis: AOF rewrite or RDB save spikes
- Cassandra: compaction backlog and commit log pressure
Engine quick reference
Postgres
- Key knobs:
shared_buffers: cache pages.work_mem: per-query sort/hash memory.max_connections: keep controlled, prefer pooling.- Engine-specific risks:
- replication slots can prevent WAL cleanup
- autovacuum debt causes bloat
- Common metric names / terms:
- WAL rate
- checkpoint duration
- lock waits
- long transactions
- bloat
MySQL (InnoDB)
- Key knobs:
innodb_buffer_pool_size: main cache.innodb_log_file_size: redo log capacity for write bursts.innodb_flush_log_at_trx_commit,sync_binlog: durability vs latency.- Engine-specific risks:
- metadata locks during DDL
- binlog retention issues
- Common metric names / terms:
- buffer pool hit rate
- replication lag
- binlog growth
- lock waits
Redis
- Key knobs:
maxmemoryand eviction policy.- AOF and RDB persistence settings.
- Engine-specific risks:
- hot key bottlenecks
- fork-related latency spikes
- Common metric names / terms:
- used memory
- evictions
- blocked clients
- persistence duration
Cassandra
- Key knobs:
- replication factor and consistency level.
- compaction strategy.
- Engine-specific risks:
- tombstone storms
- repair neglect
- Common metric names / terms:
- coordinator latency
- compaction pending
- tombstones
- repair status