DB DevBrain

Devops Interview Prep

Kafka – Cheat-sheet

What Kafka Is

One-Line Summary

“Kafka stores events as immutable logs across partitions, replicates them for fault tolerance, and lets consumer groups read independently using offsets.”

Mental Model

Kafka is not a queue.
It is a distributed, durable, ordered log where consumers move, not messages.
Simple terms:
A giant, durable, append-only log that lets different services send, store, and read events at massive scale and replay them whenever they want.
Technical terms:
A distributed commit log with partitioned topics, replicated storage, consumer groups, offset-based consumption, and high-throughput sequential disk writes. Kafka acts as both a message broker and a durable event store.

Glossary

Term
Simple Explanation
Technical Explanation
Topic
A named stream of messages
A partitioned log storing records with keys, values, timestamps
Partition
A slice of a topic
An ordered, immutable sequence of records with index-based offsets
Offset
A message’s position in a partition
A monotonically increasing pointer allowing random-access reads
Broker
A Kafka server that stores partitions
Handles replication, leader elections, fetch/produce requests
Producer
Sends messages into Kafka
Pushes batches to the leader partition via partitioner
Consumer
Reads messages from Kafka
Pulls data and commits offsets to track progress
Consumer Group
A set of consumers working together
Kafka distributes partitions across group members
ISR
“In-Sync Replicas” that are up-to-date
Followers fully caught up with the leader; needed for safe failover
Retention
How long Kafka keeps messages
Time-based or size-based segment cleanup policies
Rebalance
Redistribution of partitions
Triggered by membership changes or topic metadata updates
Kraft
Kafka without Zookeeper
Internal Raft-based metadata quorum

Kafka Architecture Explained in Words

Mental Diagram

Think of Kafka as:

Step by Step

  1. Producers send events to Kafka
  2. Events go to a topic
  3. The topic is split into partitions
  4. Each partition lives on one leader broker
  5. Other brokers keep replica copies
  6. Consumers read partitions and track offsets themselves
  7. Kafka deletes old data based on retention rules

Key Architectural Insight

Kafka moves compute, not data. Messages stay in place; consumers move forward.

Core Strengths

Common Pitfalls – With Explanations

1. Misconfigured partitions
2. Wrong retention strategy
3. Consumer offset mistakes
4. Hot partitions
5. Rebalancing storms
6. ISR (In-Sync Replica) issues
7. Running Kafka on slow disks or network
8. Treating Kafka like a job queue

4. When to Use Kafka (Good Fits)

High-throughput event streaming
CDC - Change Data Capture
Analytics pipelines
Microservice decoupling
Event replay and audit trails
Stream processing

When Not to Use Kafka (Bad Fits)

Low-latency request/response messaging
Message priority systems
Exactly-once job execution
Very small workloads
Dynamic routing, filtering, or complex message semantics

Mini Examples – With Reasoning

Bad: Keying partitions by user ID when a few users produce most events.
Bad: Assuming Kafka deletes messages after consumption.
Good: Using hash tags or composite keys when ordering per entity matters but you need distribution.
Good: Using Kafka for event replay when debugging production issues.

End-to-End Example

Simple Story Example
  1. A payment service emits “payment_completed” events
  2. Kafka stores them in order
  3. Analytics and billing services read them independently
  4. If analytics crashes, it continues from where it left off
  5. Nothing is lost, nothing is blocked
Technical Walkthrough
  1. Producer sends records to topic payments
  2. Records are hashed by payment_id → specific partition
  3. Partition leader writes to disk and replicates to ISR followers
  4. Consumers in different consumer groups poll the partition
  5. Each group commits its own offsets
  6. Retention deletes old segments after policy threshold

Failure Scenarios

1. Broker Dies

Simple terms:
Kafka promotes a backup broker automatically and keeps going.
Technical terms: If a broker fails:
✅ Works if replication factor ≥ 2

2. Consumer Dies

Simple terms:
Another consumer takes over its work.
Technical terms:
✅ Messages are reprocessed at worst

3. ISR Shrinks (Very Important)

Simple terms:
Backups fall behind. Kafka becomes fragile.
Technical terms:
⚠️ Production red flag

4. Producer Crashes

Simple terms:
Some messages may be sent twice.
Technical terms:
Retries + acks may cause duplicates unless idempotence is enabled.
✅ Use enable.idempotence=true

5. Controller Failure (KRaft)

Simple terms:
Another controller takes over.
Technical terms: