Fundamentals of Apache Kafka — Interview Questions & Answers

50 essential Apache Kafka interview questions covering topics, partitions, producers, consumers, brokers, offsets, and stream processing.

Meritshot17 min read
Apache KafkaStreamingData EngineeringInterview QuestionsBig Data
Back to Interview Guides

Kafka Basics

1. What is Apache Kafka?

Apache Kafka is an open-source distributed event streaming platform originally developed at LinkedIn and later donated to the Apache Software Foundation. It is designed to publish, subscribe to, store, and process streams of records in real time with high throughput and fault tolerance. Kafka is commonly used for building real-time data pipelines and streaming applications that move data reliably between systems.

2. What are the core use cases for Kafka?

Kafka is widely used for messaging, website activity tracking, metrics collection, log aggregation, stream processing, and event sourcing. It serves as a central nervous system that decouples data producers from consumers, allowing many independent systems to read the same data stream. Its durability and replay capability also make it popular for building event-driven microservice architectures.

3. What is an event or message in Kafka?

An event, also called a record or message, is the fundamental unit of data in Kafka and represents something that happened. Each record consists of an optional key, a value (the payload), a timestamp, and optional headers. The key is often used to determine partitioning, while the value carries the actual business data being transmitted.

4. How does Kafka differ from a traditional message queue?

Unlike traditional queues that typically delete messages after they are consumed, Kafka retains messages on disk for a configurable retention period regardless of consumption. This means multiple consumers can independently read the same data, and consumers can replay past events by resetting their offset. Kafka also scales horizontally through partitioning, offering far higher throughput than most legacy message brokers.

5. What is the publish-subscribe model in Kafka?

In the publish-subscribe model, producers publish messages to topics without knowing who will consume them, and consumers subscribe to topics to receive those messages. This decouples the senders and receivers, allowing many consumers to subscribe to the same topic independently. Kafka combines this pub-sub model with queue-like semantics through consumer groups, giving it flexible delivery patterns.

6. What are the main components of the Kafka ecosystem?

The core components are producers, consumers, brokers, topics, and partitions, coordinated historically by ZooKeeper or now by KRaft. The broader ecosystem also includes Kafka Connect for integrating external systems, Kafka Streams for stream processing, and the Schema Registry for managing data schemas. Together these tools form a complete platform for ingesting, storing, processing, and exporting streaming data.

7. What is ZooKeeper and why was it used in Kafka?

ZooKeeper is a centralized service that Kafka historically used for storing metadata, electing the controller broker, tracking cluster membership, and managing configuration. It helped coordinate the distributed brokers and maintained information about topics, partitions, and access control lists. Newer Kafka versions are moving away from ZooKeeper in favor of the built-in KRaft mode.

8. What is KRaft mode in Kafka?

KRaft (Kafka Raft) is a metadata management mode that removes the dependency on ZooKeeper by using the Raft consensus protocol internally. In KRaft mode, a quorum of controller nodes manages cluster metadata directly within Kafka, simplifying deployment and operations. It improves scalability for clusters with many partitions and reduces the number of external systems administrators must maintain.

9. What does it mean that Kafka is a distributed system?

Kafka runs as a cluster of one or more servers called brokers that can span multiple machines or data centers. Data is partitioned and replicated across these brokers so that the system can scale out and tolerate node failures without losing data. This distributed design enables Kafka to handle very large volumes of messages with high availability.

10. What programming languages and clients can interact with Kafka?

Kafka provides an official Java client and supports many other languages through community and Confluent-maintained libraries, including Python (kafka-python, confluent-kafka), Go, C/C++, .NET, and Node.js. Applications communicate with brokers using Kafka's binary TCP protocol. There is also a REST Proxy that allows clients to interact with Kafka over HTTP when a native client is not available.

Topics & Partitions

11. What is a topic in Kafka?

A topic is a named category or feed to which records are published, similar to a table in a database or a folder in a filesystem. Producers write to topics and consumers read from them, and topics are always multi-subscriber. A topic is logically a stream of records, and it is physically split into one or more partitions for scalability.

12. What is a partition in Kafka?

A partition is an ordered, immutable sequence of records that is continuously appended to, forming a commit log. Each topic is divided into partitions, and partitioning is what allows Kafka to scale horizontally and process data in parallel. Records within a single partition are strictly ordered, but ordering is not guaranteed across different partitions of the same topic.

13. How does Kafka decide which partition a message goes to?

If a producer specifies a key, Kafka hashes that key to consistently map all records with the same key to the same partition. If no key is provided, records are distributed across partitions using a round-robin or sticky strategy to balance load. Producers can also implement a custom partitioner to control message placement based on business logic.

14. Why are partitions important for scalability?

Partitions allow a topic's data and consumer load to be spread across multiple brokers and consumer instances simultaneously. Because each partition can be consumed by a separate consumer in a group, the number of partitions sets the maximum parallelism for consumption. Adding partitions therefore increases throughput, though it must be balanced against overhead and ordering requirements.

15. What is an offset in the context of partitions?

An offset is a unique, monotonically increasing integer that identifies the position of each record within a partition. Offsets are assigned sequentially as records are appended, and they never change once assigned. Consumers track their offsets to know which messages they have already read and where to resume reading.

16. Can the number of partitions in a topic be changed after creation?

Yes, you can increase the number of partitions for an existing topic, but you cannot decrease it without recreating the topic. Increasing partitions affects key-based ordering because the hash-to-partition mapping changes for future messages, so existing keys may move to different partitions. For this reason, partition count should be planned carefully up front.

17. What is the retention policy of a topic?

Retention controls how long Kafka keeps records in a topic before they are eligible for deletion, configured by time (retention.ms) or by size (retention.bytes). Once the limit is exceeded, old log segments are removed regardless of whether consumers have read them. This time-based retention is what allows multiple consumers to read and replay data independently.

18. What is log compaction in Kafka?

Log compaction is a retention strategy that retains at least the latest value for each message key within a partition, rather than deleting purely by age. It is useful for maintaining a current snapshot of state, such as the most recent value for each entity, while still allowing older duplicate keys to be cleaned up. Compaction is enabled by setting the topic's cleanup.policy to compact.

19. What is a segment in a Kafka partition?

A partition's log is physically stored on disk as a set of smaller files called segments, rather than one giant file. Kafka writes to the active segment until it reaches a configured size or time threshold, then rolls over to a new one. Segmentation makes retention, deletion, and compaction efficient because old segments can be removed as whole files.

20. How is ordering guaranteed in Kafka?

Kafka guarantees strict ordering only within an individual partition, where records are stored and delivered in the order they were appended. To preserve ordering for related events, producers should send them with the same key so they land in the same partition. There is no global ordering guarantee across all partitions of a topic.

Producers & Consumers

21. What is a Kafka producer?

A producer is a client application that publishes records to one or more Kafka topics. It is responsible for serializing data, choosing the target partition, and sending batches of records to the appropriate brokers. Producers can be tuned for throughput or latency and can request different levels of delivery acknowledgment from the brokers.

22. What is the acks setting in a producer?

The acks setting controls how many broker acknowledgments the producer waits for before considering a write successful. With acks=0 the producer does not wait at all, acks=1 waits for the leader to write the record, and acks=all waits for all in-sync replicas to confirm. Higher acks values increase durability at the cost of latency.

23. How do producers achieve high throughput?

Producers batch multiple records together before sending them to a broker, controlled by settings like batch.size and linger.ms. They can also compress batches using codecs such as gzip, snappy, lz4, or zstd to reduce network and storage usage. Batching and compression together significantly improve throughput compared to sending records one at a time.

24. What is a Kafka consumer?

A consumer is a client application that subscribes to topics and reads records from their partitions. Consumers pull data from brokers at their own pace and keep track of their position using offsets. They deserialize the records and pass the data to application logic for processing.

25. What is a consumer group?

A consumer group is a set of consumers that cooperate to consume a topic by dividing its partitions among the group members. Each partition is assigned to exactly one consumer within the group, ensuring that messages are processed once per group. Multiple consumer groups can read the same topic independently, each maintaining its own offsets.

26. What happens when a consumer in a group fails?

When a consumer fails or leaves the group, Kafka triggers a rebalance that reassigns its partitions to the remaining consumers in the group. This ensures continued processing without manual intervention and maintains the one-partition-per-consumer rule. The group coordinator on a broker manages membership and orchestrates these rebalances.

27. What is consumer group rebalancing?

Rebalancing is the process of redistributing partition assignments among the consumers of a group when membership changes, such as when consumers join, leave, or fail. During a rebalance, consumers may briefly stop processing while partitions are reassigned, which is sometimes called a stop-the-world pause. Strategies like cooperative sticky assignment reduce disruption by reassigning only the partitions that need to move.

28. What is the role of the key in a producer record?

The key in a producer record determines which partition a message is routed to, since Kafka hashes the key to select a partition consistently. Using a key ensures that all records sharing that key go to the same partition and are therefore processed in order. If no business ordering is required, the key can be left null and messages are distributed evenly.

29. What is the difference between push and pull in Kafka consumption?

Kafka uses a pull-based model in which consumers actively request data from brokers rather than having data pushed to them. This lets consumers control their consumption rate and avoid being overwhelmed during traffic spikes. The pull model also makes it easy to batch fetches and to replay messages by simply requesting older offsets.

30. What are serializers and deserializers in Kafka?

Serializers convert application objects into byte arrays before producers send them, while deserializers convert received bytes back into objects for consumers. Kafka ships with common implementations for strings, integers, and byte arrays, and supports custom formats like Avro, JSON, and Protobuf. Producers and consumers must agree on compatible serialization formats to exchange data correctly.

Brokers & Replication

31. What is a Kafka broker?

A broker is a single Kafka server that stores data and serves client requests for producing and consuming records. A Kafka cluster is made up of multiple brokers, and each broker handles a share of the partitions across all topics. Brokers are identified by a unique integer ID and coordinate with one another to form a fault-tolerant cluster.

32. What is the controller in a Kafka cluster?

The controller is a special broker responsible for managing administrative tasks such as partition leader elections and tracking broker liveness. There is exactly one active controller in a cluster at any time, elected through ZooKeeper or the KRaft quorum. If the controller broker fails, a new controller is automatically elected to take over its duties.

33. What is replication in Kafka?

Replication is the practice of keeping multiple copies of each partition on different brokers to protect against data loss. The replication factor defines how many copies exist, and a typical production value is three. Replication ensures that if a broker hosting a partition fails, another broker holding a replica can continue serving that data.

34. What is the difference between a leader and a follower replica?

For each partition, one replica is designated the leader and handles all reads and writes for that partition, while the other replicas are followers. Followers passively replicate the leader's data by fetching records from it to stay synchronized. If the leader fails, one of the in-sync followers is promoted to become the new leader.

35. What are in-sync replicas (ISR)?

In-sync replicas are the set of replicas that are fully caught up with the partition leader within a configured lag threshold. Only ISR members are eligible to be elected leader, which guarantees that no acknowledged data is lost during failover. The min.insync.replicas setting defines how many ISR members must acknowledge a write when acks=all is used.

36. How does Kafka achieve fault tolerance?

Kafka achieves fault tolerance primarily through partition replication across multiple brokers and automatic leader election. If a broker goes down, its leader partitions are reassigned to in-sync replicas on other brokers so producers and consumers continue working. Combined with durable disk storage and configurable acknowledgments, this design lets clusters survive node failures without data loss.

37. What happens during a leader election?

A leader election occurs when a partition's current leader becomes unavailable and a new leader must be chosen from the in-sync replica set. The controller selects an eligible ISR member and updates the cluster metadata so clients can redirect their requests to the new leader. Electing only from the ISR ensures the new leader already holds all committed messages.

38. What is unclean leader election?

Unclean leader election allows a replica that is not in the in-sync set to become the leader when no ISR members are available. While this improves availability by keeping the partition online, it risks losing data that the failed leader had but the out-of-sync replica lacked. It is controlled by unclean.leader.election.enable and is usually disabled in durability-sensitive systems.

39. How do brokers handle data storage on disk?

Brokers store partition data as append-only log segments written sequentially to disk, which is highly efficient for both writes and reads. Kafka relies heavily on the operating system's page cache rather than maintaining its own in-memory cache, and it uses zero-copy transfer to send data directly from disk to the network. This combination allows Kafka to achieve very high throughput on commodity hardware.

40. How do you scale a Kafka cluster?

You scale a Kafka cluster horizontally by adding more brokers and redistributing partitions across them to balance load and storage. Partition reassignment can be performed using tools like kafka-reassign-partitions to move replicas onto new brokers. Scaling also involves choosing an appropriate partition count and replication factor to match throughput and availability goals.

Offsets & Delivery Guarantees

41. How do consumers commit offsets?

Consumers commit offsets to record how far they have read in each partition, storing them in an internal Kafka topic called __consumer_offsets. Commits can be automatic at a fixed interval by enabling enable.auto.commit, or manual through explicit commitSync or commitAsync calls. Manual commits give the application precise control over when a message is considered processed.

42. What is the difference between current offset and committed offset?

The current offset is the position of the next record a consumer will read in the current session, advancing as records are fetched. The committed offset is the last position the consumer has durably saved back to Kafka, marking work it has confirmed as processed. After a restart or rebalance, a consumer resumes from the committed offset rather than the in-memory current offset.

43. What does auto.offset.reset control?

The auto.offset.reset setting determines where a consumer starts reading when it has no committed offset or its previous offset is no longer valid. Setting it to earliest starts from the beginning of the partition, while latest starts from only new incoming records. A value of none causes the consumer to throw an exception instead of choosing a default position.

44. What is at-most-once delivery?

At-most-once delivery means each message is delivered zero or one times, so messages may be lost but are never reprocessed. This happens when a consumer commits its offset before processing the message, so a crash after committing skips that record. It offers the lowest overhead and is acceptable when occasional data loss is tolerable.

45. What is at-least-once delivery?

At-least-once delivery guarantees that no message is lost, but a message may be delivered more than once. It is achieved by processing a message first and committing its offset only afterward, so a failure before committing causes reprocessing on restart. This is the most common default, and applications often pair it with idempotent processing to handle duplicates.

46. What is exactly-once semantics in Kafka?

Exactly-once semantics (EOS) ensures each message is processed once and only once, with no loss and no duplicates, even across failures. Kafka supports this through idempotent producers and transactions that atomically write to multiple partitions and commit consumer offsets together. EOS is especially valuable in stream processing pipelines built with Kafka Streams.

47. What is an idempotent producer?

An idempotent producer guarantees that retries do not result in duplicate records being written to a partition. Kafka assigns each producer a unique ID and a sequence number per partition so brokers can detect and discard duplicate sends. Enabling enable.idempotence=true provides exactly-once delivery at the partition level without changing application code significantly.

48. What are Kafka transactions?

Kafka transactions allow a producer to write to multiple topics and partitions atomically, so either all writes succeed or none are visible to consumers reading with the read_committed isolation level. Transactions also let producers commit consumer offsets as part of the same atomic unit, enabling consume-transform-produce patterns. They are the foundation for exactly-once processing across an entire pipeline.

Kafka Streams & Ecosystem

49. What is Kafka Streams?

Kafka Streams is a client library for building real-time stream processing applications and microservices using data stored in Kafka. It lets developers perform operations like filtering, mapping, joining, aggregating, and windowing directly within their Java or Scala applications without a separate processing cluster. It supports stateful processing with fault-tolerant local state stores and integrates exactly-once semantics natively.

50. What is Kafka Connect?

Kafka Connect is a framework and runtime for reliably streaming data between Kafka and external systems such as databases, object stores, and search indexes. It uses reusable connectors, where source connectors import data into Kafka and sink connectors export data out, configured declaratively without writing custom code. Connect scales across workers, handles offset tracking, and is the standard way to build integration pipelines around Kafka.