Kafka Interview Questions and Answers from Beginner to Advanced
Apache Kafka is one of the most common topics in backend engineering, distributed systems and platform architecture interviews. Kafka interview performance usually depends less on memorizing definitions and more on whether you can explain how Kafka actually behaves under production load.
In real interviews, simply saying Kafka is a message broker is usually not enough.
A strong Kafka answer usually does three things:
- explains the concept precisely.
- shows you understand the trade-offs.
- connects theory to how Kafka behaves in production.
That third point matters most. Interviewers often want to know whether you understand what happens when traffic grows, brokers fail, consumers lag, partitions rebalance or message correctness becomes more important than raw throughput. This guide focuses on exactly those three things.
1. What is Kafka?
Apache Kafka is a distributed event streaming platform used for publishing, storing and consuming streams of records called events. Kafka is designed for systems that continuously generate data and need to move that data reliably between services, applications or data platforms. According to the official Kafka documentation, it combines publishing, durable storage and real-time consumption of event streams.
What is an event?
An event represents the fact that something happened in a system. Common examples include:
- user signup
- order created
- payment processed
- page viewed
- inventory updated
In production systems, these events are generated continuously by applications, APIs, databases, devices and user interactions.
Where did Kafka come from?
Kafka was originally developed at LinkedIn to solve large-scale real-time data movement problems. It later became part of the Apache Software Foundation and evolved into one of the most widely used platforms for event-driven architectures.
A strong interview definition
Kafka is a distributed, partitioned, durable, commit log optimized for high-throughput event ingestion, storage and asynchronous consumption. Kafka is widely described as a distributed commit log where partitions are ordered append-only logs.
Why this is a strong answer
This definition is strong because it describes how Kafka is built, not just what it does.
- distributed -> Kafka runs across multiple servers
- partitioned -> data is split across partitions for scalability
- durable -> records are persisted to disk and retained
- commit log -> new records are appended sequentially
- high-throughput -> designed to handle very large event volumes efficiently
- asynchronous consumption -> producers and consumers operate independently
That is more accurate than simply calling Kafka a message queue.
A message queue usually focuses on delivering and removing messages.
Kafka behaves differently.
Kafka stores records for a configured retention period even after consumers read them, which allows replay, independent consumers and historical
reprocessing.
Interview-ready short answer
Kafka is a distributed event streaming platform that stores events durably in partitioned logs so multiple consumers can process or replay the same data independently.
2. Is Kafka a message queue?
Not exactly. Kafka can behave like a message queue in some use cases, but its core model is different. A traditional message queue usually removes a message after it has been successfully consumed. Kafka does not do that. Kafka stores records in a durable log and keeps them for a configured retention period, regardless of whether consumers have already read them.
That means consuming a message does not delete it. Instead, each consumer maintains its own reading position using offsets. Because of this, multiple independent consumers can read the same data at different times without affecting each other.
Interview-ready answer:
Kafka is not a traditional message queue. It is a distributed event streaming platform built around a durable log. Unlike queues that usually remove messages after consumption, Kafka retains records for a configured period, allowing multiple consumers to read the same data independently.
3. What are the main Kafka components?
The main Kafka components are producers, topics, partitions, brokers, consumers, consumer groups and offsets. Producers write events, brokers store them inside topic partitions and consumers read them by tracking offsets. Together, these components enable scalable, durable, and distributed event streaming.
4. What is Producer in Kafka?
A producer in Kafka is the client application that publishes events (records) to Kafka topics.
In simple terms, the producer is the component that sends data into Kafka. For example, when a user places an order, the order service may produce an event like OrderCreated. Kafka receives that event and stores it in the appropriate topic partition.
A producer does not necessarily send data to a topic as a whole. In Kafka, records are written to partitions inside a topic. Depending on configuration, the producer may decide which partition receives the record. This can happen through explicit partition selection, key-based routing or automatic distribution. That matters because partition selection affects ordering and load distribution.
Interview-ready answer
A Kafka producer is a client application that writes events to Kafka topics. It publishes records to topic partitions, making it the entry point through which data enters the Kafka system.
5. What is Topic in Kafka?
A topic in Kafka is a named logical stream of records (events). You can think of it as a category or channel where related events are stored. For example:
- orders
- payments
- user-signups
Producers publish records to a topic and consumers read records from that topic. Kafka defines topics as the core abstraction for organizing streams of records. A topic itself is a logical concept, not a single physical file.
Internally, Kafka stores a topic as one or more partitions. Each partition is an ordered sequence of records and new records are continuously appended to the end.
An important interview point is this: A topic can have multiple producers and multiple consumers.
That means several applications can write to the same topic and multiple independent consumers can read from it. Kafka also retains records in the topic for a configurable period, even after they are consumed. This allows replay and independent consumption.
Interview-ready answer
A topic in Kafka is a named stream of related events to which producers publish records and from which consumers read records. It is the logical way Kafka organizes data, while the actual storage happens through partitions
6. What are partitions in Kafka?
A partition in Kafka is a subdivision of a topic.
Instead of storing all records of a topic in one place, Kafka splits the topic into multiple partitions. Each partition stores a subset of the topic's records. This is how Kafka achieves scalability and parallel processing.
Each partition is an ordered, append-only log. That means when new records arrive, Kafka writes them to the end of the partition. Records inside a partition are stored in the order they are written.
Why are partitions important?
Partitions are important because they provide scalability and parallelism. A single machine has limits in storage, network and processing capacity. By splitting a topic into multiple partitions, Kafka can distribute data across multiple brokers and allow multiple consumers to process data in parallel.
Ordering in partitions
Kafka guarantees ordering only within a partition. For example, if a partition receives: order-1, order-2, order-3
Consumers will read them in the same order. But if records go to different partitions, Kafka does not guarantee global ordering across the entire topic.
Offsets inside partitions
Every record inside a partition gets an offset. An offset is simply the position of that record within that partition. Offsets are unique within a partition, not across the whole topic.
Interview-ready answer
A partition in Kafka is a smaller ordered unit of a topic that stores a subset of records. Partitions allow Kafka to scale horizontally, distribute data across brokers and enable parallel consumption, while preserving ordering within each partition.
7. What is a broker in Kafka?
A broker in Kafka is a Kafka server. It is the component that stores data and handles requests from producers and consumers.
When a producer sends a record, the broker receives it and stores it in the appropriate topic partition. When a consumer reads data, it fetches that data from the broker.
A Kafka cluster usually contains multiple brokers working together. Each broker stores some partitions of topics rather than the entire topic. This allows Kafka to distribute data across multiple servers for scalability and fault tolerance.
An important interview point is this: A broker stores partitions, not whole topics.
Interview-ready answer
A broker in Kafka is a server that stores topic partitions and handles read and write requests from producers and consumers. Multiple brokers form a Kafka cluster, allowing Kafka to scale horizontally and provide fault tolerance.
8. What is a consumer in Kafka?
A consumer in Kafka is the client application that reads records from Kafka topics.
In simple terms, the consumer is the component that takes data out of Kafka. For example, if an order service publishes an OrderCreated event, a billing service or analytics service can consume that event and process it.
An important interview point is this: Consumers do not remove records from Kafka. Kafka retains the records for a configured retention period and the consumer simply reads them. Kafka consumers read records from topic partitions and track their own position in the log.
Each consumer keeps track of its progress using offsets. An offset tells the consumer which record it has already read and where it should continue from next. Because of this design, multiple independent consumers can read the same data without affecting each other.
Interview-ready answer
A Kafka consumer is a client application that reads records from Kafka topics. It does not delete data after reading. Instead, it tracks its own position using offsets, which allows independent consumption and replay of events.
9. What is a consumer group in Kafka?
A consumer group in Kafka is a set of consumers that work together to read data from a topic. Instead of every consumer reading all records, Kafka divides the topic's partitions among the consumers in the group. This allows records to be processed in parallel. Kafka's consumer model uses groups to coordinate partition ownership and scalable consumption.
An important interview rule is: Within a single consumer group, one partition can be assigned to only one consumer at a time. This preserves ordering within that partition.
Example
Suppose a topic has 4 partitions and a consumer group has 2 consumers. Kafka may assign:
- Consumer A -> partitions 0 and 1
- Consumer B -> partitions 2 and 3
If the group has more consumers than partitions, some consumers will remain idle.
Why consumer groups matter
Consumer groups provide horizontal scalability. As partitions increase, more consumers can process data in parallel. They also allow independent processing. For example, a billing service and an analytics service can belong to different consumer groups and both consume the same topic without affecting each other.
Interview-ready answer
A consumer group in Kafka is a set of consumers that cooperate to read records from a topic. Kafka distributes partitions among consumers in the group and each partition is assigned to only one consumer at a time. This enables parallel processing while preserving ordering within each partition.
10. What are offsets in Kafka?
An offset in Kafka is the position of a record inside a partition.
Every record written to a partition gets a sequential number called an offset. For example, if records are written to a partition, their offsets may be: 0, 1, 2, 3 .....
This number tells Kafka exactly where a record exists within that partition's log.
An important interview point is: Offsets are unique within a partition, not across the entire topic. That means two different partitions can both have records with offset 0.
Consumers use offsets to track their reading progress. When a consumer reads records, it remembers the last processed offset so that it knows where to continue reading next. If the consumer restarts or fails, it can resume from the last committed offset instead of reading everything again.
Interview-ready answer
An offset in Kafka is the sequential position of a record within a partition. Consumers use offsets to track which records have already been read and where they should continue consuming from next.
11. Why does Kafka use partitions?
Kafka uses partitions to achieve scalability, parallelism and high throughput.
If a topic stored all its records in one place, a single server would eventually become a bottleneck for storage, network and processing. Partitions solve that problem by splitting a topic into smaller pieces. Each partition stores a subset of the topic's records and those partitions can be distributed across multiple brokers. Because of this, Kafka can scale horizontally by adding more brokers instead of relying on one large machine.
Partitions also allow parallel consumption. Different consumers in a consumer group can read different partitions at the same time, which increases processing capacity. Kafka uses partitions as the basic unit of parallelism and distribution.
Another important reason is ordering. Kafka guarantees ordering within a partition. Records written to the same partition are read in the same order.
Interview-ready answer
Kafka uses partitions to split a topic into smaller ordered units so data can be distributed across multiple brokers and consumed in parallel. This allows Kafka to scale horizontally, improve throughput and preserve ordering within each partition.
12. Does Kafka guarantee ordering?
Yes — but only within a partition.
Kafka guarantees that records written to the same partition are stored and read in the same order in which they were produced. If record M1 is written before M2 to the same partition, M1 will have a lower offset and consumers will read it first.
An important interview point is: Kafka does not guarantee global ordering across multiple partitions. If a topic has several partitions, records may be distributed across different partitions and processed independently. In that case, Kafka preserves order inside each partition, but not across the entire topic.
Example
Suppose a topic has two partitions.
- Partition 0 -> order-1, order-3
- Partition 1 -> order-2, order-4
Kafka guarantees the order inside each partition, but it does not guarantee that order-2 will always be processed before order-3.
How to preserve ordering for related events
If ordering matters for related events, those events should be sent to the same partition. A common way is to use a message key such as customerId or orderId, so all related records are routed to the same partition.
Interview-ready answer
Kafka guarantees ordering only at the partition level. Records written to the same partition are consumed in the same order they were produced, but Kafka does not guarantee ordering across multiple partitions
13. How does Kafka know what a consumer has read?
Kafka knows what a consumer has read by using offsets. Every record inside a partition has a sequential number called an offset. As a consumer reads records, it keeps track of the latest offset it has processed. That offset represents the consumer's current position in the partition.
An important interview point is this: Kafka does not track read messages directly. It tracks the consumer's position.
For example, if a consumer has processed records up to offset 25, Kafka understands that the next record to read is offset 26.
Consumers usually commit offsets after processing records. Kafka stores committed offsets so that if the consumer restarts or crashes, it can resume from the last committed position instead of starting from the beginning. Kafka stores consumer group offsets in the internal __consumer_offsets topic.
Interview-ready answer
Kafka knows what a consumer has read by tracking offsets. A consumer records its current position in each partition and Kafka uses the last committed offset to determine where consumption should continue.
14. What happens if a consumer crashes before committing?
If a consumer crashes before committing its offset, Kafka assumes that the last uncommitted records were not fully processed. When the consumer restarts or another consumer in the same group takes over the consumer resumes reading from the last committed offset, not from the point where the crash happened. Kafka consumer groups resume from the last committed offset after failure.
That means some records may be read again and processed again.
Example
Suppose a consumer processes records up to offset 25, but only offset 20 was committed before the crash. After restart, Kafka will begin reading from offset 21. So records 21 to 25 may be processed again. This is why Kafka commonly provides at-least-once delivery.
An important interview point is: If a crash happens before offset commit, Kafka may re-deliver already processed records, which can lead to duplicate processing.
Interview-ready answer
If a Kafka consumer crashes before committing its offset, Kafka restarts consumption from the last committed offset. As a result, records processed after that committed offset may be read again, which can cause duplicate processing.
15. What are Kafka delivery semantics?
Kafka delivery semantics describe how reliably a message is delivered and processed in the presence of failures.
In simple terms, they answer an important question: If a producer sends a message or a consumer crashes, will the message be lost, delivered again or processed exactly once? Kafka defines delivery guarantees between producers, brokers and consumers. Kafka provides three delivery semantics.
1. At-most-once
A message is delivered zero or one time. If a failure happens, the message may be lost, but it will not be delivered again. This usually happens when offsets are committed before processing completes.
2. At-least-once
A message is delivered one or more times.
Kafka avoids message loss, but duplicates may happen. For example, if a consumer processes a record but crashes before committing the offset, Kafka may deliver that record again after restart. This is the most common delivery model in production.
3. Exactly-once
A message is processed exactly one time. No message is lost and no duplicate processing occurs.
This requires additional mechanisms such as idempotent producers, transactions and proper consumer configuration. Exactly-once is not the default behavior in Kafka.
Interview-ready answer
Kafka delivery semantics define the guarantees Kafka provides about how many times a message may be delivered or processed. Kafka supports three delivery semantics: at-most-once, at-least-once and exactly-once. They determine whether messages may be lost, duplicated or processed exactly once when failures occur.
16. What is replication in Kafka?
Replication in Kafka means keeping multiple copies of a partition on different brokers.
Kafka uses replication to provide fault tolerance and data availability. If one broker fails, another broker that has a replica of that partition can continue serving data. Kafka replicates partitions across multiple servers using a configurable replication factor.
The number of copies is controlled by the replication factor. For example, if a topic has a replication factor of 3, Kafka keeps three copies of each partition across different brokers.
An important interview point is: Replication happens at the partition level, not at the topic level.
Each partition has:
- one leader replica
- one or more follower replicas
Producers write to the leader and followers continuously copy data from the leader. If the leader broker fails, Kafka can elect one of the follower replicas as the new leader.
Interview-ready answer
Replication in Kafka is the process of maintaining multiple copies of each partition across different brokers. It improves fault tolerance and availability by allowing Kafka to continue serving data even if a broker fails.
17. What is ISR?
ISR stands for In-Sync Replicas.
In Kafka, ISR is the set of replicas of a partition that are fully caught up with the leader and considered up-to-date. This set always includes the leader replica itself.
A partition may have multiple replicas, but not every replica is always part of the ISR. If a follower replica falls behind the leader by too much or stops syncing, Kafka removes it from the ISR. When it catches up again, it can rejoin the ISR.
Why is ISR important?
ISR matters because only replicas in the ISR are considered safe copies of the data. If the leader fails, Kafka normally elects a new leader from the ISR members. This helps reduce the risk of data loss.
Example
Suppose a partition has replication factor 3.
- Broker 1 -> leader
- Broker 2 -> follower
- Broker 3 -> follower
If all replicas are fully synchronized:
- ISR = {1, 2, 3}
If Broker 3 starts lagging behind:
- ISR = {1, 2}
Interview-ready answer
ISR (In-Sync Replicas) is the set of partition replicas that are currently synchronized with the leader. Only ISR members are considered up-to-date and are normally eligible to become the new leader if the current leader fails.
18. What does acks mean in Kafka?
acks stands for acknowledgments. It is a producer configuration that defines how many broker acknowledgments the producer waits for before considering a write successful. In simple terms, it controls the trade-off between durability and latency.
Kafka supports three common values.
1. acks=0
The producer does not wait for any acknowledgment. It sends the record and immediately continues. This gives the lowest latency, but messages can be lost if the broker fails before storing the record.
2. acks=1
The producer waits for acknowledgment from the leader broker. The leader confirms once it has written the record locally. This gives a balance between performance and reliability, but data can still be lost if the leader fails before followers replicate the record.
3. acks=all (or acks=-1)
The producer waits until all in-sync replicas (ISR) acknowledge the write. This provides the strongest durability, but it usually adds more latency.
Interview-ready answer
In Kafka, acks is a producer setting that determines how many broker acknowledgments are required before a message is considered successfully written.
It controls the trade-off between write performance and durability.
19. What is an idempotent producer?
An idempotent producer in Kafka is a producer that prevents duplicate records when a message is retried.
Normally, if a producer sends a record but does not receive an acknowledgment because of a temporary network failure, it may retry sending the same record. Without idempotence, that retry can create duplicates. With idempotence enabled, Kafka can detect that the retried record is the same one and avoid writing it again.
Kafka does this by assigning the producer a producer ID and using sequence numbers for records sent to each partition. The broker uses these sequence numbers to detect duplicate retries.
An important interview point is: Idempotent producers protect against duplicates caused by producer retries within the same producer session. It does not mean Kafka automatically deduplicates all duplicate messages created by application logic or across producer restarts.
Interview-ready answer
An idempotent producer in Kafka is a producer that ensures retried messages are not written more than once. It uses producer IDs and sequence numbers so the broker can detect and ignore duplicate retries.
20. How does consumer group rebalancing work?
Consumer group rebalancing is the process Kafka uses to redistribute partitions among consumers in the same consumer group. A rebalance happens when the membership or assignment of the group changes. Common triggers are:
- a new consumer joins the group
- a consumer leaves the group
- a consumer crashes
- partitions are added to the topic
Kafka then recalculates which consumer should read which partitions.
How does it work?
During rebalancing, Kafka temporarily pauses normal consumption for the affected group. The group coordinator collects the active consumers, looks at the available partitions and creates a new partition assignment. Each consumer then releases its old assignments if necessary and starts consuming from the newly assigned partitions.
Example
Suppose a topic has 4 partitions and initially there are 2 consumers.
- Consumer A -> partitions 0 and 1
- Consumer B -> partitions 2 and 3
Now a third consumer joins. Kafka triggers a rebalance and may assign:
- Consumer A -> partitions 0 and 1
- Consumer B -> partition 2
- Consumer C -> partition 3
The goal is to distribute work across the available consumers.
Why is it important?
Rebalancing provides scalability and fault tolerance.
If one consumer fails, another consumer in the group can take over its partitions.
An important interview point is : During rebalance, partition ownership changes - not the data itself.
Interview-ready answer
Consumer group rebalancing is the process by which Kafka redistributes topic partitions among consumers in the same group when group membership changes. It ensures partitions remain assigned, balanced and available even when consumers join, leave or fail.
21. What causes consumer lag?
Consumer lag happens when producers write records faster than consumers can process them.
In simple terms, consumers start falling behind the latest records in the topic. Consumer lag is measured as the difference between the latest offset in a partition and the last committed offset of the consumer group. The most common causes are:
- slow message processing : the consumer takes too long to process each record
- traffic spikes : producers suddenly publish data faster than usual
- insufficient consumer capacity : there are not enough consumers or resources to keep up
- slow downstream systems : database writes, API calls or external services slow processing
- partition skew (hot partitions) : some partitions receive much more traffic than others
- consumer group rebalancing : partition reassignment temporarily pauses consumption
An important interview point is: Consumer lag does not always mean Kafka is slow. Very often the bottleneck is the consumer application or downstream dependencies.
Interview-ready answer
Consumer lag occurs when records are produced faster than consumers can process them. Common causes include slow processing logic, traffic spikes, insufficient consumer capacity, hot partitions, slow downstream systems and consumer group rebalancing.
22. What happens if a broker fails?
If a broker fails, Kafka checks whether that broker was the leader for any partitions. If it was, Kafka elects a new leader from the in-sync replicas (ISR) of those partitions. The new leader then starts handling reads and writes for that partition. Kafka normally selects new leaders from ISR members to preserve committed data. Producers and consumers automatically refresh metadata and reconnect to the new leader.
If replication is configured properly, the system usually continues operating with minimal disruption.
The quality of failover mainly depends on:
- replication factor : how many copies of the partition exist
- ISR health : whether follower replicas are sufficiently caught up
- acknowledgment settings (acks): how strongly writes were acknowledged before failure
An important interview point is: Kafka does not create new data during failover. It simply promotes an existing in-sync replica to leader. If no in-sync replica is available, the partition may become temporarily unavailable until a valid replica comes back.
Interview-ready answer
If a Kafka broker fails, Kafka elects a new leader from the in-sync replicas of affected partitions. Producers and consumers reconnect to the new leader. If replication and ISR are healthy, the system continues operating with minimal interruption.
23. What is retention in Kafka?
Retention in Kafka is the rule that defines how long Kafka keeps records in a topic before they become eligible for deletion.
Kafka retains records according to configured limits rather than deleting them immediately after they are consumed. Retention policies control how long data remains available in the log. Retention is commonly based on:
- time : keep records for a configured duration
- total log size : keep records until the topic reaches a configured size limit
An important interview point is: Retention is independent of whether consumers have consumed the records.
Kafka does not check whether all consumers have read a record before deleting it. It simply applies the configured retention policy. That means a slow consumer may miss older records if they are removed because the retention limit was reached. This is one of Kafka's most distinctive design behaviors.
Interview-ready answer
Retention in Kafka is the policy that determines how long records are kept in a topic before they are deleted. It is usually based on time or log size and it works independently of whether consumers have already read the records.
24. What is log compaction?
Log compaction is a Kafka retention mechanism that keeps the latest value for each message key instead of keeping every historical record forever.
For example:
- user1 -> active
- user1 -> suspended
- user1 -> deleted
After compaction, Kafka may retain only:
- user1 -> deleted
Kafka's log compaction guarantees that the log retains at least the latest value for each key.
An important interview point is: Log compaction works per key, not per topic. That means records with different keys are compacted independently.
Where is log compaction useful?
Log compaction is commonly used for:
- materialized views : keeping the latest state of an entity
- state recovery : rebuilding application state after restart
- changelog topics : maintaining state changes over time
Important nuance
- Compaction does not mean only one record exists immediately.
- Kafka performs compaction as a background cleanup process.
- Until compaction runs, older records may still remain in the log.
That nuance matters in interviews because many candidates assume Kafka removes older records instantly.
Interview-ready answer
Log compaction in Kafka is a retention mechanism that keeps the latest record for each key while older records with the same key become eligible for background cleanup. It is commonly used for state recovery, materialized views and changelog topics.
25. What is the difference between partition key and offset?
A partition key and an offset serve completely different purposes in Kafka.
Partition key
A partition key is provided by the producer. Kafka uses it to decide which partition a record should be written to. For example, if the key is customerId, all records with the same key usually go to the same partition. This helps preserve ordering for related events. Kafka producers use keys to determine partition placement.
Offset
An offset is assigned by Kafka after the record is appended to a partition. It represents the position of that record inside that partition. Consumers use offsets to know which records have already been read and where they should continue reading from next.
An important interview point is: The partition key helps decide where a record goes. The offset tells where that record is stored.
Many candidates confuse these because both relate to partitions, but they operate at different stages.
Interview-ready answer
A partition key is provided by the producer to determine which partition should receive a record. An offset is assigned by Kafka after the record is written and consumers use it to track their reading position within that partition
26. When should Kafka not be used?
Kafka is powerful, but it is not the right tool for every problem.
1. When you need synchronous request–response
Kafka is designed for asynchronous communication. A producer writes an event and continues. It does not naturally wait for an immediate response from the consumer.
If your use case requires an instant request and immediate reply such as login validation, fetching account details or synchronous service calls a REST or RPC model usually fits better.
2. When you need immediate transactional database consistency
Kafka is built for event-driven asynchronous workflows. That means updates propagate through consumers after the event is published. If your requirement is immediate transactional consistency where multiple operations must succeed or fail together in one transaction. Kafka is usually not the primary tool.
3. When the workload is small and operational complexity is not justified
Kafka introduces operational overhead. Running Kafka usually means managing: brokers, partitions, replication, monitoring, consumer groups. For very small workloads, simple background jobs or low-volume internal communication, that operational cost may not be worth it.
An important interview point is: A good engineer knows where not to use a tool.
Interview-ready answer
Kafka should not be used when you need synchronous request–response, immediate transactional consistency or when the workload is too small to justify Kafka's operational complexity. It is most valuable for asynchronous, high-throughput event-driven systems.
27. Why can Kafka scale horizontally?
Kafka can scale horizontally because it stores data in partitions and distributes those partitions across multiple brokers. Instead of keeping all records of a topic on one server, Kafka splits the topic into smaller partitions. Each partition can live on a different broker in the cluster. Kafka uses partitions as the basic unit of data distribution and scalability.
As traffic grows, Kafka can add more brokers and spread partitions across them. This means storage, network traffic and processing load are shared across multiple machines rather than limited by a single server. Partitions also allow parallel consumption.
Different consumers in the same consumer group can read different partitions at the same time, which increases throughput.
An important interview point is: Kafka scales by adding more brokers and distributing partitions not by making one broker bigger.
Interview-ready answer
Kafka scales horizontally because topics are divided into partitions that can be distributed across multiple brokers. As more brokers are added, Kafka can spread storage and processing across machines, which increases throughput and scalability.
28. Why can multiple consumers read the same data?
Multiple consumers can read the same data because Kafka keeps records after they are consumed.
When a consumer reads a record, Kafka does not remove it from the topic. The record remains available until the retention policy deletes it. Kafka stores data independently of consumption. Each consumer keeps track of its own reading position using offsets. That means one consumer reading a record does not affect another consumer's ability to read the same record later.
An important interview point is: Consumers do not share a single read position. Each consumer group maintains its own offsets.
Example
An OrderCreated event can be consumed by:
- billing service
- analytics service
- fraud detection service
- warehouse service
Each of these can read the same event independently because each has its own consumer group and its own offsets. Within a consumer group, partitions are shared, across different groups, the same records can be read independently.
Interview-ready answer
Multiple consumers can read the same data in Kafka because records remain stored after consumption and each consumer group tracks its own offsets. This allows independent consumers to read the same records at different times without affecting one another
29. Can two consumers in the same group consume the same partition simultaneously?
No, Within a single consumer group, Kafka assigns one partition to only one consumer at a time.
This means two consumers in the same group cannot read the same partition simultaneously. Kafka uses this rule to preserve ordering within a partition
and coordinate parallel consumption.
For example, if a topic has 4 partitions and a consumer group has 2 consumers, Kafka distributes the partitions between them. Each partition belongs to
only one consumer in that group.
An important interview point is: The restriction applies only within the same consumer group. Different consumer groups can consume the same partition independently because each group maintains its own offsets.
Interview-ready answer
No. In Kafka, only one consumer within a consumer group can consume a given partition at a time. However, different consumer groups can consume the same partition independently
30. If a topic has 8 partitions, how many active consumers can fully utilize it?
At most 8 consumers within one consumer group.
In Kafka, a partition can be assigned to only one consumer at a time within the same consumer group. That means the maximum parallelism of a consumer group is limited by the number of partitions. So if a topic has 8 partitions, up to 8 consumers can actively consume in parallel.
If you add more than 8 consumers, the extra consumers will remain idle because there are no unassigned partitions left.
An important interview point is: The number of partitions defines the maximum number of active consumers in a consumer group.
Interview-ready answer
If a Kafka topic has 8 partitions, at most 8 consumers in a single consumer group can actively consume from it. Any additional consumers will remain idle because Kafka assigns only one consumer per partition within a group
31. If ordering is critical, what should you do?
If ordering is critical, use a stable partition key so that related events always go to the same partition.
Kafka guarantees ordering within a partition, not across multiple partitions. If related records are sent to different partitions, they may be processed in parallel and their overall order is not guaranteed. Kafka preserves order only at the partition level.
For example, if all events for the same order use orderId as the key, Kafka will consistently route those events to the same partition. That means events such as:
- OrderCreated
- PaymentProcessed
- OrderShipped
will be read in the same sequence in which they were written.
An important interview point is: If ordering matters, the key must remain stable. Changing the key can send related events to different partitions and break ordering.
Interview-ready answer
If ordering is critical, use a stable partition key so related events are routed to the same partition. Kafka guarantees ordering within a partition, so keeping related records together preserves their processing order
32. What is the biggest misconception about Kafka?
The biggest misconception is that Kafka guarantees global ordering. It does not. Kafka guarantees ordering only within a partition. Records written to the same partition are stored and read in the order they were produced. Each record gets an increasing offset and consumers read that partition sequentially. Kafka documents ordering at the partition level, not across the whole topic. If a topic has multiple partitions, records may be distributed across different partitions and processed independently.
That means there is no global ordering across the entire topic. This is one of the most common interview mistakes because many candidates assume ordering applies to the whole topic.
Interview-ready answer
The biggest misconception about Kafka is that it guarantees global ordering. In reality, Kafka guarantees ordering only within a partition, not across multiple partitions.
33. How would you design an order event pipeline using Kafka?
I would create an orders topic and partition it by orderId.
Using orderId as the partition key ensures that all events for the same order such as OrderCreated, PaymentProcessed and OrderShipped always go to the same partition. Since Kafka guarantees ordering within a partition, the lifecycle of a single order remains ordered. Partitioning by entity key is a common Kafka design pattern for ordered event streams.
On the producer side, I would enable idempotence and use acks=all.
Idempotence helps prevent duplicate writes caused by retries, while acks=all makes the producer wait for acknowledgment from all in-sync replicas before considering the write successful. This improves durability and reduces the chance of data loss during failures.
Consumers would run in a consumer group so partitions can be processed in parallel.
Kafka would distribute partitions across consumers, allowing multiple orders to be processed concurrently while still preserving ordering for events that belong to the same orderId.
I would commit offsets only after successful processing.
That gives at-least-once delivery semantics. If a consumer crashes after processing but before committing, Kafka will re-deliver some records after restart. Because duplicates are possible, downstream consumers should be idempotent.
For example, if PaymentProcessed(orderId=123) is received twice, the payment service should detect that the order has already been processed and avoid applying the same business action twice.
For reliability, I would configure replication for the topic.
Each partition should have follower replicas on other brokers. If the leader broker fails, Kafka can elect a new leader from the in-sync replicas so the pipeline continues operating without losing committed data.
Interview-ready answer
I would create an orders topic partitioned by orderId so all events for the same order remain ordered. Producers would use idempotence and acks=all for durable writes. Consumers would run in a consumer group for parallel processing. Offsets would be committed after successful processing to achieve at-least-once delivery. Since retries can cause duplicates, downstream handlers should be idempotent. Replication should be configured so broker failures do not cause data loss.
