Kafka Interview Question
- What is Kafka? Open-source message broker which
is Scalability, Durability, Stream Processing, Zero Data lose, High
Throughput, Fault tolerant etc.
- What are the main components in Kafka
- Topic - Stream of messages belonging to the same type
- Producer - Publish messages to a topic
- Brokers - Server/s where the publishes messages are stored
- Consumer - subscribes to various topics and pulls data from the
- What are the advantages of Kafka?
- High-throughput - Without any large hardware and can handle high
volume data in high-velocity
- Low Latency- Large no messages can handle in ms
- Fault-Tolerant -
- Scalability -
- Durability(messages are never lost)
- Scalability(without down on the fly can ad additional nodes)
- Apache Kafka Vs Apache Spark
|Traditional queuing done and deletes the messages just after
processing completion means from the end of the queue.
||Messages persist even after being processed
|Not a Event base.
||Completely work on even base.
|Spark streaming is standalone framework.
||Kafka stream can be used as part of microservice as it's just
|Data streams is Divided into Micro-batched for processing.
||Real-time processes as per data stream
|Separated processing Cluster is requried
||Separated processing cluster is NOT requried.
|Needs re-configuration for Scaling
||Scales easily by just adding java processes, No
|At least one semantics
||Exactly one semantics
|Spark streaming is better at processing group of
rows(groups,by,ml,window functions etc.)
||Kafka streams provides true a-record-at-a-time processing
capabilities. it's better for functions like rows parsing, data
- What is Stream Processing? Continuous real-time
flow of records and processing these records in similar timeframe is
- What is the traditional method of message transfer?
Queuing - pool of consumers may read message from the server and each
message goes to one of them.
Publish-Subscribe - Messages are broadcasted to all consumers.
- What is the maximum size of the message does Kafka
server can receive? 1 MB.
- What is Zookeeper in Kafka? Zookeeper is an open
source, high-performance co-ordination service used for distributed
applications which is adapted by Kafka.Once the Zookeeper is down, it
cannot serve client request. Zookeeper work is leader detection,
distributed synchronization, configuration management, identifies
when a new node leaves or joins, the cluster, node status in real
- Can we use Kafka without Zookeeper? No
- How message is consumed by consumer in Kafka?
- How you can improve the throughput of a remote
consumer? Tune the socket buffer size
- How you can get exactly once messaging from Kafka?
Avail a single writer per partition, every time you get a network
error checks the last message in that partition to see if your last
Add primary key (UUID) and de-duplicate on the consumer.
- Role of the offset. Messages contained in the
partitions are assigned a unique ID number that is called the offset.
The role of the offset is to uniquely identify every message within
- Leader and Follower in Kafka Every partition in
Kafka has one server which plays the role of a Leader. Other servers
that act as Followers. The Leader performs the task of all read and
write requests for the partition, while the role of the Followers is
to passively replicate the leader. In case Leader failing then one of
the Followers will take on the role of the Leader. This ensures load
balancing of the server.
- What is Partitioning Key? Partitioning Key is to
indicate the destination partition of the message. By default, a
hashing-based Partitioner is used to determine the partition ID given
- How do message brokers and API gateways Kafka work in
microservices? Kafka has ability to use publish / subscribers
and wildcard-based subscriptions is an ideal way to inform other
microservices due to "events" that happen in kafka stream/pipeline
without having to wait for a reply. e.g. if a new customer is added,
you can add a message to a topic so that various microservices who
are interested in customer activity can consume/do the activity. The
microservice that publishes the message is not concerned about the
subscribers to the message.
The ability to have durable consumers in case a service goes
off-line for a particular amount of time.
When you are communicating between microservices in a
request/response "massage" queues can be useful. e.g. you might want
to implement some sort of priority queueing so that important
messages are seen first. You might want to conflate adjacent messages
in the queue so that a microservice-based consumer does not have to
do much work. This kind of "massaging" is difficult to do with a
REST-based API unless you are using an API gateway.
- Can we use kafka as api gateway in microservices?
Yes you can use kafka as API gateway.
- What is the process for starting a Kafka server?
- What can you do with Kafka? - You can transmit
data between two systems
- Can build a real-time stream/streaming platform of data
- We can use kafka as API gateway in microservices -
- ________#####_______Smart questions______#####_____
- What is the purpose of retention period in Kafka
cluster?The Kafka cluster durably persists all published
records whether or not they have been consumed. using a configurable
retention period. KIP-186 increases the default offset retention time
from 1 day to 7 days. these are the property used.
log.retention.bytes --> allowed size of the topic (default 7
log.retention.hours --> message is stored on a topic before it
discards old log segments to free up space (default - 1 GB)
- How to start the Zookeeper and kafka server.
terminal zookeeper ==> bin/zookeeper-server-start.sh config/zookeeper.properties
terminal kafka server ==> bin/kafka-server-start.sh config/server.properties
What is Apache Flume
Apache Flume is a
distributed, reliable, and available software for efficiently
collecting, aggregating, and moving large amounts of log data.
What is ISR in Kafka?
maintains a set of in-sync replicas (ISR) that are caught-up to the
leader. Only members of this set are eligible for election as
leader.A write to a Kafka partition is not considered committed until
all in-sync replicas have received the write. This ISR set is
persisted to ZooKeeper whenever it changes. Because of this, any
replica in the ISR is eligible to be elected leader. This is an
important factor for Kafka's usage model where there are many
partitions and ensuring leadership balance is important. With this
ISR model and F+1 replicas, a Kafka topic can tolerate F failures
without losing committed messages.
What is Log Compaction?
Log compaction ensures
that Kafka will always retain at least last known value for each
message key within the log of data for a single topic partition. It
addresses use cases and scenarios such as restoring state after
application crashes/failure / reloading caches after application
restarts during operational maintenance.
What is Log Anatomy?
Another way to view a
partition. Basically, a data source writes messages to the log.
Further, one or more consumers read that data from the log at any
time they want.
Disadvantages of Apache Kafka
- Very Less
- Not support wildcard support like other message broker
What is Topic Replication Factor?