Kafka Interview Question
- What is Kafka?
Open-source message broker which
is Scalability, Durability, Stream Processing, Zero Data lose, High
Throughput, Fault tolerant etc.
- What are the main components in Kafka
- Topic - Stream of messages belonging to the same type
- Producer - Publish messages to a topic
- Brokers - Server/s where the publishes messages are stored
- Consumer - subscribes to various topics and pulls data from the
brokers.
- What are the advantages of Kafka?
- High-throughput - Without any large hardware and can handle high
volume data in high-velocity
- Low Latency- Large no messages can handle in ms
- Fault-Tolerant -
- Scalability -
- Durability(messages are never lost)
- Scalability(without down on the fly can ad additional nodes)
- Apache Kafka Vs Apache Spark
Apache Spark |
Apache Kafka |
Traditional queuing done and deletes the messages just after
processing completion means from the end of the queue. |
Messages persist even after being processed |
Not a Event base. |
Completely work on even base. |
Spark streaming is standalone framework. |
Kafka stream can be used as part of microservice as it's just
a library. |
Data streams is Divided into Micro-batched for processing. |
Real-time processes as per data stream |
Separated processing Cluster is requried |
Separated processing cluster is NOT requried. |
Needs re-configuration for Scaling |
Scales easily by just adding java processes, No
reconfiguration requried. |
At least one semantics |
Exactly one semantics |
Spark streaming is better at processing group of
rows(groups,by,ml,window functions etc.) |
Kafka streams provides true a-record-at-a-time processing
capabilities. it's better for functions like rows parsing, data
cleansing etc. |
20,000 messages/second |
100,000 messages/second |
- What is Stream Processing?
Continuous real-time
flow of records and processing these records in similar timeframe is
stream processing.
- What is the traditional method of message transfer?
Queuing - pool of consumers may read message from the server and each
message goes to one of them.
Publish-Subscribe - Messages are broadcasted to all consumers.
- What is the maximum size of the message does Kafka
server can receive?
1 MB.
- What is Zookeeper in Kafka?
Zookeeper is an open
source, high-performance co-ordination service used for distributed
applications which is adapted by Kafka.Once the Zookeeper is down, it
cannot serve client request. Zookeeper work is leader detection,
distributed synchronization, configuration management, identifies
when a new node leaves or joins, the cluster, node status in real
time, etc.
- Can we use Kafka without Zookeeper?
No
- How message is consumed by consumer in Kafka?
- How you can improve the throughput of a remote
consumer?
Tune the socket buffer size
- How you can get exactly once messaging from Kafka?
Avail a single writer per partition, every time you get a network
error checks the last message in that partition to see if your last
write succeeded.
Add primary key (UUID) and de-duplicate on the consumer.
- Role of the offset.
Messages contained in the
partitions are assigned a unique ID number that is called the offset.
The role of the offset is to uniquely identify every message within
the partition.
- Leader and Follower in Kafka
Every partition in
Kafka has one server which plays the role of a Leader. Other servers
that act as Followers. The Leader performs the task of all read and
write requests for the partition, while the role of the Followers is
to passively replicate the leader. In case Leader failing then one of
the Followers will take on the role of the Leader. This ensures load
balancing of the server.
- What is Partitioning Key?
Partitioning Key is to
indicate the destination partition of the message. By default, a
hashing-based Partitioner is used to determine the partition ID given
the key.
- How do message brokers and API gateways Kafka work in
microservices?
Kafka has ability to use publish / subscribers
and wildcard-based subscriptions is an ideal way to inform other
microservices due to "events" that happen in kafka stream/pipeline
without having to wait for a reply. e.g. if a new customer is added,
you can add a message to a topic so that various microservices who
are interested in customer activity can consume/do the activity. The
microservice that publishes the message is not concerned about the
subscribers to the message.
The ability to have durable consumers in case a service goes
off-line for a particular amount of time.
When you are communicating between microservices in a
request/response "massage" queues can be useful. e.g. you might want
to implement some sort of priority queueing so that important
messages are seen first. You might want to conflate adjacent messages
in the queue so that a microservice-based consumer does not have to
do much work. This kind of "massaging" is difficult to do with a
REST-based API unless you are using an API gateway.
- Can we use kafka as api gateway in microservices?
Yes you can use kafka as API gateway.
- What is the process for starting a Kafka server?
- What can you do with Kafka?
- You can transmit
data between two systems
- Can build a real-time stream/streaming platform of data
pipelines.
- We can use kafka as API gateway in microservices -
- ________#####_______Smart questions______#####_____
- What is the purpose of retention period in Kafka
cluster?
The Kafka cluster durably persists all published
records whether or not they have been consumed. using a configurable
retention period. KIP-186 increases the default offset retention time
from 1 day to 7 days. these are the property used.
log.retention.bytes --> allowed size of the topic (default 7
days)
log.retention.hours --> message is stored on a topic before it
discards old log segments to free up space (default - 1 GB)
- How to start the Zookeeper and kafka server.
terminal zookeeper ==> bin/zookeeper-server-start.sh config/zookeeper.properties
terminal kafka server ==> bin/kafka-server-start.sh config/server.properties
What is Apache Flume Apache Flume is a
distributed, reliable, and available software for efficiently
collecting, aggregating, and moving large amounts of log data.
What is ISR in Kafka? Kafka dynamically
maintains a set of in-sync replicas (ISR) that are caught-up to the
leader. Only members of this set are eligible for election as
leader.A write to a Kafka partition is not considered committed until
all in-sync replicas have received the write. This ISR set is
persisted to ZooKeeper whenever it changes. Because of this, any
replica in the ISR is eligible to be elected leader. This is an
important factor for Kafka's usage model where there are many
partitions and ensuring leadership balance is important. With this
ISR model and F+1 replicas, a Kafka topic can tolerate F failures
without losing committed messages.
What is Log Compaction? Log compaction ensures
that Kafka will always retain at least last known value for each
message key within the log of data for a single topic partition. It
addresses use cases and scenarios such as restoring state after
application crashes/failure / reloading caches after application
restarts during operational maintenance.
What is Log Anatomy? Another way to view a
partition. Basically, a data source writes messages to the log.
Further, one or more consumers read that data from the log at any
time they want.
Disadvantages of Apache Kafka - Very Less
monitoring tool
- Not support wildcard support like other message broker
-
What is Topic Replication Factor?