Java Main Topics

Extra Topics

HOT! J-e.g. DB-e.g. Git API Tools
IQ-Architecture IQ-Kafka IQ-General Cloud IQ-Spring Cloud IQ-Java General IQ-Serialization IQ-Collection IQ-Concurrency IQ-GC IQ-Database General IQ-Design Pattern IQ-XSLT IQ-MicroServices IQ-MicroServices Pattern IQ-Hibernate IQ-Spring General IQ-Spring Boot IQ-JS IQ-Bootstrap IQ-Ext-JS IQ-WS IQ-SOAP WS IQ-REST WS IQ-1 IQ-2 IQ-Scenerio
D-Language Fundamentals D-Declaration D-Access Control D-Enum D-Object Orientation D-Operators & Assignments D-Collections D-Concurrency D-I/O & NIO Drill D-Inner Classes D-Threads D-Garbage Collection D-Serialization D-JDBC D-Spring Boot D-WebService
Core Java Spring Microservices Web Services Cloud Cloud Design Pattern Architecture ORM Markup-Script UI DB PHP Ubuntu
Other

Contact Me

Big Data Topics

Kafka Details
Splunk Details



Kafka Interview Question

    Kafka Interview Question

  • What is Kafka?
  • Open-source message broker which is Scalability, Durability, Stream Processing, Zero Data lose, High Throughput, Fault tolerant etc.
  • What are the main components in Kafka
    • Topic - Stream of messages belonging to the same type
    • Producer - Publish messages to a topic
    • Brokers - Server/s where the publishes messages are stored
    • Consumer - subscribes to various topics and pulls data from the brokers.
  • What are the advantages of Kafka?
    • High-throughput - Without any large hardware and can handle high volume data in high-velocity
    • Low Latency- Large no messages can handle in ms
    • Fault-Tolerant -
    • Scalability -
    • Durability(messages are never lost)
    • Scalability(without down on the fly can ad additional nodes)
  • Apache Kafka Vs Apache Spark
  • Apache Spark Apache Kafka
    Traditional queuing done and deletes the messages just after processing completion means from the end of the queue. Messages persist even after being processed
    Not a Event base. Completely work on even base.
    Spark streaming is standalone framework. Kafka stream can be used as part of microservice as it's just a library.
    Data streams is Divided into Micro-batched for processing. Real-time processes as per data stream
    Separated processing Cluster is requried Separated processing cluster is NOT requried.
    Needs re-configuration for Scaling Scales easily by just adding java processes, No reconfiguration requried.
    At least one semantics Exactly one semantics
    Spark streaming is better at processing group of rows(groups,by,ml,window functions etc.) Kafka streams provides true a-record-at-a-time processing capabilities. it's better for functions like rows parsing, data cleansing etc.
    20,000 messages/second 100,000 messages/second
  • What is Stream Processing?
  • Continuous real-time flow of records and processing these records in similar timeframe is stream processing.
  • What is the traditional method of message transfer?
  • Queuing - pool of consumers may read message from the server and each message goes to one of them.
    Publish-Subscribe - Messages are broadcasted to all consumers.
  • What is the maximum size of the message does Kafka server can receive?
  • 1 MB.
  • What is Zookeeper in Kafka?
  • Zookeeper is an open source, high-performance co-ordination service used for distributed applications which is adapted by Kafka.Once the Zookeeper is down, it cannot serve client request. Zookeeper work is leader detection, distributed synchronization, configuration management, identifies when a new node leaves or joins, the cluster, node status in real time, etc.
  • Can we use Kafka without Zookeeper?
  • No
  • How message is consumed by consumer in Kafka?
  • How you can improve the throughput of a remote consumer?
  • Tune the socket buffer size
  • How you can get exactly once messaging from Kafka?
  • Avail a single writer per partition, every time you get a network error checks the last message in that partition to see if your last write succeeded.
    Add primary key (UUID) and de-duplicate on the consumer.
  • Role of the offset.
  • Messages contained in the partitions are assigned a unique ID number that is called the offset. The role of the offset is to uniquely identify every message within the partition.
  • Leader and Follower in Kafka
  • Every partition in Kafka has one server which plays the role of a Leader. Other servers that act as Followers. The Leader performs the task of all read and write requests for the partition, while the role of the Followers is to passively replicate the leader. In case Leader failing then one of the Followers will take on the role of the Leader. This ensures load balancing of the server.
  • What is Partitioning Key?
  • Partitioning Key is to indicate the destination partition of the message. By default, a hashing-based Partitioner is used to determine the partition ID given the key.
  • How do message brokers and API gateways Kafka work in microservices?
  • Kafka has ability to use publish / subscribers and wildcard-based subscriptions is an ideal way to inform other microservices due to "events" that happen in kafka stream/pipeline without having to wait for a reply. e.g. if a new customer is added, you can add a message to a topic so that various microservices who are interested in customer activity can consume/do the activity. The microservice that publishes the message is not concerned about the subscribers to the message.
    The ability to have durable consumers in case a service goes off-line for a particular amount of time.
    When you are communicating between microservices in a request/response "massage" queues can be useful. e.g. you might want to implement some sort of priority queueing so that important messages are seen first. You might want to conflate adjacent messages in the queue so that a microservice-based consumer does not have to do much work. This kind of "massaging" is difficult to do with a REST-based API unless you are using an API gateway.
  • Can we use kafka as api gateway in microservices?
  • Yes you can use kafka as API gateway.
  • What is the process for starting a Kafka server?
  • What can you do with Kafka?
  • - You can transmit data between two systems
    - Can build a real-time stream/streaming platform of data pipelines.
    - We can use kafka as API gateway in microservices -
  • ________#####_______Smart questions______#####_____
  • What is the purpose of retention period in Kafka cluster?
  • The Kafka cluster durably persists all published records whether or not they have been consumed. using a configurable retention period. KIP-186 increases the default offset retention time from 1 day to 7 days. these are the property used.
    log.retention.bytes --> allowed size of the topic (default 7 days)
    log.retention.hours --> message is stored on a topic before it discards old log segments to free up space (default - 1 GB)
  • How to start the Zookeeper and kafka server.
  • 			terminal zookeeper ==> bin/zookeeper-server-start.sh config/zookeeper.properties
    	
    			terminal kafka server ==> bin/kafka-server-start.sh config/server.properties
    			
  • What is Apache Flume
  • Apache Flume is a distributed, reliable, and available software for efficiently collecting, aggregating, and moving large amounts of log data.
  • What is ISR in Kafka?
  • Kafka dynamically maintains a set of in-sync replicas (ISR) that are caught-up to the leader. Only members of this set are eligible for election as leader.A write to a Kafka partition is not considered committed until all in-sync replicas have received the write. This ISR set is persisted to ZooKeeper whenever it changes. Because of this, any replica in the ISR is eligible to be elected leader. This is an important factor for Kafka's usage model where there are many partitions and ensuring leadership balance is important. With this ISR model and F+1 replicas, a Kafka topic can tolerate F failures without losing committed messages.
  • What is Log Compaction?
  • Log compaction ensures that Kafka will always retain at least last known value for each message key within the log of data for a single topic partition. It addresses use cases and scenarios such as restoring state after application crashes/failure / reloading caches after application restarts during operational maintenance.
  • What is Log Anatomy?
  • Another way to view a partition. Basically, a data source writes messages to the log. Further, one or more consumers read that data from the log at any time they want.
  • Disadvantages of Apache Kafka
  • - Very Less monitoring tool
    - Not support wildcard support like other message broker
    -
  • What is Topic Replication Factor?

© COPYRIGHT 2014-2018 JAVAREFRESH - ALL RIGHTS RESERVED by MCB