Apache Kafka Cluster Architecture | Kafka Streams Series-2

5 min readMay 9, 2021

TL;DR | As it terns from the title, it’s the second article of the Kafka Streams Series. In the previous article, I touched on about the Apache Kafka Storage Architecture. It was all about the fundamentals of Apache Kafka, related keywords and more. If you still didn’t read it or need it, you should check it is.
In this article, I’m going to find answers on why Kafka should run as distributed and how it does it. The zookeeper, Kafka controller and more!

Up to the present, you heard of some keyword such as “scalable system”, “distributed system” and you are already known the Kafka have all of that. I guess more or less, you are guessing why Kafka needs it. Yes, all about the workloads. If we made a mind to use Apache Kafka, it would mean we are dealing with the big data or relatively large most of the time. So, real-time is most significant in Kafka environment. The expectation is to fully work in real-time, especially when working on Kafka Streams. We don’t want to even feel a 1-second delay.

Think, you need to find that big data in real-time.

Usually, we don’t choose to work on a single Kafka broker. We’d like to run up multiple Kafka broker instance at the same time and work on it. Recall, the replication factor as I mentioned in the previous article. It’s involved with the number of the Kafka instance. Anyway, this coop of Kafka Brokers called a Kafka Cluster. One or more than one broker come together to be a cluster. The cluster gives us a highly configurable scalable and also a distributed system. The main goal is to do that, of course spread fairly the workloads between replicas, partitions.

Did you say fairly? Yes, I did. Don’t worry, I’ll cover it at the next article. Just want to put uncertainly into your mind. Think of it at the background. How Kafka does do that? Can we rely on Kafka’s sword of justice?

We can work on a single broker in local-grade development. But we might need to increase the number of brokers when work in production-grade. And also, we might increase broker’s number when the workloads have dramatics increase. All of that is a scenario that is may happen. The system and Kafka should be in the mood to for that not surprising beasts. Not to worry, thanks to Kafka to do that already. You need just to configure it.

There is a manager should have, if we talked about a community, group or society. Also, Kafka Cluster is like that. Most of the time, there is a master node in the distributed systems. This master node cares the other worker(node) to work correctly. Manage and maintenances them. But there isn’t any master in the Apache Kafka Cluster in built-in. Apache Kafka Cluster doesn’t have a master-worker architecture; it’s a master-less architecture.

So, the Kafka needs to hire a wise manager and the zookeeper was the best candidate for it. The zookeeper manages and maintains the brokers in the cluster. The zookeeper follows the brokers in the cluster. It finds an answer for which brokers have been crashed or which broker recently added to the cluster and takes care about their lifecycle. In a nutshell, you need to be sure knowing Apache Kafka work with the zookeeper.

Each Kafka cluster and Zookeeper has a configuration file to keep configs and every Kafka broker has an identified which mean unique_id that defined in the configuration file. Both of them have a connection config field to connect each other. Before we start, should configure them. After a successful connection has been established between them, the zookeeper creates ephemeral nodes for each unique broker IDs. These nodes are the reflection of brokers in the zookeeper. When a broker pass away, the ephemeral nodes also pass away. The ephemeral nodes i.e. active brokers, take place in the /brokers/ids path.

There is a necessity that manage the states of partitions and replicas and performing some administrative task such as reassigning partitions in the Kafka Cluster. But we said, there isn’t any such master node to do that. Yes, still didn’t. But all Kafka brokers like to take responsibility, and we need just one broker that take responsibility called Controller. In that case, Kafka needs to choose a broker to do that.
So, we can group Kafka brokers in the cluster in two ways. The Controller broker and others. The controller is the same as other brokers except it has some extra responsibilities which I explained before. It’s an elected broker.
Election way is simple. The first created broker get elected as an active controller in the cluster. If that controller pass away for some unexpected reasons, all other active brokers going to want to be a controller and Kafka elect again one of them. After Kafka elects a controller, other brokers receive an exception that “already exist”. If the previous controller comes back, it’ll want to be a controller again, but it’ll also receive an exception. Because there is already an elected active controller. After this process has been complete, all other brokers follow the controller.

As I said earlier, there are reflections of brokers in the zookeeper also there is an ephemeral node of the controller in the /controller path that is in the zookeeper.

Let’s clear confusion

We may be confused when understanding the difference between master-worker architecture and Kafka controller design.
The controller monitors and lists the active broker sessions. It knows the broker’s status and keeps all metadata. It’s not a single point of failure such as in the other distributed systems. The controller is not different from other brokers, all that are equal.

Conclusion

Kafka’s strength comes from the distributed system. Multiple broker works together in perfect harmony in the cluster. It needs a manager to do that, called the zookeeper. The zookeeper keeps metadata of brokers which in the cluster. And follow their lifecycle and do some administrative tasks with the controller. The controller is the same as any other brokers in the cluster except for some extra responsibilities. It’s elected to manage other brokers.

That’s it. Thanks to you all for interested in and reading it. I’m going to talk about Apache Kafka work distribution architecture at the next article. See you at the next time.

Apache Kafka Cluster Architecture | Kafka Streams Series-2

Let’s clear confusion

Conclusion

Written by Kadir Alan