Ubuntu 20.04 is a robust operating system that is the basis for deploying many complex infrastructures such as smooth transmission and fast and efficient processing of data streams. Today you will learn how to deploy one because in this post you will learn how to install Apache Kafka on Ubuntu 20.04.
Apache Kafka software is an open-source cross-platform application developed by Apache Software Foundation and specialized in stream processing. It allows you to publish, store, process, and subscribe to log streams in real-time. It is designed to handle data streams from various sources and distribute them to various users.
Apache Kafka is the alternative to a traditional enterprise messaging system. It started as an internal system that LinkedIn developed to handle 1.4 billion messages per day.
This platform has started to gain popularity thanks to large companies such as Netflix and Microsoft using it in their architectures. Kafka is written in Java and Scala, so it has to be present in the system to run.
Install Apache Kafka on Ubuntu 20.04
Apache Kafka is built with Java so we have to install it before proceeding with any steps.
So, open a terminal or connect to your server via SSH and update Ubuntu
sudo apt update sudo apt upgrade
Now install Java on Ubuntu.
sudo apt install default-jdk default-jre
The next step is to add a new user to the system so that Kafka can be managed by it.
sudo adduser kafka
The user you created has to be added to the sudo
group so that you have sufficient permissions to run the program.
sudo adduser kafka sudo
Now that the kafka
user is created and ready, you can log in using the su
command
su -l kafka
Downloading and installing Apache Kafka
Create a new folder for you to download the program. I will call it kafka
but you can choose another name.
mkdir kafka
Now access it and from there with the help of the wget
command, you can download the latest stable version of the program.
cd kafka wget https://downloads.apache.org/kafka/2.7.0/kafka_2.13-2.7.0.tgz
Sample Output:
--2021-04-15 23:13:07-- https://downloads.apache.org/kafka/2.7.0/kafka_2.13-2.7.0.tgz Resolving downloads.apache.org (downloads.apache.org)... 2a01:4f8:10a:201a::2, 88.99.95.219 Connecting to downloads.apache.org (downloads.apache.org)|2a01:4f8:10a:201a::2|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 68583422 (65M) [application/x-gzip] Saving to: ‘kafka_2.13-2.7.0.tgz’ kafka_2.13-2.7.0.tgz 100%[=====================================================================================>] 65.41M 3.08MB/s in 20s 2021-04-15 23:13:27 (3.21 MB/s) - ‘kafka_2.13-2.7.0.tgz’ saved [68583422/68583422]
After that unzip it using the command tar
.
tar -xvzf kafka_2.13-2.7.0.tgz --strip 1
We now have the binary correctly on the system. So we will have to do some configuration before we can use it.
Configuring Apache Kafka before using it
By default, Apache Kafka will not allow you to delete a topic. In this chaos, a topic can be a category, group, or feed name that can be published in a message. So it is a good idea to change this.
To do this, open the server.properties
file inside the config
folder
nano config/server.properties
And locate the delete.topic.enable
directive and set it to true
.
delete.topic.enable = true
In this same file, you can change the folder where Apache Kafka saves the logs that are generated.
log.dirs=/home/kafka/logs
In this case, the logs folder should be in the same directory as the home directory.
Another configuration we have to do is to create a service to manage Kafka as if it were a system service. This will make it easier to start it, stop it and check its status.
However, we have to start with Zookeeper
which is a service with which Kafka manages cluster configurations and status.
To do this, create a new file for Zookeeper
in the directory where the services are hosted.
sudo nano /etc/systemd/system/zookeeper.service
And add the following
[Unit] Requires=network.target remote-fs.target After=network.target remote-fs.target [Service] Type=simple User=kafka ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh Restart=on-abnormal [Install] WantedBy=multi-user.target
Save the changes and close the editor.
Now do the same for kafka.
sudo nano /etc/systemd/system/kafka.service
And add the following:
[Unit] Requires=zookeeper.service After=zookeeper.service [Service] Type=simple User=kafka ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1' ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh Restart=on-abnormal [Install] WantedBy=multi-user.target
Again, save the changes and close the editor.
To apply the changes, just refresh the system daemon list.
sudo systemctl daemon-reload
And start the Kafka and Zookeeper services.
sudo systemctl start kafka sudo systemctl enable kafka sudo systemctl enable zookeeper sudo systemctl start zookeeper
This will complete the installation.
Conclusion
Apache Kafka is a professional open-source solution for large companies that need effective data transmission. Being open-source gives us a reference of how powerful and manageable it is.
So, share this post and leave us a comment.