How to Install HDFS on Linux Mint

In this tutorial, you will learn how to install HDFS (Hadoop Distributed File System) on Linux Mint. HDFS is a distributed file system designed to store large data sets reliably and efficiently in a cluster. It is part of the Apache Hadoop project and is used by many big data applications to process and analyze large datasets.

Prerequisites

Before we begin, ensure that:

You have a Linux Mint installed and running.
You have a user account with sudo privileges.

Step 1: Install Java

Hadoop requires Java to be installed on the system. You can check whether Java is already installed on your system by running the following command:

$ java -version

If Java is not installed, run the following command to install Java on your system:

$ sudo apt-get update
$ sudo apt-get install default-jdk

Verify that Java is installed by running the command:

$ java -version

Step 2: Download Hadoop

Download the Hadoop distribution from the official Apache Hadoop website:

$ wget https://ftp.jaist.ac.jp/pub/apache/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz

Extract the downloaded archive file by running the following command:

$ tar -xzvf hadoop-3.3.1.tar.gz

Move the extracted directory to the /usr/local directory:

$ sudo mv hadoop-3.3.1 /usr/local/hadoop

Step 3: Configure Environment Variables

Hadoop requires some environment variables to be set up in order to run properly. These variables include HADOOP_HOME, HADOOP_INSTALL, HADOOP_MAPRED_HOME, and HADOOP_COMMON_HOME.

To configure these variables, open the ~/.bashrc file and add the following lines at the end of the file:

export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Save and close the file, then run the following command to apply the changes:

$ source ~/.bashrc

Step 4: Configure Hadoop

Hadoop needs to be configured before being used. There are several configuration files located in the /usr/local/hadoop/etc/hadoop directory that need to be edited.

core-site.xml: This file contains the configuration settings for Hadoop’s core components.

Open the file using your favorite text editor and add the following configuration:

<configuration>
   <property>
      <name>fs.default.name</name>
      <value>hdfs://localhost:9000</value>
   </property>
</configuration>

Save and close the file.

hdfs-site.xml: This file contains the configuration settings for the Hadoop Distributed File System.

Open the file using your favorite text editor and add the following configuration:

<configuration>
   <property>
      <name>dfs.replication</name>
      <value>1</value>
   </property>
   <property>
      <name>dfs.namenode.name.dir</name>
      <value>/usr/local/hadoop/hdfs/namenode</value>
   </property>
   <property>
      <name>dfs.datanode.data.dir</name>
      <value>/usr/local/hadoop/hdfs/datanode</value>
   </property>
</configuration>

Save and close the file.

Step 5: Format HDFS NameNode

To start using HDFS, you need to format the HDFS NameNode using the following command:

$ hdfs namenode -format

Step 6: Start Hadoop Services

Start the Hadoop services by running the following command:

$ start-dfs.sh
$ start-yarn.sh

To stop the services, run the following command:

$ stop-dfs.sh
$ stop-yarn.sh

Conclusion

Congratulations! You have successfully installed HDFS on Linux Mint. You can now use HDFS to store and process large datasets in your cluster.