How to Install HDFS on Linux Mint
In this tutorial, you will learn how to install HDFS (Hadoop Distributed File System) on Linux Mint. HDFS is a distributed file system designed to store large data sets reliably and efficiently in a cluster. It is part of the Apache Hadoop project and is used by many big data applications to process and analyze large datasets.
Prerequisites
Before we begin, ensure that:
- You have a Linux Mint installed and running.
- You have a user account with sudo privileges.
Step 1: Install Java
Hadoop requires Java to be installed on the system. You can check whether Java is already installed on your system by running the following command:
$ java -version
If Java is not installed, run the following command to install Java on your system:
$ sudo apt-get update
$ sudo apt-get install default-jdk
Verify that Java is installed by running the command:
$ java -version
Step 2: Download Hadoop
Download the Hadoop distribution from the official Apache Hadoop website:
$ wget https://ftp.jaist.ac.jp/pub/apache/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
Extract the downloaded archive file by running the following command:
$ tar -xzvf hadoop-3.3.1.tar.gz
Move the extracted directory to the /usr/local directory:
$ sudo mv hadoop-3.3.1 /usr/local/hadoop
Step 3: Configure Environment Variables
Hadoop requires some environment variables to be set up in order to run properly. These variables include HADOOP_HOME, HADOOP_INSTALL, HADOOP_MAPRED_HOME, and HADOOP_COMMON_HOME.
To configure these variables, open the ~/.bashrc file and add the following lines at the end of the file:
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Save and close the file, then run the following command to apply the changes:
$ source ~/.bashrc
Step 4: Configure Hadoop
Hadoop needs to be configured before being used. There are several configuration files located in the /usr/local/hadoop/etc/hadoop directory that need to be edited.
- core-site.xml: This file contains the configuration settings for Hadoop’s core components.
Open the file using your favorite text editor and add the following configuration:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Save and close the file.
- hdfs-site.xml: This file contains the configuration settings for the Hadoop Distributed File System.
Open the file using your favorite text editor and add the following configuration:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/hdfs/datanode</value>
</property>
</configuration>
Save and close the file.
Step 5: Format HDFS NameNode
To start using HDFS, you need to format the HDFS NameNode using the following command:
$ hdfs namenode -format
Step 6: Start Hadoop Services
Start the Hadoop services by running the following command:
$ start-dfs.sh
$ start-yarn.sh
To stop the services, run the following command:
$ stop-dfs.sh
$ stop-yarn.sh
Conclusion
Congratulations! You have successfully installed HDFS on Linux Mint. You can now use HDFS to store and process large datasets in your cluster.