Installing HDFS on Windows 10
Apache Hadoop is an open-source framework that provides distributed storage and processing of large data sets. Hadoop Distributed File System (HDFS) is a distributed file system that is designed to store large data sets across multiple machines.
In this tutorial, we will go through the step-by-step process of installing HDFS on Windows 10.
Prerequisites
Before starting the installation process, make sure that you have the following prerequisites installed on your Windows 10 machine:
- Java (version 8 or later)
- Hadoop (version 3.3.0 or later)
Step 1: Download Hadoop
First, download the latest version of Hadoop from the official website (http://hadoop.apache.org/).
Step 2: Set up Environment Variables
After downloading Hadoop, you need to set up environment variables in Windows to use the Hadoop commands.
Open the Control Panel and search for System.
Click on Edit the system environment variables.
In the System Properties window, click on the Environment Variables button.
Under System Variables, click on the New button and enter the following values:
- Variable Name:
HADOOP_HOME - Variable Value: the path where Hadoop is installed on your machine (e.g.
C:\hadoop)
- Variable Name:
Edit the Path variable and add the following values:
%HADOOP_HOME%\bin%HADOOP_HOME%\sbin
Click OK to save the changes.
Step 3: Configure core-site.xml
Navigate to the Hadoop installation directory (e.g.
C:\hadoop\etc\hadoop).Copy the
core-site.xml.templatefile and rename it tocore-site.xml.Open the
core-site.xmlfile in a text editor and add the following properties inside the<configuration>tag:<property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property>This will set the default file system to use the HDFS protocol and connect to the local machine on port 9000.
Save and close the
core-site.xmlfile.
Step 4: Configure hdfs-site.xml
Copy the
hdfs-site.xml.templatefile and rename it tohdfs-site.xml.Open the
hdfs-site.xmlfile in a text editor and add the following properties inside the<configuration>tag:<property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/hadoop/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/hadoop/hdfs/datanode</value> </property>This will set the replication factor to 1, and configure the storage directories for the NameNode and DataNodes.
Save and close the
hdfs-site.xmlfile.
Step 5: Format NameNode
Open Command Prompt as an administrator.
Navigate to the Hadoop installation directory (e.g.
C:\hadoop\bin).Run the following command to format the NameNode:
hdfs namenode -formatThis will initialize the file system metadata and create the NameNode directory.
Step 6: Start HDFS
In the same Command Prompt window, run the following command to start the NameNode and DataNode services:
start-dfs.cmdThis will start the HDFS cluster.
To check the status of the HDFS cluster, you can run the following command:
jpsThis will list the running processes on your machine, including the NameNode and DataNode processes.
Conclusion
Congratulations! You have successfully installed HDFS on Windows 10. You are now ready to start storing and processing large data sets with Hadoop.