Introduction to Hadoop
Hadoop is an open-source framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from a single server to thousands of machines, each offering local computation and storage.
Key Components of Hadoop
- Hadoop Distributed File System (HDFS): A distributed file system that stores data across multiple machines, providing high throughput access to application data.
- MapReduce: A programming model for processing and generating large data sets with a parallel, distributed algorithm on a cluster.
- YARN (Yet Another Resource Negotiator): Manages and schedules resources in the Hadoop cluster.
- Hadoop Common: The common utilities and libraries that support other Hadoop modules.
Objective
Implement essential file management tasks in Hadoop, specifically:
|
| Implement HDFS Tasks .add, .delete, .retrieve | IndianTechnoEra |
- Adding files and directories
- Retrieving files from HDFS to the local filesystem
- Deleting files from HDFS
Procedure
Step 1: Adding Files and Directories to HDFS
Before running Hadoop programs on data stored in HDFS, the data needs to be added to HDFS. Let's start by creating a directory and adding a file to it.
-
Create a directory in HDFS:
hadoop fs -mkdir /user/myfileThis command creates a new directory named
myfilein the/userdirectory in HDFS. -
Add a file to HDFS:
hadoop fs -put a.txtThis command uploads the file
a.txtfrom the local filesystem to the root directory of HDFS. -
Add the file to the newly created directory:
hadoop fs -put a.txt /user/myfileThis command uploads the file
a.txtfrom the local filesystem directly into the/user/myfiledirectory in HDFS.
Step 2: Retrieving Files from HDFS
To copy files from HDFS back to the local filesystem, use the
get command. Here’s how to retrieve a.txt:
hadoop fs -cat a.txt
This command displays the contents of the file a.txt directly to
the console. To actually copy the file to the local filesystem, you would use:
hadoop fs -get a.txt /local/path
Replace /local/path with the desired path on your local
filesystem.
Step 3: Deleting Files from HDFS
To delete a file from HDFS, use the rm command. Here’s how to
delete a.txt:
hadoop fs -rm a.txt
This command removes the file a.txt from HDFS.
Output
The successful execution of the above commands will result in the following:
- Creation of the
/user/myfiledirectory in HDFS. -
Addition of
a.txtto HDFS and then to/user/myfile. - Retrieval of
a.txtfrom HDFS to the local filesystem. - Deletion of
a.txtfrom HDFS.
These steps demonstrate the basic file management capabilities within Hadoop's HDFS, essential for any data processing tasks using Hadoop.
Keys: Implement HDFS Tasks - Hadoop File Management Tasks [add, delete, retrieve]
