What is "MapReduce" and how does it works?
The term "MapReduce" refers to two separate and distinct tasks that Hadoop programs perform.
MapReduce is a programming paradigm that enables massive scalability across hundreds or thousands of servers in a Hadoop cluster. As the processing component, MapReduce is the heart of Apache Hadoop.
|
| Map reduce Hadoop Clou era CodeSam |
MapReduce is a programming model for processing large data sets with a parallel , distributed algorithm on a cluster (source: Wikipedia). Map Reduce when coupled with HDFS can be used to handle big data.
Step 1 - Start Cloudera Virtual Machine
Step 2 Open Eclipse
Step 3 - Create a project (Go to File > New > Java Project
) with name ‘Project4’ & Finish
Step 4 -
Add three new java file with ‘Project4’ as Class
l Right click on Project4 > New > Class > Enter class name WordCount Finish
l Right click on Project4 > New > Class > Enter class name WordMaper Finish
l Right click on Project4 > New > Class > Enter class name WordReducer Finish
Step 5 - Copy following code in respected class java file
WordCount.java
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
public class WordCount {
public static void main(String[] args) throws IOException
{
if (args.length != 2) {
System.out.printf(
"Usage:
WordCount <input dir> <output dir>\n");
System.exit(-1);
}
@SuppressWarnings("deprecation")
Job job = new Job();
job.setJarByClass(WordCount.class); //entry point
job.setJobName("Word Count");
FileInputFormat.setInputPaths(job,new
Path(args[0]));
FileOutputFormat.setOutputPath(job,new
Path(args[1]));
job.setMapperClass(WordMapper.class);
job.setReducerClass(WordReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
boolean success =false;
try {
success = job.waitForCompletion(true); //start job
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.exit(success ? 0:1);
}
}
WordMaper .java
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer.Context;
public class WordMapper extends Mapper<LongWritable, Text , Text,
IntWritable>{
@Override
public void map(LongWritable key , Text value , Context context )
throws IOException, InterruptedException
{
String line= value.toString();
for (String word : line.split(" ")){
if (word.length() > 0) {
//context.write(new Text("total"), new IntWritable(1));
context.write(new Text(word), new IntWritable(1));
//intermediate output
}
}
}
}
WordReducer .java
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordReducer extends Reducer<Text, IntWritable, Text,
IntWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values,
Context context)
throws IOException, InterruptedException{
int wordCount=0;
for(IntWritable value : values){
wordCount +=value.get();
}
context.write(key, new
IntWritable(wordCount));
}
}
Step 6 - Add Hadoop and MapReducer Libraries in our Project
Right click on Project4 > Build Path > Configure Build Path
Click on Add External Libraries > File System > usr > lib > hadoop > Select all .jar files and click on OK
Then click on Add External JARs > File System > usr > lib > hadoop 0.20-mapreduce > Select all .jar files and click on OK
After added External Libraries and External JARs finally click on OK
Step 7 - Then export .jar file to follow the given steps
a) Right click on project file Project4
b) Click on Export then a new window will open
c) Open / expand the ‘Java’ folder from the windows
d) Choose ‘JAR file’
e) Then click on ‘Next’
f) Then in the next window browse for destination path were .JAR file will save
g) Explore/ Choose your destination and enter a jar file name should be with .jar extension like (WCFile.jar)
h) Then select source WordCount file fron JAR Export window
i) Click on Next
j) Then check on
Export class files with compiler errors,
Export class files with compile warnings and
Build projects if not built automatically
k) Then in the next window browse Main class destination( it will auto detect main class ‘WordCount’) and click on ‘OK’
l) Then ‘Finish’
m) Then from the next save modified resources choose WordReducer.java
and click on ‘OK’
n) Click on ‘OK’ on JAR export window
m) After exported jar file file let you have been exported jar file at the hadoop desktop
Then create a sample text file with some content let you have created a text file at hadoop desktop
like myintro.txt
Then open hadoop terminal and operate the following commands-
hdfs dfs -mkdir /input
hdfs dfs -put myintro.txt /input
hdfs dfs -cat /output/part-r-0000
Now done you will get the result
