Articles in this section
Category / Section

Execute Java Map reduce sample using Eclipse

3 mins read

To create New Eclipse project:

  • Create new java project in Eclipse.

C:\Users\nandhinik\Desktop\Eclipse\CreateNewProject.png

          Select “Java Project and click “Next” button.

C:\Users\nandhinik\Desktop\Eclipse\CreateJavaProject.png

           Enter project name and click “Finish” Button.

C:\Users\nandhinik\Desktop\Eclipse\CreateProjectFinish.png

New java project will be created in the Eclipse.

C:\Users\nandhinik\Desktop\Eclipse\projectExplorer.png

To create a new java class:

  • Once the project created add the class file in the project by right-clicking the “src” in a project and select “New”,”class” from the menu.

C:\Users\nandhinik\Desktop\Eclipse\NewJavaClass.png

 

Type the class name and click “Finish” button.

C:\Users\nandhinik\Desktop\Eclipse\className.png

     It will create a class under the “src” folder. Add your code in the created class.

To add the dependencies in the project:

  • Add the required dependencies to the project by right-clicking the project and select Build Path-Configure Build Path.

Click “Add External JARs” and browse the required jar files and add it.

  • Add the below mentioned dependencies to build the project
  1. hadoop-common-*.*.*.jar
  2. hadoop-mapreduce-client-core-*.*.*.jar

 

C:\Users\nandhinik\Desktop\Eclipse\addDependency.png

C:\Users\nandhinik\Desktop\Eclipse\AddDependencyOk.png

To create a MapReduce Java Program:

      MapReduce program contains Map and Reduce algorithms under Mapper and Reducer class respectively. Brief details about the Mapper and Reducer classes are as follows,

Mapper Class:

A mapper’s main work is to produce a list of key value pairs to be processed later.

A mapper receives a key value pair as parameters, and produce a list of new key value pairs.

 

For Example:

From each input to the mapper, the generated list of key value pairs is the key, combined with each of the values separated by comma.

Input: (aaa,bbb,ccc,ddd))

Output: List(aaa 1, bbb 1, ccc 1,aaa 1)

Code:

public static class MapClass extends Mapper<LongWritable, Text, Text, IntWritable> {

    

    private final static IntWritable one = new IntWritable(1);

    private Text word = new Text();

    

    public void map(LongWritable key, Text value,

                    OutputCollector<Text, IntWritable> output,

                    Reporter reporter) throws IOException {

      String line = value.toString();

      StringTokenizer itr = new StringTokenizer(line);

      while (itr.hasMoreTokens()) {

        word.set(itr.nextToken());

        output.collect(word, one);

      }

    }

  }

 

Shuffler Class:

After the mapper and before the reducer, the shuffler and combining phases take place. The shuffler phase assures that every key value pair with the same key goes to the same reducer, the combining part converts all the key value pairs of the same key to group and form key,list(values) this is what the reducer ultimately receives.

Reducer Class:

Reducer’s job is to take the key list(values) pair, operate on the grouped values, and store it somewhere. It takes the key list(values) pair, loop through the values concatenating them to a pipe-separated string, and send the new key value pair to the output.

 

For Example:

Input: [(aaa,List(1,1)),(bbb,List(1)),(ccc,List(1))]

Output: [(aaa,2),(bbb,1),(ccc,1)]

 

Code:

  public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

    

    public void reduce(Text key, Iterator<IntWritable> values,

                       OutputCollector<Text, IntWritable> output,

                       Reporter reporter) throws IOException {

      int sum = 0;

      while (values.hasNext()) {

        sum += values.next().get();

      }

      output.collect(key, new IntWritable(sum));

    }

  }

 

Main Method:

Create a instance for the Job Class and set the Mapper and Reducer class in the Main() method and execute the program.

Code:

public static void main(String[] args) throws Exception

{

               

                String arguments[] = new String[2];

                //For remote cluster set remote host_name:port instead of localhost:9000

                arguments[0] = "hdfs://localhost:9000/Data/WarPeace.txt"; // Input HDFS File

                arguments[1] = "hdfs://localhost:9000/OutPut"; // Output directory

                Configuration conf = new Configuration();

                Job job = new Job(conf, "WordCount");

                FileInputFormat.addInputPath(job, new Path(arguments[0]));

                FileOutputFormat.setOutputPath(job, new Path(arguments[1]));

                job.setJarByClass(WordCount.class);

                job.waitForCompletion(true);

                job.setOutputKeyClass(Text.class);

                job.setOutputValueClass(IntWritable.class);

           job.setMapperClass(MapClass.class);

                job.setReducerClass(Reduce.class);

                job.setInputFormatClass(TextInputFormat.class);

                job.setOutputFormatClass(TextOutputFormat.class);

  }

 

Required Dependencies to execute project:

  1. Jar files under the specified folder is required to execute Mapreduce java program,
  1. HADOOP_HOME\share\hadoop\common
  2. HADOOP_HOME\share\hadoop\common\lib
  3. HADOOP_HOME\share\hadoop\hdfs
  4. HADOOP_HOME\share\hadoop\yarn
  5. HADOOP_HOME\share\hadoop\mapreduce

To Run the Project:

  • Once the build is successful, execute the project by right-clicking the class and select

Run As – Run Configurations.. from the menu.

C:\Users\nandhinik\Desktop\Eclipse\run1.png

Double-click the “Java Applications” from the opened window.

C:\Users\nandhinik\Desktop\Eclipse\run2.png

Navigate to the “Arguments” tab and add the arguments in the provided space if your Mapreduce program has to get arguments at runtime.

C:\Users\nandhinik\Desktop\Eclipse\run3.png

Navigate to “Classpath” tab and select the “User Entries” and click “Add External JARs” and add the dependencies in it.

C:\Users\nandhinik\Desktop\Eclipse\run4.png

After adding dependencies click the “Advanced” button and select “Add External folders” and click “Ok” button.

C:\Users\nandhinik\Desktop\Eclipse\addconf.png

Select the Hadoop configuration file directory and click “Ok” button.

C:\Users\nandhinik\Desktop\Eclipse\addconf1.png

Navigate to the “Environment” tab and set “HADOOP_HOME” and click “Apply” and click “Run”.

Execution will be completed and the logs will be like shown below,

C:\Users\nandhinik\Desktop\Eclipse\output.png

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments
Please sign in to leave a comment
Access denied
Access denied