To create New Eclipse project:
Select “Java Project” and click “Next” button.
Enter project name and click “Finish” Button.
New java project will be created in the Eclipse.
To create a new java class:
Type the class name and click “Finish” button.
It will create a class under the “src” folder. Add your code in the created class.
To add the dependencies in the project:
Click “Add External JARs” and browse the required jar files and add it.
To create a MapReduce Java Program:
MapReduce program contains Map and Reduce algorithms under Mapper and Reducer class respectively. Brief details about the Mapper and Reducer classes are as follows,
A mapper’s main work is to produce a list of key value pairs to be processed later.
A mapper receives a key value pair as parameters, and produce a list of new key value pairs.
From each input to the mapper, the generated list of key value pairs is the key, combined with each of the values separated by comma.
Output: List(aaa 1, bbb 1, ccc 1,aaa 1)
After the mapper and before the reducer, the shuffler and combining phases take place. The shuffler phase assures that every key value pair with the same key goes to the same reducer, the combining part converts all the key value pairs of the same key to group and form key,list(values) this is what the reducer ultimately receives.
Reducer’s job is to take the key list(values) pair, operate on the grouped values, and store it somewhere. It takes the key list(values) pair, loop through the values concatenating them to a pipe-separated string, and send the new key value pair to the output.
Create a instance for the Job Class and set the Mapper and Reducer class in the Main() method and execute the program.
Required Dependencies to execute project:
To Run the Project:
Run As – Run Configurations.. from the menu.
Double-click the “Java Applications” from the opened window.
Navigate to the “Arguments” tab and add the arguments in the provided space if your Mapreduce program has to get arguments at runtime.
Navigate to “Classpath” tab and select the “User Entries” and click “Add External JARs” and add the dependencies in it.
After adding dependencies click the “Advanced” button and select “Add External folders” and click “Ok” button.
Select the Hadoop configuration file directory and click “Ok” button.
Navigate to the “Environment” tab and set “HADOOP_HOME” and click “Apply” and click “Run”.
Execution will be completed and the logs will be like shown below,
|Article ID:||Published Date:||Last Revised Date:||Platform:||Control:|
|7054||08/12/2016||09/08/2016||Big Data Platform||General|