MapReduce Tutorial with sample code WordCount.java

git pull https://github.com/zhuby1973/python/blob/master/WordCount.java to hadoop VM
add environment variables:
export PATH=${JAVA_HOME}/bin:${PATH}
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
compile
$ hadoop com.sun.tools.javac.Main WordCount.java
$ jar cf wc.jar WordCount*.class
create input directory in HDFS
hdfs dfs -mkdir /wordcount
hdfs dfs -mkdir /wordcount/input
echo "Hello World Bye World" > file01
echo "Hello Hadoop Goodbye Hadoop" > file02
hadoop fs -put file0* /wordcount/input
hadoop fs -ls /wordcount/input
hadoop fs -cat /wordcount/input/file01
edit "~/hadoop-3.1.3/etc/hadoop/yarn-site.xml" as below:

<configuration>
  <property>
   <name>mapreduceyarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
  </property>
</configuration>

edit ~/hadoop-3.1.3/etc/hadoop/mapred-site.xml as below:

<configuration>
 <property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
 </property>
 <property>
   <name>yarn.app.mapreduce.am.env</name>
   <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
 </property>
 <property>
   <name>mapreduce.map.env</name>
   <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
 </property>
 <property>
   <name>mapreduce.reduce.env</name>
   <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
 </property>
 <property>
   <name>yarn.app.mapreduce.am.env</name>
   <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
 </property>
 <property>
   <name>mapreduce.map.env</name>
   <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
 </property>
 <property>
   <name>mapreduce.reduce.env</name>
   <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
 </property>
</configuration>

you need stop-all.sh and start-all.sh to restart hadoop after changes.

Run the application:
$ hadoop jar wc.jar WordCount /wordcount/input /wordcount/output
verify the output:
hadoop@ubunu2004:~$ hadoop fs -ls /wordcount/output
Found 2 items
-rw-r–r– 1 hadoop supergroup 0 2020-07-13 13:34 /wordcount/output/_SUCCESS
-rw-r–r– 1 hadoop supergroup 41 2020-07-13 13:34 /wordcount/output/part-r-00000
you need delete /wordcount/output if you need run it again:
hadoop fs -rm -r -f /wordcount/output

2 Replies to “MapReduce Tutorial with sample code WordCount.java”

click here says:

June 19, 2021 at 5:48 am

Hi my family member! I want to say that this article is amazing, nice written and include almost all significant infos.
I’d like to look more posts like this .

Celeste says:

July 27, 2021 at 4:05 pm

It’s a shame you don’t have a donate button! I’d definitely
donate to this brilliant blog! I suppose for now i’ll settle for book-marking and adding your
RSS feed to my Google account. I look forward to fresh updates and will share this website with my Facebook group.

Chat soon!

2 Replies to “MapReduce Tutorial with sample code WordCount.java”

Leave a Reply Cancel reply