MapReduce Tutorial with sample code WordCount.java

  1. git pull https://github.com/zhuby1973/python/blob/master/WordCount.java to hadoop VM
  2. add environment variables:
    export PATH=${JAVA_HOME}/bin:${PATH}
    export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
  3. compile
    $ hadoop com.sun.tools.javac.Main WordCount.java
    $ jar cf wc.jar WordCount*.class
  4. create input directory in HDFS
    hdfs dfs -mkdir /wordcount
    hdfs dfs -mkdir /wordcount/input
    echo "Hello World Bye World" > file01
    echo "Hello Hadoop Goodbye Hadoop" > file02
    hadoop fs -put file0* /wordcount/input
    hadoop fs -ls /wordcount/input
    hadoop fs -cat /wordcount/input/file01
  5. edit "~/hadoop-3.1.3/etc/hadoop/yarn-site.xml" as below:
<configuration>
  <property>
   <name>mapreduceyarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
  </property>
</configuration>
  1. edit ~/hadoop-3.1.3/etc/hadoop/mapred-site.xml as below:
<configuration>
 <property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
 </property>
 <property>
   <name>yarn.app.mapreduce.am.env</name>
   <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
 </property>
 <property>
   <name>mapreduce.map.env</name>
   <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
 </property>
 <property>
   <name>mapreduce.reduce.env</name>
   <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
 </property>
 <property>
   <name>yarn.app.mapreduce.am.env</name>
   <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
 </property>
 <property>
   <name>mapreduce.map.env</name>
   <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
 </property>
 <property>
   <name>mapreduce.reduce.env</name>
   <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
 </property>
</configuration>

you need stop-all.sh and start-all.sh to restart hadoop after changes.

  1. Run the application:
    $ hadoop jar wc.jar WordCount /wordcount/input /wordcount/output
    verify the output:
    hadoop@ubunu2004:~$ hadoop fs -ls /wordcount/output
    Found 2 items
    -rw-r–r– 1 hadoop supergroup 0 2020-07-13 13:34 /wordcount/output/_SUCCESS
    -rw-r–r– 1 hadoop supergroup 41 2020-07-13 13:34 /wordcount/output/part-r-00000
    you need delete /wordcount/output if you need run it again:
    hadoop fs -rm -r -f /wordcount/output

2 Replies to “MapReduce Tutorial with sample code WordCount.java”

  1. Hi my family member! I want to say that this article is amazing, nice written and include almost all significant infos.
    I’d like to look more posts like this .

  2. It’s a shame you don’t have a donate button! I’d definitely
    donate to this brilliant blog! I suppose for now i’ll settle for book-marking and adding your
    RSS feed to my Google account. I look forward to fresh updates and will share this website with my Facebook group.

    Chat soon!

Leave a Reply

Your email address will not be published. Required fields are marked *