零基础体验hadoop大数据(5)

时间:2016-10-13 09:16 来源:潇湘夜雨作者:华嵩阳点击:次

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>172.18.109.235:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>172.18.109.235:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>172.18.109.235:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>172.18.109.235:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>172.18.109.235:8088</value>
</property>
注意：下面一段属性，在2.7的版本中，最小值为1024，所以设定内存分配时要大于1024，否者会导致nodemanager不能启动
在测试环境中，如果内存不足，可以省略下面的内容，让系统根据实际情况分配内存，但仍然建议所有节点的物理内存
要大于1G。
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1024</value>
</property>
</configuration>

9、配置namenode,修改env环境变量文件

配置之前要确保你已经安装了java6或者java7，并且java的环境变量已经配置好。
所以将hadoop-env.sh、mapred-env.sh、yarn-env.sh这几个文件中的JAVA_HOME改为/usr/java/jdk1.7.0_71
Hadoop运行时需要能访问到如前安装的Java环境，这可以通过将其二进制程序(/usr/)所在的目录添加至PATH环境变量的路径中实现，也可以通过设定hadoop-env.sh脚本来进行。这里采用修改脚本。
export JAVA_HOME=/home/java/jdk1.7.0_79
export JAVA_HOME=/usr/bin

也可以修改系统环境变量，编辑/etc/profile.d/java.sh，
在文件中添加如下内容：
JAVA_HOME=/home/java/jdk1.7.0_79
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME PATH
注意：yum安装的jdk的二进制程序在/usr/bin/java目录下，但是Hadoop中java的应用格式为JAVA=$JAVA_HOME/bin/java，所以环境变量设为JAVA_HOME=/usr/

10、配置/usr/local/hadoop/etc/hadoop目录下的slaves，删除默认的localhost，增加2个从节点，
172.18.109.236
172.18.109.237
11、将配置好的Hadoop复制到各个节点对应位置上，通过scp传送，
[root@hadoop-master local]# scp -r hadoop/ hadoop1:/usr/local/
[root@hadoop-master local]# scp -r hadoop/ hadoop2:/usr/local/

12、在Master服务器启动hadoop，从节点会自动启动，进入/home/hadoop/hadoop-2.7.0目录
(1)初始化，输入命令，bin/hdfs namenode -format
(2)全部启动sbin/start-all.sh，也可以分开sbin/start-dfs.sh、sbin/start-yarn.sh
[root@hadoop-master hadoop]# sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hadoop-master]
root@hadoop-master's password:
hadoop-master: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-hadoop-master.out
172.18.109.237: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoop2.out
172.18.109.236: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoop1.out
Starting secondary namenodes [hadoop-master]
root@hadoop-master's password:
hadoop-master: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-hadoop-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-hadoop-master.out
172.18.109.236: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hadoop1.out
172.18.109.237: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hadoop2.out

(3)停止的话，输入命令，sbin/stop-all.sh
(4)输入命令，jps，可以看到相关信息
[root@hadoop-master ~]# jps
21698 NameNode
21882 SecondaryNameNode
22027 ResourceManager
22497 Jps
[root@hadoop1 ~]# jps
12940 Jps
12684 DataNode
12795 NodeManager
[root@hadoop2 ~]# jps
2048 DataNode
2305 Jps
2161 NodeManager

(5)设置hadoop环境变量
编辑/etc/profile，添加如下内容：
HADOOP_BASE=/usr/local/hadoop
PATH=$HADOOP_BASE/bin:$PATH
export HADOOP_BASE PATH
source /etc/profile #让环境变量生效
测试命令： hadoop version
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.3.jar
(6)上传文件到hdfs
[root@hadoop-master ~]# hadoop fs -mkdir -p test #在hdfs中的创建目录
[root@hadoop-master ~]# hadoop fs -ls #查看创建的目录
Found 1 items
drwxr-xr-x   - root supergroup          0 2016-10-12 22:26 test
[root@hadoop-master ~]# hadoop fs -put test.txt test #上传文件到hdfs中的test目录
[root@hadoop-master ~]# hadoop fs -ls test #查看上传是否成功
Found 1 items
-rw-r--r--   2 root supergroup       1647 2016-10-12 22:28 test/test.txt

13、测试mapreduce
hadoop自带一个统计单词的测试脚本
[root@hadoop-master ~]# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar #查看脚本的描述
An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

运行一个job，统计test.txt中每个单词的个数，这里为了快速计算出结果，数据量很少，如果时间和测试节点的性能不错，可以
增大数据量，可以更真实的感受大数据的魅力。
test.txt内容：
hellow welcome to chendu
hellow welcome
[root@hadoop-master ~]# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount test/test.txt wordout
注意：wordout为计算结果的输出目录，该目录不能是hdfs中已存在的目录。
计算过程如下：
16/10/13 01:52:19 INFO client.RMProxy: Connecting to ResourceManager at /172.18.109.235:8032
16/10/13 01:52:20 INFO input.FileInputFormat: Total input paths to process : 1
16/10/13 01:52:21 INFO mapreduce.JobSubmitter: number of splits:1
16/10/13 01:52:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1476294690985_0002
16/10/13 01:52:21 INFO impl.YarnClientImpl: Submitted application application_1476294690985_0002
16/10/13 01:52:21 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1476294690985_0002/
16/10/13 01:52:21 INFO mapreduce.Job: Running job: job_1476294690985_0002
16/10/13 01:52:31 INFO mapreduce.Job: Job job_1476294690985_0002 running in uber mode : false (责任编辑：liangzh)

零基础体验hadoop大数据(5)

时间:2016-10-13 09:16 来源:潇湘夜雨 作者:华嵩阳 点击:次

时间:2016-10-13 09:16 来源:潇湘夜雨作者:华嵩阳点击:次