零基础体验hadoop大数据(5)
时间:2016-10-13 09:16 来源:潇湘夜雨 作者:华嵩阳 点击:次
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>172.18.109.235:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>172.18.109.235:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>172.18.109.235:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>172.18.109.235:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>172.18.109.235:8088</value> </property> 注意:下面一段属性,在2.7的版本中,最小值为1024,所以设定内存分配时要大于1024,否者会导致nodemanager不能启动 在测试环境中,如果内存不足,可以省略下面的内容,让系统根据实际情况分配内存,但仍然建议所有节点的物理内存 要大于1G。 <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>1024</value> </property> </configuration> 9、 配置namenode,修改env环境变量文件 配置之前要确保你已经安装了java6或者java7,并且java的环境变量已经配置好。 所以将hadoop-env.sh、mapred-env.sh、yarn-env.sh这几个文件中的JAVA_HOME改为/usr/java/jdk1.7.0_71 Hadoop运行时需要能访问到如前安装的Java环境,这可以通过将其二进制程序(/usr/)所在的目录添加至PATH环境变量的路径中实现,也可以通过设定hadoop-env.sh脚本来进行。这里采用修改脚本。 export JAVA_HOME=/home/java/jdk1.7.0_79 export JAVA_HOME=/usr/bin 也可以修改系统环境变量,编辑/etc/profile.d/java.sh, 在文件中添加如下内容: JAVA_HOME=/home/java/jdk1.7.0_79 PATH=$JAVA_HOME/bin:$PATH export JAVA_HOME PATH 注意:yum安装的jdk的二进制程序在/usr/bin/java目录下,但是Hadoop中java的应用格式为JAVA=$JAVA_HOME/bin/java,所以环境变量设为JAVA_HOME=/usr/ 10、配置/usr/local/hadoop/etc/hadoop目录下的slaves,删除默认的localhost,增加2个从节点, 172.18.109.236 172.18.109.237 11、将配置好的Hadoop复制到各个节点对应位置上,通过scp传送, [root@hadoop-master local]# scp -r hadoop/ hadoop1:/usr/local/ [root@hadoop-master local]# scp -r hadoop/ hadoop2:/usr/local/ 12、在Master服务器启动hadoop,从节点会自动启动,进入/home/hadoop/hadoop-2.7.0目录 (1)初始化,输入命令,bin/hdfs namenode -format (2)全部启动sbin/start-all.sh,也可以分开sbin/start-dfs.sh、sbin/start-yarn.sh [root@hadoop-master hadoop]# sbin/start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh Starting namenodes on [hadoop-master] root@hadoop-master's password: hadoop-master: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-hadoop-master.out 172.18.109.237: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoop2.out 172.18.109.236: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoop1.out Starting secondary namenodes [hadoop-master] root@hadoop-master's password: hadoop-master: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-hadoop-master.out starting yarn daemons starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-hadoop-master.out 172.18.109.236: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hadoop1.out 172.18.109.237: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hadoop2.out (3)停止的话,输入命令,sbin/stop-all.sh (4)输入命令,jps,可以看到相关信息 [root@hadoop-master ~]# jps 21698 NameNode 21882 SecondaryNameNode 22027 ResourceManager 22497 Jps [root@hadoop1 ~]# jps 12940 Jps 12684 DataNode 12795 NodeManager [root@hadoop2 ~]# jps 2048 DataNode 2305 Jps 2161 NodeManager (5)设置hadoop环境变量 编辑/etc/profile,添加如下内容: HADOOP_BASE=/usr/local/hadoop PATH=$HADOOP_BASE/bin:$PATH export HADOOP_BASE PATH source /etc/profile #让环境变量生效 测试命令: hadoop version Hadoop 2.7.3 Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff Compiled by root on 2016-08-18T01:41Z Compiled with protoc 2.5.0 From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4 This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.3.jar (6)上传文件到hdfs [root@hadoop-master ~]# hadoop fs -mkdir -p test #在hdfs中的创建目录 [root@hadoop-master ~]# hadoop fs -ls #查看创建的目录 Found 1 items drwxr-xr-x - root supergroup 0 2016-10-12 22:26 test [root@hadoop-master ~]# hadoop fs -put test.txt test #上传文件到hdfs中的test目录 [root@hadoop-master ~]# hadoop fs -ls test #查看上传是否成功 Found 1 items -rw-r--r-- 2 root supergroup 1647 2016-10-12 22:28 test/test.txt 13、测试mapreduce hadoop自带一个统计单词的测试脚本 [root@hadoop-master ~]# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar #查看脚本的描述 An example program must be given as the first argument. Valid program names are: aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files. bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi. dbcount: An example job that count the pageview counts from a database. distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi. grep: A map/reduce program that counts the matches of a regex in the input. join: A job that effects a join over sorted, equally partitioned datasets multifilewc: A job that counts words from several files. pentomino: A map/reduce tile laying program to find solutions to pentomino problems. pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method. randomtextwriter: A map/reduce program that writes 10GB of random textual data per node. randomwriter: A map/reduce program that writes 10GB of random data per node. secondarysort: An example defining a secondary sort to the reduce. sort: A map/reduce program that sorts the data written by the random writer. sudoku: A sudoku solver. teragen: Generate data for the terasort terasort: Run the terasort teravalidate: Checking results of terasort wordcount: A map/reduce program that counts the words in the input files. wordmean: A map/reduce program that counts the average length of the words in the input files. wordmedian: A map/reduce program that counts the median length of the words in the input files. wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files. 运行一个job,统计test.txt中每个单词的个数,这里为了快速计算出结果,数据量很少,如果时间和测试节点的性能不错,可以 增大数据量,可以更真实的感受大数据的魅力。 test.txt内容: hellow welcome to chendu hellow welcome [root@hadoop-master ~]# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount test/test.txt wordout 注意:wordout为计算结果的输出目录,该目录不能是hdfs中已存在的目录。 计算过程如下: 16/10/13 01:52:19 INFO client.RMProxy: Connecting to ResourceManager at /172.18.109.235:8032 16/10/13 01:52:20 INFO input.FileInputFormat: Total input paths to process : 1 16/10/13 01:52:21 INFO mapreduce.JobSubmitter: number of splits:1 16/10/13 01:52:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1476294690985_0002 16/10/13 01:52:21 INFO impl.YarnClientImpl: Submitted application application_1476294690985_0002 16/10/13 01:52:21 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1476294690985_0002/ 16/10/13 01:52:21 INFO mapreduce.Job: Running job: job_1476294690985_0002 16/10/13 01:52:31 INFO mapreduce.Job: Job job_1476294690985_0002 running in uber mode : false (责任编辑:liangzh) |