环境:Ubuntu 16.04
本文将基于Hadoop
2.2.0讲解其在Linux集群上的安装方法,并对一些重要的设置项进行解释
Hadoop
分布式集群搭建初入门
需要软件:jdk ssh
相关阅读:
一、配置环境
1.设置主机名和对应的地址映射
[root@master ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4
localhost4.localdomain4
::1 localhost localhost.localdomain localhost6
localhost6.localdomain6
192.168.230.130 master
192.168.230.131 slave1
192.168.230.100 slave2
#分别对三台设备配置hostname和hosts
Ubuntu
13.04上搭建Hadoop环境
http://www.linuxidc.com/Linux/2013-06/86106.htm
2.在三个节点上分别新建hadoop用户
[root@master ~]# tail -1 /etc/passwd
hadoop:x:1001:1001::/home/hadoop:/bin/bash
Hadoop 2.8.3
Ubuntu 12.10 +Hadoop 1.2.1版本集群配置
http://www.linuxidc.com/Linux/2013-09/90600.htm
二、为hadoop配置所有节点之间的ssh免密登陆
Ubuntu上搭建Hadoop环境(单机模式+伪分布模式)
http://www.linuxidc.com/Linux/2013-01/77681.htm
1.生成密钥
[hadoop@master ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
/home/hadoop/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
1c:16:61:04:4f:76:93:cd:da:9a:08:04:15:58:7d:96 hadoop@master
The key’s randomart image is:
+–[ RSA 2048]—-+
| .===B.o= |
| . .=.oE.o |
| . +o o |
| .o .. . |
| .S. o |
| . o |
| |
| |
| |
+—————–+
[hadoop@master ~]$
Ubuntu下Hadoop环境的配置
http://www.linuxidc.com/Linux/2012-11/74539.htm
2.发送公钥
安装 jdk并配置环境变量
单机版搭建Hadoop环境图文教程详解
http://www.linuxidc.com/Linux/2012-02/53927.htm
[hadoop@master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave1
The authenticity of host ‘slave1 (192.168.230.131)’ can’t be
established.
ECDSA key fingerprint is
32:1a:8a:37:f8:11:bc:cc:ec:35:e6:37:c2:b8:e1:45.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to
filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed — if you
are prompted now it is to install the new keys
hadoop@slave1’s password:
Number of key(s) added: 1
Now try logging into the machine, with: “ssh ‘hadoop@slave1′”
and check to make sure that only the key(s) you wanted were added.
[hadoop@master ~]$
安装ssh和rshync,主要设置免密登录
搭建Hadoop环境(在Winodws环境下用虚拟机虚拟两个Ubuntu系统进行搭建)
http://www.linuxidc.com/Linux/2011-12/48894.htm
[hadoop@master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave2
[hadoop@master ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@master
#slave1和slave2对其他节点略
sudo apt-get install ssh
- 网络设置
3.验证登陆
sudo apt-get install rshync
禁用防火墙
[hadoop@master ~]$ ssh hadoop@slave1
Last login: Wed Jul 26 01:11:22 2017 from master
[hadoop@slave1 ~]$ exit
logout
Connection to slave1 closed.
[hadoop@master ~]$ ssh hadoop@slave2
Last login: Wed Jul 26 13:12:00 2017 from master
[hadoop@slave2 ~]$ exit
logout
Connection to slave2 closed.
[hadoop@master ~]$
sh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
service iptables stop
三、配置JAVA
1.使用xftp将hadoop-2.7.3.tar.gz和jdk-8u131-linux-x64.tar.gz上传至master
[hadoop@master ~]$ ls
hadoop-2.7.3.tar.gz jdk-8u131-linux-x64.tar.gz
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
禁用IPv6
2.使用root用户解压并移动到/usr/local 下
[hadoop@master ~]$ exit
exit
[root@master ~]# cd /home/hadoop/
[root@master hadoop]# ls
hadoop-2.7.3.tar.gz jdk-8u131-linux-x64.tar.gz
[root@master hadoop]# tar -zxf jdk-8u131-linux-x64.tar.gz
[root@master hadoop]# ls
hadoop-2.7.3.tar.gz jdk1.8.0_131 jdk-8u131-linux-x64.tar.gz
[root@master hadoop]# mv jdk1.8.0_131 /usr/local/
[root@master hadoop]# cd /usr/local/
[root@master local]# ls
bin etc games include jdk1.8.0_131 lib lib64 libexec sbin
share src
[root@master local]#
ssh
打开/etc/modprobe.d/dist.conf,添加:
3.配置java环境变量(这里使用的是全局变量)
[root@master ~]# vim /etc/profile
#在文件末尾添加如下java环境变量
[root@master ~]# tail -5 /etc/profile
export JAVA_HOME=/usr/local/jdk1.8.0_131 #注意jdk版本
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$JAVA_HOME/bin:$PATH
[root@master ~]#
[root@master ~]# source /etc/profile #使配置生效
安装hadoop
alias net-pf-10 off
4.测试master上的java是否配置完成
root@linuxidc.com:/usr/local/hadoop# tar -xzvf
/home/hett/Downloads/hadoop-2.8.3.tar.gz
root@linuxidc.com:/usr/local/hadoop# mv hadoop-2.8.3 hadoop
root@linuxidc.com:/usr/local# cd hadoop/
root@linuxidc.com:/usr/local/hadoop# mkdir tmp
root@linuxidc.com:/usr/local/hadoop# mkdir hdfs
root@linuxidc.com:/usr/local/hadoop# mkdir hdfs/data
root@linuxidc.com:/usr/local/hadoop# mkdir hdfs/name
root@linuxidc.com:/usr/local/hadoop# nano
/etc/profile
alias ipv6 off
[root@master ~]# java -version
java version “1.8.0_131”
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
[root@master ~]#
配置
export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/local/jdk1.8.0_151
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH:$HADOOP_HOME/bin
重新系统之后,可以通过命令:
5.使用scp将jdk拷贝到slave1和slave2
12 [root@master ~]# scp -r /usr/local/jdk1.8.0_131/
root@slave1:/usr/local/
[root@master ~]# scp -r /usr/local/jdk1.8.0_131/
root@slave2:/usr/local/
root@linuxidc.com:/usr/local/hadoop# source /etc/profile
lsmod|grep ipv6
6.配置slave1和slave2上的环境变量(同步骤3),配置完后使用java
-version验证一下
root@linuxidc.com:/usr/local/hadoop# cd etc/hadoop/
root@linuxidc.com:/usr/local/hadoop/etc/hadoop# ls
capacity-scheduler.xml httpfs-env.sh mapred-env.sh
configuration.xsl httpfs-log4j.properties
mapred-queues.xml.template
container-executor.cfg httpfs-signature.secret
mapred-site.xml.template
core-site.xml httpfs-site.xml slaves
hadoop-env.cmd kms-acls.xml
ssl-client.xml.example
hadoop-env.sh kms-env.sh
ssl-server.xml.example
hadoop-metrics2.properties kms-log4j.properties yarn-env.cmd
hadoop-metrics.properties kms-site.xml yarn-env.sh
hadoop-policy.xml log4j.properties yarn-site.xml
hdfs-site.xml mapred-env.cmd
root@linuxidc.com:/usr/local/hadoop/etc/hadoop#
查看ipv6模块是否已经不再加载
四、配置hadoop环境
root@linuxidc.com:/usr/local/hadoop/etc/hadoop#
nano hadoop-env.sh
- 安装与配置
1.解压hadoop并移动到/usr/local 下
[root@master ~]# cd /home/hadoop/
[root@master hadoop]# ls
hadoop-2.7.3.tar.gz jdk-8u131-linux-x64.tar.gz
[root@master hadoop]# tar -zxf hadoop-2.7.3.tar.gz
[root@master hadoop]# mv hadoop-2.7.3 /usr/local/hadoop
[root@master hadoop]# ls /usr/local/
bin etc games hadoop include jdk1.8.0_131 lib lib64 libexec
sbin share src
export
JAVA_HOME=/usr/local/jdk1.8.0_151
2.1 安装前的准备
2.更改hadoop的文件所属用户
[root@master ~]# cd /usr/local
[root@master local]# chown -R hadoop:hadoop /usr/local/hadoop
[root@master local]# ll
drwxr-xr-x 9 hadoop hadoop 149 Aug 17 2016 hadoop
[root@master local]#
配置yarn-env.sh
在安装hadoop前需要安装ssh,配置各节点间的基于密钥的免密码登录,安装jdk1.7并配置JAVA_HOME,关于这些操作请参考其他文档,本文不做赘述,只给出/etc/profile中关于JAVA_HOME和HADOOP_HOME的配置参考:
3.配置hadoop环境变量
export JAVA_HOME=/usr/local/jdk1.8.0_151
JAVA_HOME=/usr/java/jdk1.7.0_51
HADOOP_HOME=/usr/local/hadoop
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export JAVA_HOME HADOOP_HOME PATH
[root@master local]# vim /etc/profile
[root@master local]# tail -4 /etc/profile
#hadoop
export HADOOP_HOME=/usr/local/hadoop #注意路径
export PATH=”$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin”
[root@master local]#
[root@master local]# source /etc/profile #使配置生效
3)配置core-site.xml
添加如下配置:
注:为了方便起见,我们将$HADOOP_HOME/sbin也加入到了PATH中,同时为了在输入命令时避免同名cmd文件的干扰,可以使用rm
-f $HADOOP_HOME/bin/*.cmd;rm -f
$HADOOP_HOME/sbin/*.cmd;删除cmd文件。
4.测试
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>HDFS的URI,文件系统://namenode标识:端口号</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hadoop/tmp</value>
<description>namenode上本地的hadoop临时文件夹</description>
</property>
</configuration>
2.2 配置必要的环境变量
[root@master local]# hadoop version
Hadoop 2.7.3
Subversion -r
baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using
/usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.3.jar
[root@master local]#
4),配置hdfs-site.xml
添加如下配置
本文安装基于这样一个约定或者说偏好:安装程序位于/usr/local(或者/opt),生成的文件和相关数据文件集中放置于/var/hadoop,将发行包解压至/usr/local(或者/opt)后,分别编辑${HADOOP_HOME}/etc/hadoop/hadoop-env.sh和${HADOOP_HOME}/etc/hadoop/yarn-env.sh,在两个文件中找到并修改或者是添加下列环境变量:
5.配置hadoop-env.sh
[root@master local]# cd $HADOOP_HOME/etc/hadoop
[root@master hadoop]# pwd
/usr/local/hadoop/etc/hadoop
[root@master hadoop]#
[root@master hadoop]# vim hadoop-env.sh
[root@master hadoop]# tail -1 hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_131 #在末尾添加
[root@master hadoop]#
<configuration>
<!—hdfs-site.xml-->
<property>
<name>dfs.name.dir</name>
<value>/usr/hadoop/hdfs/name</value>
<description>namenode上存储hdfs名字空间元��据 </description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/hadoop/hdfs/data</value>
<description>datanode上数据块的物理存储位置</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>副本个数,配置默认是3,应小于datanode机器数量</description>
</property>
</configuration>
export JAVA_HOME=/your/java/home
export HADOOP_LOG_DIR=/var/hadoop/logs
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_PREFIX/lib”
6.配置core-site.xml
<configuration>
<!– 指定hdfs的nameService –>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
5),配置mapred-site.xml
添加如下配置:
上述环境变量的配置都不是必须的,对于第一项配置原文件的写法是export
JAVA_HOME=${JAVA_HOME},但在集群环境下启动时可能会报JAVA_HOME is not
set and could not be
found错误,从该项的注释上我们了解到,在集群环境下即使各结点都正确地配置了JAVA_HOME,这里最好还是显示地重新声明一遍JAVA_HOME.第二项配置是指定log的存放目录,默认位置是安装目录下的logs文件夹,按前文约定,本次安装将log文件置于/var/hadoop/logs下。对第三项和第四顶配置要视情况添加,如果出现4.2节所描述的问题则这两项是必须的!
7.配置hdfs-site.xml
<configuration>
<!– 数据节点数 –>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!– nameNode数据目录 –>
#目录不存在需要手动创建,并把所属改为hadoop
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/dfs/name</value>
</property>
<!– dataNode数据目录 –>
#目录不存在需要手动创建,并把所属改为hadoop
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/dfs/data</value>
</property>
</configuration>
2.3 配置${HADOOP_HOME}/etc/hadoop/core-site.xml
8.配置yarn-site.xml
<configuration>
<!– 指定YARN的ResourceManager的地址 –>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<!– reducer取数据的方式 –>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://YOUR-NAMENODE:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop</value>
</property>
</configuration>
9.配置mapred-site.xml
[root@master hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@master hadoop]# vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
6),配置yarn-site.xml
添加如下配置:
core-site.xml的各项默认配置可参考:
10.配置slaves
[root@master hadoop]# vim slaves
[root@master hadoop]# cat slaves
slave1
slave2
[root@master hadoop]#
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>192.168.241.128:8099</value>
</property>
</configuration>
对于一个新集群来说,唯一必须修改的项是:fs.defaultFS,该项指明了文件系统的访问入口,实际上是告知所有的datanode它们的namenode是哪一个从而建立起namenode与各datanode之间的通信。
11.使用scp将配置好的hadoop传输到slave1和slave2节点上
[root@master ~]# scp -r /usr/local/hadoop root@slave1:/usr/local/
[root@master ~]# scp -r /usr/local/hadoop root@slave2:/usr/local/
4,Hadoop启动
1)格式化namenode
除此之外,按照前文约定,我们把hadoop.tmp.dir设置为/var/hadoop。观察core-default.xml我们可以发现,在所有涉及目录的配置项上,默认都是在${hadoop.tmp.dir}之下建立子文件夹,所以本次安装我们只简单地将hadoop.tmp.dir的原默认值/tmp/hadoop-${user.name}改为/var/hadoop,将所有hadoop生成和使用的文件集中/var/hadoop下,避免与/tmp目录下若干其他文件混杂在一起。
12.配置slave1和slave2上的环境变量(同步骤3),配置完后使用hadoop
version验证一下
$ bin/hdfs namenode –format
更多详情见请继续阅读下一页的精彩内容:
http://www.linuxidc.com/Linux/2014-04/100934p2.htm
13.格式化 hdfs namenode–format
[root@master hadoop]# su hadoop
[hadoop@master hadoop]$ cd /usr/local/hadoop/
[hadoop@master hadoop]$ hdfs namenode -format
#一定要在hadoop用户下进行
17/07/26 20:26:12 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = master/192.168.230.130
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.7.3
.
.
.
17/07/26 20:26:15 INFO util.ExitUtil: Exiting with status 0 #status
为0才是成功
17/07/26 20:26:15 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.230.130
************************************************************/
[hadoop@master hadoop]$
2)启动NameNode 和 DataNode 守护进程
五、启动hadoop服务
$ sbin/start-dfs.sh
1.启动所有的服务
[hadoop@master dfs]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
hadoop@master’s password: #输入master上的hadoop的密码
master: starting namenode, logging to
/usr/local/hadoop/logs/hadoop-hadoop-namenode-master.out
slave1: starting datanode, logging to
/usr/local/hadoop/logs/hadoop-hadoop-datanode-slave1.out
slave2: starting datanode, logging to
/usr/local/hadoop/logs/hadoop-hadoop-datanode-slave2.out
Starting secondary namenodes [0.0.0.0]
hadoop@0.0.0.0’s password: #输入master上的hadoop的密码
0.0.0.0: starting secondarynamenode, logging to
/usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to
/usr/local/hadoop/logs/yarn-hadoop-resourcemanager-master.out
slave1: starting nodemanager, logging to
/usr/local/hadoop/logs/yarn-hadoop-nodemanager-slave1.out
slave2: starting nodemanager, logging to
/usr/local/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
[hadoop@master dfs]$
3)启动ResourceManager 和 NodeManager 守护进程
2.验证
[hadoop@master dfs]$ jps #master上的进程
7491 Jps
6820 NameNode
7014 SecondaryNameNode
7164 ResourceManager
[hadoop@master dfs]$
$ sbin/start-yarn.sh
[root@slave1 name]# jps #slave1上的进程
3160 NodeManager
3050 DataNode
3307 Jps
[root@slave1 name]#
- $ cd ~/.ssh/ # 若没有该目录,请先执行一次ssh localhost
- $ ssh-keygen -t rsa # 会有提示,都按回车就可以
- $ cat id_rsa.pub >> authorized_keys # 加入授权
[root@slave2 name]# jps #slave2上的进程
3233 DataNode
3469 Jps
3343 NodeManager
[root@slave2 name]#
3.使用浏览器管理
root@linuxidc.com:~# cd /usr/local/hadoop/
root@linuxidc.com:/usr/local/hadoop# sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to
/usr/local/hadoop/logs/hadoop-root-namenode-linuxidc.com.out
localhost: starting datanode, logging to
/usr/local/hadoop/logs/hadoop-root-datanode-linuxidc.com.out
……..
5,启动验证
1)执行jps命令,有如下进程,说明Hadoop正常启动
# jps
6097 NodeManager
11044 Jps
7497 -- process information unavailable
8256 Worker
5999 ResourceManager
5122 SecondaryNameNode
8106 Master
4836 NameNode
4957 DataNode
Hadoop项目之基于CentOS7的Cloudera
5.10.1(CDH)的安装部署
http://www.linuxidc.com/Linux/2017-04/143095.htm
Hadoop2.7.2集群搭建详解(高可用)
http://www.linuxidc.com/Linux/2017-03/142052.htm
Hadoop项目之基于CentOS7的Cloudera
5.10.1(CDH)的安装部署
http://www.linuxidc.com/Linux/2017-04/143095.htm
使用Ambari来部署Hadoop集群(搭建内网HDP源)
http://www.linuxidc.com/Linux/2017-03/142136.htm
Hadoop2.7.2集群搭建详解(高可用)
http://www.linuxidc.com/Linux/2017-03/142052.htm
Ubuntu
14.04下Hadoop集群安装
http://www.linuxidc.com/Linux/2017-02/140783.htm
使用Ambari来部署Hadoop集群(搭建内网HDP源)
http://www.linuxidc.com/Linux/2017-03/142136.htm
CentOS 6.7安装Hadoop 2.7.2
http://www.linuxidc.com/Linux/2017-08/146232.htm
Ubuntu 14.04下Hadoop集群安装
http://www.linuxidc.com/Linux/2017-02/140783.htm
Ubuntu 16.04上构建分布式Hadoop-2.7.3集群
http://www.linuxidc.com/Linux/2017-07/145503.htm
CentOS 6.7安装Hadoop 2.7.2
http://www.linuxidc.com/Linux/2017-08/146232.htm
CentOS 7.3下Hadoop2.8分布式集群安装与测试
http://www.linuxidc.com/Linux/2017-09/146864.htm
Ubuntu 16.04上构建分布式Hadoop-2.7.3集群
http://www.linuxidc.com/Linux/2017-07/145503.htm
CentOS 7 下 Hadoop 2.6.4 分布式集群环境搭建
http://www.linuxidc.com/Linux/2017-06/144932.htm
CentOS 7.3下Hadoop2.8分布式集群安装与测试
http://www.linuxidc.com/Linux/2017-09/146864.htm
Hadoop2.7.3+Spark2.1.0完全分布式集群搭建过程
http://www.linuxidc.com/Linux/2017-06/144926.htm
CentOS 7 下 Hadoop 2.6.4 分布式集群环境搭建
http://www.linuxidc.com/Linux/2017-06/144932.htm
更多Hadoop相关信息见Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13
Hadoop2.7.3+Spark2.1.0完全分布式集群搭建过程
http://www.linuxidc.com/Linux/2017-06/144926.htm
本文永久更新链接地址:http://www.linuxidc.com/Linux/2017-10/147640.htm
更多Hadoop相关信息见Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13
本文永久更新链接地址:http://www.linuxidc.com/Linux/2017-12/149854.htm