高可用之heartbeat双主模式

一：Heartbeat介绍：

heartbeat是一款开源提供高可用（Highly-Available）服务的软件，通过heartbeat，可以将资源从一台已经故障的计算机快速转移到另一台正常运转的机器上继续提供服务，一般称之为高可用服务。在实际生产应用场景中，heartbeat的功能和另一个高可用开源软件keepalived有很多相同之处，但在生产中，对应实际的业务应用也是有区别的，例如，keepalived主要是控制ip的漂移，配置、应用简单，而heartbeat则不但可以控制ip漂移，更擅长对资源服务的控制，配置、应用比较复杂。heartbeat官方地址：http://linux-ha.org/wiki/Main_page

搭建heartbeat双主模式

mastereth0 10.0.0.72管理IP,用于wan的数据转发 eth1 10.20.23.115 用于心跳直连 vip10.0.0.73用于提供应用A的挂载服务
backupeth0 10.0.0.71管理IP,用于wan的数据转发 eth1 10.20.23.111 用于心跳直连 vip10.0.0.74用于提供应用A的挂载服务

一：搭建heartbeat高可用环境

1.配置hosts

cat>>/etc/hosts<<eof
10.0.0.72  master
10.0.0.71  backup
eof

2.配置心跳直连

a.matser
/sbin/route add -host 10.20.23.111 dev eth1
route -n 
or
echo '/sbin/route add -host 10.20.23.111 dev eth1' >>/etc/rc.local
b.backup
/sbin/route add -host  10.20.23.115 dev eth1
route -n 
or
echo '/sbin/route add -host 10.20.23.115 dev eth1' >>/etc/rc.local

3.安装heartbeat

wget http://mirrors.sohu.com/fedora-epel/6/i386/epel-release-6-8.noarch.rpm
 rpm -ivh epel-release-6-8.noarch.rpm 
 yum install heartbeat*
查看安装版本：
[root@backup ~]# rpm -qa heartbeat
heartbeat-3.0.4-2.el6.x86_64
[root@master ~]# rpm -qa heartbeat
heartbeat-3.0.4-2.el6.x86_64

4.配置文件

ha.cf heartbeat参数配置文件
authkey heartbeat认证文件
haresource 资源配置文件
模板配置文件：

[root@master ha.d]# ll /usr/share/doc/heartbeat-3.0.4/
-rw-r--r--. 1 root root  1873 Dec  2  2013 apphbd.cf
-rw-r--r--. 1 root root   645 Dec  2  2013 authkeys
-rw-r--r--. 1 root root  3701 Dec  2  2013 AUTHORS
-rw-r--r--. 1 root root 58752 Dec  2  2013 ChangeLog
-rw-r--r--. 1 root root 17989 Dec  2  2013 COPYING
-rw-r--r--. 1 root root 26532 Dec  2  2013 COPYING.LGPL
-rw-r--r--. 1 root root 10502 Dec  2  2013 ha.cf
-rw-r--r--. 1 root root  5905 Dec  2  2013 haresources
-rw-r--r--. 1 root root  2935 Dec  2  2013 README
cd /etc/ha.d
[root@master ha.d]# cp /usr/share/doc/heartbeat-3.0.4/ha.cf ./
[root@master ha.d]# cp /usr/share/doc/heartbeat-3.0.4/authkeys ./
[root@master ha.d]# cp /usr/share/doc/heartbeat-3.0.4/haresources ./
赋权限： 
chmod 600 /etc/ha.d/authkeys
修改配置文件：
[root@master ha.d]# cp ha.cf{,.bak} 
[root@master ha.d]# cp authkeys{,.bak}  
[root@master ha.d]# cp haresources{,.bak}  
cat >/etc/ha.d/ha.cf<<EOF
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 60
mcast eth1 225.0.0.1 694 1 0
auto_failback on
node master
node backup
EOF
[root@master ha.d]# grep -Ev '#|^$' /etc/ha.d/ha.cf
cat >/etc/ha.d/authkeys <<EOF
auth 1
1 sha1 c4f9375f9834b4e7f0a528cc65c055702bf5f24a
EOF
chmod 600 /etc/ha.d/authkeys
ll /etc/ha.d/authkeys
grep -Ev '#|^$' /etc/ha.d/authkeys
cat >/etc/ha.d/haresources<<EOF
master IPaddr::10.0.0.73/25/eth0
backup IPaddr::10.0.0.74/25/eth0
EOF

##设置VIP
[root@master ha.d]# grep -Ev '#|^$' /etc/ha.d/haresources
启动主的heartbeat
[root@master network-scripts]# /etc/init.d/heartbeat start
查看vip开启情况：
开始的时候没发现VIP 60s后vip绑定原因是initdead 60；

[root@master network-scripts]# ip add|grep 10.0.0.
    inet 10.0.0.72/25 brd 10.0.0.127 scope global eth0
    inet 10.0.0.74/25 brd 10.0.0.127 scope global secondary eth0
    inet 10.0.0.73/25 brd 10.0.0.127 scope global secondary eth0

启动backup的heartbeat

[root@backup ~]# /etc/init.d/heartbeat start
[root@backup ~]# ip addr |grep 10.0.0
    inet 10.0.0.71/25 brd 10.0.0.127 scope global eth0
    inet 10.0.0.73/25 brd 10.0.0.127 scope global secondary eth0
    inet 10.0.0.74/25 brd 10.0.0.127 scope global secondary eth0
接着客户端也启动两个vip,发生裂脑。
/etc/init.d/iptables stop
[root@master network-scripts]# /etc/init.d/heartbeat stop
[root@backup ~]# /etc/init.d/heartbeat stop
[root@master network-scripts]# /etc/init.d/heartbeat start
[root@backup ~]# /etc/init.d/heartbeat start
[root@master network-scripts]# ip add|grep 10.0.0.
    inet 10.0.0.72/25 brd 10.0.0.127 scope global eth0
    inet 10.0.0.73/25 brd 10.0.0.127 scope global secondary eth0
[root@backup ~]# ip addr |grep 10.0.0
    inet 10.0.0.71/25 brd 10.0.0.127 scope global eth0
    inet 10.0.0.74/25 brd 10.0.0.127 scope global secondary eth0
停掉masterheartbeat，查看是否接管。
[root@master ha.d]# /etc/init.d/heartbeat stop
Stopping High-Availability services: Done.
查看backup是否接管；基本很快接管。

参数说明：
1．主配置文件(/etc/ha.d/ha.cf)
下面对ha.cf文件的每个选项进行详细介绍，其中“#”号后面的内容是对选项的注释说明。
#debugfile /var/log/ha-debug #调试日志存放位置
logfile /var/log/ha-log #指名heartbeat的日志存放位置。
logfacility local0 ##在syslog服务中配置通过local0设备接收日志
#crm yes #是否开启Cluster Resource Manager（集群资源管理）功能。
bcast eth1 #指明心跳使用以太网广播方式，并且是在eth1接口上进行广播。
keepalive 2 #指定心跳间隔时间为2秒（即每两秒钟在eth1上发送一次广播）。
deadtime 30 #指定备用节点在30秒内没有收到主节点的心跳信号后，则立即接管主节点的服务资源。
warntime 10 #指定心跳延迟的时间为十秒。当10秒钟内备份节点不能接收到主节点的心跳信号时，就会往日志中写入一个警告日志，但此时不会切换服务。
initdead 120 #在某些系统上，系统启动或重启之后需要经过一段时间网络才能正常工作，该选项用于解决这种情况产生的时间间隔。取值至少为deadtime的两倍。
udpport 694 #设置广播通信使用的端口，694为默认使用的端口号。
baud 19200 #设置串行通信的波特率。
#serial /dev/ttyS0 #选择串行通信设备，用于双机使用串口线连接的情况。如果双机使用以太网连接，则应该关闭该选项。
#ucast eth0 192.168.1.2 #采用网卡eth0的udp单播来组织心跳，后面跟的IP地址应为双机对方的IP地址。
#mcast eth0 225.0.0.1 694 1 0 #采用网卡eth0的Udp多播来组织心跳，一般在备用节点不止一台时使用。Bcast、ucast和mcast分别代表广播、单播和多播，是组织心跳的三种方式，任选其一即可。
auto_failback on #用来定义当主节点恢复后，是否将服务自动切回，heartbeat的两台主机分别为主节点和备份节点。主节点在正常情况下占用资源并运行所有的服务，遇到故障时把资源交给备份节点并由备份节点运行服务。在该选项设为on的情况下，一旦主节点恢复运行，则自动获取资源并取代备份节点，如果该选项设置为off，那么当主节点恢复后，将变为备份节点，而原来的备份节点成为主节点。
#stonith baytech /etc/ha.d/conf/stonith.baytech # stonith的主要作用是使出现问题的节点从集群环境中脱离，进而释放集群资源，避免两个节点争用一个资源的情形发生。保证共享数据的安全性和完整性。
#watchdog /dev/watchdog #该选项是可选配置，是通过Heartbeat来监控系统的运行状态。使用该特性，需要在内核中载入"softdog"内核模块，用来生成实际的设备文件，如果系统中没有这个内核模块，就需要指定此模块，重新编译内核。编译完成输入"insmod softdog"加载该模块。然后输入"grep misc /proc/devices"(应为10)，输入"cat /proc/misc |grep watchdog"(应为130)。最后，生成设备文件："mknod /dev/watchdog c 10 130" 。即可使用此功能。
node node1 #主节点主机名，可以通过命令“uanme –n”查看。
node node2 #备用节点主机名。
ping 192.168.60.1 #选择ping的节点，ping 节点选择的越好，HA集群就越强壮，可以选择固定的路由器作为ping节点，但是最好不要选择集群中的成员作为ping节点，ping节点仅仅用来测试网络连接。
respawn hacluster /usr/lib/heartbeat/ipfail #该选项是可选配置，列出与heartbeat一起启动和关闭的进程，该进程一般是和heartbeat集成的插件，这些进程遇到故障可以自动重新启动。最常用的进程是ipfail，此进程用于检测和处理网络故障，需要配合ping语句指定的ping node来检测网络的连通性。其中hacluster表示启动ipfail进程的身份。
crm no #集群资源管理功能

配置authkey:
三种可用的认证方式：crc、md5和sha1 #和安全有关，安全要求越高，依次选择
如果heartbeat集群运行在安全的网络上，可以使用crc方式，如果HA每个节点的硬件配置很高，建议使用sha1，这种认证方式安全级别最高，如果是处于网络安全和系统资源之间，可以使用md5认证方式。
# Authentication file. Must be mode 600 #authkey权限应该为600