部署高可用NFS集群

NFS服务器挂载存储盘阵

准备工作

为两台作为NFS服务器的系统盘设置 raid1 ，并部署 centos7.9 系统。

设置主机名为 nfs01 和 nfs02 ，为节点规划ip地址。

==================================== ====================================

ip规划 ip地址
浮动ip 172.16.0.10/16
nfs01管理&心跳网络 172.16.0.11/16
nfs02管理&心跳网络 172.16.0.12/16

==================================== ====================================

编辑 nfs01 和 nfs02 节点的 /etc/hosts 文件:

1 2	172.16.0.11 nfs01 172.16.0.12 nfs02

检查两台NFS服务器的连线，服务器的两个FC端口需分别连接存储盘阵上下控制器的FC端口。

设置主机映射

查看 nfs01 和 nfs02 节点FC端口的 WWPN 号:

1
2
3

# cat /sys/class/fc_host/host*/port_name
0x100000109b50a92b
0x100000109b50a92c

注：取”0x”后面的字符。

浏览器访问 https://172.16.0.40 登陆存储盘阵的管理界面，添加两个 主机 ，在 主机 中添加光纤通道端口，填入对应端口的 WWPN 。

登陆 nfs01 和 nfs02 确认是否能够发现块存储:

# lsblk
# lsscsi
[0:2:0:0]    disk    AVAGO    INSPUR           4.67  /dev/sda 
[15:0:0:0]   disk    INSPUR   MCS              0000  /dev/sdb 
[16:0:0:0]   disk    INSPUR   MCS              0000  /dev/sdc

设置分区

登陆 nfs01 操作，仅需对一个块存储操作即可。

更改分区模式为GPT，并将所有空间做为一个分区:

# parted /dev/sdb 
(parted) mklabel gpt
(parted) mkpart primary 1 100%  
(parted) p
Model: INSPUR MCS (scsi)
Disk /dev/sdb: 4783GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name     Flags
1      1049kB  4783GB  4783GB               primary
(parted) q

更改后重启系统：

# reboot

设置多路径聚合

安装多路径配置所需依赖:

1	# yum -y install multipath

检查安装包:

# rpm -qa | grep multipath
device-mapper-multipath-devel-0.4.9-134.el7_9.x86_64
device-mapper-multipath-libs-0.4.9-134.el7_9.x86_64
device-mapper-multipath-0.4.9-134.el7_9.x86_64
device-mapper-multipath-sysvinit-0.4.9-134.el7_9.x86_64

依次加载两个内核模块:

1 2	# modprobe dm-multipath # modprobe dm-service-time

拷贝 multipath.conf 模板文件至 /etc 目录:

1	# cp -p /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf /etc/

在 /etc/multipath.conf 中添加如下配置:

devices {
        device {
                vendor                  "INSPUR"
                product                 "MCS"
                path_grouping_policy    group_by_prio
                path_checker            tur
                path_selector           "round-robin 0"
                failback                immediate
                features                "1 queue_if_no_path"
                prio                    alua
                no_path_retry           "60"
                rr_min_io               1
                dev_loss_tmo            120
                fast_io_fail_tmo        5
        }

启动服务和设置开机自启:

1 2	# systemctl start multipathd.service # systemctl enable multipathd.service

注：若之前已启动了服务，可通过 systemctl reload multipathd.service 命令重载配置文件。

检查multipath配置是否成功:

# multipath  -v1
# multipath  -v2
# lsblk
NAME            MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda               8:0    0   279G  0 disk  
├─sda1            8:1    0   512M  0 part  /boot
└─sda2            8:2    0 278.4G  0 part  
  ├─centos-root 253:0    0 262.4G  0 lvm   /
  └─centos-swap 253:1    0    16G  0 lvm   [SWAP]
sdb               8:16   0   4.4T  0 disk  
└─mpatha        253:2    0   4.4T  0 mpath 
  └─mpatha1     253:3    0   4.4T  0 part  
sdc               8:32   0   4.4T  0 disk  
└─mpatha        253:2    0   4.4T  0 mpath 
  └─mpatha1     253:3    0   4.4T  0 part

# multipath -ll
mpatha (36005076708808074f800000000000006) dm-2 INSPUR  ,MCS             
size=4.3T features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 16:0:0:0 sdc 8:32 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 7:0:0:0  sdb 8:16 active ready running

设置逻辑卷

标记物理卷:

1 2	# pvcreate /dev/mapper/mpatha1 Physical volume "/dev/mapper/mpatha1" successfully created.

设置卷组:

1 2	# vgcreate nfsdata_vg /dev/mapper/mpatha1 Volume group "nfsdata_vg" successfully created

设置逻辑卷:

1 2	# lvcreate -l 100%VG -n nfsdata_lv nfsdata_vg Logical volume "nfsdata_lv" created.

查看逻辑卷:

# lvs
  LV         VG         Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root       centos     -wi-ao---- <262.43g                                                    
  swap       centos     -wi-ao----   16.00g                                                    
  nfsdata_lv nfsdata_vg -wi-a-----   <4.35t

将逻辑卷格式化为 ext4 文件系统:

# mkfs.ext4 /dev/nfsdata_vg/nfsdata_lv 
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=8 blocks, Stripe width=8 blocks
145965056 inodes, 1167692800 blocks
58384640 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=3315597312
35636 block groups
32768 blocks per group, 32768 fragments per group
4096 inodes per group
Superblock backups stored on blocks: 
  32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
  4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
  102400000, 214990848, 512000000, 550731776, 644972544

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

创造 /nfsshare 目录，挂载 ext4 文件系统:

1 2	# mkdir /nfsshare # mount /dev//dev/nfsdata_vg/nfsdata_lv /nfsshare

创建 exports 目录，之后会设置为NFS的共享目录:

1	# mkdir /nfsshare/exports

卸载刚刚挂载的文件系统，并停用 nfsdata_vg 卷组:

1 2	# umount /nfsshare # vgchange -an nfsdata_vg

停用后， nfsdata_vg 卷组不再可见:

1 2	# ll /dev/nfsdata_vg ls: cannot access /dev/nfsdata_vg: No such file or directory

设置集群中卷组的独占激活

以下会对LVM进行配置，确保只有集群才能激活卷组，并只由一台节点独占激活。

执行以下命令以确保 /etc/lvm/lvm.conf 文件中的 lock_type 设置为 1，并且 use_lvmetad 设置为 0 。此命令还会立即禁用和停止任何 lvmetad 进程:

1	# lvmconf --enable-halvm --services --startstopservices

查看节点上所有卷组:

1
2
3

# vgs --noheadings -o vg_name
centos    
nfsdata_vg

/etc/lvm/lvm.conf 配置文件中的 volume_list 条目，此条目设置中的卷组被允许在集群管理控制器之外的节点上自动激活，因此必须将 nfsdata_vg 卷组排除在该设置外。

volume_list 条目默认被注释掉了，在 /etc/lvm/lvm.conf 中添加一行新的配置:

1	volume_list = [ "centos" ]

centos 卷组与本地根目录相关，必须包含在 volume_list 中。

重建 initramfs 引导镜像，确保上述操作成功配置:

1	# dracut -H -f /boot/initramfs-$(uname -r).img $(uname -r)

命令完成后，重启节点:

# reboot

登陆到另一台节点，进行相同的配置。

安装NFS服务

安装 nfs 和 rpcbind 服务:

1	# yum -y install nfs-utils rpcbind

启动服务并设置开机自启:

1
2
3

# systemctl enable --now rpcbind
# ststemctl enable --now nfs-server
# ststemctl enable --now nfs-secure

关闭防火墙:

1 2	# systemctl stop firewalld # systemctl disable firewalld

编辑 /etc/exports 配置文件:

1	/nfsshare 172.16.0.0/16(rw,no_all_squash,no_root_squash,sync)

重载NFS配置:

1	# systemctl reload nfs-server

确认本机nfs服务共享信息:

1
2
3

# showmount -e localhost
Export list for localhost:
/nfsshare 172.16.0.0/16

确认NFS服务正常可用后，清空 /etc/exports 的配置，并关闭 nfs-server ，配置HA后会由PCS管理NFS服务。

配置HA高可用

安装PCS软件

为 nfs01 和 nfs02 安装PCS相关的软件:

1	# yum install pcs pacemaker fence-agents-all

启动 nfs01 和 nfs02 上的 pcsd服务:

1	# systemctl enable start pcsd

给 nfs01 和 nfs02 上的 hacluster用户添加密码:

1	# passwd hacluster

创建集群

设置私钥和公钥，使 nfs01 和 nfs02 互相可以无密访问。

之后，在任意节点上操作，利用pcs集群auth认证为 hacluster 用户:

1	# pcs cluster auth nfs01 nfs02

接下来，在同一个节点上执行创建集群的命令:

# pcs cluster setup --name nfsdata_cluster nfs01 nfs02
Destroying cluster on nodes: nfs01, nfs02...
nfs02: Stopping Cluster (pacemaker)...
nfs01: Stopping Cluster (pacemaker)...
nfs02: Successfully destroyed cluster
nfs01: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'nfs01', 'nfs02'
nfs01: successful distribution of the file 'pacemaker_remote authkey'
nfs02: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
nfs01: Succeeded
nfs02: Succeeded

Synchronizing pcsd certificates on nodes nfs01, nfs02...
nfs01: Success
nfs02: Success
Restarting pcsd on the nodes in order to reload the certificates...
nfs01: Success
nfs02: Success

开启集群上所有节点并设置开机自启:

1 2	# pcs cluster start --all # pcs cluster enable --all

查看集群节点的状态信息:

# pcs cluster status
Cluster Status:
Stack: corosync
Current DC: nfs02 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Tue Aug 10 00:05:12 2021
Last change: Mon Aug  9 16:13:36 2021 by hacluster via crmd on nfs02
2 nodes configured
0 resource instances configured

PCSD Status:
  nfs01: Online
  nfs02: Online

查看当前节点的集群通信是否正常:

# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
  id	= 172.16.0.11
  status	= ring 0 active with no faults

查看集群节点信息:

# pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
        1          1 nfs01 (local)
        2          1 nfs02

配置高可用资源

添加 卷组 资源:

1 2	# pcs resource create nfsdata_lvm LVM \ volgrpname=nfsdata_vg exclusive=true --group nfsgroup

添加 文件系统 资源:

1
2
3

# pcs resource create XFS Filesystem \
device=/dev/nfsdata_vg/nfsdata_lv directory=/nfsshare \
fstype=ext4 --group nfsgroup

添加 nfsserver 资源:

1
2
3

# pcs resource create nfs-daemon nfsserver \
nfs_shared_infodir=/nfsshare/nfsinfo \
nfs_no_notify=true --group nfsgroup

nfsserver 资源的 nfs_shared_infodir 参数指定一个目录，该目录用于存储NFS相关状态信息，目录设置在共享的盘阵中，使另一台节点在切换资源后也能使用这些信息。

添加 nfs共享目录 资源:

1
2
3

# pcs resource create nfs-export exportfs \
clientspec=172.16.0.0/16 options=rw,sync,no_root_squash,no_all_squash \
directory=/nfsshare/exports fsid=0 --group nfsgroup

添加 虚拟ip 资源:

1 2	# pcs resource create VIP IPaddr2 ip="172.16.0.10" cidr_netmask=16 nic=eno1:0 \ op monitor interval=30s OCF_CHECK_LEVEL=10 --group nfsgroup

添加 nfsnotify 资源，以便在整个NFS部署初始化后发送NFSv3重启通知:

1	# pcs resource create nfs-notify nfsnotify source_host=172.16.0.10 --group nfsgroup

查看资源信息:

# pcs resource show
Resource Group: nfsgroup
nfsdata_lvm	(ocf:heartbeat:LVM):	Stopped
XFS	(ocf:heartbeat:Filesystem):	Stopped
nfs-daemon	(ocf:heartbeat:nfsserver):	Started nfs02
nfs-export	(ocf:heartbeat:exportfs):	Stopped
VIP	(ocf:heartbeat:IPaddr2):	Stopped
nfs-notify	(ocf:heartbeat:nfsnotify):	Stopped

确保上述资源同在 nfsgroup 资源组内，切换节点时，组内资源将按序一起切换到另一台节点，因此上述资源设置命令必须依次设置。

部分资源在 stonith 组件开启时无法分配，需要将该组件禁用:

1	# pcs property set stonith-enabled=false

再次查看资源信息，显示两项资源已经分配给 nfs01 节点:

# pcs resource show
Resource Group: nfsgroup
nfsdata_lvm	(ocf:heartbeat:LVM):	Started nfs01
XFS	(ocf:heartbeat:Filesystem):	Started nfs01
nfs-daemon	(ocf:heartbeat:nfsserver):	Started nfs01
nfs-export	(ocf:heartbeat:exportfs):	Started nfs01
VIP	(ocf:heartbeat:IPaddr2):	Started nfs01
nfs-notify	(ocf:heartbeat:nfsnotify):	Started nfs01

# Cluster name: nfsdata_cluster
Stack: corosync
Current DC: nfs02 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Fri Aug 13 23:18:24 2021
Last change: Fri Aug 13 23:17:21 2021 by root via cibadmin on nfs01

2 nodes configured
6 resource instances configured

Online: [ nfs01 nfs02 ]

Full list of resources:

  Resource Group: nfsgroup
      nfsdata_lvm	(ocf:heartbeat:LVM):	Started nfs01
      XFS	(ocf:heartbeat:Filesystem):	Started nfs01
      nfs-daemon	(ocf:heartbeat:nfsserver):	Started nfs01
      nfs-export	(ocf:heartbeat:exportfs):	Started nfs01
      VIP	(ocf:heartbeat:IPaddr2):	Started nfs01
      nfs-notify	(ocf:heartbeat:nfsnotify):	Started nfs01

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

检查资源分配

检查 nfsdata_lv 逻辑卷是否存在:

1 2	# ll /dev/nfsdata_vg/nfsdata_lv lrwxrwxrwx. 1 root root 7 Aug 13 23:17 /dev/nfsdata_vg/nfsdata_lv -> ../dm-4

检查 文件系统 挂载:

1
2
3

# mount | grep nfsshare
/dev/mapper/nfsdata_vg-nfsdata_lv on /nfsshare type ext4 (rw,relatime,seclabel,stripe=8,data=ordered)
sunrpc on /nfsshare/nfsinfo/rpc_pipefs type rpc_pipefs (rw,relatime)

检查 nfs-daemon 信息文件:

1 2	# ls /nfsshare/nfsinfo/ etab export-lock nfsdcltrack rmtab rpc_pipefs statd v4recovery xtab

检查 nfs共享目录 信息:

1
2
3

# showmount -e localhost 
Export list for localhost:
/nfsshare/exports 172.16.0.0/16

检查 浮动ip 配置情况:

# ip a 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
      valid_lft forever preferred_lft forever
    inet6 :1/128 scope host 
      valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 6c:92:bf:bc:f1:2c brd ff:ff:ff:ff:ff:ff
    inet 172.16.0.11/16 brd 172.16.255.255 scope global eno1
      valid_lft forever preferred_lft forever
    inet 172.16.0.10/16 brd 172.16.255.255 scope global secondary eno1:0
      valid_lft forever preferred_lft forever
    inet6 fe80:6e92:bfff:febc:f12c/64 scope link 
      valid_lft forever preferred_lft forever
  ...

NFS及高可用测试

测试NFS挂载

准备一台节点做为NFS客户端进行测试。

在 /etc/hosts 中添加配置:

1	172.16.0.10 nfs nfs.pi.sjtu.edu.cn

通过 showmount 工具查看 nfs 可挂载目录:

1
2
3

# showmount -e nfs
Export list for nfs:
/nfsshare/exports 172.16.0.0/16

在 /etc/fstab 表中添加如下配置:

1	nfs:/nfsshare/exports /mnt nfs defaults,_netdev 0 0

进入挂载目录测试是否可读写:

1 2	# cd /mnt # echo "nfs test page" >> test.txt

测试节点宕机

当前所有资源在 nfs01 节点上，将该节点关机:

# init 0

登陆到 nfs02 节点，查看集群状态，所有资源已切换:

# pcs status
Cluster name: nfsdata_cluster
Stack: corosync
Current DC: nfs02 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Tue Aug 17 10:06:08 2021
Last change: Tue Aug 17 09:58:16 2021 by hacluster via crmd on nfs01

2 nodes configured
6 resource instances configured

Online: [ nfs02 ]
OFFLINE: [ nfs01 ]

Full list of resources:

  Resource Group: nfsgroup
      nfsdata_lvm	(ocf:heartbeat:LVM):	Started nfs02
      XFS	(ocf:heartbeat:Filesystem):	Started nfs02
      nfs-daemon	(ocf:heartbeat:nfsserver):	Started nfs02
      nfs-export	(ocf:heartbeat:exportfs):	Started nfs02
      VIP	(ocf:heartbeat:IPaddr2):	Started nfs02
      nfs-notify	(ocf:heartbeat:nfsnotify):	Started nfs02

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

登陆NFS客户端查看，NFS挂载正常可用:

1 2	# cat test.txt nfs test page

通过BMC将 nfs01 重新开机， nfs01 并不会抢回资源:

# pcs status 
Cluster name: nfsdata_cluster
Stack: corosync
Current DC: nfs02 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Tue Aug 17 18:09:54 2021
Last change: Tue Aug 17 09:58:16 2021 by hacluster via crmd on nfs01

2 nodes configured
6 resource instances configured

Online: [ nfs01 nfs02 ]

Full list of resources:

  Resource Group: nfsgroup
      nfsdata_lvm	(ocf:heartbeat:LVM):	Started nfs02
      XFS	(ocf:heartbeat:Filesystem):	Started nfs02
      nfs-daemon	(ocf:heartbeat:nfsserver):	Started nfs02
      nfs-export	(ocf:heartbeat:exportfs):	Started nfs02
      VIP	(ocf:heartbeat:IPaddr2):	Started nfs02
      nfs-notify	(ocf:heartbeat:nfsnotify):	Started nfs02

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

如果重启后发现 nfs01 在集群中的状态为 standby ，需将状态变为 Online ，否则当 nfs02 宕机时， standby 状态的主机不会接管资源:

1	# pcs node unstandby nfs01

测试NFS服务器网络故障

当前资源分配在 nfs02 节点上，将 nfs02 服务器的 eno1 网口网线断开。

登陆 nfs01 节点查看集群状态，所有资源已切换到 nfs01 节点， nfs02 节点显示为 OFFLINE 状态:

# pcs status
Cluster name: nfsdata_cluster
Stack: corosync
Current DC: nfs01 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Tue Aug 17 22:02:42 2021
Last change: Tue Aug 17 13:47:56 2021 by root via cibadmin on nfs01

2 nodes configured
6 resource instances configured

Online: [ nfs01 ]
OFFLINE: [ nfs02 ]

Full list of resources:

Resource Group: nfsgroup
    nfsdata_lvm	(ocf:heartbeat:LVM):	Started nfs01
    XFS	(ocf:heartbeat:Filesystem):	Started nfs01
    nfs-daemon	(ocf:heartbeat:nfsserver):	Started nfs01
    nfs-export	(ocf:heartbeat:exportfs):	Started nfs01
    VIP	(ocf:heartbeat:IPaddr2):	Started nfs01
    nfs-notify	(ocf:heartbeat:nfsnotify):	Started nfs01

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

登陆NFS客户端查看，NFS挂载正常可用:

1 2	# cat test.txt nfs test page

测试FC网络故障

断开一根 nfs01 的FC网线。

由于对存储盘阵做了链路聚合，因此单根FC线依旧可以继续工作。

登陆NFS客户端查看，NFS挂载正常可用:

1 2	# cat test.txt nfs test page

将 nfs01 的另一根FC网线也断开。

nfs01 依旧为 Online 状态，部分资源虽然配置失败，但资源并未分配给备用的 nfs02 :

# pcs status 
Cluster name: nfsdata_cluster
Stack: corosync
Current DC: nfs01 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Tue Aug 17 14:28:21 2021
Last change: Tue Aug 17 13:47:56 2021 by root via cibadmin on nfs01

2 nodes configured
6 resource instances configured

Online: [ nfs01 nfs02 ]

Full list of resources:

Resource Group: nfsgroup
    nfsdata_lvm	(ocf:heartbeat:LVM):	FAILED nfs01
    XFS	(ocf:heartbeat:Filesystem):	Started nfs01
    nfs-daemon	(ocf:heartbeat:nfsserver):	Started nfs01
    nfs-export	(ocf:heartbeat:exportfs):	FAILED nfs01
    VIP	(ocf:heartbeat:IPaddr2):	Stopped
    nfs-notify	(ocf:heartbeat:nfsnotify):	Stopped

Failed Resource Actions:
* nfsdata_lvm_monitor_10000 on nfs01 'unknown error' (1): call=75, status=Timed Out, exitreason='',
    last-rc-change='Tue Aug 17 14:27:53 2021', queued=0ms, exec=0ms
* nfs-export_monitor_10000 on nfs01 'unknown error' (1): call=81, status=Timed Out, exitreason='',
    last-rc-change='Tue Aug 17 14:27:49 2021', queued=0ms, exec=0ms

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

参考资料

“HIGH AVAILABILITY ADD-ON ADMINISTRATION” https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_administration/index