前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Greenplum中检测和恢复故障的master实例

Greenplum中检测和恢复故障的master实例

作者头像
AiDBA宝典
发布2023-04-27 13:27:44
1.9K0
发布2023-04-27 13:27:44
举报
文章被收录于专栏:小麦苗的DB宝专栏

greenplum整个集群是由多台服务器组合而成,任何一台服务都有可能发生软件或硬件故障,我们一起来模拟一下任何一个节点或服务器故障后,greenplumn的容错及恢复方法.

本文主要说明master或standby master故障后的处理办法。

master状态检测

使用gpstate -f或查询select * from gp_segment_configuration order by 2,1;即可。

Greenplum中恢复故障的Master命令gpactivatestandby

https://www.xmmup.com/greenplumzhonghuifuguzhangdemasterminglinggpactivatestandby.html

master恢复示例

环境说明:https://www.xmmup.com/mppjiagouzhigreenplumdeanzhuangpeizhigaojiban.html#huan_jing_shen_qing

正常集群状态

在master查看数据库当前的状态:

代码语言:javascript
复制
[gpadmin@mdw1 ~]$ gpstate
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-Starting gpstate with args: 
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source'
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Dec 20 2022 08:02:23'
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-Obtaining Segment details from master...
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-Gathering data from segments...
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-Greenplum instance status summary
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-----------------------------------------------------
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Master instance                                           = Active
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Master standby                                            = mdw2
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Standby master state                                      = Standby host passive
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total segment instance count from metadata                = 32
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-----------------------------------------------------
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Primary Segment Status
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-----------------------------------------------------
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total primary segments                                    = 16
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total primary segment valid (at master)                   = 16
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total primary segment failures (at master)                = 0
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total number of postmaster.pid files missing              = 0
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total number of postmaster.pid files found                = 16
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs missing               = 0
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs found                 = 16
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total number of /tmp lock files missing                   = 0
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total number of /tmp lock files found                     = 16
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total number postmaster processes missing                 = 0
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total number postmaster processes found                   = 16
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-----------------------------------------------------
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Mirror Segment Status
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-----------------------------------------------------
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total mirror segments                                     = 16
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total mirror segment valid (at master)                    = 16
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total mirror segment failures (at master)                 = 0
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total number of postmaster.pid files missing              = 0
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total number of postmaster.pid files found                = 16
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs missing               = 0
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs found                 = 16
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total number of /tmp lock files missing                   = 0
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total number of /tmp lock files found                     = 16
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total number postmaster processes missing                 = 0
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total number postmaster processes found                   = 16
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total number mirror segments acting as primary segments   = 0
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-   Total number mirror segments acting as mirror segments    = 16
20230131:16:37:05:021525 gpstate:mdw1:gpadmin-[INFO]:-----------------------------------------------------

故障情况1:master 服务器故障

当master节点故障后,我们需要激活standby节点作为新的master节点(如果服务器配置有VIP,那么把vip也切换到standby服务器)

在激活standby节点的可以直接指定新的standby节点,也可以等原master服务器恢复后,指定原master节点为standby节点

关闭master节点
代码语言:javascript
复制
[root@lhrxxt ~]# docker stop mdw1
mdw1
激活standby节点
代码语言:javascript
复制
[root@mdw2 /]# su - gpadmin
Last login: Tue Jan 31 16:05:54 CST 2023 on pts/1
[gpadmin@mdw2 ~]$ 
[gpadmin@mdw2 ~]$ gpactivatestandby -d /opt/greenplum/data/master/gpseg-1   
20230201:13:47:23:016042 gpactivatestandby:mdw2:gpadmin-[CRITICAL]:-PGPORT environment variable not set.
[gpadmin@mdw2 ~]$ export PGPORT=5432
[gpadmin@mdw2 ~]$ gpactivatestandby -d /opt/greenplum/data/master/gpseg-1
20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[DEBUG]:-Running Command: echo 'START_CMD_OUTPUT';ps -ef | grep postgres | grep -v grep | awk '{print $2}' | grep \`cat /opt/greenplum/data/master/gpseg-1/postmaster.pid | head -1\` || echo -1
20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:------------------------------------------------------
20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Standby data directory    = /opt/greenplum/data/master/gpseg-1
20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Standby port              = 5432
20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Standby running           = yes
20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Force standby activation  = no
20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:------------------------------------------------------
20230201:13:47:43:016069 gpactivatestandby:mdw2:gpadmin-[DEBUG]:-Running Command: cat /tmp/.s.PGSQL.5432.lock
Do you want to continue with standby master activation? Yy|Nn (default=N):
> y
20230201:13:47:51:016069 gpactivatestandby:mdw2:gpadmin-[DEBUG]:-Running Command: ps -ef | grep postgres | grep -v grep | awk '{print $2}' | grep `cat /opt/greenplum/data/master/gpseg-1/postmaster.pid | head -1` || echo -1
20230201:13:47:51:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-found standby postmaster process
20230201:13:47:51:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Promoting standby...
20230201:13:47:51:016069 gpactivatestandby:mdw2:gpadmin-[DEBUG]:-Running Command: pg_ctl promote -D /opt/greenplum/data/master/gpseg-1
20230201:13:47:51:016069 gpactivatestandby:mdw2:gpadmin-[DEBUG]:-Waiting for connection...
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Standby master is promoted
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Reading current configuration...
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[DEBUG]:-Connecting to dbname='postgres'
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:------------------------------------------------------
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-The activation of the standby master has completed successfully.
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-mdw2 is now the new primary master.
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-You will need to update your user access mechanism to reflect
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-the change of master hostname.
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Do not re-start the failed master while the fail-over master is
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-operational, this could result in database corruption!
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-MASTER_DATA_DIRECTORY is now /opt/greenplum/data/master/gpseg-1 if
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-this has changed as a result of the standby master activation, remember
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-to change this in any startup scripts etc, that may be configured
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-to set this value.
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-MASTER_PORT is now 5432, if this has changed, you
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-may need to make additional configuration changes to allow access
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-to the Greenplum instance.
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Refer to the Administrator Guide for instructions on how to re-activate
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-the master to its previous state once it becomes available.
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-Query planner statistics must be updated on all databases
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-following standby master activation.
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:-When convenient, run ANALYZE against all user databases.
20230201:13:47:52:016069 gpactivatestandby:mdw2:gpadmin-[INFO]:------------------------------------------------------
[gpadmin@mdw2 ~]$ 

standby master启用VIP:

代码语言:javascript
复制
[root@mdw2 ~]$ ip addr add 172.72.6.56/16 dev eth0 label eth0:1
[root@mdw2 /]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.72.6.51  netmask 255.255.0.0  broadcast 172.72.255.255
        ether 02:42:ac:48:06:33  txqueuelen 0  (Ethernet)
        RX packets 94731  bytes 7030183 (6.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 37837  bytes 166369481 (158.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.72.6.56  netmask 255.255.0.0  broadcast 0.0.0.0
        ether 02:42:ac:48:06:33  txqueuelen 0  (Ethernet)

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 1682  bytes 463712 (452.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1682  bytes 463712 (452.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@mdw2 /]# 


[root@lhrxxt ~]# psql -U gpadmin -h 172.72.6.56 -d postgres
Password for user gpadmin: 
psql (13.9, server 9.4.26)
Type "help" for help.

postgres=# \l
                               List of databases
   Name    |  Owner  | Encoding |  Collate   |   Ctype    |  Access privileges  
-----------+---------+----------+------------+------------+---------------------
 lhrgpdb   | gpadmin | UTF8     | en_US.utf8 | en_US.utf8 | 
 postgres  | gpadmin | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0 | gpadmin | UTF8     | en_US.utf8 | en_US.utf8 | =c/gpadmin         +
           |         |          |            |            | gpadmin=CTc/gpadmin
 template1 | gpadmin | UTF8     | en_US.utf8 | en_US.utf8 | =c/gpadmin         +
           |         |          |            |            | gpadmin=CTc/gpadmin
 testdb    | gpadmin | UTF8     | en_US.utf8 | en_US.utf8 | 
(5 rows)

postgres=# 

此时,集群即可正常使用。

修复mdw1节点,然后指定为新的standby节点

我们指定原master节点为新的standby节点服务器

需要先删除原master的数据文件,然后重新执行初始化standby节点即可

错误操作:在mdw1节点启动整个GreenPlum集群会报错,修复集群也会报错:

代码语言:javascript
复制
[gpadmin@mdw1 ~]$ gpstate
20230201:13:56:07:000419 gpstate:mdw1:gpadmin-[INFO]:-Starting gpstate with args: 
20230201:13:56:07:000419 gpstate:mdw1:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source'
20230201:13:56:07:000419 gpstate:mdw1:gpadmin-[CRITICAL]:-gpstate failed. (Reason='could not connect to server: Connection refused
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 5432?
could not connect to server: Cannot assign requested address
        Is the server running on host "localhost" (::1) and accepting
        TCP/IP connections on port 5432?
') exiting...
[gpadmin@mdw1 ~]$ gpstart
20230201:13:56:34:000462 gpstart:mdw1:gpadmin-[INFO]:-Starting gpstart with args: 
20230201:13:56:34:000462 gpstart:mdw1:gpadmin-[INFO]:-Gathering information and validating the environment...
20230201:13:56:34:000462 gpstart:mdw1:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source'
20230201:13:56:34:000462 gpstart:mdw1:gpadmin-[INFO]:-Greenplum Catalog Version: '301908232'
20230201:13:56:34:000462 gpstart:mdw1:gpadmin-[INFO]:-Starting Master instance in admin mode
20230201:13:56:34:000462 gpstart:mdw1:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20230201:13:56:34:000462 gpstart:mdw1:gpadmin-[INFO]:-Obtaining Segment details from master...
20230201:13:56:34:000462 gpstart:mdw1:gpadmin-[INFO]:-Setting new master era
20230201:13:56:34:000462 gpstart:mdw1:gpadmin-[INFO]:-Master Started...
20230201:13:56:35:000462 gpstart:mdw1:gpadmin-[INFO]:-Master Stopped...
20230201:13:56:35:000462 gpstart:mdw1:gpadmin-[ERROR]:-gpstart error: Standby activated, this node no more can act as master.
[gpadmin@mdw1 ~]$ gpinitstandby -a -s mdw1
20230201:13:57:06:000532 gpinitstandby:mdw1:gpadmin-[ERROR]:-Failed to retrieve configuration information from the master.
20230201:13:57:06:000532 gpinitstandby:mdw1:gpadmin-[ERROR]:-Failed to create standby
20230201:13:57:06:000532 gpinitstandby:mdw1:gpadmin-[ERROR]:-Error initializing standby master: could not connect to server: Connection refused
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 5432?
could not connect to server: Cannot assign requested address
        Is the server running on host "localhost" (::1) and accepting
        TCP/IP connections on port 5432?

我们应该在md2上操作:

代码语言:javascript
复制
[gpadmin@mdw2 ~]$ gpinitstandby -a -s mdw1
20230201:13:57:16:016792 gpinitstandby:mdw2:gpadmin-[INFO]:-Validating environment and parameters for standby initialization...
20230201:13:57:16:016792 gpinitstandby:mdw2:gpadmin-[INFO]:-Checking for data directory /opt/greenplum/data/master/gpseg-1 on mdw1
20230201:13:57:16:016792 gpinitstandby:mdw2:gpadmin-[ERROR]:-Data directory already exists on host mdw1
20230201:13:57:16:016792 gpinitstandby:mdw2:gpadmin-[ERROR]:-Failed to create standby
20230201:13:57:16:016792 gpinitstandby:mdw2:gpadmin-[ERROR]:-Error initializing standby master: master data directory exists


-- 注意:会报错,需要去mdw1上删除原来的数据目录:
[gpadmin@mdw1 master]$ cd /opt/greenplum/data/master/
[gpadmin@mdw1 master]$ rm -rf gpseg-1/
[gpadmin@mdw1 master]$ ll



-- 继续在mdw2上进行修复:
[gpadmin@mdw2 ~]$ gpinitstandby -a -s mdw1
20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Validating environment and parameters for standby initialization...
20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Checking for data directory /opt/greenplum/data/master/gpseg-1 on mdw1
20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:------------------------------------------------------
20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master initialization parameters
20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:------------------------------------------------------
20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum master hostname               = mdw2
20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum master data directory         = /opt/greenplum/data/master/gpseg-1
20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum master port                   = 5432
20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master hostname       = mdw1
20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master port           = 5432
20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master data directory = /opt/greenplum/data/master/gpseg-1
20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum update system catalog         = On
20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Syncing Greenplum Database extensions to standby
20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-The packages on mdw1 are consistent.
20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Adding standby master to catalog...
20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Database catalog updated successfully.
20230201:13:58:30:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Updating pg_hba.conf file...
20230201:13:58:31:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-pg_hba.conf files updated successfully.
20230201:13:58:38:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Starting standby master
20230201:13:58:38:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Checking if standby master is running on host: mdw1  in directory: /opt/greenplum/data/master/gpseg-1
20230201:13:58:39:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Cleaning up pg_hba.conf backup files...
20230201:13:58:39:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Backup files of pg_hba.conf cleaned up successfully.
20230201:13:58:39:016867 gpinitstandby:mdw2:gpadmin-[INFO]:-Successfully created standby master on mdw1

修复完成后的状态:

代码语言:javascript
复制
[gpadmin@mdw2 ~]$ gpconfig -s max_connections
Values on all segments are consistent
GUC          : max_connections
Master  value: 500
Segment value: 1500
[gpadmin@mdw2 ~]$ gpstate 
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-Starting gpstate with args: 
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source'
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Dec 20 2022 08:02:23'
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-Obtaining Segment details from master...
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-Gathering data from segments...
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-Greenplum instance status summary
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-----------------------------------------------------
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Master instance                                           = Active
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Master standby                                            = mdw1
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Standby master state                                      = Standby host passive
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total segment instance count from metadata                = 32
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-----------------------------------------------------
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Primary Segment Status
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-----------------------------------------------------
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total primary segments                                    = 16
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total primary segment valid (at master)                   = 16
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total primary segment failures (at master)                = 0
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total number of postmaster.pid files missing              = 0
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total number of postmaster.pid files found                = 16
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs missing               = 0
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs found                 = 16
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total number of /tmp lock files missing                   = 0
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total number of /tmp lock files found                     = 16
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total number postmaster processes missing                 = 0
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total number postmaster processes found                   = 16
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-----------------------------------------------------
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Mirror Segment Status
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-----------------------------------------------------
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total mirror segments                                     = 16
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total mirror segment valid (at master)                    = 16
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total mirror segment failures (at master)                 = 0
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total number of postmaster.pid files missing              = 0
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total number of postmaster.pid files found                = 16
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs missing               = 0
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs found                 = 16
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total number of /tmp lock files missing                   = 0
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total number of /tmp lock files found                     = 16
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total number postmaster processes missing                 = 0
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total number postmaster processes found                   = 16
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total number mirror segments acting as primary segments   = 0
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-   Total number mirror segments acting as mirror segments    = 16
20230201:13:59:04:016970 gpstate:mdw2:gpadmin-[INFO]:-----------------------------------------------------
[gpadmin@mdw2 ~]$ gpstate  -f
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:-Starting gpstate with args: -f
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source'
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Dec 20 2022 08:02:23'
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:-Obtaining Segment details from master...
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:-Standby master details
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:-----------------------
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:-   Standby address          = mdw1
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:-   Standby data directory   = /opt/greenplum/data/master/gpseg-1
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:-   Standby port             = 5432
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:-   Standby PID              = 1061
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:-   Standby status           = Standby host passive
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:--------------------------------------------------------------
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:--pg_stat_replication
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:--------------------------------------------------------------
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:--WAL Sender State: streaming
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:--Sync state: sync
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:--Sent Location: 0/2C000000
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:--Flush Location: 0/2C000000
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:--Replay Location: 0/2C000000
20230201:13:59:15:017051 gpstate:mdw2:gpadmin-[INFO]:--------------------------------------------------------------

故障情况2:standby 服务器故障

情况1:能正常启动
代码语言:javascript
复制
[gpadmin@mdw2 ~]$ gpstate -f
20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:-Starting gpstate with args: -f
20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source'
20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Dec 20 2022 08:02:23'
20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:-Obtaining Segment details from master...
20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:-Standby master details
20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:-----------------------
20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:-   Standby address          = mdw1
20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:-   Standby data directory   = /opt/greenplum/data/master/gpseg-1
20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:-   Standby port             = 5432
20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[WARNING]:-Standby PID              = 1061                                 No socket connection or lock file (/tmp/.s.PGSQL.5432.lock) found for port 5432   <<<<<<<<
20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[WARNING]:-Standby status           = Status could not be determined       <<<<<<<<
20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:--------------------------------------------------------------
20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:--pg_stat_replication
20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:--------------------------------------------------------------
20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:-No entries found.
20230201:15:08:57:020835 gpstate:mdw2:gpadmin-[INFO]:--------------------------------------------------------------


[gpadmin@mdw1 ~]$ nohup /usr/local/greenplum-db-6.23.0/bin/postgres -D /opt/greenplum/data/master/gpseg-1 -p 5432 -E &
[1] 2385

[gpadmin@mdw2 ~]$ gpstate -f
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:-Starting gpstate with args: -f
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source'
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Dec 20 2022 08:02:23'
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:-Obtaining Segment details from master...
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:-Standby master details
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:-----------------------
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:-   Standby address          = mdw1
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:-   Standby data directory   = /opt/greenplum/data/master/gpseg-1
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:-   Standby port             = 5432
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:-   Standby PID              = 2385
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:-   Standby status           = Standby host passive
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:--------------------------------------------------------------
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:--pg_stat_replication
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:--------------------------------------------------------------
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:--WAL Sender State: streaming
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:--Sync state: sync
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:--Sent Location: 0/2C0000A8
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:--Flush Location: 0/2C0000A8
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:--Replay Location: 0/2C0000A8
20230201:15:09:38:020946 gpstate:mdw2:gpadmin-[INFO]:--------------------------------------------------------------
[gpadmin@mdw2 ~]$ 

情况2:数据库文件损坏不能正常启动

若数据库文件损坏不能正常启动,则需要将standby节点数据删除,然后重新初始化一下standby服务器即可

代码语言:javascript
复制
[gpadmin@mdw1 master]$ ps -ef|grep green
gpadmin   2385  2338  0 15:09 pts/1    00:00:00 /usr/local/greenplum-db-6.23.0/bin/postgres -D /opt/greenplum/data/master/gpseg-1 -p 5432 -E
gpadmin   2740  2338  0 15:13 pts/1    00:00:00 grep --color=auto green
[gpadmin@mdw1 master]$ kill -9 2385
[gpadmin@mdw1 master]$ 
[gpadmin@mdw1 ~]$ rm -rf  /opt/greenplum/data/master/gpseg-1
[gpadmin@mdw1 ~]$ cd  /opt/greenplum/data/master/
[gpadmin@mdw1 master]$ ll
total 0
[gpadmin@mdw2 ~]$ gpstate -f
20230201:15:13:58:021233 gpstate:mdw2:gpadmin-[INFO]:-Starting gpstate with args: -f
20230201:15:13:58:021233 gpstate:mdw2:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source'
20230201:15:13:58:021233 gpstate:mdw2:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Dec 20 2022 08:02:23'
20230201:15:13:58:021233 gpstate:mdw2:gpadmin-[INFO]:-Obtaining Segment details from master...
20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:-Standby master details
20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:-----------------------
20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:-   Standby address          = mdw1
20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:-   Standby data directory   = /opt/greenplum/data/master/gpseg-1
20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:-   Standby port             = 5432
20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[WARNING]:-Standby PID              = 0                                    <<<<<<<<
20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[WARNING]:-Standby status           = Standby process not running          <<<<<<<<
20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:--------------------------------------------------------------
20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:--pg_stat_replication
20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:--------------------------------------------------------------
20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:-No entries found.
20230201:15:13:59:021233 gpstate:mdw2:gpadmin-[INFO]:--------------------------------------------------------------
[gpadmin@mdw2 ~]$ 

修复:需要先删除再添加:

代码语言:javascript
复制
[gpadmin@mdw2 ~]$ gpinitstandby -s mdw1 -S /opt/greenplum/data/master/gpseg-1   -P 5432
20230201:15:14:41:021292 gpinitstandby:mdw2:gpadmin-[INFO]:-Validating environment and parameters for standby initialization...
20230201:15:14:41:021292 gpinitstandby:mdw2:gpadmin-[ERROR]:-Standby master already configured
20230201:15:14:41:021292 gpinitstandby:mdw2:gpadmin-[INFO]:-If you want to start the stopped standby master, use the -n option
20230201:15:14:41:021292 gpinitstandby:mdw2:gpadmin-[ERROR]:-Failed to create standby
20230201:15:14:41:021292 gpinitstandby:mdw2:gpadmin-[ERROR]:-Error initializing standby master: standby master already configured
[gpadmin@mdw2 ~]$ gpinitstandby -r
20230201:15:15:29:021341 gpinitstandby:mdw2:gpadmin-[INFO]:------------------------------------------------------
20230201:15:15:29:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Warm master standby removal parameters
20230201:15:15:29:021341 gpinitstandby:mdw2:gpadmin-[INFO]:------------------------------------------------------
20230201:15:15:29:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum master hostname               = mdw2
20230201:15:15:29:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum master data directory         = /opt/greenplum/data/master/gpseg-1
20230201:15:15:29:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum master port                   = 5432
20230201:15:15:29:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master hostname       = mdw1
20230201:15:15:29:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master port           = 5432
20230201:15:15:29:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master data directory = /opt/greenplum/data/master/gpseg-1
Do you want to continue with deleting the standby master? Yy|Nn (default=N):
> y
20230201:15:15:32:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Removing standby master from catalog...
20230201:15:15:32:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Database catalog updated successfully.
20230201:15:15:32:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Removing data directory on standby master...
20230201:15:15:33:021341 gpinitstandby:mdw2:gpadmin-[INFO]:-Successfully removed standby master
[gpadmin@mdw2 ~]$ gpinitstandby -s mdw1 -S /opt/greenplum/data/master/gpseg-1   -P 5432
20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Validating environment and parameters for standby initialization...
20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Checking for data directory /opt/greenplum/data/master/gpseg-1 on mdw1
20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:------------------------------------------------------
20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master initialization parameters
20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:------------------------------------------------------
20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum master hostname               = mdw2
20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum master data directory         = /opt/greenplum/data/master/gpseg-1
20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum master port                   = 5432
20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master hostname       = mdw1
20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master port           = 5432
20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum standby master data directory = /opt/greenplum/data/master/gpseg-1
20230201:15:15:37:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Greenplum update system catalog         = On
Do you want to continue with standby master initialization? Yy|Nn (default=N):
> y  
20230201:15:15:42:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Syncing Greenplum Database extensions to standby
20230201:15:15:42:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-The packages on mdw1 are consistent.
20230201:15:15:42:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Adding standby master to catalog...
20230201:15:15:42:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Database catalog updated successfully.
20230201:15:15:42:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Updating pg_hba.conf file...
20230201:15:15:43:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-pg_hba.conf files updated successfully.
20230201:15:15:50:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Starting standby master
20230201:15:15:50:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Checking if standby master is running on host: mdw1  in directory: /opt/greenplum/data/master/gpseg-1
20230201:15:15:51:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Cleaning up pg_hba.conf backup files...
20230201:15:15:51:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Backup files of pg_hba.conf cleaned up successfully.
20230201:15:15:51:021371 gpinitstandby:mdw2:gpadmin-[INFO]:-Successfully created standby master on mdw1
[gpadmin@mdw2 ~]$ gpstate -f
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:-Starting gpstate with args: -f
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source'
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.23.0 build commit:5b5e432f35f92a40c18dffe4e5bca94790aae83c Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Dec 20 2022 08:02:23'
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:-Obtaining Segment details from master...
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:-Standby master details
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:-----------------------
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:-   Standby address          = mdw1
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:-   Standby data directory   = /opt/greenplum/data/master/gpseg-1
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:-   Standby port             = 5432
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:-   Standby PID              = 3328
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:-   Standby status           = Standby host passive
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:--------------------------------------------------------------
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:--pg_stat_replication
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:--------------------------------------------------------------
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:--WAL Sender State: streaming
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:--Sync state: sync
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:--Sent Location: 0/34000000
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:--Flush Location: 0/34000000
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:--Replay Location: 0/34000000
20230201:15:16:02:021460 gpstate:mdw2:gpadmin-[INFO]:--------------------------------------------------------------
[gpadmin@mdw2 ~]$ 

总结

1、若master故障,则直接在standby master上执行如下命令将standby master激活为主master:

代码语言:javascript
复制
gpactivatestandby -d /opt/greenplum/data/master/gpseg-1

若有VIP也需要切换VIP。

然后原来的master节点可以删除数据后,然后作为新的standby加入:

代码语言:javascript
复制
gpinitstandby -a -s mdw1

2、若standby master故障,则分情况:若是宕机,其实可以正常启动,使用如下命令正常启动:

代码语言:javascript
复制
nohup /usr/local/greenplum-db-6.23.0/bin/postgres -D /opt/greenplum/data/master/gpseg-1 -p 5432 -E &

若数据库文件损坏不能正常启动,则需要将standby节点数据删除,然后重新初始化一下standby服务器即可,需要先删除再添加:

代码语言:javascript
复制
gpinitstandby -r
gpinitstandby -s mdw1 -S /opt/greenplum/data/master/gpseg-1   -P 5432

3、master和standby master就是PG中的主从复制,要想实现自动故障转移,则需要借助keepalived或repmgr或patroni或pgpool实现高可用自动切换。

参考

https://blog.csdn.net/weixin_33724570/article/details/89720415

https://www.xmmup.com/greenplumshujukugaokeyongxinggaishu.html#Master_jing_xiang

https://zhuanlan.zhihu.com/p/165983070

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2023-02-02,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 DB宝 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • master状态检测
  • Greenplum中恢复故障的Master命令gpactivatestandby
  • master恢复示例
    • 正常集群状态
      • 故障情况1:master 服务器故障
        • 关闭master节点
        • 激活standby节点
        • 修复mdw1节点,然后指定为新的standby节点
      • 故障情况2:standby 服务器故障
        • 情况1:能正常启动
        • 情况2:数据库文件损坏不能正常启动
    • 总结
    • 参考
    相关产品与服务
    容器服务
    腾讯云容器服务(Tencent Kubernetes Engine, TKE)基于原生 kubernetes 提供以容器为核心的、高度可扩展的高性能容器管理服务,覆盖 Serverless、边缘计算、分布式云等多种业务部署场景,业内首创单个集群兼容多种计算节点的容器资源管理模式。同时产品作为云原生 Finops 领先布道者,主导开源项目Crane,全面助力客户实现资源优化、成本控制。
    领券
    问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档