首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
社区首页 >问答首页 >备用节点上的PostgreSQL服务在Patroni启动后一次又一次地启动和停止

备用节点上的PostgreSQL服务在Patroni启动后一次又一次地启动和停止
EN

Stack Overflow用户
提问于 2020-04-02 21:23:14
回答 1查看 714关注 0票数 1

我的PostgreSQL集群的备用节点上的PostgreSQL服务在我启动Patroni服务之后一次又一次地启动和停止。

我想在两台使用Patroni1.6.4和Etcd3.3的计算机上构建一个PostgreSQL HA集群。首先,我构建了一个etcd集群,它是健康的。

代码语言:javascript
代码运行次数:0
运行
复制
member 230e646882873b50 is healthy: got healthy result from http://10.19.170.119:2379
member afcefe35d67a646c is healthy: got healthy result from http://10.19.170.155:2379
cluster is healthy

接下来,我在两台计算机(在端口5433上运行)上构建了一个流复制PostgreSQL集群,它工作得很好。

然后我停止了PostgreSQL集群,并在主服务器和备用服务器上启动了Patroni。

主服务器上的PostgreSQL和Patroni服务似乎工作正常。

代码语言:javascript
代码运行次数:0
运行
复制
2020-04-02 18:17:22,402 INFO: Lock owner: pgsql_119; I am pgsql_119
2020-04-02 18:17:22,430 INFO: no action.  i am the leader with the lock
2020-04-02 18:17:26,402 INFO: Lock owner: pgsql_119; I am pgsql_119
2020-04-02 18:17:26,430 INFO: no action.  i am the leader with the lock

但是在备用服务器上出现了问题。备用服务器上的Patroni打印消息如下:

代码语言:javascript
代码运行次数:0
运行
复制
2020-04-02 18:45:25,995 INFO: no action.  i am a secondary and i am following aleader
2020-04-02 18:45:27,285 INFO: closed patroni connection to the postgresql cluster
2020-04-02 18:45:27,722 INFO: postmaster pid=7448
2020-04-02 18:45:27.832 HKT [7448] LOG:  listening on IPv4 address "0.0.0.0", port 5433
2020-04-02 18:45:27.994 HKT [7448] LOG:  redirecting log output to logging collector process
2020-04-02 18:45:27.994 HKT [7448] HINT:  Future log output will appear in directory "log".
2020-04-02 18:45:30,058 INFO: Lock owner: pgsql_node119; I am pgsql_node155
2020-04-02 18:45:30,058 INFO: does not have lock
2020-04-02 18:45:30,058 INFO: establishing a new patroni connection to the postgres cluster
2020-04-02 18:45:31,162 INFO: no action.  i am a secondary and i am following aleader
2020-04-02 18:45:32,460 INFO: closed patroni connection to the postgresql cluster
2020-04-02 18:45:32,875 INFO: postmaster pid=8820
2020-04-02 18:45:32.996 HKT [8820] LOG:  listening on IPv4 address "0.0.0.0", port 5433
2020-04-02 18:45:33.161 HKT [8820] LOG:  redirecting log output to logging collector process
2020-04-02 18:45:33.161 HKT [8820] HINT:  Future log output will appear in directory "log".
2020-04-02 18:45:35,211 INFO: Lock owner: pgsql_node119; I am pgsql_node155
2020-04-02 18:45:35,211 INFO: does not have lock
2020-04-02 18:45:35,211 INFO: establishing a new patroni connection to the postgres cluster
2020-04-02 18:45:37,215 INFO: establishing a new patroni connection to the postgres cluster

postgresql日志内容重复如下:

代码语言:javascript
代码运行次数:0
运行
复制
FATAL:  the database system is starting up
LOG:  redo starts at 0/3A000060
LOG:  consistent recovery state reached at 0/3A000140
LOG:  invalid record length at 0/3A000140: wanted 24, got 0
LOG:  database system is ready to accept read only connections
LOG:  started streaming WAL from primary at 0/3A000000 on timeline 40
LOG:  received fast shutdown request
LOG:  aborting any active transactions
FATAL:  terminating connection due to administrator command
FATAL:  terminating walreceiver process due to administrator command
LOG:  shutting down
LOG:  database system is shut down
LOG:  database system was shut down in recovery at 2020-04-02 18:50:20 CST

这意味着备用服务器上的PostgreSQL每5秒重新启动一次!

这是我的一个patroni.yml。除了ip地址之外,另一个是相同的。

代码语言:javascript
代码运行次数:0
运行
复制
scope: pgsql
namespace: /pgsql/
name: pgsql_node119

restapi:
  listen: 10.19.170.119:8008
  connect_address: 10.19.170.119:8008
 
etcd:
  host: 10.19.170.119:2379
 
bootstrap:
  # this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
  # and all other cluster members will use it as a `global configuration`
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    master_start_timeout: 300
    synchronous_mode: false
    # check_timeline: true
    postgresql:
      use_pg_rewind: true
      use_slots: true
 
postgresql:
  listen: 0.0.0.0:5433
  connect_address: 10.19.170.119:5433
  data_dir: "/opt/postgresql-11/data"
  bin_dir: "/opt/postgresql-11/bin"
#  config_dir: /etc/postgresql/9.6/main
  authentication:
    replication:
      username: repuser
      password: repuserpwd
    superuser:
      username: postgres
      password: postgrespwd
 
#watchdog:
#  mode: automatic # Allowed values: off, automatic, required
#  device: /dev/watchdog
#  safety_margin: 5
 
tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

你知道为什么会发生这种情况吗?或者如何解决这个问题?谢谢。

EN

回答 1

Stack Overflow用户

发布于 2021-05-20 20:31:30

我在github上的Patroni的issue中发现了类似的问题。

备用服务器上的Patroni如此频繁地重新启动PostgreSQL的问题的直接原因是Patroni所需的密码文件被另一个进程修改。我的patroni.yml中没有配置postgresql的一个重要参数pgpass。默认值为$HOME/.pgpass($HOME是用户“postgres”的主目录)。

但是,文件"$HOME/.pgpass“可能会被其他应用程序修改。这将使Patroni服务异常。

解决方案是将pgpass设置为仅由Patroni访问的.pgpass的路径。例如,

代码语言:javascript
代码运行次数:0
运行
复制
pgpass: /etc/patroni/.pgpass
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/60992603

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档