云计算百科
云计算领域专业知识百科平台

服务器时间发生跳变导致hghac中对应主机状态频繁切换为crash或stop

文章目录

  • 环境
  • 症状
  • 问题原因
  • 解决方案
  • 相关文档
  • 报错编码

环境

系统平台:N/A 版本:N/A

症状

集群状态:

[root@bthbj-hgywsjkjq-ip28-cen76 ~]# hghactl list

  • Cluster: highgo-ee-cluster —±————-±——–±—±———-+

| Member | Host | Role | State | TL | Lag in MB |

±———±——————-±————-±——–±—±———-+

| pgsql_1 | x.x.100.1:5866 | leader | running | 84 | |

| pgsql_2 | x.x.100.2:5866 | crash | running | 84 | 0 |

±———±——————-±————-±——–±—±———-+

注意:虽然状态显示为crash,但是实际状态并未停止,仍然是running,数据库正常访问。

问题原因

检查hghac日志:

2020-04-03 14:40:35,434 INFO: Lock owner: None; I am pgsql_2

2020-04-03 14:40:35,524 INFO: PAUSE: postgres is not running

2020-04-03 14:40:45,433 INFO: Process 77888 is not postmaster, too much difference between PID file start time 1565058216.95 and process start time 1565058213

2020-04-03 14:40:45,434 INFO: Process 77888 is not postmaster, too much difference between PID file start time 1565058216.95 and process start time 1565058213

2020-04-03 14:40:45,434 WARNING: Postgresql is not running.

2020-04-03 14:40:45,434 INFO: Lock owner: None; I am pgsql_2

2020-04-03 14:40:45,523 INFO: PAUSE: postgres is not running

2020-04-03 14:40:55,435 INFO: Process 77888 is not postmaster, too much difference between PID file start time 1565058216.95 and process start time 1565058213

2020-04-03 14:40:55,435 INFO: Process 77888 is not postmaster, too much difference between PID file start time 1565058216.95 and process start time 1565058213

2020-04-03 14:40:55,435 WARNING: Postgresql is not running.

原因:

由于服务器时间发生跳变,则进程在操作系统中的启动时间戳也会发生改变,不再与$PGDATA/postmaster.pid文件中记录的时间相同,

hghac(patroni)会认为有集群以外其他的postmaster正在该$PGDATA运行,并尝试停止postmaster进程,但是由于操作系统pid一致,未能成功停止该进程,所以会不断切换节点状态。

解决方案

解决方案:

  • (稳妥推荐)条件允许情况下,在问题节点手动重启集群服务和数据库服务,让数据库的postmaster.pid中记录的时间戳更新。或者:

  • 手动修改postmaster.pid,修改:

  • [root@localhost data]# cat postmaster.pid

    77888

    /opt/highgo/hgdb-see-4.5.8/data

    1565058213 <—修改为hghac日志中的记录时间,

    5866

    /tmp * 5866001 48

    ready

    注意:一些版本中,hghac日志会把进程时间和pid创建时间位置反了,所以将postmaster.pid的时间改为日志中另一个时间,让两个时间相同即可。

    相关文档

    报错编码

    赞(0)
    未经允许不得转载:网硕互联帮助中心 » 服务器时间发生跳变导致hghac中对应主机状态频繁切换为crash或stop
    分享到: 更多 (0)

    评论 抢沙发

    评论前必须登录!