sanlock

#7 host_dead_seconds too long?

Closed 2 years ago by teigland. Opened 2 years ago by yisong.

Hi,

As default configuration, host_dead_seconds is 140 seconds.
As the example in src/timeouts.h,
io_timeout_seconds = 10 (default, configurable)
watchdog_fire_timeout = 60 (constant)
id_renewal_fail_seconds = 80 (= 4 * delta_renew_max = 8 * io_timeout_seconds)
host_dead_seconds = 140 (id_renewal_fail_seconds + watchdog_fire_timeout)

watchdog_fire_timeout is hardcoded to 60, this means the minimum value of host_dead_seconds is 60.
For a HA system, when active node down, standby node need at least 60 seconds to take over the resource lock. This looks too long.
Should watchdog_fire_timeout be changed to a configurable option?

Thanks

teigland commented 2 years ago

Hi, watchdog_fire_timeout represents the timeout used by the watchdog hardware. If the watchdog device on the system has a configurable timeout, then we could change watchdog_fire_timeout to match the hardware setting. I think that most hardware watchdog devices have fixed 60 second timeouts, so we have not tried to make the sanlock value configurable. If you know of some hardware that has configurable timeouts, then I think it would be nice to change this to make the time shorter.

yisong commented 2 years ago

Hi,

I am using openstack VM, and it looks its watchdog timeout is configurable.

# /usr/bin/wdctl      
Device:        /dev/watchdog
Identity:      Software Watchdog [version 0]
Timeout:       60 seconds
Pre-timeout:    0 seconds
FLAG           DESCRIPTION               STATUS BOOT-STATUS
KEEPALIVEPING  Keep alive ping reply          1           0
MAGICCLOSE     Supports magic close char      0           0
SETTIMEOUT     Set timeout (in seconds)       0           0

# /usr/bin/wdctl -s 10
Timeout has been set to 10 seconds.
Device:        /dev/watchdog
Identity:      Software Watchdog [version 0]
Timeout:       10 seconds
Pre-timeout:    0 seconds
FLAG           DESCRIPTION               STATUS BOOT-STATUS
KEEPALIVEPING  Keep alive ping reply          1           0
MAGICCLOSE     Supports magic close char      0           0
SETTIMEOUT     Set timeout (in seconds)       0           0

I also checked another bare metal machine, it looks also configuralbe.

# /usr/bin/wdctl
Device:        /dev/watchdog
Identity:      IPMI [version 1]
Timeout:       60 seconds
Pre-timeout:    0 seconds

# /usr/bin/wdctl -s 10
Timeout has been set to 10 seconds.
Device:        /dev/watchdog
Identity:      IPMI [version 1]
Timeout:       10 seconds
Pre-timeout:    0 seconds

But when wdmd service start, it always reset the watchdog timeout to 60 seconds.
wdmd/main.c

        rv = ioctl(dev_fd, WDIOC_GETTIMEOUT, &timeout);
        if (rv < 0) {
                log_error("%s failed to report timeout", watchdog_path);
                close_watchdog();
                return -1;
        }

        if (timeout == fire_timeout)
                goto out;

        timeout = fire_timeout;

        rv = ioctl(dev_fd, WDIOC_SETTIMEOUT, &timeout);
        if (rv < 0) {
                log_error("%s failed to set timeout", watchdog_path);
                close_watchdog();
                return -1;
        }

yisong commented 2 years ago

BTW, if watchdog_fire_timeout is changed to configurable, "wdmd test interval" should also be changed to configurable, instead of hardcode to 10 seconds.

src/timeouts.h

 * wdmd test interval           = 10 (defined in wdmd/main.c)
 * watchdog_fire_timeout        = 60 (constant)
 * io_timeout_seconds           = 10 (defined by us)
 * id_renewal_seconds           = 20 (= delta_renew_max = 2 * io_timeout_seconds)
 * id_renewal_fail_seconds      = 80 (= 4 * delta_renew_max = 8 * io_timeout_seconds)
 * host_dead_seconds            = 140 (id_renewal_fail_seconds + watchdog_fire_timeout)

teigland commented 2 years ago

Could you test the configurable timeout by setting it to 30 sec and check that the machine is reset 30 sec after opening the watchdog (and not pinging it)? I'm most interested in knowing if the hardware-based watchdog can be configured.

You're correct that it's not as simple as just changing watchdog_fire_timeout. Many of the timeouts are interdependent, so others will need to be adjusted.

yisong commented 2 years ago

Hi,

Tested on a HPE DL380 G9 server, the hardware watchdog timeout configuation works.

Firstly, HP watchdog has a pretimeout parameter as below.
https://docs.kernel.org/watchdog/hpwdt.html

parameters	desc
soft_margin	allows the user to set the watchdog timer value. Default value is 30 seconds.
timeout	an alias of soft_margin.
pretimeout	allows the user to set the watchdog pretimeout value. This is the number of seconds before timeout when an NMI is delivered to the system. Setting the value to zero disables the pretimeout NMI. Default value is 9 seconds.

Then, without keep alive watchdog, Linux will stop work in 30-9=21 seconds by default.

[root@compute6 ~]# dmesg | grep -i dog
[    0.430726] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[    6.557842] hpwdt 0000:01:00.0: HPE Watchdog Timer Driver: NMI decoding initialized, allow kernel dump: ON (default = 1/ON)
[    6.558197] hpwdt 0000:01:00.0: HPE Watchdog Timer Driver: 1.4.0, timer margin: 30 seconds (nowayout=0).
[    7.117534] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.11

[root@compute6 ~]# wdctl 
Device:        /dev/watchdog
Identity:      HPE iLO2+ HW Watchdog Timer [version 0]
Timeout:       30 seconds
Timeleft:      29 seconds
FLAG           DESCRIPTION               STATUS BOOT-STATUS
KEEPALIVEPING  Keep alive ping reply          0           0
MAGICCLOSE     Supports magic close char      0           0
SETTIMEOUT     Set timeout (in seconds)       0           0

[root@compute6 ~]# cat >> /dev/watchdog

After stop press Enter key on the above "cat >> /dev/watchdog" command, system hung in about 21 seconds, suppose HP ILO send NMI event at this time. And later, system rebooted.

After set timeout to 60 as below, do the same test, system hung in about 51 seconds, and rebooted.

[root@compute6 ~]# wdctl -s 60
Set timeout to 60 seconds
Device:        /dev/watchdog
Identity:      HPE iLO2+ HW Watchdog Timer [version 0]
Timeout:       60 seconds
Timeleft:      59 seconds
FLAG           DESCRIPTION               STATUS BOOT-STATUS
KEEPALIVEPING  Keep alive ping reply          0           0
MAGICCLOSE     Supports magic close char      0           0
SETTIMEOUT     Set timeout (in seconds)       0           0

teigland commented 2 years ago

Thanks for the info, I hope to soon work on a patch to make this configurable.

teigland commented 2 years ago

I am currently testing the code in branch https://pagure.io/sanlock/commits/config-wd-timeout

yisong commented 2 years ago

Hi,

Do you have private rpm for RedHat 8.6? Then, I can help to test.

Here is my current sanlock rpm:

rpm -qa|grep sanlock

sanlock-lib-3.8.4-3.el8.x86_64
sanlock-3.8.4-3.el8.x86_64

teigland commented 2 years ago

This feature is now complete
https://pagure.io/sanlock/c/748e8325fd0b2e09469c76f584b8e08c1ef03ca6?branch=master

Metadata Update from @teigland:
- Issue status updated to: Closed (was: Open)

2 years ago

yisong commented 2 years ago

Thanks for the feature.

We usually get rpm from RedHat.
Would sanlock publish a new release, then RedHat can integrate to a new rpm?

teigland commented 2 years ago

It's possible that this could appear in a future RHEL update, but there are no immediate plans to do that. sanlock is supported as a component of other products. If you will use this feature as part of a specific product, then you could file a bugzilla RFE for that product to support this feature.

Edited 2 years ago by teigland

Metadata

Assignee

None

Tags

None

Blocking

None

Depending on

None

sanlock

Source Code

#7 host_dead_seconds too long? Closed 2 years ago by teigland. Opened 2 years ago by yisong.

rpm -qa|grep sanlock

Metadata

#7 host_dead_seconds too long?

Closed 2 years ago by teigland. Opened 2 years ago by yisong.