148e37e wdmd: close device when test fails

Authored and Committed by teigland 12 years ago
1 file changed. 55 lines added. 8 lines removed.
    wdmd: close device when test fails
    
    Instead of just not petting the device after a test fails,
    close the device.  Because the close generates a ping, we
    want to get it done early, otherwise if wdmd exited (e.g.
    crash or sigkill) just before the device was ready to fire,
    the close generated by the kernel extends the life of the
    machine by an extra 60 sec.  This means we need to re-open
    the device if we want to resume petting it.
    
    So, depending on whether the tests happen just prior
    to the expiry or just after the expiry, the watchdog
    will fire between 60 and 70 seconds after the expiry
    time.
    
    It would be 70 seconds if:
    
    we do the check just before the expiration, the client
    expires, 10 seconds (TEST_INTERVAL) later, we see the
    expiration, close the device, which generates a ping,
    which causes the firing to be 60 seconds after the close,
    which is already 10 seconds after the expiration.
    
    It would be 60 seconds if:
    
    we do the check just after the expiration, we see
    the expiration, close the device, which generates a
    ping, which causes the firing to be 60 seconds after
    the close, which is just after at the expiration
    time.
    
    Previously, the assumption was that the host would
    be reset between 50 and 60 seconds from the expiration
    time, but this did not account for the fact that
    the daemon could exit just before the host reset,
    which would lead the kernel to generate a new ping.
    
    If we can patch the kernel so that a device close
    does not generate a ping, then we do not need to
    close the device when a test fails, but we can
    simply not pet the device, as we've been doing.
    
    Signed-off-by: David Teigland <teigland@redhat.com>
    
        
file modified
+55 -8