8e948fb sanlock: process commands arriving during poll() promptly

2 files Authored by Jonathan Davies 8 years ago, Committed by teigland 8 years ago,
    sanlock: process commands arriving during poll() promptly
    
    If a command is issued to the sanlock daemon soon after the previous command
    from the same client has completed, it might not be processed for up to 1
    second. This scenario is commonplace during sequential operations issued through
    lvmlockd, making them feel unusually sluggish.
    
    The delay occurs when a command is issued by a client soon after that client has
    been marked as 'resumed' and while the sanlock daemon is in the main loop
    executing poll().
    
    This is because the fds that poll() monitors are the fds of the non-suspended
    clients. Since the duration of poll() is up to STANDARD_CHECK_INTERVAL
    milliseconds, i.e. 1 second, any client that resumes during that period and
    issues another command will not be picked up during that invocation of poll().
    Instead we need to wait for that invocation of poll() to return and be called
    again on the next loop iteration.
    
    This problem was observed using lvmlockd between successive invocations of lvs:
    the poll() is entered before client_resume for the first lvs's 'unlock' command,
    so when the next lvs command's 'acquire' command arrives, it must wait for the
    poll() to complete and restart, so takes longer than necessary.
    
    This is illustrated in the following sequence of events caused by two
    consecutive invocations of lvs, which we pick up as the first lvs command is
    nearing completion:
    
      1. lvmlockd sends "unlock" command to sanlock daemon socket;
      2. sanlock daemon dispatches this as "cmd_release" to a worker thread and
         calls client_suspend;
      3. sanlock daemon invokes poll() (not listening to the suspended lvmlockd
         client);
      4. sanlock worker thread finishes handling the "cmd_release" command, returns
         the response on the socket, and calls client_resume;
      5. the second lvs command is issued;
      6. lvmlockd issues an "acquire" command on the same connection (but the daemon
         isn't listening yet);
      7. sanlock daemon's poll() returns after timing out after 1000 ms;
      8. sanlock daemon's main loop executes poll() again, this time listening to
         the lvmlockd client;
      9. poll() returns immediately and receives the "acquire" command.
    
    This patch makes client_resume interrupt the currently-executing poll() by
    poking an internal eventfd on which poll() is listening in addition to the
    non-suspended clients. This causes the current poll() to return and immediately
    restart, this time listening on the resumed client's fd, ready to receive a new
    command from the client.
    
    Some performance measurements follow, demonstrating how this patch makes the
    second command more responsive.
    
    Before:
    
    % time lvs >/dev/null; time lvs >/dev/null
    
    real    0m0.051s
    user    0m0.008s
    sys     0m0.008s
    
    real    0m0.880s
    user    0m0.000s
    sys     0m0.012s
    
    After:
    
    % time lvs >/dev/null; time lvs >/dev/null
    
    real    0m0.039s
    user    0m0.004s
    sys     0m0.012s
    
    real    0m0.036s
    user    0m0.000s
    sys     0m0.016s
    
    Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
    
        
file modified
+25 -3
file modified
+1 -0