From c3250447865073aa475260c3d0e35fe216edc208 Mon Sep 17 00:00:00 2001 From: David Teigland Date: Jul 18 2024 20:13:11 +0000 Subject: sanlock: change paxos_acquire error for initial host state paxos_acquire will return IDLIVE for a dead host if the acquire happens soon after add_lockspace, before two host renewal checks have happened. This differs from the normal behavior of paxos_acquire, where it will wait for the dead host to time out, and then run a ballot to acquire the lease. The difference in behavior depends on how long after add_lockspace the paxos_acquire is called, which can make it complicated for applications to know how to respond (perhaps wanting to retry the acquire for a while.) Internally: After add_lockspace, the first time check_other_leases() (checking renewals of other host_ids) is called, first_check and last_check are both set to 'now', and last_live is also set to 'now' because no previous renewal timestamp from that host_id has been recorded. This also applies to the host_id of a dead host. The core of the issue is that after add_lockspace, a dead host looks no different from a live host. The dead/alive difference is based on seeing the host_id lease timestamp change. Until at least a renewal period after add_lockspace, the timestamp won't be changed, so there's no basis for concluding dead vs alive. In most cases, it is only safe to treat this unknown ase as alive. But, in this case, the handling of the unknown case is fail and retry for a while vs block for a while. Before change: paxos_acquire is called just after add_lockspace, and the liveness check of the current lease owner was: last_live is non-zero, and last_check equals last_live. A dead host is considered alive by that condition, causing paxos_acquire to return IDLIVE. It would need to be retried for a period of another renewal interval, before paxos_acquire would wait for the dead host to time out. A live host is also considered alive, and IDLIVE is correctly returned immediately. After change: paxos_acquire is called just after add_lockspace, and the liveness check of the current lease owner is: last_live is non-zero, and last_check equals last_live, and first_check is not equal to last_live. A dead host would not be considered alive by that condition, causing paxos_acquire to wait for the dead host to time out, just as if acquire was called long after add_lockspace. A live host would also not be considered alive, causing paxos_acquire to wait until another renewal from the live host is seen before returning IDLIVE. This may be an unwelcome change, where previously the correct result was returned immediately, now the correct result is only returned after a delay. However, SANLK_ACQUIRE_OWNER_NOWAIT can be used in sanlock_acquire() to avoid waiting for a dead host's lease to expire, and may already be used by callers wanting to avoid blocking in sanlock_acquire for long periods waiting for lease expiration. --- diff --git a/src/paxos_lease.c b/src/paxos_lease.c index 7d3ac51..3f11b30 100644 --- a/src/paxos_lease.c +++ b/src/paxos_lease.c @@ -1824,6 +1824,7 @@ int paxos_lease_acquire(struct task *task, } rv = host_info(cur_leader.space_name, cur_leader.owner_id, &hs); + if (!rv && hs.last_check && hs.last_live && hs.owner_id == cur_leader.owner_id && hs.owner_generation == cur_leader.owner_generation) { @@ -1908,15 +1909,34 @@ int paxos_lease_acquire(struct task *task, * * 2. If our own renewal thread saw the owner's timestamp change * the last time it was checked, then consider the owner to be alive. + * + * first_check != last_live: avoid returning IDLIVE when this + * checks the host state soon after add_lockspace, before more + * than one check_other_leases() has happened. Before two + * checks of the other lease, a dead host and live host look + * the same (and first_check == last_live). If the lease owner + * is dead before acquire, then acquire will wait for the lease + * to expire. If the caller doesn't want acquire to wait for a + * lease to expire, they include the NOWAIT flag. */ if ((host_id_leader.timestamp != last_timestamp) || - (hs.last_live && (hs.last_check == hs.last_live))) { + (hs.last_live && (hs.last_check == hs.last_live) && (hs.first_check != hs.last_live))) { + log_token(token, "paxos_acquire owner %llu delta %llu %llu %llu alive", (unsigned long long)cur_leader.owner_id, (unsigned long long)host_id_leader.owner_id, (unsigned long long)host_id_leader.owner_generation, (unsigned long long)host_id_leader.timestamp); + + log_token(token, "paxos_acquire alive host info %llu %llu timestamp %llu first_check %llu last_check %llu last_live %llu", + (unsigned long long)hs.owner_id, + (unsigned long long)hs.owner_generation, + (unsigned long long)hs.timestamp, + (unsigned long long)hs.first_check, + (unsigned long long)hs.last_check, + (unsigned long long)hs.last_live); + memcpy(leader_ret, &cur_leader, sizeof(struct leader_record)); /* It's possible that the live owner has released the