提交 · 85f0379aa0f9366bb6918e2e898a915231176fbd · openeuler / raspberrypi-kernel

31 1月, 2008 9 次提交

dlm: keep cached master rsbs during recovery · 85f0379a

由 David Teigland 提交于 1月 16, 2008

To prevent the master of an rsb from changing rapidly, an unused rsb is kept
on the "toss list" for a period of time to be reused. The toss list was
being cleared completely for each recovery, which is unnecessary. Much of
the benefit of the toss list can be maintained if nodes keep rsb's in their
toss list that they are the master of. These rsb's need to be included
when the resource directory is rebuilt during recovery.
Signed-off-by: NDavid Teigland <teigland@redhat.com>

85f0379a

dlm: change error message to debug · 594199eb

由 David Teigland 提交于 1月 16, 2008

The invalid lockspace messages are normal and can appear relatively
often.  They should be suppressed without debugging enabled.
Signed-off-by: NDavid Teigland <teigland@redhat.com>

594199eb

dlm: limit dir lookup loop · 755b5eb8

由 David Teigland 提交于 1月 09, 2008

In a rare case we may need to repeat a local resource directory lookup
due to a race with removing the rsb and removing the resdir record.
We'll never need to do more than a single additional lookup, though,
so the infinite loop around the lookup can be removed. In addition
to being unnecessary, the infinite loop is dangerous since some other
unknown condition may appear causing the loop to never break.
Signed-off-by: NDavid Teigland <teigland@redhat.com>

755b5eb8

dlm: reject normal unlock when lock is waiting for lookup · 42dc1601

由 David Teigland 提交于 1月 09, 2008

Non-forced unlocks should be rejected if the lock is waiting on the
rsb_lookup list for another lock to establish the master node.
Signed-off-by: NDavid Teigland <teigland@redhat.com>

42dc1601

dlm: validate messages before processing · c54e04b0

由 David Teigland 提交于 1月 09, 2008

There was some hit and miss validation of messages that has now been
cleaned up and unified.  Before processing a message, the new
validate_message() function checks that the lkb is the appropriate type,
process-copy or master-copy, and that the message is from the correct
nodeid for the the given lkb.  Other checks and assertions on the
lkb type and nodeid have been removed.  The assertions were particularly
bad since they would panic the machine instead of just ignoring the bad
message.

Although other recent patches have made processing old message unlikely,
it still may be possible for an old message to be processed and caught
by these checks.
Signed-off-by: NDavid Teigland <teigland@redhat.com>

c54e04b0

dlm: reject messages from non-members · 46b43eed

由 David Teigland 提交于 1月 08, 2008

Messages from nodes that are no longer members of the lockspace should be
ignored.  When nodes are removed from the lockspace, recovery can
sometimes complete quickly enough that messages arrive from a removed node
after recovery has completed.  When processed, these messages would often
cause an error message, and could in some cases change some state, causing
problems.
Signed-off-by: NDavid Teigland <teigland@redhat.com>

46b43eed

dlm: another call to confirm_master in receive_request_reply · aec64e1b

由 David Teigland 提交于 1月 08, 2008

When a failed request (EBADR or ENOTBLK) is unlocked/canceled instead of
retried, there may be other lkb's waiting on the rsb_lookup list for it
to complete. A call to confirm_master() is needed to move on to the next
waiting lkb since the current one won't be retried.
Signed-off-by: NDavid Teigland <teigland@redhat.com>

aec64e1b

dlm: recover locks waiting for overlap replies · 601342ce

由 David Teigland 提交于 1月 07, 2008

When recovery looks at locks waiting for replies, it fails to consider
locks that have already received a reply for their first remote operation,
but not received a reply for secondary, overlapping unlock/cancel.  The
appropriate stub reply needs to be called for these waiters.

Appears when we start doing recovery in the presence of a many overlapping
unlock/cancel ops.
Signed-off-by: NDavid Teigland <teigland@redhat.com>

601342ce

dlm: clear ast_type when removing from astqueue · 8a358ca8

由 David Teigland 提交于 1月 07, 2008

The lkb_ast_type field indicates whether the lkb is on the astqueue list.
When clearing locks for a process, lkb's were being removed from the astqueue
list without clearing the field.  If release_lockspace then happened
immediately afterward, it could try to remove the lkb from the list a second
time.

Appears when process calls libdlm dlm_release_lockspace() which first
closes the ls dev triggering clear_proc_locks, and then removes the ls
(a write to control dev) causing release_lockspace().
Signed-off-by: NDavid Teigland <teigland@redhat.com>

8a358ca8

30 1月, 2008 3 次提交

dlm: use dlm prefix on alloc and free functions · 52bda2b5

由 David Teigland 提交于 11月 07, 2007

The dlm functions in memory.c should use the dlm_ prefix. Also, use
kzalloc/kfree directly for dlm_direntry's, removing the wrapper functions.
Signed-off-by: NDavid Teigland <teigland@redhat.com>

52bda2b5

dlm: don't print common non-errors · 11b2498b

由 David Teigland 提交于 11月 07, 2007

Change log_error() to log_debug() for conditions that can occur in
large number in normal operation.
Signed-off-by: NDavid Teigland <teigland@redhat.com>

11b2498b

dlm: proper prototypes · e028398d

由 Adrian Bunk 提交于 11月 03, 2007

This patch adds a proper prototype for some functions in
fs/dlm/dlm_internal.h
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NDavid Teigland <teigland@redhat.com>

e028398d

10 10月, 2007 2 次提交

[DLM] block dlm_recv in recovery transition · c36258b5

由 David Teigland 提交于 9月 27, 2007

Introduce a per-lockspace rwsem that's held in read mode by dlm_recv
threads while working in the dlm.  This allows dlm_recv activity to be
suspended when the lockspace transitions to, from and between recovery
cycles.

The specific bug prompting this change is one where an in-progress
recovery cycle is aborted by a new recovery cycle.  While dlm_recv was
processing a recovery message, the recovery cycle was aborted and
dlm_recoverd began cleaning up.  dlm_recv decremented recover_locks_count
on an rsb after dlm_recoverd had reset it to zero.  This is fixed by
suspending dlm_recv (taking write lock on the rwsem) before aborting the
current recovery.

The transitions to/from normal and recovery modes are simplified by using
this new ability to block dlm_recv.  The switch from normal to recovery
mode means dlm_recv goes from processing locking messages, to saving them
for later, and vice versa.  Races are avoided by blocking dlm_recv when
setting the flag that switches between modes.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

c36258b5

[DLM] don't overwrite castparam if it's NULL · b434eda6

由 Patrick Caulfield 提交于 10月 01, 2007

If the castaddr passed to the userland API is NULL then don't overwrite the
existing castparam. This allows a different thread to cancel a lock request and
the CANCEL AST gets delivered to the original thread.

bz#306391 (for RHEL4) refers.
Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

b434eda6

14 8月, 2007 1 次提交

[DLM] fix basts for granted PR waiting CW · 36509258

由 David Teigland 提交于 8月 07, 2007

Fix a long standing bug where a blocking callback would be missed
when there's a granted lock in PR mode and waiting locks in both
PR and CW modes (and the PR lock was added to the waiting queue
before the CW lock). The logic simply compared the numerical values
of the modes to determine if a blocking callback was required, but in
the one case of PR and CW, the lower valued CW mode blocks the higher
valued PR mode. We just need to add a special check for this PR/CW
case in the tests that decide when a blocking callback is needed.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

36509258

09 7月, 2007 9 次提交

[DLM] variable allocation · 44f487a5

由 Patrick Caulfield 提交于 6月 06, 2007

Add a new flag, DLM_LSFL_FS, to be used when a file system creates a lockspace.
This flag causes the dlm to use GFP_NOFS for allocations instead of GFP_KERNEL.
(This updated version of the patch uses gfp_t for ls_allocation.)
Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com>
Signed-Off-By: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

44f487a5

[DLM] canceling deadlocked lock · 8b4021fa

由 David Teigland 提交于 5月 29, 2007

Add a function that can be used through libdlm by a system daemon to cancel
another process's deadlocked lock.  A completion ast with EDEADLK is returned
to the process waiting for the lock.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

8b4021fa

[DLM] timeout fixes · 84d8cd69

由 David Teigland 提交于 5月 29, 2007

Various fixes related to the new timeout feature:
- add_timeout() missed setting TIMEWARN flag on lkb's when the
  TIMEOUT flag was already set
- clear_proc_locks should remove a dead process's locks from the
  timeout list
- the end-of-life calculation for user locks needs to consider that
  ETIMEDOUT is equivalent to -DLM_ECANCEL
- make initial default timewarn_cs config value visible in configfs
- change bit position of TIMEOUT_CANCEL flag so it's not copied to
  a remote master node
- set timestamp on remote lkb's so a lock dump will display the time
  they've been waiting
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

84d8cd69

[DLM] Compile fix · b3cab7b9

由 Steven Whitehouse 提交于 5月 29, 2007

A one liner fix which got missed from the earlier patches.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Cc: Fabio Massimo Di Nitto <fabbione@ubuntu.com>
Cc: David Teigland <teigland@redhat.com>

b3cab7b9

[DLM] fix compile breakage · 639aca41

由 David Teigland 提交于 5月 18, 2007

In the rush to get the previous patch set sent, a compilation bug I fixed
shortly before sending somehow got clobbered, probably by a missed quilt
refresh or something.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

639aca41

[DLM] cancel in conversion deadlock [4/6] · c85d65e9

由 David Teigland 提交于 5月 18, 2007

When conversion deadlock is detected, cancel the conversion and return
EDEADLK to the application. This is a new default behavior where before
the dlm would allow the deadlock to exist indefinately.

The DLM_LKF_NODLCKWT flag can now be used in a conversion to prevent the
dlm from performing conversion deadlock detection/cancelation on it.
The DLM_LKF_CONVDEADLK flag can continue to be used as before to tell the
dlm to demote the granted mode of the lock being converted if it gets into
a conversion deadlock.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

c85d65e9

[DLM] dlm_device interface changes [3/6] · d7db923e

由 David Teigland 提交于 5月 18, 2007

Change the user/kernel device interface used by libdlm:
- Add ability for userspace to check the version of the interface.  libdlm
  can now adapt to different versions of the kernel interface.
- Increase the size of the flags passed in a lock request so all possible
  flags can be used from userspace.
- Add an opaque "xid" value for each lock.  This "transaction id" will be
  used later to associate locks with each other during deadlock detection.
- Add a "timeout" value for each lock.  This is used along with the
  DLM_LKF_TIMEOUT flag.

Also, remove a fragment of unused code in device_read().

This patch requires updating libdlm which is backward compatible with
older kernels.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

d7db923e

[DLM] add lock timeouts and warnings [2/6] · 3ae1acf9

由 David Teigland 提交于 5月 18, 2007

New features: lock timeouts and time warnings.  If the DLM_LKF_TIMEOUT
flag is set, then the request/conversion will be canceled after waiting
the specified number of centiseconds (specified per lock).  This feature
is only available for locks requested through libdlm (can be enabled for
kernel dlm users if there's a use for it.)

If the new DLM_LSFL_TIMEWARN flag is set when creating the lockspace, then
a warning message will be sent to userspace (using genetlink) after a
request/conversion has been waiting for a given number of centiseconds
(configurable per node).  The time warnings will be used in the future
to do deadlock detection in userspace.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

3ae1acf9

[DLM] block scand during recovery [1/6] · 85e86edf

由 David Teigland 提交于 5月 18, 2007

Don't let dlm_scand run during recovery since it may try to do a resource
directory removal while the directory nodes are changing.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

85e86edf

01 5月, 2007 5 次提交

[DLM] fix mode munging · 7d3c1feb

由 David Teigland 提交于 4月 19, 2007

There are flags to enable two specialized features in the dlm:
1. CONVDEADLK causes the dlm to resolve conversion deadlocks internally by
changing the granted mode of locks to NL.
2. ALTPR/ALTCW cause the dlm to change the requested mode of locks to PR
or CW to grant them if the normal requested mode can't be granted.

GFS direct i/o exercises both of these features, especially when mixed
with buffered i/o. The dlm has problems with them.

The first problem is on the master node. If it demotes a lock as a part of
converting it, the actual step of converting the lock isn't being done
after the demotion, the lock is just left sitting on the granted queue
with a granted mode of NL. I think the mistaken assumption was that the
call to grant_pending_locks() would grant it, but that function naturally
doesn't look at locks on the granted queue.

The second problem is on the process node. If the master either demotes
or gives an altmode, the munging of the gr/rq modes is never done in the
process copy of the lock, leaving the master/process copies out of sync.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

7d3c1feb

[DLM] change lkid format · ce03f12b

由 David Teigland 提交于 4月 02, 2007

A lock id is a uint32 and is used as an opaque reference to the lock. For
userland apps, the lkid is passed up, through libdlm, as the return value
from a write() on the dlm device. This created a problem when the high
bit was 1, making the lkid look like an error. This is fixed by changing
how the lkid is composed. The low 16 bits identified the hash bucket for
the lock and the high 16 bits were a per-bucket counter (which eventually
hit 0x8000 causing the problem). These are simply swapped around; the
number of hash table buckets is far below 0x8000, making all lkid's
positive when viewed as signed.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

ce03f12b

[DLM] add orphan purging code (1/2) · 8499137d

由 David Teigland 提交于 3月 30, 2007

Add code for purging orphan locks.  A process can also purge all of its
own non-orphan locks by passing a pid of zero.  Code already exists for
processes to create persistent locks that become orphans when the process
exits, but the complimentary capability for another process to then purge
these orphans has been missing.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

8499137d

[DLM] split create_message function · 7e4dac33

由 David Teigland 提交于 4月 02, 2007

This splits the current create_message() function into two parts so that
later patches can call the new lower-level _create_message() function when
they don't have an rsb struct. No functional change in this patch.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

7e4dac33

[DLM] overlapping cancel and unlock · ef0c2bb0

由 David Teigland 提交于 3月 28, 2007

Full cancel and force-unlock support.  In the past, cancel and force-unlock
wouldn't work if there was another operation in progress on the lock.  Now,
both cancel and unlock-force can overlap an operation on a lock, meaning there
may be 2 or 3 operations in progress on a lock in parallel.  This support is
important not only because cancel and force-unlock are explicit operations
that an app can use, but both are used implicitly when a process exits while
holding locks.

Summary of changes:

- add-to and remove-from waiters functions were rewritten to handle situations
  with more than one remote operation outstanding on a lock

- validate_unlock_args detects when an overlapping cancel/unlock-force
  can be sent and when it needs to be delayed until a request/lookup
  reply is received

- processing request/lookup replies detects when cancel/unlock-force
  occured during the op, and carries out the delayed cancel/unlock-force

- manipulation of the "waiters" (remote operation) state of a lock moved under
  the standard rsb mutex that protects all the other lock state

- the two recovery routines related to locks on the waiters list changed
  according to the way lkb's are now locked before accessing waiters state

- waiters recovery detects when lkb's being recovered have overlapping
  cancel/unlock-force, and may not recover such locks

- revert_lock (cancel) returns a value to distinguish cases where it did
  nothing vs cases where it actually did a cancel; the cancel completion ast
  should only be done when cancel did something

- orphaned locks put on new list so they can be found later for purging

- cancel must be called on a lock when making it an orphan

- flag user locks (ENDOFLIFE) at the end of their useful life (to the
  application) so we can return an error for any further cancel/unlock-force

- we weren't setting COMP/BAST ast flags if one was already set, so we'd lose
  either a completion or blocking ast

- clear an unread bast on a lock that's become unlocked
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

ef0c2bb0

06 2月, 2007 9 次提交

[DLM] zero new user lvbs · 62a0f623

由 David Teigland 提交于 1月 31, 2007

A new lvb for a userland lock wasn't being initialized to zero.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

62a0f623

[DLM] can miss clearing resend flag · b790c3b7

由 David Teigland 提交于 1月 24, 2007

A long, complicated sequence of events, beginning with the RESEND flag not
being cleared on an lkb, can result in an unlock never completing.

- lkb on waiters list for remote lookup
- the remote node is both the dir node and the master node, so
  it optimizes the lookup into a request and sends a request
  reply back
- the request reply is saved on the requestqueue to be processed
  after recovery
- recovery runs dlm_recover_waiters_pre() which sets RESEND flag
  so the lookup will be resent after recovery
- end of recovery: process_requestqueue takes saved request reply
  which removes the lkb off the waitesr list, _without_ clearing
  the RESEND flag
- end of recovery: dlm_recover_waiters_post() doesn't do anything
  with the now completed lookup lkb (would usually clear RESEND)
- later, the node unmounts, unlocks this lkb that still has RESEND
  flag set
- the lkb is on the waiters list again, now for unlock, when recovery
  occurs, dlm_recover_waiters_pre() shows the lkb for unlock with RESEND
  set, doesn't do anything since the master still exists
- end of recovery: dlm_recover_waiters_post() takes this lkb off
  the waiters list because it has the RESEND flag set, then reports
  an error because unlocks are never supposed to be handled in
  recover_waiters_post().
- later, the unlock reply is received, doesn't find the lkb on
  the waiters list because recover_waiters_post() has wrongly
  removed it.
- the unlock operation has been lost, and we're left with a
  stray granted lock
- unmount spins waiting for the unlock to complete

The visible evidence of this problem will be a node where gfs umount is
spinning, the dlm waiters list will be empty, and the dlm locks list will
show a granted lock.

The fix is simply to clear the RESEND flag when taking an lkb off the
waiters list.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

b790c3b7

[DLM] saved dlm message can be dropped · 8fd3a98f

由 David Teigland 提交于 1月 24, 2007

dlm_receive_message() returns 0 instead of returning 'error'. What would
happen is that process_requestqueue would take a saved message off the
requestqueue and call receive_message on it. receive_message would then
see that recovery had been aborted, set error to EINTR, and 'goto out',
expecting that the error would be returned. Instead, 0 was always
returned, so process_requestqueue would think that the message had been
processed and delete it instead of saving it to process next time. This
means the message (usually an unlock in my tests) would be lost.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

8fd3a98f

[DLM] fix user unlocking · a1bc86e6

由 David Teigland 提交于 1月 15, 2007

When a user process exits, we clear all the locks it holds. There is a
problem, though, with locks that the process had begun unlocking before it
exited. We couldn't find the lkb's that were in the process of being
unlocked remotely, to flag that they are DEAD. To solve this, we move
lkb's being unlocked onto a new list in the per-process structure that
tracks what locks the process is holding. We can then go through this
list to flag the necessary lkb's when clearing locks for a process when it
exits.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

a1bc86e6

[DLM] rename dlm_config_info fields · 68c817a1

由 David Teigland 提交于 1月 09, 2007

Add a "ci_" prefix to the fields in the dlm_config_info struct so that we
can use macros to add configfs functions to access them (in a later
patch). No functional changes in this patch, just naming changes.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

68c817a1

[DLM] fix lost flags in stub replies · 075529b5

由 David Teigland 提交于 12月 13, 2006

When the dlm fakes an unlock/cancel reply from a failed node using a stub
message struct, it wasn't setting the flags in the stub message. So, in
the process of receiving the fake message the lkb flags would be updated
and cleared from the zero flags in the message. The problem observed in
tests was the loss of the USER flag which caused the dlm to think a user
lock was a kernel lock and subsequently fail an assertion checking the
validity of the ast/callback field.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

075529b5

[DLM] fix receive_request() lvb copying · 8d07fd50

由 David Teigland 提交于 12月 13, 2006

LVB's are not sent as part of new requests, but the code receiving the
request was copying data into the lvb anyway. The space in the message
where it mistakenly thought the lvb lived actually contained the resource
name, so it wound up incorrectly copying this name data into the lvb. Fix
is to just create the lvb, not copy junk into it.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

8d07fd50

[DLM] fix send_args() lvb copying · da49f36f

由 David Teigland 提交于 12月 13, 2006

The send_args() function is used to copy parameters into a message for a
number different message types. Only some of those types are set up
beforehand (in create_message) to include space for sending lvb data.
send_args was wrongly copying the lvb for all message types as long as the
lock had an lvb. This means that the lvb data was being written past the
end of the message into unknown space.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

da49f36f

[DLM] fix resend rcom lock · dc200a88

由 David Teigland 提交于 12月 13, 2006

There's a chance the new master of resource hasn't learned it's the new
master before another node sends it a lock during recovery.  The node
sending the lock needs to resend if this happens.

- A sends a master lookup for resource R to C
- B sends a master lookup for resource R to C
- C receives A's lookup, assigns A to be master of R and
  sends a reply back to A
- C receives B's lookup and sends a reply back to B saying
  that A is the master
- B receives lookup reply from C and sends its lock for R to A
- A receives lock from B, doesn't think it's the master of R
  and sends an error back to B
- A receives lookup reply from C and becomes master of R
- B gets error back from A and resends its lock back to A
  (this resending is what this patch does)
- A receives lock from B, it now sees it's the master of R
  and takes the lock
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

dc200a88

30 11月, 2006 2 次提交

[DLM] clear sbflags on lock master · 6f90a8b1

由 David Teigland 提交于 11月 10, 2006

RH BZ 211622

The ALTMODE flag can be set in the lock master's copy of the lock but
never cleared, so ALTMODE will also be returned in a subsequent conversion
of the lock when it shouldn't be. This results in lock_dlm incorrectly
switching to the alternate lock mode when returning the result to gfs
which then asserts when it sees the wrong lock state. The fix is to
propagate the cleared sbflags value to the master node when the lock is
requested. QA's d_rwrandirectlarge test triggers this bug very quickly.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

6f90a8b1

[DLM] fix requestqueue race · d4400156

由 David Teigland 提交于 10月 31, 2006

Red Hat BZ 211914

There's a race between dlm_recoverd (1) enabling locking and (2) clearing
out the requestqueue, and dlm_recvd (1) checking if locking is enabled and
(2) adding a message to the requestqueue.  An order of recoverd(1),
recvd(1), recvd(2), recoverd(2) will result in a message being left on the
requestqueue.  The fix is to have dlm_recvd check if dlm_recoverd has
enabled locking after taking the mutex for the requestqueue and if it has
processing the message instead of queueing it.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

d4400156