- 30 1月, 2008 2 次提交
-
-
由 David Teigland 提交于
Change log_error() to log_debug() for conditions that can occur in large number in normal operation. Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 Adrian Bunk 提交于
This patch adds a proper prototype for some functions in fs/dlm/dlm_internal.h Signed-off-by: NAdrian Bunk <bunk@kernel.org> Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
- 10 10月, 2007 2 次提交
-
-
由 David Teigland 提交于
Introduce a per-lockspace rwsem that's held in read mode by dlm_recv threads while working in the dlm. This allows dlm_recv activity to be suspended when the lockspace transitions to, from and between recovery cycles. The specific bug prompting this change is one where an in-progress recovery cycle is aborted by a new recovery cycle. While dlm_recv was processing a recovery message, the recovery cycle was aborted and dlm_recoverd began cleaning up. dlm_recv decremented recover_locks_count on an rsb after dlm_recoverd had reset it to zero. This is fixed by suspending dlm_recv (taking write lock on the rwsem) before aborting the current recovery. The transitions to/from normal and recovery modes are simplified by using this new ability to block dlm_recv. The switch from normal to recovery mode means dlm_recv goes from processing locking messages, to saving them for later, and vice versa. Races are avoided by blocking dlm_recv when setting the flag that switches between modes. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Patrick Caulfield 提交于
If the castaddr passed to the userland API is NULL then don't overwrite the existing castparam. This allows a different thread to cancel a lock request and the CANCEL AST gets delivered to the original thread. bz#306391 (for RHEL4) refers. Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 14 8月, 2007 1 次提交
-
-
由 David Teigland 提交于
Fix a long standing bug where a blocking callback would be missed when there's a granted lock in PR mode and waiting locks in both PR and CW modes (and the PR lock was added to the waiting queue before the CW lock). The logic simply compared the numerical values of the modes to determine if a blocking callback was required, but in the one case of PR and CW, the lower valued CW mode blocks the higher valued PR mode. We just need to add a special check for this PR/CW case in the tests that decide when a blocking callback is needed. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 09 7月, 2007 9 次提交
-
-
由 Patrick Caulfield 提交于
Add a new flag, DLM_LSFL_FS, to be used when a file system creates a lockspace. This flag causes the dlm to use GFP_NOFS for allocations instead of GFP_KERNEL. (This updated version of the patch uses gfp_t for ls_allocation.) Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com> Signed-Off-By: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
Add a function that can be used through libdlm by a system daemon to cancel another process's deadlocked lock. A completion ast with EDEADLK is returned to the process waiting for the lock. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
Various fixes related to the new timeout feature: - add_timeout() missed setting TIMEWARN flag on lkb's when the TIMEOUT flag was already set - clear_proc_locks should remove a dead process's locks from the timeout list - the end-of-life calculation for user locks needs to consider that ETIMEDOUT is equivalent to -DLM_ECANCEL - make initial default timewarn_cs config value visible in configfs - change bit position of TIMEOUT_CANCEL flag so it's not copied to a remote master node - set timestamp on remote lkb's so a lock dump will display the time they've been waiting Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
A one liner fix which got missed from the earlier patches. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Cc: Fabio Massimo Di Nitto <fabbione@ubuntu.com> Cc: David Teigland <teigland@redhat.com>
-
由 David Teigland 提交于
In the rush to get the previous patch set sent, a compilation bug I fixed shortly before sending somehow got clobbered, probably by a missed quilt refresh or something. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
When conversion deadlock is detected, cancel the conversion and return EDEADLK to the application. This is a new default behavior where before the dlm would allow the deadlock to exist indefinately. The DLM_LKF_NODLCKWT flag can now be used in a conversion to prevent the dlm from performing conversion deadlock detection/cancelation on it. The DLM_LKF_CONVDEADLK flag can continue to be used as before to tell the dlm to demote the granted mode of the lock being converted if it gets into a conversion deadlock. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
Change the user/kernel device interface used by libdlm: - Add ability for userspace to check the version of the interface. libdlm can now adapt to different versions of the kernel interface. - Increase the size of the flags passed in a lock request so all possible flags can be used from userspace. - Add an opaque "xid" value for each lock. This "transaction id" will be used later to associate locks with each other during deadlock detection. - Add a "timeout" value for each lock. This is used along with the DLM_LKF_TIMEOUT flag. Also, remove a fragment of unused code in device_read(). This patch requires updating libdlm which is backward compatible with older kernels. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
New features: lock timeouts and time warnings. If the DLM_LKF_TIMEOUT flag is set, then the request/conversion will be canceled after waiting the specified number of centiseconds (specified per lock). This feature is only available for locks requested through libdlm (can be enabled for kernel dlm users if there's a use for it.) If the new DLM_LSFL_TIMEWARN flag is set when creating the lockspace, then a warning message will be sent to userspace (using genetlink) after a request/conversion has been waiting for a given number of centiseconds (configurable per node). The time warnings will be used in the future to do deadlock detection in userspace. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
Don't let dlm_scand run during recovery since it may try to do a resource directory removal while the directory nodes are changing. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 01 5月, 2007 5 次提交
-
-
由 David Teigland 提交于
There are flags to enable two specialized features in the dlm: 1. CONVDEADLK causes the dlm to resolve conversion deadlocks internally by changing the granted mode of locks to NL. 2. ALTPR/ALTCW cause the dlm to change the requested mode of locks to PR or CW to grant them if the normal requested mode can't be granted. GFS direct i/o exercises both of these features, especially when mixed with buffered i/o. The dlm has problems with them. The first problem is on the master node. If it demotes a lock as a part of converting it, the actual step of converting the lock isn't being done after the demotion, the lock is just left sitting on the granted queue with a granted mode of NL. I think the mistaken assumption was that the call to grant_pending_locks() would grant it, but that function naturally doesn't look at locks on the granted queue. The second problem is on the process node. If the master either demotes or gives an altmode, the munging of the gr/rq modes is never done in the process copy of the lock, leaving the master/process copies out of sync. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
A lock id is a uint32 and is used as an opaque reference to the lock. For userland apps, the lkid is passed up, through libdlm, as the return value from a write() on the dlm device. This created a problem when the high bit was 1, making the lkid look like an error. This is fixed by changing how the lkid is composed. The low 16 bits identified the hash bucket for the lock and the high 16 bits were a per-bucket counter (which eventually hit 0x8000 causing the problem). These are simply swapped around; the number of hash table buckets is far below 0x8000, making all lkid's positive when viewed as signed. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
Add code for purging orphan locks. A process can also purge all of its own non-orphan locks by passing a pid of zero. Code already exists for processes to create persistent locks that become orphans when the process exits, but the complimentary capability for another process to then purge these orphans has been missing. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
This splits the current create_message() function into two parts so that later patches can call the new lower-level _create_message() function when they don't have an rsb struct. No functional change in this patch. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
Full cancel and force-unlock support. In the past, cancel and force-unlock wouldn't work if there was another operation in progress on the lock. Now, both cancel and unlock-force can overlap an operation on a lock, meaning there may be 2 or 3 operations in progress on a lock in parallel. This support is important not only because cancel and force-unlock are explicit operations that an app can use, but both are used implicitly when a process exits while holding locks. Summary of changes: - add-to and remove-from waiters functions were rewritten to handle situations with more than one remote operation outstanding on a lock - validate_unlock_args detects when an overlapping cancel/unlock-force can be sent and when it needs to be delayed until a request/lookup reply is received - processing request/lookup replies detects when cancel/unlock-force occured during the op, and carries out the delayed cancel/unlock-force - manipulation of the "waiters" (remote operation) state of a lock moved under the standard rsb mutex that protects all the other lock state - the two recovery routines related to locks on the waiters list changed according to the way lkb's are now locked before accessing waiters state - waiters recovery detects when lkb's being recovered have overlapping cancel/unlock-force, and may not recover such locks - revert_lock (cancel) returns a value to distinguish cases where it did nothing vs cases where it actually did a cancel; the cancel completion ast should only be done when cancel did something - orphaned locks put on new list so they can be found later for purging - cancel must be called on a lock when making it an orphan - flag user locks (ENDOFLIFE) at the end of their useful life (to the application) so we can return an error for any further cancel/unlock-force - we weren't setting COMP/BAST ast flags if one was already set, so we'd lose either a completion or blocking ast - clear an unread bast on a lock that's become unlocked Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 06 2月, 2007 9 次提交
-
-
由 David Teigland 提交于
A new lvb for a userland lock wasn't being initialized to zero. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
A long, complicated sequence of events, beginning with the RESEND flag not being cleared on an lkb, can result in an unlock never completing. - lkb on waiters list for remote lookup - the remote node is both the dir node and the master node, so it optimizes the lookup into a request and sends a request reply back - the request reply is saved on the requestqueue to be processed after recovery - recovery runs dlm_recover_waiters_pre() which sets RESEND flag so the lookup will be resent after recovery - end of recovery: process_requestqueue takes saved request reply which removes the lkb off the waitesr list, _without_ clearing the RESEND flag - end of recovery: dlm_recover_waiters_post() doesn't do anything with the now completed lookup lkb (would usually clear RESEND) - later, the node unmounts, unlocks this lkb that still has RESEND flag set - the lkb is on the waiters list again, now for unlock, when recovery occurs, dlm_recover_waiters_pre() shows the lkb for unlock with RESEND set, doesn't do anything since the master still exists - end of recovery: dlm_recover_waiters_post() takes this lkb off the waiters list because it has the RESEND flag set, then reports an error because unlocks are never supposed to be handled in recover_waiters_post(). - later, the unlock reply is received, doesn't find the lkb on the waiters list because recover_waiters_post() has wrongly removed it. - the unlock operation has been lost, and we're left with a stray granted lock - unmount spins waiting for the unlock to complete The visible evidence of this problem will be a node where gfs umount is spinning, the dlm waiters list will be empty, and the dlm locks list will show a granted lock. The fix is simply to clear the RESEND flag when taking an lkb off the waiters list. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
dlm_receive_message() returns 0 instead of returning 'error'. What would happen is that process_requestqueue would take a saved message off the requestqueue and call receive_message on it. receive_message would then see that recovery had been aborted, set error to EINTR, and 'goto out', expecting that the error would be returned. Instead, 0 was always returned, so process_requestqueue would think that the message had been processed and delete it instead of saving it to process next time. This means the message (usually an unlock in my tests) would be lost. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
When a user process exits, we clear all the locks it holds. There is a problem, though, with locks that the process had begun unlocking before it exited. We couldn't find the lkb's that were in the process of being unlocked remotely, to flag that they are DEAD. To solve this, we move lkb's being unlocked onto a new list in the per-process structure that tracks what locks the process is holding. We can then go through this list to flag the necessary lkb's when clearing locks for a process when it exits. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
Add a "ci_" prefix to the fields in the dlm_config_info struct so that we can use macros to add configfs functions to access them (in a later patch). No functional changes in this patch, just naming changes. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
When the dlm fakes an unlock/cancel reply from a failed node using a stub message struct, it wasn't setting the flags in the stub message. So, in the process of receiving the fake message the lkb flags would be updated and cleared from the zero flags in the message. The problem observed in tests was the loss of the USER flag which caused the dlm to think a user lock was a kernel lock and subsequently fail an assertion checking the validity of the ast/callback field. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
LVB's are not sent as part of new requests, but the code receiving the request was copying data into the lvb anyway. The space in the message where it mistakenly thought the lvb lived actually contained the resource name, so it wound up incorrectly copying this name data into the lvb. Fix is to just create the lvb, not copy junk into it. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
The send_args() function is used to copy parameters into a message for a number different message types. Only some of those types are set up beforehand (in create_message) to include space for sending lvb data. send_args was wrongly copying the lvb for all message types as long as the lock had an lvb. This means that the lvb data was being written past the end of the message into unknown space. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
There's a chance the new master of resource hasn't learned it's the new master before another node sends it a lock during recovery. The node sending the lock needs to resend if this happens. - A sends a master lookup for resource R to C - B sends a master lookup for resource R to C - C receives A's lookup, assigns A to be master of R and sends a reply back to A - C receives B's lookup and sends a reply back to B saying that A is the master - B receives lookup reply from C and sends its lock for R to A - A receives lock from B, doesn't think it's the master of R and sends an error back to B - A receives lookup reply from C and becomes master of R - B gets error back from A and resends its lock back to A (this resending is what this patch does) - A receives lock from B, it now sees it's the master of R and takes the lock Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 30 11月, 2006 2 次提交
-
-
由 David Teigland 提交于
RH BZ 211622 The ALTMODE flag can be set in the lock master's copy of the lock but never cleared, so ALTMODE will also be returned in a subsequent conversion of the lock when it shouldn't be. This results in lock_dlm incorrectly switching to the alternate lock mode when returning the result to gfs which then asserts when it sees the wrong lock state. The fix is to propagate the cleared sbflags value to the master node when the lock is requested. QA's d_rwrandirectlarge test triggers this bug very quickly. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
Red Hat BZ 211914 There's a race between dlm_recoverd (1) enabling locking and (2) clearing out the requestqueue, and dlm_recvd (1) checking if locking is enabled and (2) adding a message to the requestqueue. An order of recoverd(1), recvd(1), recvd(2), recoverd(2) will result in a message being left on the requestqueue. The fix is to have dlm_recvd check if dlm_recoverd has enabled locking after taking the mutex for the requestqueue and if it has processing the message instead of queueing it. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 25 9月, 2006 1 次提交
-
-
由 Steven Whitehouse 提交于
As per Andrew Morton's request, removed trailing whitespace. Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 09 9月, 2006 1 次提交
-
-
由 David Teigland 提交于
Fixing the following scenario: - A request is on the waiters list waiting for a reply from a remote node. - The request is the first one on the resource, so first_lkid is set. - The remote node fails causing recovery. - During recovery the requesting node becomes master. - The request is now processed locally instead of being a remote operation. - At this point we need to call confirm_master() on the resource since we're certain we're now the master node. This will clear first_lkid. - We weren't calling confirm_master(), so first_lkid was not being cleared causing subsequent requests on that resource to get stuck. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 24 8月, 2006 1 次提交
-
-
由 David Teigland 提交于
The down-conversion optimization was resulting in the lkb flags being cleared because the stub message reply had no flags value set. Copy the current flags into the stub message so they'll be copied back into the lkb as part of processing the fake reply. Also add an assertion to catch this error more directly if it exists elsewhere. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 23 8月, 2006 2 次提交
-
-
由 Patrick Caulfield 提交于
Oh, and here's (hopefully) the last of these ua_tmp patches. I think I've caught all the paths now. Sorry it didn't make the last one. Signed-off-by: NPatrick Caulfield <pcaulfie@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Patrick Caulfield 提交于
This patch fixes bz#203444 where the LKSB was lost during userland conversion operations Signed-off-by: NPatrick Caulfield <pcaulfie@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 21 8月, 2006 1 次提交
-
-
由 David Teigland 提交于
Introduce new function dlm_dump_rsb() to call within assertions instead of dlm_print_rsb(). The new function dumps info about all locks on the rsb in addition to rsb details. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 08 8月, 2006 1 次提交
-
-
由 Patrick Caulfield 提交于
This patch fixes the userland DLM unlock code so that it correctly returns the address of the userland lock status block in its completion AST. It fixes bug #201348 Patrick Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 26 7月, 2006 2 次提交
-
-
由 David Teigland 提交于
The loop through all waiting locks in recover_waiters can potentially be long, so we should schedule explicitly. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 David Teigland 提交于
The loop in grant_after_purge is intended to find all rsb's in each hash bucket that have the LOCKS_PURGED flag set. The loop was quitting the current bucket after finding just one rsb instead of going until there are no more. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 20 7月, 2006 1 次提交
-
-
由 David Teigland 提交于
User NOQUEUE lock requests to a remote node that failed with -EAGAIN were never being removed from a process's list of locks. Signed-off-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-