1. 09 7月, 2007 4 次提交
  2. 08 7月, 2007 1 次提交
    • A
      DLM must depend on SYSFS · 95511ad4
      Adrian Bunk 提交于
      The dependency of DLM on SYSFS got lost in
      commit 6ed7257b resulting in the
      following compile error with CONFIG_DLM=y, CONFIG_SYSFS=n:
      
      <--  snip  -->
      
      ...
        LD      .tmp_vmlinux1
      fs/built-in.o: In function `dlm_lockspace_init':
      /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/fs/dlm/lockspace.c:231: undefined reference to `kernel_subsys'
      fs/built-in.o: In function `configfs_init':
      /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/fs/configfs/mount.c:143: undefined reference to `kernel_subsys'
      make[1]: *** [.tmp_vmlinux1] Error 1
      
      <--  snip  -->
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      95511ad4
  3. 03 5月, 2007 1 次提交
  4. 01 5月, 2007 15 次提交
    • D
      [DLM] lowcomms style · 617e82e1
      David Teigland 提交于
      Replace some printk with log_print, and fix some simple cases of lines
      over 80.  Also, return -ENOTCONN if lowcomms_start fails due to no local
      IP address being available.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      617e82e1
    • P
      [DLM] Lowcomms nodeid range & initialisation fixes · 30d3a237
      Patrick Caulfield 提交于
      Fix a few range & initialization bugs in lowcomms.
      - max_nodeid is really the highest nodeid encountered, so all loops must include
      it in their iterations.
      - clean dlm_local_count & connection_idr so we can do a clean restart.
      - Remove a spurious BUG_ON
      Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      30d3a237
    • J
      [DLM] Fix dlm_lowcoms_stop hang · 2439fe50
      Josef Bacik 提交于
      When you attempt to release a lockspace in DLM, it will hang trying to down a
      semaphore that has already been downed.  The attached patch fixes the problem.
      Signed-off-by: NJosef Bacik <jwhiter@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Patrick Caulfield <pcaulfie@redhat.com>
      2439fe50
    • D
      [DLM] fix mode munging · 7d3c1feb
      David Teigland 提交于
      There are flags to enable two specialized features in the dlm:
      1. CONVDEADLK causes the dlm to resolve conversion deadlocks internally by
         changing the granted mode of locks to NL.
      2. ALTPR/ALTCW cause the dlm to change the requested mode of locks to PR
         or CW to grant them if the normal requested mode can't be granted.
      
      GFS direct i/o exercises both of these features, especially when mixed
      with buffered i/o.  The dlm has problems with them.
      
      The first problem is on the master node. If it demotes a lock as a part of
      converting it, the actual step of converting the lock isn't being done
      after the demotion, the lock is just left sitting on the granted queue
      with a granted mode of NL.  I think the mistaken assumption was that the
      call to grant_pending_locks() would grant it, but that function naturally
      doesn't look at locks on the granted queue.
      
      The second problem is on the process node.  If the master either demotes
      or gives an altmode, the munging of the gr/rq modes is never done in the
      process copy of the lock, leaving the master/process copies out of sync.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7d3c1feb
    • A
      [DLM] fs/dlm/ast.c should #include "ast.h" · 8fa1de38
      Adrian Bunk 提交于
      Every file should include the headers containing the prototypes for
      it's global functions.
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8fa1de38
    • P
      [DLM] Consolidate transport protocols · 6ed7257b
      Patrick Caulfield 提交于
      This patch consolidates the TCP & SCTP protocols for the DLM into a single file
      and makes it switchable at run-time (well, at least before the DLM actually
      starts up!)
      
      For RHEL5 this patch requires Neil Horman's patch that expands the in-kernel
      socket API but that has already been twice ACKed so it should be OK.
      
      The patch adds a new lowcomms.c file that replaces the existing lowcomms-sctp.c
      & lowcomms-tcp.c files.
      Signed-off-By: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6ed7257b
    • P
      [DLM] Remove redundant assignment · fc7c44f0
      Patrick Caulfield 提交于
      This patch removes a redundant (and incorrect) assignment from compat_output
      Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fc7c44f0
    • D
      [DLM] change lkid format · ce03f12b
      David Teigland 提交于
      A lock id is a uint32 and is used as an opaque reference to the lock.  For
      userland apps, the lkid is passed up, through libdlm, as the return value
      from a write() on the dlm device.  This created a problem when the high
      bit was 1, making the lkid look like an error.  This is fixed by changing
      how the lkid is composed.  The low 16 bits identified the hash bucket for
      the lock and the high 16 bits were a per-bucket counter (which eventually
      hit 0x8000 causing the problem).  These are simply swapped around; the
      number of hash table buckets is far below 0x8000, making all lkid's
      positive when viewed as signed.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ce03f12b
    • D
      [DLM] interface for purge (2/2) · 72c2be77
      David Teigland 提交于
      Add code to accept purge commands from userland.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      72c2be77
    • D
      [DLM] add orphan purging code (1/2) · 8499137d
      David Teigland 提交于
      Add code for purging orphan locks.  A process can also purge all of its
      own non-orphan locks by passing a pid of zero.  Code already exists for
      processes to create persistent locks that become orphans when the process
      exits, but the complimentary capability for another process to then purge
      these orphans has been missing.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8499137d
    • D
      [DLM] split create_message function · 7e4dac33
      David Teigland 提交于
      This splits the current create_message() function into two parts so that
      later patches can call the new lower-level _create_message() function when
      they don't have an rsb struct.  No functional change in this patch.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7e4dac33
    • D
      [DLM] overlapping cancel and unlock · ef0c2bb0
      David Teigland 提交于
      Full cancel and force-unlock support.  In the past, cancel and force-unlock
      wouldn't work if there was another operation in progress on the lock.  Now,
      both cancel and unlock-force can overlap an operation on a lock, meaning there
      may be 2 or 3 operations in progress on a lock in parallel.  This support is
      important not only because cancel and force-unlock are explicit operations
      that an app can use, but both are used implicitly when a process exits while
      holding locks.
      
      Summary of changes:
      
      - add-to and remove-from waiters functions were rewritten to handle situations
        with more than one remote operation outstanding on a lock
      
      - validate_unlock_args detects when an overlapping cancel/unlock-force
        can be sent and when it needs to be delayed until a request/lookup
        reply is received
      
      - processing request/lookup replies detects when cancel/unlock-force
        occured during the op, and carries out the delayed cancel/unlock-force
      
      - manipulation of the "waiters" (remote operation) state of a lock moved under
        the standard rsb mutex that protects all the other lock state
      
      - the two recovery routines related to locks on the waiters list changed
        according to the way lkb's are now locked before accessing waiters state
      
      - waiters recovery detects when lkb's being recovered have overlapping
        cancel/unlock-force, and may not recover such locks
      
      - revert_lock (cancel) returns a value to distinguish cases where it did
        nothing vs cases where it actually did a cancel; the cancel completion ast
        should only be done when cancel did something
      
      - orphaned locks put on new list so they can be found later for purging
      
      - cancel must be called on a lock when making it an orphan
      
      - flag user locks (ENDOFLIFE) at the end of their useful life (to the
        application) so we can return an error for any further cancel/unlock-force
      
      - we weren't setting COMP/BAST ast flags if one was already set, so we'd lose
        either a completion or blocking ast
      
      - clear an unread bast on a lock that's become unlocked
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ef0c2bb0
    • P
      [DLM] fix coverity-spotted stupidity · 03206727
      Patrick Caulfield 提交于
      Replacement patch to remove redundant code rather than moving it around.
      Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      03206727
    • P
      [DLM] Don't delete misc device if lockspace removal fails · 254da030
      Patrick Caulfield 提交于
      Currently if the lockspace removal fails the misc device associated with a
      lockspace is left deleted. After that there is no way to access the orphaned
      lockspace from userland.
      
      This patch recreates the misc device if th dlm_release_lockspace fails. I
      believe this is better than attempting to remove the lockspace first because
      that leaves an unattached device lying around. The potential gap in which there
      is no access to the lockspace between removing the misc device and recreating it
      is acceptable ... after all the application is trying to remove it, and only new
      users of the lockspace will be affected.
      Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      254da030
    • P
      [DLM] Fix uninitialised variable in receiving · 89adc934
      Patrick Caulfield 提交于
      The length of the second element of the kvec array was not initialised before
      being added to the first one. This could cause invalid lengths to be passed to
      kernel_recvmsg
      Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      89adc934
  5. 08 3月, 2007 1 次提交
  6. 13 2月, 2007 1 次提交
  7. 12 2月, 2007 1 次提交
  8. 10 2月, 2007 1 次提交
  9. 06 2月, 2007 15 次提交
    • P
      [DLM] fix softlockup in dlm_recv · a34fbc63
      Patrick Caulfield 提交于
      This patch stops the dlm_recv workqueue from busy-waiting when a node
      disconnects. This can cause soft lockup errors on debug systems and bad
      performance generally.
      Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a34fbc63
    • D
      [DLM] zero new user lvbs · 62a0f623
      David Teigland 提交于
      A new lvb for a userland lock wasn't being initialized to zero.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      62a0f623
    • R
      [DLM/GFS2] indent help text · 9beeb9f3
      Randy Dunlap 提交于
      Indent help text as expected.
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9beeb9f3
    • A
      [GFS2/DLM] fix GFS2 circular dependency · 00117277
      Adrian Bunk 提交于
      On Sun, Jan 28, 2007 at 11:08:18AM +0100, Jiri Slaby wrote:
      > Andrew Morton napsal(a):
      > >Temporarily at
      > >
      > >	http://userweb.kernel.org/~akpm/2.6.20-rc6-mm1/
      >
      > Unable to select IPV6. Menuconfig doesn't offer it when INET is selected.
      > When it's not it appears in the menu, but after state change it gets away.
      > The same behaviour in xconfig, gconfig.
      >
      > $ mkdir ../a/tst
      > $ make O=../a/tst menuconfig
      >   HOSTCC  scripts/basic/fixdep
      > [...]
      >   HOSTLD  scripts/kconfig/mconf
      > scripts/kconfig/mconf arch/i386/Kconfig
      > Warning! Found recursive dependency: INET GFS2_FS_LOCKING_DLM SYSFS
      > OCFS2_FS INET
      >
      > Maybe this is the problem?
      
      Yes, patch below.
      
      > regards,
      
      cu
      Adrian
      
      <--  snip  -->
      
      This patch fixes a circular dependency by letting GFS2_FS_LOCKING_DLM
      and DLM depend on instead of select SYSFS.
      
      Since SYSFS depends on EMBEDDED this change shouldn't cause any problems
      for users.
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Acked-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      00117277
    • R
      [GFS2/DLM] use sysfs · 67f55897
      Randy Dunlap 提交于
      With CONFIG_DLM=m, CONFIG_PROC_FS=n, and CONFIG_SYSFS=n, kernel build
      fails with:
      
      WARNING: "kernel_subsys" [fs/gfs2/locking/dlm/lock_dlm.ko] undefined!
      WARNING: "kernel_subsys" [fs/dlm/dlm.ko] undefined!
      WARNING: "kernel_subsys" [fs/configfs/configfs.ko] undefined!
      make[1]: *** [__modpost] Error 1
      make: *** [modules] Error 2
      
      Since fs/dlm/lockspace.c and fs/gfs2/locking/dlm/sysfs.c use
      kernel_subsys, they should either DEPEND on it or SELECT it.
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      67f55897
    • D
      [DLM] can miss clearing resend flag · b790c3b7
      David Teigland 提交于
      A long, complicated sequence of events, beginning with the RESEND flag not
      being cleared on an lkb, can result in an unlock never completing.
      
      - lkb on waiters list for remote lookup
      - the remote node is both the dir node and the master node, so
        it optimizes the lookup into a request and sends a request
        reply back
      - the request reply is saved on the requestqueue to be processed
        after recovery
      - recovery runs dlm_recover_waiters_pre() which sets RESEND flag
        so the lookup will be resent after recovery
      - end of recovery: process_requestqueue takes saved request reply
        which removes the lkb off the waitesr list, _without_ clearing
        the RESEND flag
      - end of recovery: dlm_recover_waiters_post() doesn't do anything
        with the now completed lookup lkb (would usually clear RESEND)
      - later, the node unmounts, unlocks this lkb that still has RESEND
        flag set
      - the lkb is on the waiters list again, now for unlock, when recovery
        occurs, dlm_recover_waiters_pre() shows the lkb for unlock with RESEND
        set, doesn't do anything since the master still exists
      - end of recovery: dlm_recover_waiters_post() takes this lkb off
        the waiters list because it has the RESEND flag set, then reports
        an error because unlocks are never supposed to be handled in
        recover_waiters_post().
      - later, the unlock reply is received, doesn't find the lkb on
        the waiters list because recover_waiters_post() has wrongly
        removed it.
      - the unlock operation has been lost, and we're left with a
        stray granted lock
      - unmount spins waiting for the unlock to complete
      
      The visible evidence of this problem will be a node where gfs umount is
      spinning, the dlm waiters list will be empty, and the dlm locks list will
      show a granted lock.
      
      The fix is simply to clear the RESEND flag when taking an lkb off the
      waiters list.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b790c3b7
    • D
      [DLM] saved dlm message can be dropped · 8fd3a98f
      David Teigland 提交于
      dlm_receive_message() returns 0 instead of returning 'error'.  What would
      happen is that process_requestqueue would take a saved message off the
      requestqueue and call receive_message on it.  receive_message would then
      see that recovery had been aborted, set error to EINTR, and 'goto out',
      expecting that the error would be returned.  Instead, 0 was always
      returned, so process_requestqueue would think that the message had been
      processed and delete it instead of saving it to process next time.  This
      means the message (usually an unlock in my tests) would be lost.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8fd3a98f
    • P
      [DLM] Make sock_sem into a mutex · f1f1c1cc
      Patrick Caulfield 提交于
      Now that there can be multiple dlm_recv threads running we need to prevent two
      recvs running for the same connection - it's unlikely but it can happen and it
      causes message corruption.
      Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f1f1c1cc
    • P
      [DLM] fix lowcomms receiving · bd44e2b0
      Patrick Caulfield 提交于
      This patch fixes a bug whereby data on a newly accepted connection would be
      ignored if it arrived soon after the accept.
      Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      bd44e2b0
    • P
      [DLM] lowcomms tidy · f2f5095f
      Patrick Caulfield 提交于
      This patch removes some redundant fields from the connection structure and adds
      some lockdep annotation to remove spurious warnings.
      Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f2f5095f
    • D
      [DLM] fix master recovery · 222d3960
      David Teigland 提交于
      If master recovery happens on an rsb in one recovery sequence, then that
      sequence is aborted before lock recovery happens, then in the next
      sequence, we rely on the previous master recovery (which may now be
      invalid due to another node ignoring a lookup result) and go on do to the
      lock recovery where we get stuck due to an invalid master value.
      
       recovery cycle begins: master of rsb X has left
       nodes A and B send node C an rcom lookup for X to find the new master
       C gets lookup from B first, sets B as new master, and sends reply back to B
       C gets lookup from A next, and sends reply back to A saying B is master
       A gets lookup reply from C and sets B as the new master in the rsb
       recovery cycle on A, B and C is aborted to start a new recovery
       B gets lookup reply from C and ignores it since there's a new recovery
       recovery cycle begins: some other node has joined
       B doesn't think it's the master of X so it doesn't rebuild it in the directory
       C looks up the master of X, no one is master, so it becomes new master
       B looks up the master of X, finds it's C
       A believes that B is the master of X, so it sends its lock to B
       B sends an error back to A
       A resends
       this repeats forever, the incorrect master value on A is never corrected
      
      The fix is to do master recovery on an rsb that still has the NEW_MASTER
      flag set from an earlier recovery sequence, and therefore didn't complete
      lock recovery.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      222d3960
    • D
      [DLM] fix user unlocking · a1bc86e6
      David Teigland 提交于
      When a user process exits, we clear all the locks it holds.  There is a
      problem, though, with locks that the process had begun unlocking before it
      exited.  We couldn't find the lkb's that were in the process of being
      unlocked remotely, to flag that they are DEAD.  To solve this, we move
      lkb's being unlocked onto a new list in the per-process structure that
      tracks what locks the process is holding.  We can then go through this
      list to flag the necessary lkb's when clearing locks for a process when it
      exits.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a1bc86e6
    • P
      [DLM] Use workqueues for dlm lowcomms · 1d6e8131
      Patrick Caulfield 提交于
      This patch converts the DLM TCP lowcomms to use workqueues rather than using its
      own daemon functions. Simultaneously removing a lot of code and making it more
      scalable on multi-processor machines.
      Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1d6e8131
    • D
      [DLM] expose dlm_config_info fields in configfs · d200778e
      David Teigland 提交于
      Make the dlm_config_info values readable and writeable via configfs
      entries.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d200778e
    • D
      [DLM] add config entry to enable log_debug · 99fc6487
      David Teigland 提交于
      Add a new dlm_config_info field to enable log_debug output and change
      log_debug() to use it.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      99fc6487