1. 08 2月, 2007 6 次提交
    • E
      sysfs: Shadow directory support · b592fcfe
      Eric W. Biederman 提交于
      The problem.  When implementing a network namespace I need to be able
      to have multiple network devices with the same name.  Currently this
      is a problem for /sys/class/net/*. 
      
      What I want is a separate /sys/class/net directory in sysfs for each
      network namespace, and I want to name each of them /sys/class/net.
      
      I looked and the VFS actually allows that.  All that is needed is
      for /sys/class/net to implement a follow link method to redirect
      lookups to the real directory you want. 
      
      Implementing a follow link method that is sensitive to the current
      network namespace turns out to be 3 lines of code so it looks like a
      clean approach.  Modifying sysfs so it doesn't get in my was is a bit
      trickier. 
      
      I am calling the concept of multiple directories all at the same path
      in the filesystem shadow directories.  With the directory entry really
      at that location the shadow master. 
      
      The following patch modifies sysfs so it can handle a directory
      structure slightly different from the kobject tree so I can implement
      the shadow directories for handling /sys/class/net/.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Maneesh Soni <maneesh@in.ibm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      b592fcfe
    • O
      sysfs: error handling in sysfs, fill_read_buffer() · 82244b16
      Oliver Neukum 提交于
      if a driver returns an error in fill_read_buffer(), the buffer will be
      marked as filled. Subsequent reads will return eof. But there is
      no data because of an error, not because it has been read.
      Not marking the buffer filled is the obvious fix.
      Signed-off-by: NOliver Neukum <oliver@neukum.name>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      82244b16
    • M
      sysfs: kobject_put cleanup · f7506536
      Mariusz Kozlowski 提交于
      This patch removes redundant argument checks for kobject_put().
      Signed-off-by: NMariusz Kozlowski <m.kozlowski@tuxland.pl>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      
      f7506536
    • F
      sysfs: suppress lockdep warnings · d3fc373a
      Frederik Deweerdt 提交于
      Lockdep issues the following warning:
      [    9.064000] =============================================
      [    9.064000] [ INFO: possible recursive locking detected ]
      [    9.064000] 2.6.20-rc3-mm1 #3
      [    9.064000] ---------------------------------------------
      [    9.064000] init/1 is trying to acquire lock:
      [    9.064000]  (&sysfs_inode_imutex_key){--..}, at: [<c03e6afc>] mutex_lock+0x1c/0x1f
      [    9.064000]
      [    9.064000] but task is already holding lock:
      [    9.064000]  (&sysfs_inode_imutex_key){--..}, at: [<c03e6afc>] mutex_lock+0x1c/0x1f
      [    9.065000]
      [    9.065000] other info that might help us debug this:
      [    9.065000] 2 locks held by init/1:
      [    9.065000]  #0:  (tty_mutex){--..}, at: [<c03e6afc>] mutex_lock+0x1c/0x1f
      [    9.065000]  #1:  (&sysfs_inode_imutex_key){--..}, at: [<c03e6afc>] mutex_lock+0x1c/0x1f
      [    9.065000]
      [    9.065000] stack backtrace:
      [    9.065000]  [<c010390d>] show_trace_log_lvl+0x1a/0x30
      [    9.066000]  [<c0103935>] show_trace+0x12/0x14
      [    9.066000]  [<c0103a2f>] dump_stack+0x16/0x18
      [    9.066000]  [<c0138cb8>] print_deadlock_bug+0xb9/0xc3
      [    9.066000]  [<c0138d17>] check_deadlock+0x55/0x5a
      [    9.066000]  [<c013a953>] __lock_acquire+0x371/0xbf0
      [    9.066000]  [<c013b7a9>] lock_acquire+0x69/0x83
      [    9.066000]  [<c03e6b7e>] __mutex_lock_slowpath+0x75/0x2d1
      [    9.066000]  [<c03e6afc>] mutex_lock+0x1c/0x1f
      [    9.066000]  [<c01b249c>] sysfs_drop_dentry+0xb1/0x133
      [    9.066000]  [<c01b25d1>] sysfs_hash_and_remove+0xb3/0x142
      [    9.066000]  [<c01b30ed>] sysfs_remove_file+0xd/0x10
      [    9.067000]  [<c02849e0>] device_remove_file+0x23/0x2e
      [    9.067000]  [<c02850b2>] device_del+0x188/0x1e6
      [    9.067000]  [<c028511b>] device_unregister+0xb/0x15
      [    9.067000]  [<c0285318>] device_destroy+0x9c/0xa9
      [    9.067000]  [<c0261431>] vcs_remove_sysfs+0x1c/0x3b
      [    9.067000]  [<c0267a08>] con_close+0x5e/0x6b
      [    9.067000]  [<c02598f2>] release_dev+0x4c4/0x6e5
      [    9.067000]  [<c0259faa>] tty_release+0x12/0x1c
      [    9.067000]  [<c0174872>] __fput+0x177/0x1a0
      [    9.067000]  [<c01746f5>] fput+0x3b/0x41
      [    9.068000]  [<c0172ee1>] filp_close+0x36/0x65
      [    9.068000]  [<c0172f73>] sys_close+0x63/0xa4
      [    9.068000]  [<c0102a96>] sysenter_past_esp+0x5f/0x99
      [    9.068000]  =======================
      
      This is due to sysfs_hash_and_remove() holding dir->d_inode->i_mutex
      before calling sysfs_drop_dentry() which calls orphan_all_buffers()
      which in turn takes node->i_mutex.
      Signed-off-by: NFrederik Deweerdt <frederik.deweerdt@gmail.com>
      Cc: Oliver Neukum <oliver@neukum.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      d3fc373a
    • O
      Driver core: fix race in sysfs between sysfs_remove_file() and read()/write() · 94bebf4d
      Oliver Neukum 提交于
      This patch prevents a race between IO and removing a file from sysfs.
      It introduces a list of sysfs_buffers associated with a file at the inode.
      Upon removal of a file the list is walked and the buffers marked orphaned.
      IO to orphaned buffers fails with -ENODEV. The driver can safely free
      associated data structures or be unloaded.
      Signed-off-by: NOliver Neukum <oliver@neukum.name>
      Acked-by: NManeesh Soni <maneesh@in.ibm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      
      94bebf4d
    • C
      driver core: Allow device_move(dev, NULL). · c744aeae
      Cornelia Huck 提交于
      If we allow NULL as the new parent in device_move(), we need to make sure
      that the device is placed into the same place as it would if it was
      newly registered:
      
      - Consider the device virtual tree. In order to be able to reuse code,
        setup_parent() has been tweaked a bit.
      - kobject_move() can fall back to the kset's kobject.
      - sysfs_move_dir() uses the sysfs root dir as fallback.
      Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Cc: Marcel Holtmann <marcel@holtmann.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      c744aeae
  2. 07 2月, 2007 4 次提交
  3. 06 2月, 2007 30 次提交
    • P
      [DLM] fix softlockup in dlm_recv · a34fbc63
      Patrick Caulfield 提交于
      This patch stops the dlm_recv workqueue from busy-waiting when a node
      disconnects. This can cause soft lockup errors on debug systems and bad
      performance generally.
      Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a34fbc63
    • D
      [DLM] zero new user lvbs · 62a0f623
      David Teigland 提交于
      A new lvb for a userland lock wasn't being initialized to zero.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      62a0f623
    • R
      [DLM/GFS2] indent help text · 9beeb9f3
      Randy Dunlap 提交于
      Indent help text as expected.
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9beeb9f3
    • R
      [GFS2] Fix unlink deadlocks · ddee7608
      Russell Cattelan 提交于
      Move the glock acquisition to outside of the transactions.
      
      Lock odering must be preserved in order to prevent ABBA
      deadlocks. The current gfs2_change_nlink code would tries
      to grab the glock after having started a transaction and thus is holding
      the log lock. This is inconsistent with other code paths in
      gfs that grab the resource group glock prior to staring
      a tranactions.
      
      One problem with this fix is that the resource group
      lock is always grabbed now even if the inode still has
      ref count and can not be marked for unlink.
      Signed-off-by: NRussell Cattelan <cattelan@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ddee7608
    • S
      [GFS2] Put back semaphore to avoid umount problem · 61be084e
      Steven Whitehouse 提交于
      Dave Teigland fixed this bug a while back, but I managed to mistakenly
      remove the semaphore during later development. It is required to avoid
      the list of inodes changing during an invalidate_inodes call. I have
      made it an rwsem since the read side will be taken frequently during
      normal filesystem operation. The write site will only happen during
      umount of the file system.
      
      Also the bug only triggers when using the DLM lock manager and only then
      under certain conditions as its timing related.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: David Teigland <teigland@redhat.com>
      61be084e
    • E
      [GFS2] more CURRENT_TIME_SEC · bbb28ab7
      Eric Sandeen 提交于
      Whoops, quilt user error, missed this one in the previous patch.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      bbb28ab7
    • A
      [GFS2/DLM] fix GFS2 circular dependency · 00117277
      Adrian Bunk 提交于
      On Sun, Jan 28, 2007 at 11:08:18AM +0100, Jiri Slaby wrote:
      > Andrew Morton napsal(a):
      > >Temporarily at
      > >
      > >	http://userweb.kernel.org/~akpm/2.6.20-rc6-mm1/
      >
      > Unable to select IPV6. Menuconfig doesn't offer it when INET is selected.
      > When it's not it appears in the menu, but after state change it gets away.
      > The same behaviour in xconfig, gconfig.
      >
      > $ mkdir ../a/tst
      > $ make O=../a/tst menuconfig
      >   HOSTCC  scripts/basic/fixdep
      > [...]
      >   HOSTLD  scripts/kconfig/mconf
      > scripts/kconfig/mconf arch/i386/Kconfig
      > Warning! Found recursive dependency: INET GFS2_FS_LOCKING_DLM SYSFS
      > OCFS2_FS INET
      >
      > Maybe this is the problem?
      
      Yes, patch below.
      
      > regards,
      
      cu
      Adrian
      
      <--  snip  -->
      
      This patch fixes a circular dependency by letting GFS2_FS_LOCKING_DLM
      and DLM depend on instead of select SYSFS.
      
      Since SYSFS depends on EMBEDDED this change shouldn't cause any problems
      for users.
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Acked-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      00117277
    • R
      [GFS2/DLM] use sysfs · 67f55897
      Randy Dunlap 提交于
      With CONFIG_DLM=m, CONFIG_PROC_FS=n, and CONFIG_SYSFS=n, kernel build
      fails with:
      
      WARNING: "kernel_subsys" [fs/gfs2/locking/dlm/lock_dlm.ko] undefined!
      WARNING: "kernel_subsys" [fs/dlm/dlm.ko] undefined!
      WARNING: "kernel_subsys" [fs/configfs/configfs.ko] undefined!
      make[1]: *** [__modpost] Error 1
      make: *** [modules] Error 2
      
      Since fs/dlm/lockspace.c and fs/gfs2/locking/dlm/sysfs.c use
      kernel_subsys, they should either DEPEND on it or SELECT it.
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      67f55897
    • D
      [GFS2] make lock_dlm drop_count tunable in sysfs · ee32e4f3
      David Teigland 提交于
      We want to be able to change or disable the default drop_count (number at
      which the dlm asks gfs to limit the the number of locks it's holding).
      Add it to the collection of sysfs tunables for an fs.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ee32e4f3
    • D
      [GFS2] increase default lock limit · 2f708649
      David Teigland 提交于
      Increase the number of locks at which point the dlm begins asking gfs to
      reduce its lock usage.  The default value is largely arbitrary, but the
      current value of 50,000 ends up limiting performance unnecessarily for too
      many users.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2f708649
    • S
      [GFS2] Fix list corruption in lops.c · 8bd95727
      Steven Whitehouse 提交于
      The patch below appears to fix the list corruption that we are seeing on
      occasion. Although the transaction structure is private to a single
      thread, when the queued structures are dismantled during an in-core
      commit, its possible for a different thread to be trying to add the same
      structure to another, new, transaction at the same time.
      
      To avoid this, this patch takes the log spinlock during this operation.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8bd95727
    • S
      [GFS2] Fix recursive locking attempt with NFS · d7c103d0
      Steven Whitehouse 提交于
      In certain cases, its possible for NFS to call the lookup code while
      holding the glock (when doing a readdirplus operation) so we need to
      check for that and not try and lock the glock twice. This also fixes a
      typo in a previous NFS related GFS2 patch.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d7c103d0
    • D
      [DLM] can miss clearing resend flag · b790c3b7
      David Teigland 提交于
      A long, complicated sequence of events, beginning with the RESEND flag not
      being cleared on an lkb, can result in an unlock never completing.
      
      - lkb on waiters list for remote lookup
      - the remote node is both the dir node and the master node, so
        it optimizes the lookup into a request and sends a request
        reply back
      - the request reply is saved on the requestqueue to be processed
        after recovery
      - recovery runs dlm_recover_waiters_pre() which sets RESEND flag
        so the lookup will be resent after recovery
      - end of recovery: process_requestqueue takes saved request reply
        which removes the lkb off the waitesr list, _without_ clearing
        the RESEND flag
      - end of recovery: dlm_recover_waiters_post() doesn't do anything
        with the now completed lookup lkb (would usually clear RESEND)
      - later, the node unmounts, unlocks this lkb that still has RESEND
        flag set
      - the lkb is on the waiters list again, now for unlock, when recovery
        occurs, dlm_recover_waiters_pre() shows the lkb for unlock with RESEND
        set, doesn't do anything since the master still exists
      - end of recovery: dlm_recover_waiters_post() takes this lkb off
        the waiters list because it has the RESEND flag set, then reports
        an error because unlocks are never supposed to be handled in
        recover_waiters_post().
      - later, the unlock reply is received, doesn't find the lkb on
        the waiters list because recover_waiters_post() has wrongly
        removed it.
      - the unlock operation has been lost, and we're left with a
        stray granted lock
      - unmount spins waiting for the unlock to complete
      
      The visible evidence of this problem will be a node where gfs umount is
      spinning, the dlm waiters list will be empty, and the dlm locks list will
      show a granted lock.
      
      The fix is simply to clear the RESEND flag when taking an lkb off the
      waiters list.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b790c3b7
    • D
      [DLM] saved dlm message can be dropped · 8fd3a98f
      David Teigland 提交于
      dlm_receive_message() returns 0 instead of returning 'error'.  What would
      happen is that process_requestqueue would take a saved message off the
      requestqueue and call receive_message on it.  receive_message would then
      see that recovery had been aborted, set error to EINTR, and 'goto out',
      expecting that the error would be returned.  Instead, 0 was always
      returned, so process_requestqueue would think that the message had been
      processed and delete it instead of saving it to process next time.  This
      means the message (usually an unlock in my tests) would be lost.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8fd3a98f
    • P
      [DLM] Make sock_sem into a mutex · f1f1c1cc
      Patrick Caulfield 提交于
      Now that there can be multiple dlm_recv threads running we need to prevent two
      recvs running for the same connection - it's unlikely but it can happen and it
      causes message corruption.
      Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f1f1c1cc
    • S
      [GFS2] Fix typo in glock.c · d043e190
      Steven Whitehouse 提交于
      This is a one letter typo fix in glock.c, spotted by Rob Kenna.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d043e190
    • E
      [GFS2] use CURRENT_TIME_SEC instead of get_seconds in gfs2 · ddfe0627
      Eric Sandeen 提交于
      I was looking something else up and came across this...
      
      I don't honestly have a good reason to change it other than to make it
      like every other Linux filesystem in this regard.  ;-)  It doesn't
      functionally change anything, but makes some lines shorter. :)
      
      I'm also curious; why does gfs2 have 64-bits of on-disk timestamps, but
      not in timespec_t format, and only stores second resolutions?  Seems like
      you're halfway to sub-second resolutions already.
      
      I suppose if that gets implemented then all of the below should
      instead be CURRENT_TIME not CURRENT_TIME_SEC.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ddfe0627
    • S
      [GFS2] Compile fix for glock.c · 90101c31
      Steven Whitehouse 提交于
      This one liner got missed from the previous patch.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      90101c31
    • S
      [GFS2] Remove queue_empty() function · 12132933
      Steven Whitehouse 提交于
      This function is not longer required since we do not do recursive
      locking in the glock layer. As a result all its callers can be
      replaceed with list_empty() calls.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      12132933
    • P
      [DLM] fix lowcomms receiving · bd44e2b0
      Patrick Caulfield 提交于
      This patch fixes a bug whereby data on a newly accepted connection would be
      ignored if it arrived soon after the accept.
      Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      bd44e2b0
    • S
      [GFS2] Tidy up glops calls · b5d32bea
      Steven Whitehouse 提交于
      This patch doesn't make any changes to the ordering of the various
      operations related to glocking, but it does tidy up the calls to the
      glops.c functions to make the structure more obvious.
      
      The two functions: gfs2_glock_xmote_th() and gfs2_glock_drop_th() can be
      made static within glock.c since they are called by every set of glock
      operations. The xmote_th and drop_th glock operations are then made
      conditional upon those two routines existing and called from the
      previously mentioned functions in glock.c respectively.
      
      Also it can be seen that the go_sync operation isn't needed since it can
      easily be replaced by calls to xmote_bh and drop_bh respectively. This
      results in no longer (confusingly) calling back into routines in glock.c
      from glops.c and also reducing the glock operations by one member.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b5d32bea
    • P
      [DLM] lowcomms tidy · f2f5095f
      Patrick Caulfield 提交于
      This patch removes some redundant fields from the connection structure and adds
      some lockdep annotation to remove spurious warnings.
      Signed-Off-By: NPatrick Caulfield <pcaulfie@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f2f5095f
    • S
      [GFS2] Remove local exclusive glock mode · 1c0f4872
      Steven Whitehouse 提交于
      Here is a patch for GFS2 to remove the local exclusive flag. In
      the places it was used, mutex's are always held earlier in the
      call path, so it appears redundant in the LM_ST_SHARED case.
      
      Also, the GFS2 holders were setting local exclusive in any case where
      the requested lock was LM_ST_EXCLUSIVE. So the other places in the glock
      code where the flag was tested have been replaced with tests for the
      lock state being LM_ST_EXCLUSIVE in order to ensure the logic is the
      same as before (i.e. LM_ST_EXCLUSIVE is always locally exclusive as well
      as globally exclusive).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1c0f4872
    • S
      [GFS2] Remove unused go_callback operation · 6bd9c8c2
      Steven Whitehouse 提交于
      This is never used, so we might as well remove it.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6bd9c8c2
    • S
      [GFS2] Remove the "greedy" function from glock.[ch] · e5dab552
      Steven Whitehouse 提交于
      The "greedy" code was an attempt to retain glocks for a minimum length
      of time when they relate to mmap()ed files. The current implementation
      of this feature is not, however, ideal in that it required allocating
      memory in order to do this and its overly complicated.
      
      It also misses the mark by ignoring the other I/O operations which are
      just as likely to suffer from the same problem. So the plan is to remove
      this now and then add the functionality back as part of the glock state
      machine at a later date (and thus take into account all the possible
      users of this feature)
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e5dab552
    • S
      [GFS2] Shrink gfs2_inode memory by half · fee852e3
      Steven Whitehouse 提交于
      Here is something I spotted (while looking for something entirely
      different) the other day.
      
      Rather than using a completion in each and every struct gfs2_holder,
      this removes it in favour of hashed wait queues, thus saving a
      considerable amount of memory both on the stack (where a number of
      gfs2_holder structures are allocated) and in particular in the
      gfs2_inode which has 8 gfs2_holder structures embedded within it.
      
      As a result on x86_64 the gfs2_inode shrinks from 2488 bytes to
      1912 bytes, a saving of 576 bytes per inode (no thats not a typo!).
      In actual practice we get a much better result than that since
      now that a gfs2_inode is under the 2048 byte barrier, we get two
      per 4k slab page effectively halving the amount of memory required
      to store gfs2_inodes.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fee852e3
    • S
      [GFS2] Remove max_atomic_write tunable · 330005c2
      Steven Whitehouse 提交于
      This removes an unused sysfs tunable parameter.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      330005c2
    • S
      [GFS2] Clean up/speed up readdir · 3699e3a4
      Steven Whitehouse 提交于
      This removes the extra filldir callback which gfs2 was using to
      enclose an attempt at readahead for inodes during readdir. The
      code was too complicated and also hurts performance badly in the
      case that the getdents64/readdir call isn't being followed by
      stat() and it wasn't even getting it right all the time when it
      was.
      
      As a result, on my test box an "ls" of a directory containing 250000
      files fell from about 7mins (freshly mounted, so nothing cached) to
      between about 15 to 25 seconds. When the directory content was cached,
      the time taken fell from about 3mins to about 4 or 5 seconds.
      
      Interestingly in the cached case, running "ls -l" once reduced the time
      taken for subsequent runs of "ls" to about 6 secs even without this
      patch. Now it turns out that there was a special case of glocks being
      used for prefetching the metadata, but because of the timeouts for these
      locks (set to 10 secs) the metadata was being timed out before it was
      being used and this the prefetch code was constantly trying to prefetch
      the same data over and over.
      
      Calling "ls -l" meant that the inodes were brought into memory and once
      the inodes are cached, the glocks are not disposed of until the inodes
      are pushed out of the cache, thus extending the lifetime of the glocks,
      and thus bringing down the time for subsequent runs of "ls"
      considerably.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3699e3a4
    • S
      [GFS2] Add writepages for "data=writeback" mounts · a8d638e3
      Steven Whitehouse 提交于
      It occurred to me that although a gfs2 specific writepages for ordered
      writes and journaled data would be tricky, by hooking writepages only
      for "data=writeback" mounts we could take advantage of not needing
      buffer heads (we don't use them on the read side, nor have we for some
      time) and create much larger I/Os for the block layer.
      
      Using blktrace both before and after, its possible to see that for large
      I/Os, most of the requests generated through writepages are now 1024
      sectors after this patch is applied as opposed to 8 sectors before.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a8d638e3
    • D
      [DLM] fix master recovery · 222d3960
      David Teigland 提交于
      If master recovery happens on an rsb in one recovery sequence, then that
      sequence is aborted before lock recovery happens, then in the next
      sequence, we rely on the previous master recovery (which may now be
      invalid due to another node ignoring a lookup result) and go on do to the
      lock recovery where we get stuck due to an invalid master value.
      
       recovery cycle begins: master of rsb X has left
       nodes A and B send node C an rcom lookup for X to find the new master
       C gets lookup from B first, sets B as new master, and sends reply back to B
       C gets lookup from A next, and sends reply back to A saying B is master
       A gets lookup reply from C and sets B as the new master in the rsb
       recovery cycle on A, B and C is aborted to start a new recovery
       B gets lookup reply from C and ignores it since there's a new recovery
       recovery cycle begins: some other node has joined
       B doesn't think it's the master of X so it doesn't rebuild it in the directory
       C looks up the master of X, no one is master, so it becomes new master
       B looks up the master of X, finds it's C
       A believes that B is the master of X, so it sends its lock to B
       B sends an error back to A
       A resends
       this repeats forever, the incorrect master value on A is never corrected
      
      The fix is to do master recovery on an rsb that still has the NEW_MASTER
      flag set from an earlier recovery sequence, and therefore didn't complete
      lock recovery.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      222d3960