1. 19 8月, 2013 2 次提交
  2. 29 6月, 2013 4 次提交
    • J
      locks: protect most of the file_lock handling with i_lock · 1c8c601a
      Jeff Layton 提交于
      Having a global lock that protects all of this code is a clear
      scalability problem. Instead of doing that, move most of the code to be
      protected by the i_lock instead. The exceptions are the global lists
      that the ->fl_link sits on, and the ->fl_block list.
      
      ->fl_link is what connects these structures to the
      global lists, so we must ensure that we hold those locks when iterating
      over or updating these lists.
      
      Furthermore, sound deadlock detection requires that we hold the
      blocked_list state steady while checking for loops. We also must ensure
      that the search and update to the list are atomic.
      
      For the checking and insertion side of the blocked_list, push the
      acquisition of the global lock into __posix_lock_file and ensure that
      checking and update of the  blocked_list is done without dropping the
      lock in between.
      
      On the removal side, when waking up blocked lock waiters, take the
      global lock before walking the blocked list and dequeue the waiters from
      the global list prior to removal from the fl_block list.
      
      With this, deadlock detection should be race free while we minimize
      excessive file_lock_lock thrashing.
      
      Finally, in order to avoid a lock inversion problem when handling
      /proc/locks output we must ensure that manipulations of the fl_block
      list are also protected by the file_lock_lock.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1c8c601a
    • L
      Don't pass inode to ->d_hash() and ->d_compare() · da53be12
      Linus Torvalds 提交于
      Instances either don't look at it at all (the majority of cases) or
      only want it to find the superblock (which can be had as dentry->d_sb).
      A few cases that want more are actually safe with dentry->d_inode -
      the only precaution needed is the check that it hadn't been replaced with
      NULL by rmdir() or by overwriting rename(), which case should be simply
      treated as cache miss.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      da53be12
    • A
      [readdir] constify ->actor · ac6614b7
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ac6614b7
    • A
      [readdir] convert gfs2 · d81a8ef5
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d81a8ef5
  3. 28 6月, 2013 1 次提交
  4. 20 6月, 2013 2 次提交
  5. 19 6月, 2013 1 次提交
    • B
      GFS2: aggressively issue revokes in gfs2_log_flush · 5d054964
      Benjamin Marzinski 提交于
      This patch looks at all the outstanding blocks in all the transactions
      on the log, and moves the completed ones to the ail2 list.  Then it
      issues revokes for these blocks.  This will hopefully speed things up
      in situations where there is a lot of contention for glocks, especially
      if they are acquired serially.
      
      revoke_lo_before_commit will issue at most one log block's full of these
      preemptive revokes. The amount of reserved log space that
      gfs2_log_reserve() ignores has been incremented to allow for this extra
      block.
      
      This patch also consolidates the common revoke instructions into one
      function, gfs2_add_revoke().
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5d054964
  6. 14 6月, 2013 2 次提交
    • B
      GFS2: fix regression in dir_double_exhash · 512cbf02
      Bob Peterson 提交于
      Recent commit e8830d88 introduced a bug in function dir_double_exhash;
      it was failing to set h in the fall-back case. This patch corrects it.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      512cbf02
    • S
      GFS2: Add atomic_open support · 6d4ade98
      Steven Whitehouse 提交于
      I've restricted atomic_open to only operate on regular files, although
      I still don't understand why atomic_open should not be possible also for
      directories on GFS2. That can always be added in later though, if it
      makes sense.
      
      The ->atomic_open function can be passed negative dentries, which
      in most cases means either ENOENT (->lookup) or a call to d_instantiate
      (->create). In the GFS2 case though, we need to actually perform the
      look up, since we do not know whether there has been a new inode created
      on another node. The look up calls d_splice_alias which then tries to
      rehash the dentry - so the solution here is to simply check for that
      in d_splice_alias. The same issue is likely to affect any other cluster
      filesystem implementing ->atomic_open
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "J. Bruce Fields" <bfields fieldses org>
      Cc: Jeff Layton <jlayton@redhat.com>
      6d4ade98
  7. 11 6月, 2013 1 次提交
    • S
      GFS2: Only do one directory search on create · 5a00f3cc
      Steven Whitehouse 提交于
      Creation of a new inode requires a directory search in order to ensure
      that we are not trying to create an inode with the same name as an
      existing one. This was hidden away inside the create_ok() function.
      
      In the case that there was an existing inode, and a lookup can be
      substituted for a create (which is the case with regular files
      when the O_EXCL flag is not in use) then we were doing a second
      lookup in order to return the inode.
      
      This patch merges these two lookups into one. This can be done by
      passing a flag to gfs2_dir_search() to tell it to just return -EEXIST
      in the cases where we don't actually want to look up the inode.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5a00f3cc
  8. 06 6月, 2013 1 次提交
  9. 05 6月, 2013 4 次提交
  10. 04 6月, 2013 1 次提交
  11. 03 6月, 2013 4 次提交
  12. 24 5月, 2013 4 次提交
    • S
      GFS2: Fix typo in gfs2_log_end_write loop · e97e548b
      Steven Whitehouse 提交于
      There was a missing _all in this loop iterator
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e97e548b
    • R
      GFS2: fix DLM depends to fix build errors · 75f96ce6
      Randy Dunlap 提交于
      Fix build errors by correcting DLM dependencies in GFS2.
      Build errors happen when CONFIG_GFS2_FS_LOCKING_DLM=y and CONFIG_DLM=m:
      
      fs/built-in.o: In function `gfs2_lock':
      file.c:(.text+0xc7abd): undefined reference to `dlm_posix_get'
      file.c:(.text+0xc7ad0): undefined reference to `dlm_posix_unlock'
      file.c:(.text+0xc7ad9): undefined reference to `dlm_posix_lock'
      fs/built-in.o: In function `gdlm_unmount':
      lock_dlm.c:(.text+0xd6e5b): undefined reference to `dlm_release_lockspace'
      fs/built-in.o: In function `sync_unlock':
      lock_dlm.c:(.text+0xd6e9e): undefined reference to `dlm_unlock'
      fs/built-in.o: In function `sync_lock':
      lock_dlm.c:(.text+0xd6fb6): undefined reference to `dlm_lock'
      fs/built-in.o: In function `gdlm_put_lock':
      lock_dlm.c:(.text+0xd7238): undefined reference to `dlm_unlock'
      fs/built-in.o: In function `gdlm_mount':
      lock_dlm.c:(.text+0xd753e): undefined reference to `dlm_new_lockspace'
      lock_dlm.c:(.text+0xd79d3): undefined reference to `dlm_release_lockspace'
      fs/built-in.o: In function `gdlm_lock':
      lock_dlm.c:(.text+0xd8179): undefined reference to `dlm_lock'
      fs/built-in.o: In function `gdlm_cancel':
      lock_dlm.c:(.text+0xd6b22): undefined reference to `dlm_unlock'
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      75f96ce6
    • B
      GFS2: Use single-block reservations for directories · af21ca8e
      Bob Peterson 提交于
      This patch changes the multi-block allocation code, such that
      directory inodes only get a single block reserved in the bitmap.
      That way, the bitmaps are more tightly packed together, and there
      are fewer spans of free blocks for in-use block reservations.
      This means it takes less time to find a free span of blocks in the
      bitmap, which speeds things up. This increases the performance of
      some workloads by almost 2X. In Nate's mockup.py script (which does
      (1) create dir, (2) create dir in dir, (3) create file in that dir)
      the test executes in 23 steps rather than 43 steps, a 47%
      performance improvement.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      af21ca8e
    • B
      GFS2: two minor quota fixups · 37f71577
      Bob Peterson 提交于
      This patch fixes two regression problems that Abhi found in the
      GFS2 quota code.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      37f71577
  13. 22 5月, 2013 2 次提交
    • L
      gfs2: use ->invalidatepage() length argument · 5c0bb97c
      Lukas Czerner 提交于
      ->invalidatepage() aop now accepts range to invalidate so we can make
      use of it in gfs2_invalidatepage().
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: cluster-devel@redhat.com
      5c0bb97c
    • L
      mm: change invalidatepage prototype to accept length · d47992f8
      Lukas Czerner 提交于
      Currently there is no way to truncate partial page where the end
      truncate point is not at the end of the page. This is because it was not
      needed and the functionality was enough for file system truncate
      operation to work properly. However more file systems now support punch
      hole feature and it can benefit from mm supporting truncating page just
      up to the certain point.
      
      Specifically, with this functionality truncate_inode_pages_range() can
      be changed so it supports truncating partial page at the end of the
      range (currently it will BUG_ON() if 'end' is not at the end of the
      page).
      
      This commit changes the invalidatepage() address space operation
      prototype to accept range to be invalidated and update all the instances
      for it.
      
      We also change the block_invalidatepage() in the same way and actually
      make a use of the new length argument implementing range invalidation.
      
      Actual file system implementations will follow except the file systems
      where the changes are really simple and should not change the behaviour
      in any way .Implementation for truncate_page_range() which will be able
      to accept page unaligned ranges will follow as well.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      d47992f8
  14. 08 5月, 2013 1 次提交
  15. 29 4月, 2013 1 次提交
  16. 26 4月, 2013 1 次提交
    • B
      GFS2: Flush work queue before clearing glock hash tables · 222cb538
      Bob Peterson 提交于
      There was a timing window when a GFS2 file system was unmounted
      that caused GFS2 to call BUG() and panic the kernel. The call
      to BUG() is meant to ensure that the glock reference count,
      gl_ref, never gets down to zero and bounce back up again. What was
      happening during umount is that function gfs2_put_super was dequeing
      its glocks for well-known files. In particular, we saw it on the
      journal glock, sd_jinode_gh. The dequeue caused delayed work to be
      queued for the glock state machine, to transition the lock to an
      "unlocked" state. While the work was still queued, gfs2_put_super
      called gfs2_gl_hash_clear to clear out the glock hash tables.
      If the timing was just so, the glock work function would drop the
      reference count at the time when it was being checked for zero,
      and that caused BUG() to be called. This patch calls
      flush_workqueue before clearing the glock hash tables, thereby
      ensuring that the delayed work is executed before the hash tables
      are cleared, and therefore the reference count never goes to zero
      until the glock is cleared.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      222cb538
  17. 10 4月, 2013 2 次提交
    • S
      GFS2: Add origin indicator to glock demote tracing · 7bd8b2eb
      Steven Whitehouse 提交于
      This adds the origin indicator to the trace point for glock
      demotion, so that it is possible to see where demote requests
      have come from.
      
      Note that requests generated from the demote_rq sysfs interface
      will show as remote, since they are intended to replicate
      exactly the effect of a demote reuqest from a remote node. It
      is still possible to tell these apart by looking at the process
      which initiated the demote request.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7bd8b2eb
    • S
      GFS2: Add origin indicator to glock callbacks · 81ffbf65
      Steven Whitehouse 提交于
      This patch adds a bool indicating whether the demote
      request was originated locally or remotely. This is then
      used by the iopen ->go_callback() to make 100% sure that
      it will only respond to remote callbacks.
      
      Since ->evict_inode() uses GL_NOCACHE when it attempts to
      get an exclusive lock on the iopen lock, this may result
      in extra scheduling of the workqueue in case that the
      exclusive promotion request failed. This patch prevents
      that from happening.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      81ffbf65
  18. 08 4月, 2013 5 次提交
    • B
      GFS2: replace gfs2_ail structure with gfs2_trans · 16ca9412
      Benjamin Marzinski 提交于
      In order to allow transactions and log flushes to happen at the same
      time, gfs2 needs to move the transaction accounting and active items
      list code into the gfs2_trans structure.  As a first step toward this,
      this patch removes the gfs2_ail structure, and handles the active items
      list in the gfs_trans structure.  This keeps gfs2 from allocating an ail
      structure on log flushes, and gives us a struture that can later be used
      to store the transaction accounting outside of the gfs2 superblock
      structure.
      
      With this patch, at the end of a transaction, gfs2 will add the
      gfs2_trans structure to the superblock if there is not one already.
      This structure now has the active items fields that were previously in
      gfs2_ail.  This is not necessary in the case where the transaction was
      simply used to add revokes, since these are never written outside of the
      journal, and thus, don't need an active items list.
      
      Also, in order to make sure that the transaction structure is not
      removed while it's still in use by gfs2_trans_end, unlocking the
      sd_log_flush_lock has to happen slightly later in ending the
      transaction.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      16ca9412
    • B
      GFS2: Remove vestigial parameter ip from function rs_deltree · 20095218
      Bob Peterson 提交于
      The functions that delete block reservations from the rgrp block
      reservations rbtree no longer use the ip parameter. This patch
      eliminates the parameter.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      20095218
    • S
      GFS2: Use gfs2_dinode_out() in the inode create path · 79ba7480
      Steven Whitehouse 提交于
      Over the previous two patches relating to inode creation, the
      content of init_dinode() has been looking more and more like
      gfs2_dinode_out(). This is not an accident! This patch replaces
      the parts of init_dinode() which are duplicated in gfs2_dinode_out()
      with a call to that function.
      
      Mostly that is straightforward, but there is one issue which needed
      to be resolved relating to the link count. The link count has to be
      set to zero in a certain error handling code path, which lands up
      calling iput(). This is now done specifically in that code path
      allowing the link count to be set earlier and written into the
      on disk inode by gfs2_dinode_put() in the normal way.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      79ba7480
    • S
      GFS2: Remove gfs2_refresh_inode from inode creation path · 28fb3027
      Steven Whitehouse 提交于
      The original method for creating inodes used in GFS2 was to fill
      out a buffer, with all the information, and then to read that
      buffer into the in-core inode, using gfs2_refresh_inode()
      
      The problem with this approach is that all the inode's fields
      need to be calculated ahead of time, and were stored in various
      variables making the code rather complicated.
      
      The new approach is simply to allocate the in-core inode earlier
      and fill in as many fields as possible ahead of time. These can
      then be used to initilise the on disk representation. The
      code has been working towards the point where it is possible
      to remove gfs2_refresh_inode() because all the fields are
      correctly initialised ahead of time. We've now reached that
      milestone, and have reversed the order of setting up the in
      core and on disk inodes.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      28fb3027
    • S
      GFS2: Clean up inode creation path · fd4b4e04
      Steven Whitehouse 提交于
      This patch cleans up the inode creation code path in GFS2. After the
      Orlov allocator was merged, a number of potential improvements are
      now possible, and this is a first set of these.
      
      The quota handling is now updated so that it matches the point in
      the code where the allocation takes place. This means that the one
      exception in gfs2_alloc_blocks relating to quota is now no longer
      required, and we can use the generic code everywhere.
      
      In addition the call to figure out whether we need to allocate any
      extra blocks in order to add a directory entry is moved higher up
      gfs2_create_inode. This means that if it returns an error, we
      can deal with that at a stage where it is easier to handle that case.
      The returned status cannot change during the function since we hold
      an exclusive lock on the directory.
      
      Two calls to gfs2_rindex_update have been changed to one, again at
      the top of gfs2_create_inode to simplify error handling.
      
      The time stamps are also now initialised earlier in the creation
      process, this is gradually moving towards being able to remove the
      call to gfs2_refresh_inode in gfs2_inode_create once we have all the
      fields covered.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fd4b4e04
  19. 06 4月, 2013 1 次提交
    • B
      GFS2: Issue discards in 512b sectors · b2c87cae
      Bob Peterson 提交于
      This patch changes GFS2's discard issuing code so that it calls
      function sb_issue_discard rather than blkdev_issue_discard. The
      code was calling blkdev_issue_discard and specifying the correct
      sector offset and sector size, but blkdev_issue_discard expects
      these values to be in terms of 512 byte sectors, even if the native
      sector size for the device is different. Calling sb_issue_discard
      with the BLOCK size instead ensures the correct block-to-512b-sector
      translation. I verified that "minlen" is specified in blocks, so
      comparing it to a number of blocks is correct.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b2c87cae