1. 22 10月, 2013 5 次提交
  2. 18 10月, 2013 3 次提交
    • E
      xfs: don't break from growfs ag update loop on error · 59e5a0e8
      Eric Sandeen 提交于
      When xfs_growfs_data_private() is updating backup superblocks,
      it bails out on the first error encountered, whether reading or
      writing:
      
      * If we get an error writing out the alternate superblocks,
      * just issue a warning and continue.  The real work is
      * already done and committed.
      
      This can cause a problem later during repair, because repair
      looks at all superblocks, and picks the most prevalent one
      as correct.  If we bail out early in the backup superblock
      loop, we can end up with more "bad" matching superblocks than
      good, and a post-growfs repair may revert the filesystem to
      the old geometry.
      
      With the combination of superblock verifiers and old bugs,
      we're more likely to encounter read errors due to verification.
      
      And perhaps even worse, we don't even properly write any of the
      newly-added superblocks in the new AGs.
      
      Even with this change, growfs will still say:
      
        xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Structure needs cleaning
        data blocks changed from 319815680 to 335216640
      
      which might be confusing to the user, but it at least communicates
      that something has gone wrong, and dmesg will probably highlight
      the need for an xfs_repair.
      
      And this is still best-effort; if verifiers fail on more than
      half the backup supers, they may still "win" - but that's probably
      best left to repair to more gracefully handle by doing its own
      strict verification as part of the backup super "voting."
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Acked-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: Mark Tinguely <tinguely@sgi.com> 
      Signed-off-by: NBen Myers <bpm@sgi.com>
      59e5a0e8
    • E
      xfs: don't emit corruption noise on fs probes · 31625f28
      Eric Sandeen 提交于
      If we get EWRONGFS due to probing of non-xfs filesystems,
      there's no need to issue the scary corruption error and backtrace.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      31625f28
    • E
      xfs: remove newlines from strings passed to __xfs_printk · 08e96e1a
      Eric Sandeen 提交于
      __xfs_printk adds its own "\n".  Having it in the original string
      leads to unintentional blank lines from these messages.
      
      Most format strings have no newline, but a few do, leading to
      i.e.:
      
      [ 7347.119911] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05
      [ 7347.119911] 
      [ 7347.119919] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05
      [ 7347.119919] 
      
      Fix them all.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      08e96e1a
  3. 17 10月, 2013 1 次提交
    • D
      xfs: prevent deadlock trying to cover an active log · 2c6e24ce
      Dave Chinner 提交于
      Recent analysis of a deadlocked XFS filesystem from a kernel
      crash dump indicated that the filesystem was stuck waiting for log
      space. The short story of the hang on the RHEL6 kernel is this:
      
      	- the tail of the log is pinned by an inode
      	- the inode has been pushed by the xfsaild
      	- the inode has been flushed to it's backing buffer and is
      	  currently flush locked and hence waiting for backing
      	  buffer IO to complete and remove it from the AIL
      	- the backing buffer is marked for write - it is on the
      	  delayed write queue
      	- the inode buffer has been modified directly and logged
      	  recently due to unlinked inode list modification
      	- the backing buffer is pinned in memory as it is in the
      	  active CIL context.
      	- the xfsbufd won't start buffer writeback because it is
      	  pinned
      	- xfssyncd won't force the log because it sees the log as
      	  needing to be covered and hence wants to issue a dummy
      	  transaction to move the log covering state machine along.
      
      Hence there is no trigger to force the CIL to the log and hence
      unpin the inode buffer and therefore complete the inode IO, remove
      it from the AIL and hence move the tail of the log along, allowing
      transactions to start again.
      
      Mainline kernels also have the same deadlock, though the signature
      is slightly different - the inode buffer never reaches the delayed
      write lists because xfs_buf_item_push() sees that it is pinned and
      hence never adds it to the delayed write list that the xfsaild
      flushes.
      
      There are two possible solutions here. The first is to simply force
      the log before trying to cover the log and so ensure that the CIL is
      emptied before we try to reserve space for the dummy transaction in
      the xfs_log_worker(). While this might work most of the time, it is
      still racy and is no guarantee that we don't get stuck in
      xfs_trans_reserve waiting for log space to come free. Hence it's not
      the best way to solve the problem.
      
      The second solution is to modify xfs_log_need_covered() to be aware
      of the CIL. We only should be attempting to cover the log if there
      is no current activity in the log - covering the log is the process
      of ensuring that the head and tail in the log on disk are identical
      (i.e. the log is clean and at idle). Hence, by definition, if there
      are items in the CIL then the log is not at idle and so we don't
      need to attempt to cover it.
      
      When we don't need to cover the log because it is active or idle, we
      issue a log force from xfs_log_worker() - if the log is idle, then
      this does nothing.  However, if the log is active due to there being
      items in the CIL, it will force the items in the CIL to the log and
      unpin them.
      
      In the case of the above deadlock scenario, instead of
      xfs_log_worker() getting stuck in xfs_trans_reserve() attempting to
      cover the log, it will instead force the log, thereby unpinning the
      inode buffer, allowing IO to be issued and complete and hence
      removing the inode that was pinning the tail of the log from the
      AIL. At that point, everything will start moving along again. i.e.
      the xfs_log_worker turns back into a watchdog that can alleviate
      deadlocks based around pinned items that prevent the tail of the log
      from being moved...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      2c6e24ce
  4. 09 10月, 2013 5 次提交
  5. 02 10月, 2013 3 次提交
    • B
      xfs: remove usage of is_bad_inode · d948709b
      Ben Myers 提交于
      XFS never calls mark_inode_bad or iget_failed, so it will never see a
      bad inode.  Remove all checks for is_bad_inode because they are
      unnecessary.
      Signed-off-by: NBen Myers <bpm@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      d948709b
    • J
      xfs: fix the wrong new_size/rnew_size at xfs_iext_realloc_direct() · 17ec81c1
      Jie Liu 提交于
      At xfs_iext_realloc_direct(), the new_size is changed by adding
      if_bytes if originally the extent records are stored at the inline
      extent buffer, and we have to switch from it to a direct extent
      list for those new allocated extents, this is wrong. e.g,
      
      Create a file with three extents which was showing as following,
      
      xfs_io -f -c "truncate 100m" /xfs/testme
      
      for i in $(seq 0 5 10); do
      	offset=$(($i * $((1 << 20))))
      	xfs_io -c "pwrite $offset 1m" /xfs/testme
      done
      
      Inline
      ------
      irec:	if_bytes	bytes_diff	new_size
      1st	0		16		16
      2nd	16		16		32
      
      Switching
      ---------						rnew_size
      3rd	32		16		48 + 32 = 80	roundup=128
      
      In this case, the desired value of new_size should be 48, and then
      it will be roundup to 64 and be assigned to rnew_size.
      
      However, this issue has been covered by resetting the if_bytes to
      the new_size which is calculated at the begnning of xfs_iext_add()
      before leaving out this function, and in turn make the rnew_size
      correctly again. Hence, this can not be detected via xfstestes.
      
      This patch fix above problem and revise the new_size comments at
      xfs_iext_realloc_direct() to make it more readable.  Also, fix the
      comments while switching from the inline extent buffer to a direct
      extent list to reflect this change.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      17ec81c1
    • J
      xfs: get rid of count from xfs_iomap_write_allocate() · 0799a3e8
      Jie Liu 提交于
      Get rid of function variable count from xfs_iomap_write_allocate() as
      it is unused.
      
      Additionally, checkpatch warn me of the following for this change:
      WARNING: extern prototypes should be avoided in .h files
      +extern int xfs_iomap_write_allocate(struct xfs_inode *, xfs_off_t,
      
      So this patch also remove all extern function prototypes at xfs_iomap.h
      to suppress it to make this code style in consistent manner in this file.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      0799a3e8
  6. 01 10月, 2013 4 次提交
    • T
      xfs: Use kmem_free() instead of free() · aaaae980
      Thierry Reding 提交于
      This fixes a build failure caused by calling the free() function which
      does not exist in the Linux kernel.
      Signed-off-by: NThierry Reding <treding@nvidia.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      aaaae980
    • T
      xfs: fix memory leak in xlog_recover_add_to_trans · 519ccb81
      tinguely@sgi.com 提交于
      Free the memory in error path of xlog_recover_add_to_trans().
      Normally this memory is freed in recovery pass2, but is leaked
      in the error path.
      Signed-off-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      519ccb81
    • D
      xfs: dirent dtype presence is dependent on directory magic numbers · 367993e7
      Dave Chinner 提交于
      The determination of whether a directory entry contains a dtype
      field originally was dependent on the filesystem having CRCs
      enabled. This meant that the format for dtype beign enabled could be
      determined by checking the directory block magic number rather than
      doing a feature bit check. This was useful in that it meant that we
      didn't need to pass a struct xfs_mount around to functions that
      were already supplied with a directory block header.
      
      Unfortunately, the introduction of dtype fields into the v4
      structure via a feature bit meant this "use the directory block
      magic number" method of discriminating the dirent entry sizes is
      broken. Hence we need to convert the places that use magic number
      checks to use feature bit checks so that they work correctly and not
      by chance.
      
      The current code works on v4 filesystems only because the dirent
      size roundup covers the extra byte needed by the dtype field in the
      places where this problem occurs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      367993e7
    • D
      xfs: lockdep needs to know about 3 dquot-deep nesting · f112a049
      Dave Chinner 提交于
      Michael Semon reported that xfs/299 generated this lockdep warning:
      
      =============================================
      [ INFO: possible recursive locking detected ]
      3.12.0-rc2+ #2 Not tainted
      ---------------------------------------------
      touch/21072 is trying to acquire lock:
       (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
      
      but task is already holding lock:
       (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&xfs_dquot_other_class);
        lock(&xfs_dquot_other_class);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      7 locks held by touch/21072:
       #0:  (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e
       #1:  (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40
       #2:  (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35
       #3:  (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1
       #4:  (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f
       #5:  (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
       #6:  (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
      
      The lockdep annotation for dquot lock nesting only understands
      locking for user and "other" dquots, not user, group and quota
      dquots. Fix the annotations to match the locking heirarchy we now
      have.
      Reported-by: NMichael L. Semon <mlsemon35@gmail.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      f112a049
  7. 26 9月, 2013 1 次提交
    • M
      xfs: fix node forward in xfs_node_toosmall · 997def25
      Mark Tinguely 提交于
      Commit f5ea1100 cleans up the disk to host conversions for
      node directory entries, but because a variable is reused in
      xfs_node_toosmall() the next node is not correctly found.
      If the original node is small enough (<= 3/8 of the node size),
      this change may incorrectly cause a node collapse when it should
      not. That will cause an assert in xfstest generic/319:
      
         Assertion failed: first <= last && last < BBTOB(bp->b_length),
         file: /root/newest/xfs/fs/xfs/xfs_trans_buf.c, line: 569
      
      Keep the original node header to get the correct forward node.
      
      (When a node is considered for a merge with a sibling, it overwrites the
       sibling pointers of the original incore nodehdr with the sibling's
       pointers.  This leads to loop considering the original node as a merge
       candidate with itself in the second pass, and so it incorrectly
       determines a merge should occur.)
      Signed-off-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      [v3: added Dave Chinner's (slightly modified) suggestion to the commit header,
      	cleaned up whitespace.  -bpm]
      997def25
  8. 25 9月, 2013 4 次提交
    • D
      xfs: log recovery lsn ordering needs uuid check · 566055d3
      Dave Chinner 提交于
      After a fair number of xfstests runs, xfs/182 started to fail
      regularly with a corrupted directory - a directory read verifier was
      failing after recovery because it found a block with a XARM magic
      number (remote attribute block) rather than a directory data block.
      
      The first time I saw this repeated failure I did /something/ and the
      problem went away, so I was never able to find the underlying
      problem. Test xfs/182 failed again today, and I found the root
      cause before I did /something else/ that made it go away.
      
      Tracing indicated that the block in question was being correctly
      logged, the log was being flushed by sync, but the buffer was not
      being written back before the shutdown occurred. Tracing also
      indicated that log recovery was also reading the block, but then
      never writing it before log recovery invalidated the cache,
      indicating that it was not modified by log recovery.
      
      More detailed analysis of the corpse indicated that the filesystem
      had a uuid of "a4131074-1872-4cac-9323-2229adbcb886" but the XARM
      block had a uuid of "8f32f043-c3c9-e7f8-f947-4e7f989c05d3", which
      indicated it was a block from an older filesystem. The reason that
      log recovery didn't replay it was that the LSN in the XARM block was
      larger than the LSN of the transaction being replayed, and so the
      block was not overwritten by log recovery.
      
      Hence, log recovery cant blindly trust the magic number and LSN in
      the block - it must verify that it belongs to the filesystem being
      recovered before using the LSN. i.e. if the UUIDs don't match, we
      need to unconditionally recovery the change held in the log.
      
      This patch was first tested on a block device that was repeatedly
      causing xfs/182 to fail with the same failure on the same block with
      the same directory read corruption signature (i.e. XARM block). It
      did not fail, and hasn't failed since.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      566055d3
    • D
      xfs: fix XFS_IOC_FREE_EOFBLOCKS definition · b771af2f
      Dave Chinner 提交于
      It uses a kernel internal structure in it's definition rather than
      the user visible structure that is passed to the ioctl.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b771af2f
    • D
      xfs: asserting lock not held during freeing not valid · b313a5f1
      Dave Chinner 提交于
      When we free an inode, we do so via RCU. As an RCU lookup can occur
      at any time before we free an inode, and that lookup takes the inode
      flags lock, we cannot safely assert that the flags lock is not held
      just before marking it dead and running call_rcu() to free the
      inode.
      
      We check on allocation of a new inode structre that the lock is not
      held, so we still have protection against locks being leaked and
      hence not correctly initialised when allocated out of the slab.
      Hence just remove the assert...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b313a5f1
    • D
      xfs: lock the AIL before removing the buffer item · 48852358
      Dave Chinner 提交于
      Regression introduced by commit 46f9d2eb ("xfs: aborted buf items can
      be in the AIL") which fails to lock the AIL before removing the
      item. Spinlock debugging throws a warning about this.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      48852358
  9. 13 9月, 2013 1 次提交
  10. 12 9月, 2013 2 次提交
  11. 11 9月, 2013 11 次提交
    • G
      super: fix for destroy lrus · f5e1dd34
      Glauber Costa 提交于
      This patch adds the missing call to list_lru_destroy (spotted by Li Zhong)
      and moves the deletion to after the shrinker is unregistered, as correctly
      spotted by Dave
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Dave Chinner <dchinner@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f5e1dd34
    • G
      list_lru: dynamically adjust node arrays · 5ca302c8
      Glauber Costa 提交于
      We currently use a compile-time constant to size the node array for the
      list_lru structure.  Due to this, we don't need to allocate any memory at
      initialization time.  But as a consequence, the structures that contain
      embedded list_lru lists can become way too big (the superblock for
      instance contains two of them).
      
      This patch aims at ameliorating this situation by dynamically allocating
      the node arrays with the firmware provided nr_node_ids.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5ca302c8
    • D
      xfs: fix dquot isolation hang · 35163417
      Dave Chinner 提交于
      The new LRU list isolation code in xfs_qm_dquot_isolate() isn't
      completely up to date.  Firstly, it needs conversion to return enum
      lru_status values, not raw numbers. Secondly - most importantly - it
      fails to unlock the dquot and relock the LRU in the LRU_RETRY path.
      This leads to deadlocks in xfstests generic/232. Fix them.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      35163417
    • A
      xfs-convert-dquot-cache-lru-to-list_lru-fix · 2f5b56f8
      Andrew Morton 提交于
      fix warnings
      
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Glauber Costa <glommer@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2f5b56f8
    • D
      xfs: convert dquot cache lru to list_lru · cd56a39a
      Dave Chinner 提交于
      Convert the XFS dquot lru to use the list_lru construct and convert the
      shrinker to being node aware.
      
      [glommer@openvz.org: edited for conflicts + warning fixes]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cd56a39a
    • D
      xfs: rework buffer dispose list tracking · a4082357
      Dave Chinner 提交于
      In converting the buffer lru lists to use the generic code, the locking
      for marking the buffers as on the dispose list was lost.  This results in
      confusion in LRU buffer tracking and acocunting, resulting in reference
      counts being mucked up and filesystem beig unmountable.
      
      To fix this, introduce an internal buffer spinlock to protect the state
      field that holds the dispose list information.  Because there is now
      locking needed around xfs_buf_lru_add/del, and they are used in exactly
      one place each two lines apart, get rid of the wrappers and code the logic
      directly in place.
      
      Further, the LRU emptying code used on unmount is less than optimal.
      Convert it to use a dispose list as per a normal shrinker walk, and repeat
      the walk that fills the dispose list until the LRU is empty.  Thi avoids
      needing to drop and regain the LRU lock for every item being freed, and
      allows the same logic as the shrinker isolate call to be used.  Simpler,
      easier to understand.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a4082357
    • A
      xfs-convert-buftarg-lru-to-generic-code-fix · addbda40
      Andrew Morton 提交于
      fix warnings
      
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Glauber Costa <glommer@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      addbda40
    • D
      xfs: convert buftarg LRU to generic code · e80dfa19
      Dave Chinner 提交于
      Convert the buftarg LRU to use the new generic LRU list and take advantage
      of the functionality it supplies to make the buffer cache shrinker node
      aware.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e80dfa19
    • D
      fs: convert inode and dentry shrinking to be node aware · 9b17c623
      Dave Chinner 提交于
      Now that the shrinker is passing a node in the scan control structure, we
      can pass this to the the generic LRU list code to isolate reclaim to the
      lists on matching nodes.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9b17c623
    • D
      shrinker: convert superblock shrinkers to new API · 0a234c6d
      Dave Chinner 提交于
      Convert superblock shrinker to use the new count/scan API, and propagate
      the API changes through to the filesystem callouts.  The filesystem
      callouts already use a count/scan API, so it's just changing counters to
      longs to match the VM API.
      
      This requires the dentry and inode shrinker callouts to be converted to
      the count/scan API.  This is mainly a mechanical change.
      
      [glommer@openvz.org: use mult_frac for fractional proportions, build fixes]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0a234c6d
    • G
      super: fix calculation of shrinkable objects for small numbers · 55f841ce
      Glauber Costa 提交于
      The sysctl knob sysctl_vfs_cache_pressure is used to determine which
      percentage of the shrinkable objects in our cache we should actively try
      to shrink.
      
      It works great in situations in which we have many objects (at least more
      than 100), because the aproximation errors will be negligible.  But if
      this is not the case, specially when total_objects < 100, we may end up
      concluding that we have no objects at all (total / 100 = 0, if total <
      100).
      
      This is certainly not the biggest killer in the world, but may matter in
      very low kernel memory situations.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      55f841ce