1. 23 2月, 2011 1 次提交
  2. 12 1月, 2011 1 次提交
    • D
      xfs: ensure log covering transactions are synchronous · c58efdb4
      Dave Chinner 提交于
      To ensure the log is covered and the filesystem idles correctly, we
      need to ensure that dummy transactions hit the disk and do not stay
      pinned in memory.  If the superblock is pinned in memory, it can't
      be flushed so the log covering cannot make progress. The result is
      dependent on timing - more oftent han not we continue to issues a
      log covering transaction every 36s rather than idling after ~90s.
      
      Fix this by making the log covering transaction synchronous. To
      avoid additional log force from xfssyncd, make the log covering
      transaction take the place of the existing log force in the xfssyncd
      background sync process.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      c58efdb4
  3. 17 12月, 2010 1 次提交
    • D
      xfs: reduce the number of AIL push wakeups · e677d0f9
      Dave Chinner 提交于
      The xfaild often tries to rest to wait for congestion to pass of for
      IO to complete, but is regularly woken in tail-pushing situations.
      In severe cases, the xfsaild is getting woken tens of thousands of
      times a second. Reduce the number needless wakeups by only waking
      the xfsaild if the new target is larger than the old one. Further
      make short sleeps uninterruptible as they occur when the xfsaild has
      decided it needs to back off to allow some IO to complete and being
      woken early is counter-productive.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      e677d0f9
  4. 23 12月, 2010 1 次提交
    • D
      xfs: provide a inode iolock lockdep class · dcfcf205
      Dave Chinner 提交于
      The XFS iolock needs to be re-initialised to a new lock class before
      it enters reclaim to prevent lockdep false positives. Unfortunately,
      this is not sufficient protection as inodes in the XFS_IRECLAIMABLE
      state can be recycled and not re-initialised before being reused.
      
      We need to re-initialise the lock state when transfering out of
      XFS_IRECLAIMABLE state to XFS_INEW, but we need to keep the same
      class as if the inode was just allocated. Hence we need a specific
      lockdep class variable for the iolock so that both initialisations
      use the same class.
      
      While there, add a specific class for inodes in the reclaim state so
      that it is easy to tell from lockdep reports what state the inode
      was in that generated the report.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      dcfcf205
  5. 13 11月, 2010 2 次提交
    • T
      block: clean up blkdev_get() wrappers and their users · d4d77629
      Tejun Heo 提交于
      After recent blkdev_get() modifications, open_by_devnum() and
      open_bdev_exclusive() are simple wrappers around blkdev_get().
      Replace them with blkdev_get_by_dev() and blkdev_get_by_path().
      
      blkdev_get_by_dev() is identical to open_by_devnum().
      blkdev_get_by_path() is slightly different in that it doesn't
      automatically add %FMODE_EXCL to @mode.
      
      All users are converted.  Most conversions are mechanical and don't
      introduce any behavior difference.  There are several exceptions.
      
      * btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no
        reason to OR it explicitly on blkdev_put().
      
      * gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in
        sb->s_mode.
      
      * With the above changes, sb->s_mode now always should contain
        FMODE_EXCL.  WARN_ON_ONCE() added to kill_block_super() to detect
        errors.
      
      The new blkdev_get_*() functions are with proper docbook comments.
      While at it, add function description to blkdev_get() too.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Joern Engel <joern@lazybastard.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: reiserfs-devel@vger.kernel.org
      Cc: xfs-masters@oss.sgi.com
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      d4d77629
    • T
      block: make blkdev_get/put() handle exclusive access · e525fd89
      Tejun Heo 提交于
      Over time, block layer has accumulated a set of APIs dealing with bdev
      open, close, claim and release.
      
      * blkdev_get/put() are the primary open and close functions.
      
      * bd_claim/release() deal with exclusive open.
      
      * open/close_bdev_exclusive() are combination of open and claim and
        the other way around, respectively.
      
      * bd_link/unlink_disk_holder() to create and remove holder/slave
        symlinks.
      
      * open_by_devnum() wraps bdget() + blkdev_get().
      
      The interface is a bit confusing and the decoupling of open and claim
      makes it impossible to properly guarantee exclusive access as
      in-kernel open + claim sequence can disturb the existing exclusive
      open even before the block layer knows the current open if for another
      exclusive access.  Reorganize the interface such that,
      
      * blkdev_get() is extended to include exclusive access management.
        @holder argument is added and, if is @FMODE_EXCL specified, it will
        gain exclusive access atomically w.r.t. other exclusive accesses.
      
      * blkdev_put() is similarly extended.  It now takes @mode argument and
        if @FMODE_EXCL is set, it releases an exclusive access.  Also, when
        the last exclusive claim is released, the holder/slave symlinks are
        removed automatically.
      
      * bd_claim/release() and close_bdev_exclusive() are no longer
        necessary and either made static or removed.
      
      * bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
        is no longer necessary and removed.
      
      * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
        and blkdev_get().  It also has an unexpected extra bdev_read_only()
        test which probably should be moved into blkdev_get().
      
      * open_by_devnum() is modified to take @holder argument and pass it to
        blkdev_get().
      
      Most of bdev open/close operations are unified into blkdev_get/put()
      and most exclusive accesses are tested atomically at the open time (as
      it should).  This cleans up code and removes some, both valid and
      invalid, but unnecessary all the same, corner cases.
      
      open_bdev_exclusive() and open_by_devnum() can use further cleanup -
      rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
      special features.  Well, let's leave them for another day.
      
      Most conversions are straight-forward.  drbd conversion is a bit more
      involved as there was some reordering, but the logic should stay the
      same.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Brown <neilb@suse.de>
      Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Cc: Peter Osterlund <petero2@telia.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: dm-devel@redhat.com
      Cc: drbd-dev@lists.linbit.com
      Cc: Leo Chen <leochen@broadcom.com>
      Cc: Scott Branden <sbranden@broadcom.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Joern Engel <joern@logfs.org>
      Cc: reiserfs-devel@vger.kernel.org
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      e525fd89
  6. 11 11月, 2010 1 次提交
  7. 02 11月, 2010 1 次提交
  8. 29 10月, 2010 1 次提交
  9. 26 10月, 2010 1 次提交
  10. 19 10月, 2010 5 次提交
  11. 17 9月, 2010 1 次提交
    • C
      block: remove BLKDEV_IFL_WAIT · dd3932ed
      Christoph Hellwig 提交于
      All the blkdev_issue_* helpers can only sanely be used for synchronous
      caller.  To issue cache flushes or barriers asynchronously the caller needs
      to set up a bio by itself with a completion callback to move the asynchronous
      state machine ahead.  So drop the BLKDEV_IFL_WAIT flag that is always
      specified when calling blkdev_issue_* and also remove the now unused flags
      argument to blkdev_issue_flush and blkdev_issue_zeroout.  For
      blkdev_issue_discard we need to keep it for the secure discard flag, which
      gains a more descriptive name and loses the bitops vs flag confusion.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      dd3932ed
  12. 24 8月, 2010 2 次提交
    • D
      xfs: dummy transactions should not dirty VFS state · 1a387d3b
      Dave Chinner 提交于
      When we  need to cover the log, we issue dummy transactions to ensure
      the current log tail is on disk. Unfortunately we currently use the
      root inode in the dummy transaction, and the act of committing the
      transaction dirties the inode at the VFS level.
      
      As a result, the VFS writeback of the dirty inode will prevent the
      filesystem from idling long enough for the log covering state
      machine to complete. The state machine gets stuck in a loop issuing
      new dummy transactions to cover the log and never makes progress.
      
      To avoid this problem, the dummy transactions should not cause
      externally visible state changes. To ensure this occurs, make sure
      that dummy transactions log an unchanging field in the superblock as
      it's state is never propagated outside the filesystem. This allows
      the log covering state machine to complete successfully and the
      filesystem now correctly enters a fully idle state about 90s after
      the last modification was made.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      1a387d3b
    • S
      xfs: ensure f_ffree returned by statfs() is non-negative · 2fe33661
      Stuart Brodsky 提交于
      Because of delayed updates to sb_icount field in the super block, it
      is possible to allocate over maxicount number of inodes.  This
      causes the arithmetic to calculate a negative number of free inodes
      in user commands like df or stat -f.
      
      Since maxicount is a somewhat arbitrary number, a slight over
      allocation is not critical but user commands should be displayed as
      0 or greater and never go negative.  To do this the value in the
      stats buffer f_ffree is capped to never go negative.
      
      [ Modified to use max_t as per Christoph's comment. ]
      Signed-off-by: NStu Brodsky <sbrodsky@sgi.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      2fe33661
  13. 10 8月, 2010 1 次提交
  14. 27 7月, 2010 10 次提交
    • C
      xfs: remove obsolete osyncisosync mount option · a64afb05
      Christoph Hellwig 提交于
      Since Linux 2.6.33 the kernel has support for real O_SYNC, which made
      the osyncisosync option a no-op.  Warn the users about this and remove
      the mount flag for it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      a64afb05
    • D
      xfs: move inode shrinker unregister even earlier · a4190f90
      Dave Chinner 提交于
      I missed Dave Chinner's second revision of this change, and pushed
      his first version out to the repository instead.
      
      	commit a476c59ebb279d738718edc0e3fb76aab3687114
      	Author: Dave Chinner <dchinner@redhat.com>
      
      This commit compensates for that by moving a block of code up a bit
      further, with a result that matches the the effect of Dave's second
      version.
      
      Dave's first version was:
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Dave's second version was:
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      a4190f90
    • D
      xfs: unregister inode shrinker before freeing filesystem structures · 2727ccc9
      Dave Chinner 提交于
      Currently we don't remove the XFS mount from the shrinker list until
      late in the unmount path. By this time, we have already torn down
      the internals of the filesystem (e.g. the per-ag structures), and
      hence if the shrinker is executed between the teardown and the
      unregistering, the shrinker will get NULL per-ag structure pointers
      and panic trying to dereference them.
      
      Fix this by removing the xfs mount from the shrinker list before
      tearing down it's internal structures.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      2727ccc9
    • C
      xfs: split xfs_itrace_entry · cca28fb8
      Christoph Hellwig 提交于
      Replace the xfs_itrace_entry catchall with specific trace points.  For
      most simple callers we now use the simple inode class, which used to
      be the iget class, but add more details tracing for namespace events,
      which now includes the name of the directory entries manipulated.
      
      Remove the xfs_inactive trace point, which is a duplicate of the clear_inode
      one, and the xfs_change_file_space trace point, which is immediately
      followed by the more specific alloc/free space trace points.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      cca28fb8
    • C
      xfs: remove explicit xfs_sync_data/xfs_sync_attr calls on umount · 64c86149
      Christoph Hellwig 提交于
      On the final put of a superblock the VFS already calls sync_filesystem
      for us to write out all data and wait for it.  No need to start another
      asynchronous writeback inside ->put_super.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      64c86149
    • C
      xfs: avoid synchronous transaction in xfs_fs_write_inode · 7a36c8a9
      Christoph Hellwig 提交于
      We already rely on the fact that the sync code will cause a synchronous
      log force later on (currently via xfs_fs_sync_fs -> xfs_quiesce_data ->
      xfs_sync_data), so no need to do this here.  This allows us to avoid
      a lot of synchronous log forces during sync, which pays of especially
      with delayed logging enabled.   Some compilebench numbers that show
      this:
      
      xfs (delayed logging, 256k logbufs)
      ===================================
      
      intial create		  25.94 MB/s	  25.75 MB/s	  25.64 MB/s
      create			   8.54 MB/s	   9.12 MB/s	   9.15 MB/s
      patch			   2.47 MB/s	   2.47 MB/s	   3.17 MB/s
      compile			  29.65 MB/s	  30.51 MB/s	  27.33 MB/s
      clean			  90.92 MB/s	  98.83 MB/s	 128.87 MB/s
      read tree		  11.90 MB/s	  11.84 MB/s	   8.56 MB/s
      read compiled		  28.75 MB/s	  29.96 MB/s	  24.25 MB/s
      delete tree		8.39 seconds	8.12 seconds	8.46 seconds
      delete compiled		8.35 seconds	8.44 seconds	5.11 seconds
      stat tree		6.03 seconds	5.59 seconds	5.19 seconds
      stat compiled tree	9.00 seconds	9.52 seconds	8.49 seconds
      
      xfs + write_inode log_force removal
      ===================================
      intial create		  25.87 MB/s	  25.76 MB/s	  25.87 MB/s
      create			  15.18 MB/s	  14.80 MB/s	  14.94 MB/s
      patch			   3.13 MB/s	   3.14 MB/s	   3.11 MB/s
      compile			  36.74 MB/s	  37.17 MB/s	  36.84 MB/s
      clean			 226.02 MB/s	 222.58 MB/s	 217.94 MB/s
      read tree		  15.14 MB/s	  15.02 MB/s	  15.14 MB/s
      read compiled tree	  29.30 MB/s	  29.31 MB/s	  29.32 MB/s
      delete tree		6.22 seconds	6.14 seconds	6.15 seconds
      delete compiled tree	5.75 seconds	5.92 seconds	5.81 seconds
      stat tree		4.60 seconds	4.51 seconds	4.56 seconds
      stat compiled tree	4.07 seconds	3.87 seconds	3.96 seconds
      
      In addition to that also remove the delwri inode flush that is unessecary
      now that bulkstat is always coherent.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      7a36c8a9
    • C
      xfs: simplify inode to transaction joining · 898621d5
      Christoph Hellwig 提交于
      Currently we need to either call IHOLD or xfs_trans_ihold on an inode when
      joining it to a transaction via xfs_trans_ijoin.
      
      This patches instead makes xfs_trans_ijoin usable on it's own by doing
      an implicity xfs_trans_ihold, which also allows us to drop the third
      argument.  For the case where we want to hold a reference on the inode
      a xfs_trans_ijoin_ref wrapper is added which does the IHOLD and marks
      the inode for needing an xfs_iput.  In addition to the cleaner interface
      to the caller this also simplifies the implementation.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      898621d5
    • C
      xfs: simplify log item descriptor tracking · e98c414f
      Christoph Hellwig 提交于
      Currently we track log item descriptor belonging to a transaction using a
      complex opencoded chunk allocator.  This code has been there since day one
      and seems to work around the lack of an efficient slab allocator.
      
      This patch replaces it with dynamically allocated log item descriptors
      from a dedicated slab pool, linked to the transaction by a linked list.
      
      This allows to greatly simplify the log item descriptor tracking to the
      point where it's just a couple hundred lines in xfs_trans.c instead of
      a separate file.  The external API has also been simplified while we're
      at it - the xfs_trans_add_item and xfs_trans_del_item functions to add/
      delete items from a transaction have been simplified to the bare minium,
      and the xfs_trans_find_item function is replaced with a direct dereference
      of the li_desc field.  All debug code walking the list of log items in
      a transaction is down to a simple list_for_each_entry.
      
      Note that we could easily use a singly linked list here instead of the
      double linked list from list.h as the fastpath only does deletion from
      sequential traversal.  But given that we don't have one available as
      a library function yet I use the list.h functions for simplicity.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      e98c414f
    • C
      xfs: remove unneeded #include statements · 3400777f
      Christoph Hellwig 提交于
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <david@fromorbit.com>
      3400777f
    • C
      xfs: drop dmapi hooks · 288699fe
      Christoph Hellwig 提交于
      Dmapi support was never merged upstream, but we still have a lot of hooks
      bloating XFS for it, all over the fast pathes of the filesystem.
      
      This patch drops over 700 lines of dmapi overhead.  If we'll ever get HSM
      support in mainline at least the namespace events can be done much saner
      in the VFS instead of the individual filesystem, so it's not like this
      is much help for future work.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      288699fe
  15. 20 7月, 2010 1 次提交
  16. 24 5月, 2010 2 次提交
    • D
      xfs: Introduce delayed logging core code · 71e330b5
      Dave Chinner 提交于
      The delayed logging code only changes in-memory structures and as
      such can be enabled and disabled with a mount option. Add the mount
      option and emit a warning that this is an experimental feature that
      should not be used in production yet.
      
      We also need infrastructure to track committed items that have not
      yet been written to the log. This is what the Committed Item List
      (CIL) is for.
      
      The log item also needs to be extended to track the current log
      vector, the associated memory buffer and it's location in the Commit
      Item List. Extend the log item and log vector structures to enable
      this tracking.
      
      To maintain the current log format for transactions with delayed
      logging, we need to introduce a checkpoint transaction and a context
      for tracking each checkpoint from initiation to transaction
      completion.  This includes adding a log ticket for tracking space
      log required/used by the context checkpoint.
      
      To track all the changes we need an io vector array per log item,
      rather than a single array for the entire transaction. Using the new
      log vector structure for this requires two passes - the first to
      allocate the log vector structures and chain them together, and the
      second to fill them out.  This log vector chain can then be passed
      to the CIL for formatting, pinning and insertion into the CIL.
      
      Formatting of the log vector chain is relatively simple - it's just
      a loop over the iovecs on each log vector, but it is made slightly
      more complex because we re-write the iovec after the copy to point
      back at the memory buffer we just copied into.
      
      This code also needs to pin log items. If the log item is not
      already tracked in this checkpoint context, then it needs to be
      pinned. Otherwise it is already pinned and we don't need to pin it
      again.
      
      The only other complexity is calculating the amount of new log space
      the formatting has consumed. This needs to be accounted to the
      transaction in progress, and the accounting is made more complex
      becase we need also to steal space from it for log metadata in the
      checkpoint transaction. Calculate all this at insert time and update
      all the tickets, counters, etc correctly.
      
      Once we've formatted all the log items in the transaction, attach
      the busy extents to the checkpoint context so the busy extents live
      until checkpoint completion and can be processed at that point in
      time. Transactions can then be freed at this point in time.
      
      Now we need to issue checkpoints - we are tracking the amount of log space
      used by the items in the CIL, so we can trigger background checkpoints when the
      space usage gets to a certain threshold. Otherwise, checkpoints need ot be
      triggered when a log synchronisation point is reached - a log force event.
      
      Because the log write code already handles chained log vectors, writing the
      transaction is trivial, too. Construct a transaction header, add it
      to the head of the chain and write it into the log, then issue a
      commit record write. Then we can release the checkpoint log ticket
      and attach the context to the log buffer so it can be called during
      Io completion to complete the checkpoint.
      
      We also need to allow for synchronising multiple in-flight
      checkpoints. This is needed for two things - the first is to ensure
      that checkpoint commit records appear in the log in the correct
      sequence order (so they are replayed in the correct order). The
      second is so that xfs_log_force_lsn() operates correctly and only
      flushes and/or waits for the specific sequence it was provided with.
      
      To do this we need a wait variable and a list tracking the
      checkpoint commits in progress. We can walk this list and wait for
      the checkpoints to change state or complete easily, an this provides
      the necessary synchronisation for correct operation in both cases.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      71e330b5
    • D
      xfs: Clean up XFS_BLI_* flag namespace · c1155410
      Dave Chinner 提交于
      Clean up the buffer log format (XFS_BLI_*) flags because they have a
      polluted namespace. They XFS_BLI_ prefix is used for both in-memory
      and on-disk flag feilds, but have overlapping values for different
      flags. Rename the buffer log format flags to use the XFS_BLF_*
      prefix to avoid confusing them with the in-memory XFS_BLI_* prefixed
      flags.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      c1155410
  17. 19 5月, 2010 2 次提交
  18. 30 4月, 2010 1 次提交
    • D
      xfs: add a shrinker to background inode reclaim · 9bf729c0
      Dave Chinner 提交于
      On low memory boxes or those with highmem, kernel can OOM before the
      background reclaims inodes via xfssyncd. Add a shrinker to run inode
      reclaim so that it inode reclaim is expedited when memory is low.
      
      This is more complex than it needs to be because the VM folk don't
      want a context added to the shrinker infrastructure. Hence we need
      to add a global list of XFS mount structures so the shrinker can
      traverse them.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      9bf729c0
  19. 29 4月, 2010 1 次提交
  20. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  21. 06 3月, 2010 2 次提交
  22. 09 2月, 2010 1 次提交
    • C
      xfs: log changed inodes instead of writing them synchronously · 07fec736
      Christoph Hellwig 提交于
      When an inode has already be flushed delayed write,
      xfs_inode_clean() returns true and hence xfs_fs_write_inode() can
      return on a synchronous inode write without having written the
      inode. Currently these sycnhronous writes only come sync(1),
      unmount, a sycnhronous NFS export and cachefiles so should be
      relatively rare and out of common performance paths.
      
      Realistically, a synchronous inode write is not necessary here; we
      can avoid writing the inode by logging any non-transactional changes
      that are pending.  This needs to be done with synchronous
      transactions, but it avoids seeking between the log and inode
      clusters as we do now. We don't force the log if the inode is
      pinned, though, so this differs from the fsync case.  For normal
      sys_sync and unmount behaviour this is fine because we do a
      synchronous log force in xfs_sync_data which is called from the
      ->sync_fs code.
      
      It does however break the NFS synchronous export guarantees for now,
      but work is under way to fix this at a higher level or for the
      higher level to provide an additional flag in the writeback control
      to tell us that a log force is needed.
      
      Portions of this patch are based on work from Dave Chinner.
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Reviewed-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      07fec736