1. 02 9月, 2010 1 次提交
    • D
      xfs: improve buffer cache hash scalability · 9bc08a45
      Dave Chinner 提交于
      When doing large parallel file creates on a 16p machines, large amounts of
      time is being spent in _xfs_buf_find(). A system wide profile with perf top
      shows this:
      
                1134740.00 19.3% _xfs_buf_find
                 733142.00 12.5% __ticket_spin_lock
      
      The problem is that the hash contains 45,000 buffers, and the hash table width
      is only 256 buffers. That means we've got around 200 buffers per chain, and
      searching it is quite expensive. The hash table size needs to increase.
      
      Secondly, every time we do a lookup, we promote the buffer we find to the head
      of the hash chain. This is causing cachelines to be dirtied and causes
      invalidation of cachelines across all CPUs that may have walked the hash chain
      recently. hence every walk of the hash chain is effectively a cold cache walk.
      Remove the promotion to avoid this invalidation.
      
      The results are:
      
                1045043.00 21.2% __ticket_spin_lock
                 326184.00  6.6% _xfs_buf_find
      
      A 70% drop in the CPU usage when looking up buffers. Unfortunately that does
      not result in an increase in performance underthis workload as contention on
      the inode_lock soaks up most of the reduction in CPU usage.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      9bc08a45
  2. 24 8月, 2010 4 次提交
    • C
      xfs: do not discard page cache data on EAGAIN · b5420f23
      Christoph Hellwig 提交于
      If xfs_map_blocks returns EAGAIN because of lock contention we must redirty the
      page and not disard the pagecache content and return an error from writepage.
      We used to do this correctly, but the logic got lost during the recent
      reshuffle of the writepage code.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reported-by: NMike Gao <ygao.linux@gmail.com>
      Tested-by: NMike Gao <ygao.linux@gmail.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      b5420f23
    • D
      xfs: dummy transactions should not dirty VFS state · 1a387d3b
      Dave Chinner 提交于
      When we  need to cover the log, we issue dummy transactions to ensure
      the current log tail is on disk. Unfortunately we currently use the
      root inode in the dummy transaction, and the act of committing the
      transaction dirties the inode at the VFS level.
      
      As a result, the VFS writeback of the dirty inode will prevent the
      filesystem from idling long enough for the log covering state
      machine to complete. The state machine gets stuck in a loop issuing
      new dummy transactions to cover the log and never makes progress.
      
      To avoid this problem, the dummy transactions should not cause
      externally visible state changes. To ensure this occurs, make sure
      that dummy transactions log an unchanging field in the superblock as
      it's state is never propagated outside the filesystem. This allows
      the log covering state machine to complete successfully and the
      filesystem now correctly enters a fully idle state about 90s after
      the last modification was made.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      1a387d3b
    • S
      xfs: ensure f_ffree returned by statfs() is non-negative · 2fe33661
      Stuart Brodsky 提交于
      Because of delayed updates to sb_icount field in the super block, it
      is possible to allocate over maxicount number of inodes.  This
      causes the arithmetic to calculate a negative number of free inodes
      in user commands like df or stat -f.
      
      Since maxicount is a somewhat arbitrary number, a slight over
      allocation is not critical but user commands should be displayed as
      0 or greater and never go negative.  To do this the value in the
      stats buffer f_ffree is capped to never go negative.
      
      [ Modified to use max_t as per Christoph's comment. ]
      Signed-off-by: NStu Brodsky <sbrodsky@sgi.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      2fe33661
    • D
      xfs: handle negative wbc->nr_to_write during sync writeback · efceab1d
      Dave Chinner 提交于
      During data integrity (WB_SYNC_ALL) writeback, wbc->nr_to_write will
      go negative on inodes with more than 1024 dirty pages due to
      implementation details of write_cache_pages(). Currently XFS will
      abort page clustering in writeback once nr_to_write drops below
      zero, and so for data integrity writeback we will do very
      inefficient page at a time allocation and IO submission for inodes
      with large numbers of dirty pages.
      
      Fix this by only aborting the page clustering code when
      wbc->nr_to_write is negative and the sync mode is WB_SYNC_NONE.
      
      Cc: <stable@kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      efceab1d
  3. 10 8月, 2010 5 次提交
    • A
      convert remaining ->clear_inode() to ->evict_inode() · b57922d9
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b57922d9
    • A
      simplify checks for I_CLEAR/I_FREEING · a4ffdde6
      Al Viro 提交于
      add I_CLEAR instead of replacing I_FREEING with it.  I_CLEAR is
      equivalent to I_FREEING for almost all code looking at either;
      it's there to keep track of having called clear_inode() exactly
      once per inode lifetime, at some point after having set I_FREEING.
      I_CLEAR and I_FREEING never get set at the same time with the
      current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR
      instead of I_CLEAR without loss of information.  As the result of
      such change, checks become simpler and the amount of code that needs
      to know about I_CLEAR shrinks a lot.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a4ffdde6
    • C
      xfs: new truncate sequence · fa9b227e
      Christoph Hellwig 提交于
      Convert XFS to the new truncate sequence.  We still can have errors after
      updating the file size in xfs_setattr, but these are real I/O errors and lead
      to a transaction abort and filesystem shutdown, so they are not an issue.
      
      Errors from ->write_begin and write_end can now be handled correctly because
      we can actually get rid of the delalloc extents while previous the buffer
      state was stipped in block_invalidatepage.
      
      There is still no error handling for ->direct_IO, because doing so will need
      some major restructuring given that we only have the iolock shared and do not
      hold i_mutex at all.  Fortunately leaving the normally allocated blocks behind
      there is not a major issue and this will get cleaned up by xfs_free_eofblock
      later.
      
      Note: the patch is against Al's vfs.git tree as that contains the nessecary
      preparations.  I'd prefer to get it applied there so that we can get some
      testing in linux-next.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fa9b227e
    • C
      get rid of block_write_begin_newtrunc · 155130a4
      Christoph Hellwig 提交于
      Move the call to vmtruncate to get rid of accessive blocks to the callers
      in preparation of the new truncate sequence and rename the non-truncating
      version to block_write_begin.
      
      While we're at it also remove several unused arguments to block_write_begin.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      155130a4
    • C
      sort out blockdev_direct_IO variants · eafdc7d1
      Christoph Hellwig 提交于
      Move the call to vmtruncate to get rid of accessive blocks to the callers
      in prepearation of the new truncate calling sequence.  This was only done
      for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
      was not needed anyway.  Get rid of blockdev_direct_IO_no_locking and
      its _newtrunc variant while at it as just opencoding the two additional
      paramters is shorted than the name suffix.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      eafdc7d1
  4. 27 7月, 2010 30 次提交
    • C
      direct-io: move aio_complete into ->end_io · 552ef802
      Christoph Hellwig 提交于
      Filesystems with unwritten extent support must not complete an AIO request
      until the transaction to convert the extent has been commited.  That means
      the aio_complete calls needs to be moved into the ->end_io callback so
      that the filesystem can control when to call it exactly.
      
      This makes a bit of a mess out of dio_complete and the ->end_io callback
      prototype even more complicated. 
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: Jan Kara <jack@suse.cz> 
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      552ef802
    • C
      xfs simplify and speed up direct I/O completions · 209fb87a
      Christoph Hellwig 提交于
      Our current handling of direct I/O completions is rather suboptimal,
      because we defer it to a workqueue more often than needed, and we
      perform a much to aggressive flush of the workqueue in case unwritten
      extent conversions happen.
      
      This patch changes the direct I/O reads to not even use a completion
      handler, as we don't bother to use it at all, and to perform the unwritten
      extent conversions in caller context for synchronous direct I/O.
      
      For a small I/O size direct I/O workload on a consumer grade SSD, such as
      the untar of a kernel tree inside qemu this patch gives speedups of
      about 5%.  Getting us much closer to the speed of a native block device,
      or a fully allocated XFS file.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      209fb87a
    • C
      xfs: move aio completion after unwritten extent conversion · fb511f21
      Christoph Hellwig 提交于
      If we write into an unwritten extent using AIO we need to complete the AIO
      request after the extent conversion has finished.  Without that a read could
      race to see see the extent still unwritten and return zeros.   For synchronous
      I/O we already take care of that by flushing the xfsconvertd workqueue (which
      might be a bit of overkill).
      
      To do that add iocb and result fields to struct xfs_ioend, so that we can
      call aio_complete from xfs_end_io after the extent conversion has happened.
      Note that we need a new result field as io_error is used for positive errno
      values, while the AIO code can return negative error values and positive
      transfer sizes.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      fb511f21
    • C
      direct-io: move aio_complete into ->end_io · 40e2e973
      Christoph Hellwig 提交于
      Filesystems with unwritten extent support must not complete an AIO request
      until the transaction to convert the extent has been commited.  That means
      the aio_complete calls needs to be moved into the ->end_io callback so
      that the filesystem can control when to call it exactly.
      
      This makes a bit of a mess out of dio_complete and the ->end_io callback
      prototype even more complicated.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      40e2e973
    • C
      xfs: kill the b_strat callback in xfs_buf · 939d723b
      Christoph Hellwig 提交于
      The b_strat callback is used by xfs_buf_iostrategy to perform additional
      checks before submitting a buffer.  It is used in xfs_bwrite and when
      writing out delayed buffers.  In xfs_bwrite it we can de-virtualize the
      call easily as b_strat is set a few lines above the call to
      xfs_buf_iostrategy.  For the delayed buffers the rationale is a bit
      more complicated:
      
       - there are three callers of xfs_buf_delwri_queue, which places buffers
         on the delwri list:
          (1) xfs_bdwrite - this sets up b_strat, so it's fine
          (2) xfs_buf_iorequest.  None of the callers can have XBF_DELWRI set:
      	- xlog_bdstrat is only used for log buffers, which are never delwri
      	- _xfs_buf_read explicitly clears the delwri flag
      	- xfs_buf_iodone_work retries log buffers only
      	- xfsbdstrat - only used for reads, superblock writes without the
      	  delwri flag, log I/O and file zeroing with explicitly allocated
      	  buffers.
      	- xfs_buf_iostrategy - only calls xfs_buf_iorequest if b_strat is
      	  not set
          (3) xfs_buf_unlock
      	- only puts the buffer on the delwri list if the DELWRI flag is
      	  already set.  The DELWRI flag is only ever set in xfs_bwrite,
      	  xfs_buf_iodone_callbacks, or xfs_trans_log_buf.  For
      	  xfs_buf_iodone_callbacks and xfs_trans_log_buf we require
      	  an initialized buf item, which means b_strat was set to
      	  xfs_bdstrat_cb in xfs_buf_item_init.
      
      Conclusion: we can just get rid of the callback and replace it with
      explicit calls to xfs_bdstrat_cb.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      939d723b
    • C
      xfs: remove obsolete osyncisosync mount option · a64afb05
      Christoph Hellwig 提交于
      Since Linux 2.6.33 the kernel has support for real O_SYNC, which made
      the osyncisosync option a no-op.  Warn the users about this and remove
      the mount flag for it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      a64afb05
    • T
      xfs: Fix build when CONFIG_XFS_POSIX_ACL=n · 0f1a932f
      Tony Luck 提交于
      When CONFIG_XFS_POSIX_ACL is not set "xfs_check_acl" is #defined
      to NULL - which breaks the code attempting to add a tracepoint
      on this function.
      
      Only define the tracepoint when the function exists.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      0f1a932f
    • D
      xfs: use GFP_NOFS for page cache allocation · aea1b953
      Dave Chinner 提交于
      Avoid a lockdep warning by preventing page cache allocation from
      recursing back into the filesystem during memory reclaim.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      aea1b953
    • D
      xfs: simplify and remove xfs_ireclaim · 2f11feab
      Dave Chinner 提交于
      xfs_ireclaim has to get and put te pag structure because it is only
      called with the inode to reclaim. The one caller of this function
      already has a reference on the pag and a pointer to is, so move the
      radix tree delete to the caller and remove xfs_ireclaim completely.
      This avoids a xfs_perag_get/put on every inode being reclaimed.
      
      The overhead was noticed in a bug report at:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=16348Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2f11feab
    • D
      xfs: don't block on buffer read errors · ec53d1db
      Dave Chinner 提交于
      xfs_buf_read() fails to detect dispatch errors before attempting to
      wait on sychronous IO. If there was an error, it will get stuck
      forever, waiting for an I/O that was never started. Make sure the
      error is detected correctly.
      
      Further, such a failure can leave locked pages in the page cache
      which will cause a later operation to hang on the page. Ensure that
      we correctly process pages in the buffers when we get a dispatch
      error.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ec53d1db
    • D
      xfs: move inode shrinker unregister even earlier · a4190f90
      Dave Chinner 提交于
      I missed Dave Chinner's second revision of this change, and pushed
      his first version out to the repository instead.
      
      	commit a476c59ebb279d738718edc0e3fb76aab3687114
      	Author: Dave Chinner <dchinner@redhat.com>
      
      This commit compensates for that by moving a block of code up a bit
      further, with a result that matches the the effect of Dave's second
      version.
      
      Dave's first version was:
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Dave's second version was:
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      a4190f90
    • C
      xfs: remove a dmapi leftover · fa17b25e
      Christoph Hellwig 提交于
      The open_exec file operation is only added by the external dmapi
      patch.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      fa17b25e
    • C
      xfs: writepage always has buffers · 78558fe8
      Christoph Hellwig 提交于
      These days we always have buffers thanks to ->page_mkwrite.  And we
      already have an assert a few lines above tripping in case that was
      not true due to a bug.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      78558fe8
    • C
      xfs: allow writeback from kswapd · d4f7a5cb
      Christoph Hellwig 提交于
      We only need disable I/O from direct or memcg reclaim.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      d4f7a5cb
    • D
      xfs: unregister inode shrinker before freeing filesystem structures · 2727ccc9
      Dave Chinner 提交于
      Currently we don't remove the XFS mount from the shrinker list until
      late in the unmount path. By this time, we have already torn down
      the internals of the filesystem (e.g. the per-ag structures), and
      hence if the shrinker is executed between the teardown and the
      unregistering, the shrinker will get NULL per-ag structure pointers
      and panic trying to dereference them.
      
      Fix this by removing the xfs mount from the shrinker list before
      tearing down it's internal structures.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      2727ccc9
    • C
      xfs: split xfs_itrace_entry · cca28fb8
      Christoph Hellwig 提交于
      Replace the xfs_itrace_entry catchall with specific trace points.  For
      most simple callers we now use the simple inode class, which used to
      be the iget class, but add more details tracing for namespace events,
      which now includes the name of the directory entries manipulated.
      
      Remove the xfs_inactive trace point, which is a duplicate of the clear_inode
      one, and the xfs_change_file_space trace point, which is immediately
      followed by the more specific alloc/free space trace points.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      cca28fb8
    • C
      xfs: remove xfs_iput_new · ef35e925
      Christoph Hellwig 提交于
      We never get an i_mode of 0 or a locked VFS inode until we pass in the
      XFS_IGET_CREATE flag to xfs_iget, which makes xfs_iput_new equivalent to
      xfs_iput for the only caller.  In addition to that xfs_nfs_get_inode
      does not even need to lock the inode given that the generation never changes
      for a life inode, so just pass a 0 lock_flags to xfs_iget and release
      the inode using IRELE in the error path.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      ef35e925
    • C
      xfs: some iget tracing cleanups / fixes · d2e078c3
      Christoph Hellwig 提交于
      The xfs_iget_alloc/found tracepoints are a bit misnamed and misplaced.
      Rename them to xfs_iget_hit/xfs_iget_miss and move them to the beggining
      of the xfs_iget_cache_hit/miss functions.  Add a new xfs_iget_reclaim_fail
      tracepoint for the case where we fail to re-initialize a VFS inode,
      and add a second instance of the xfs_iget_skip tracepoint for the case
      of a failed igrab() call.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      d2e078c3
    • C
      xfs: do not use emums for flags used in tracing · 807cbbdb
      Christoph Hellwig 提交于
      The tracing code can't print flags defined as enums.  Most flags that
      we want to print are defines as macros already, but move the few remaining
      ones over to make the trace output more useful.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      807cbbdb
    • C
      xfs: remove explicit xfs_sync_data/xfs_sync_attr calls on umount · 64c86149
      Christoph Hellwig 提交于
      On the final put of a superblock the VFS already calls sync_filesystem
      for us to write out all data and wait for it.  No need to start another
      asynchronous writeback inside ->put_super.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      64c86149
    • C
      xfs: small cleanups for xfs_iomap / __xfs_get_blocks · f2bde9b8
      Christoph Hellwig 提交于
      Remove the flags argument to  __xfs_get_blocks as we can easily derive
      it from the direct argument, and remove the unused BMAPI_MMAP flag.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      f2bde9b8
    • C
      xfs: avoid synchronous transaction in xfs_fs_write_inode · 7a36c8a9
      Christoph Hellwig 提交于
      We already rely on the fact that the sync code will cause a synchronous
      log force later on (currently via xfs_fs_sync_fs -> xfs_quiesce_data ->
      xfs_sync_data), so no need to do this here.  This allows us to avoid
      a lot of synchronous log forces during sync, which pays of especially
      with delayed logging enabled.   Some compilebench numbers that show
      this:
      
      xfs (delayed logging, 256k logbufs)
      ===================================
      
      intial create		  25.94 MB/s	  25.75 MB/s	  25.64 MB/s
      create			   8.54 MB/s	   9.12 MB/s	   9.15 MB/s
      patch			   2.47 MB/s	   2.47 MB/s	   3.17 MB/s
      compile			  29.65 MB/s	  30.51 MB/s	  27.33 MB/s
      clean			  90.92 MB/s	  98.83 MB/s	 128.87 MB/s
      read tree		  11.90 MB/s	  11.84 MB/s	   8.56 MB/s
      read compiled		  28.75 MB/s	  29.96 MB/s	  24.25 MB/s
      delete tree		8.39 seconds	8.12 seconds	8.46 seconds
      delete compiled		8.35 seconds	8.44 seconds	5.11 seconds
      stat tree		6.03 seconds	5.59 seconds	5.19 seconds
      stat compiled tree	9.00 seconds	9.52 seconds	8.49 seconds
      
      xfs + write_inode log_force removal
      ===================================
      intial create		  25.87 MB/s	  25.76 MB/s	  25.87 MB/s
      create			  15.18 MB/s	  14.80 MB/s	  14.94 MB/s
      patch			   3.13 MB/s	   3.14 MB/s	   3.11 MB/s
      compile			  36.74 MB/s	  37.17 MB/s	  36.84 MB/s
      clean			 226.02 MB/s	 222.58 MB/s	 217.94 MB/s
      read tree		  15.14 MB/s	  15.02 MB/s	  15.14 MB/s
      read compiled tree	  29.30 MB/s	  29.31 MB/s	  29.32 MB/s
      delete tree		6.22 seconds	6.14 seconds	6.15 seconds
      delete compiled tree	5.75 seconds	5.92 seconds	5.81 seconds
      stat tree		4.60 seconds	4.51 seconds	4.56 seconds
      stat compiled tree	4.07 seconds	3.87 seconds	3.96 seconds
      
      In addition to that also remove the delwri inode flush that is unessecary
      now that bulkstat is always coherent.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      7a36c8a9
    • C
      xfs: simplify xfs_vm_writepage · 20cb52eb
      Christoph Hellwig 提交于
      The writepage implementation in XFS still tries to deal with dirty but
      unmapped buffers which used to caused by writes through shared mmaps.  Since
      the introduction of ->page_mkwrite these can't happen anymore, so remove the
      code dealing with them.
      
      Note that the all_bh variable which causes us to start I/O on all buffers on
      the pages was controlled by the count of unmapped buffers, which also
      included those not actually dirty.  It's now unconditionally initialized to
      0 but set to 1 for the case of small file size extensions.  It probably can
      be removed entirely, but that's left for another patch.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      20cb52eb
    • C
      xfs: simplify xfs_vm_releasepage · 89f3b363
      Christoph Hellwig 提交于
      Currently the xfs releasepage implementation has code to deal with converting
      delayed allocated and unwritten space.  But we never get called for those as
      we always convert delayed and unwritten space when cleaning a page, or drop
      the state from the buffers in block_invalidatepage.  We still keep a WARN_ON
      on those cases for now, but remove all the case dealing with it, which allows
      to fold xfs_page_state_convert into xfs_vm_writepage and remove the !startio
      case from the whole writeback path.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      89f3b363
    • E
      xfs: fix corruption case for block size < page size · 3d9b02e3
      Eric Sandeen 提交于
      xfstests 194 first truncats a file back and then extends it again by
      truncating it to a larger size.  This causes discard_buffer to drop
      the mapped, but not the uptodate bit and thus creates something that
      xfs_page_state_convert takes for unmapped space created by mmap because
      it doesn't check for the dirty bit, which also gets cleared by
      discard_buffer and checked by other ->writepage implementations like
      block_write_full_page.  Handle this kind of buffers early, and unlike
      Eric's first version of the patch simply ASSERT that the buffers is
      dirty, given that the mmap write case can't happen anymore since the
      introduction of ->page_mkwrite.  The now dead code dealing with that
      will be deleted in a follow on patch.
      Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      3d9b02e3
    • C
      xfs: remove unused delta tracking code in xfs_bmapi · b4e9181e
      Christoph Hellwig 提交于
      This code was introduced four years ago in commit
      3e57ecf6 without any review and has
      been unused since.  Remove it just as the rest of the code introduced
      in that commit to reduce that stack usage and complexity in this central
      piece of code.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      b4e9181e
    • C
      xfs: simplify inode to transaction joining · 898621d5
      Christoph Hellwig 提交于
      Currently we need to either call IHOLD or xfs_trans_ihold on an inode when
      joining it to a transaction via xfs_trans_ijoin.
      
      This patches instead makes xfs_trans_ijoin usable on it's own by doing
      an implicity xfs_trans_ihold, which also allows us to drop the third
      argument.  For the case where we want to hold a reference on the inode
      a xfs_trans_ijoin_ref wrapper is added which does the IHOLD and marks
      the inode for needing an xfs_iput.  In addition to the cleaner interface
      to the caller this also simplifies the implementation.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      898621d5
    • C
      xfs: simplify buffer pinning · 4d16e924
      Christoph Hellwig 提交于
      Get rid of the xfs_buf_pin/xfs_buf_unpin/xfs_buf_ispin helpers and opencode
      them in their only callers, just like we did for the inode pinning a while
      ago.  Also remove duplicate trace points - the bufitem tracepoints cover
      all the information that is present in a buffer tracepoint.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      4d16e924
    • C
      xfs: simplify log item descriptor tracking · e98c414f
      Christoph Hellwig 提交于
      Currently we track log item descriptor belonging to a transaction using a
      complex opencoded chunk allocator.  This code has been there since day one
      and seems to work around the lack of an efficient slab allocator.
      
      This patch replaces it with dynamically allocated log item descriptors
      from a dedicated slab pool, linked to the transaction by a linked list.
      
      This allows to greatly simplify the log item descriptor tracking to the
      point where it's just a couple hundred lines in xfs_trans.c instead of
      a separate file.  The external API has also been simplified while we're
      at it - the xfs_trans_add_item and xfs_trans_del_item functions to add/
      delete items from a transaction have been simplified to the bare minium,
      and the xfs_trans_find_item function is replaced with a direct dereference
      of the li_desc field.  All debug code walking the list of log items in
      a transaction is down to a simple list_for_each_entry.
      
      Note that we could easily use a singly linked list here instead of the
      double linked list from list.h as the fastpath only does deletion from
      sequential traversal.  But given that we don't have one available as
      a library function yet I use the list.h functions for simplicity.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      e98c414f
    • C
      xfs: remove unneeded #include statements · 3400777f
      Christoph Hellwig 提交于
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <david@fromorbit.com>
      3400777f