1. 08 4月, 2011 7 次提交
    • C
      xfs: fix xfs_debug warnings · 957935dc
      Christoph Hellwig 提交于
      For a CONFIG_XFS_DEBUG=n build gcc complains about statements with no
      effect in xfs_debug:
      
      fs/xfs/quota/xfs_qm_syscalls.c: In function 'xfs_qm_scall_trunc_qfiles':
      fs/xfs/quota/xfs_qm_syscalls.c:291:3: warning: statement with no effect
      
      The reason for that is that the various new xfs message functions have a
      return value which is never used, and in case of the non-debug build
      xfs_debug the macro evaluates to a plain 0 which produces the above
      warnings.  This can be fixed by turning xfs_debug into an inline function
      instead of a macro, but in addition to that I've also changed all the
      message helpers to return void as we never use their return values.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      957935dc
    • C
      xfs: fix variable set but not used warnings · ecb697c1
      Christoph Hellwig 提交于
      GCC 4.6 now warnings about variables set but not used.  Fix the trivially
      fixable warnings of this sort.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      ecb697c1
    • D
      xfs: push the AIL from memory reclaim and periodic sync · fd074841
      Dave Chinner 提交于
      When we are short on memory, we want to expedite the cleaning of
      dirty objects.  Hence when we run short on memory, we need to kick
      the AIL flushing into action to clean as many dirty objects as
      quickly as possible.  To implement this, sample the lsn of the log
      item at the head of the AIL and use that as the push target for the
      AIL flush.
      
      Further, we keep items in the AIL that are dirty that are not
      tracked any other way, so we can get objects sitting in the AIL that
      don't get written back until the AIL is pushed. Hence to get the
      filesystem to the idle state, we might need to push the AIL to flush
      out any remaining dirty objects sitting in the AIL. This requires
      the same push mechanism as the reclaim push.
      
      This patch also renames xfs_trans_ail_tail() to xfs_ail_min_lsn() to
      match the new xfs_ail_max_lsn() function introduced in this patch.
      Similarly for xfs_trans_ail_push -> xfs_ail_push.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      fd074841
    • D
      xfs: convert the xfsaild threads to a workqueue · 0bf6a5bd
      Dave Chinner 提交于
      Similar to the xfssyncd, the per-filesystem xfsaild threads can be
      converted to a global workqueue and run periodically by delayed
      works. This makes sense for the AIL pushing because it uses
      variable timeouts depending on the work that needs to be done.
      
      By removing the xfsaild, we simplify the AIL pushing code and
      remove the need to spread the code to implement the threading
      and pushing across multiple files.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      0bf6a5bd
    • D
      xfs: introduce background inode reclaim work · a7b339f1
      Dave Chinner 提交于
      Background inode reclaim needs to run more frequently that the XFS
      syncd work is run as 30s is too long between optimal reclaim runs.
      Add a new periodic work item to the xfs syncd workqueue to run a
      fast, non-blocking inode reclaim scan.
      
      Background inode reclaim is kicked by the act of marking inodes for
      reclaim.  When an AG is first marked as having reclaimable inodes,
      the background reclaim work is kicked. It will continue to run
      periodically untill it detects that there are no more reclaimable
      inodes. It will be kicked again when the first inode is queued for
      reclaim.
      
      To ensure shrinker based inode reclaim throttles to the inode
      cleaning and reclaim rate but still reclaim inodes efficiently, make it kick the
      background inode reclaim so that when we are low on memory we are
      trying to reclaim inodes as efficiently as possible. This kick shoul
      d not be necessary, but it will protect against failures to kick the
      background reclaim when inodes are first dirtied.
      
      To provide the rate throttling, make the shrinker pass do
      synchronous inode reclaim so that it blocks on inodes under IO. This
      means that the shrinker will reclaim inodes rather than just
      skipping over them, but it does not adversely affect the rate of
      reclaim because most dirty inodes are already under IO due to the
      background reclaim work the shrinker kicked.
      
      These two modifications solve one of the two OOM killer invocations
      Chris Mason reported recently when running a stress testing script.
      The particular workload trigger for the OOM killer invocation is
      where there are more threads than CPUs all unlinking files in an
      extremely memory constrained environment. Unlike other solutions,
      this one does not have a performance impact on performance when
      memory is not constrained or the number of concurrent threads
      operating is <= to the number of CPUs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      a7b339f1
    • D
      xfs: convert ENOSPC inode flushing to use new syncd workqueue · 89e4cb55
      Dave Chinner 提交于
      On of the problems with the current inode flush at ENOSPC is that we
      queue a flush per ENOSPC event, regardless of how many are already
      queued. Thi can result in    hundreds of queued flushes, most of
      which simply burn CPU scanned and do no real work. This simply slows
      down allocation at ENOSPC.
      
      We really only need one active flush at a time, and we can easily
      implement that via the new xfs_syncd_wq. All we need to do is queue
      a flush if one is not already active, then block waiting for the
      currently active flush to complete. The result is that we only ever
      have a single ENOSPC inode flush active at a time and this greatly
      reduces the overhead of ENOSPC processing.
      
      On my 2p test machine, this results in tests exercising ENOSPC
      conditions running significantly faster - 042 halves execution time,
      083 drops from 60s to 5s, etc - while not introducing test
      regressions.
      
      This allows us to remove the old xfssyncd threads and infrastructure
      as they are no longer used.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      89e4cb55
    • D
      xfs: introduce a xfssyncd workqueue · c6d09b66
      Dave Chinner 提交于
      All of the work xfssyncd does is background functionality. There is
      no need for a thread per filesystem to do this work - it can al be
      managed by a global workqueue now they manage concurrency
      effectively.
      
      Introduce a new gglobal xfssyncd workqueue, and convert the periodic
      work to use this new functionality. To do this, use a delayed work
      construct to schedule the next running of the periodic sync work
      for the filesystem. When the sync work is complete, queue a new
      delayed work for the next running of the sync work.
      
      For laptop mode, we wait on completion for the sync works, so ensure
      that the sync work queuing interface can flush and wait for work to
      complete to enable the work queue infrastructure to replace the
      current sequence number and wakeup that is used.
      
      Because the sync work does non-trivial amounts of work, mark the
      new work queue as CPU intensive.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      c6d09b66
  2. 31 3月, 2011 1 次提交
  3. 26 3月, 2011 5 次提交
    • D
      xfs: stop using the page cache to back the buffer cache · 0e6e847f
      Dave Chinner 提交于
      Now that the buffer cache has it's own LRU, we do not need to use
      the page cache to provide persistent caching and reclaim
      infrastructure. Convert the buffer cache to use alloc_pages()
      instead of the page cache. This will remove all the overhead of page
      cache management from setup and teardown of the buffers, as well as
      needing to mark pages accessed as we find buffers in the buffer
      cache.
      
      By avoiding the page cache, we also remove the need to keep state in
      the page_private(page) field for persistant storage across buffer
      free/buffer rebuild and so all that code can be removed. This also
      fixes the long-standing problem of not having enough bits in the
      page_private field to track all the state needed for a 512
      sector/64k page setup.
      
      It also removes the need for page locking during reads as the pages
      are unique to the buffer and nobody else will be attempting to
      access them.
      
      Finally, it removes the buftarg address space lock as a point of
      global contention on workloads that allocate and free buffers
      quickly such as when creating or removing large numbers of inodes in
      parallel. This remove the 16TB limit on filesystem size on 32 bit
      machines as the page index (32 bit) is no longer used for lookups
      of metadata buffers - the buffer cache is now solely indexed by disk
      address which is stored in a 64 bit field in the buffer.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      0e6e847f
    • D
      xfs: register the inode cache shrinker before quotachecks · 704b2907
      Dave Chinner 提交于
      During mount, we can do a quotacheck that involves a bulkstat pass
      on all inodes. If there are more inodes in the filesystem than can
      be held in memory, we require the inode cache shrinker to run to
      ensure that we don't run out of memory.
      
      Unfortunately, the inode cache shrinker is not registered until we
      get to the end of the superblock setup process, which is after a
      quotacheck is run if it is needed. Hence we need to register the
      inode cache shrinker earlier in the mount process so that we don't
      OOM during mount. This requires that we also initialise the syncd
      work before we register the shrinker, so we nee dto juggle that
      around as well.
      
      While there, make sure that we have set up the block sizes in the
      VFS superblock correctly before the quotacheck is run so that any
      inodes that are cached as a result of the quotacheck have their
      block size fields set up correctly.
      
      Cc: stable@kernel.org
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      704b2907
    • D
      xfs: introduce inode cluster buffer trylocks for xfs_iflush · 1bfd8d04
      Dave Chinner 提交于
      There is an ABBA deadlock between synchronous inode flushing in
      xfs_reclaim_inode and xfs_icluster_free. xfs_icluster_free locks the
      buffer, then takes inode ilocks, whilst synchronous reclaim takes
      the ilock followed by the buffer lock in xfs_iflush().
      
      To avoid this deadlock, separate the inode cluster buffer locking
      semantics from the synchronous inode flush semantics, allowing
      callers to attempt to lock the buffer but still issue synchronous IO
      if it can get the buffer. This requires xfs_iflush() calls that
      currently use non-blocking semantics to pass SYNC_TRYLOCK rather
      than 0 as the flags parameter.
      
      This allows xfs_reclaim_inode to avoid the deadlock on the buffer
      lock and detect the failure so that it can drop the inode ilock and
      restart the reclaim attempt on the inode. This allows
      xfs_ifree_cluster to obtain the inode lock, mark the inode stale and
      release it and hence defuse the deadlock situation. It also has the
      pleasant side effect of avoiding IO in xfs_reclaim_inode when it
      tries to next reclaim the inode as it is now marked stale.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      1bfd8d04
    • D
      vmap: flush vmap aliases when mapping fails · a19fb380
      Dave Chinner 提交于
      On 32 bit systems, vmalloc space is limited and XFS can chew through
      it quickly as the vmalloc space is lazily freed. This can result in
      failure to map buffers, even when there is apparently large amounts
      of vmalloc space available. Hence, if we fail to map a buffer, purge
      the aliases that have not yet been freed to hopefuly free up enough
      vmalloc space to allow a retry to succeed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      a19fb380
    • D
      xfs: preallocation transactions do not need to be synchronous · 82878897
      Dave Chinner 提交于
      Preallocation and hole punch transactions are currently synchronous
      and this is causing performance problems in some cases. The
      transactions don't need to be synchronous as we don't need to
      guarantee the preallocation is persistent on disk until a
      fdatasync, fsync, sync operation occurs. If the file is opened
      O_SYNC or O_DATASYNC, only then should the transaction be issued
      synchronously.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      82878897
  4. 14 3月, 2011 1 次提交
  5. 12 3月, 2011 1 次提交
  6. 10 3月, 2011 2 次提交
    • J
      block: kill off REQ_UNPLUG · 721a9602
      Jens Axboe 提交于
      With the plugging now being explicitly controlled by the
      submitter, callers need not pass down unplugging hints
      to the block layer. If they want to unplug, it's because they
      manually plugged on their own - in which case, they should just
      unplug at will.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      721a9602
    • J
      block: remove per-queue plugging · 7eaceacc
      Jens Axboe 提交于
      Code has been converted over to the new explicit on-stack plugging,
      and delay users have been converted to use the new API for that.
      So lets kill off the old plugging along with aops->sync_page().
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      7eaceacc
  7. 07 3月, 2011 2 次提交
  8. 02 3月, 2011 3 次提交
  9. 23 2月, 2011 2 次提交
  10. 22 2月, 2011 1 次提交
    • L
      xfs: check if device support discard in xfs_ioc_trim() · 5d157655
      Lukas Czerner 提交于
      Right now we, are relying on the fact that when we attempt to
      actually do the discard, blkdev_issue_discar() returns -EOPNOTSUPP
      and the user is informed that the device does not support discard.
      
      However, in the case where the we do not hit any suitable free
      extent to trim in FITRIM code, it will finish without any error.
      This is very confusing, because it seems that FITRIM was successful
      even though the device does not actually supports discard.
      
      Solution: Check for the discard support before attempt to search for
      free extents.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      5d157655
  11. 02 2月, 2011 1 次提交
    • E
      fs/vfs/security: pass last path component to LSM on inode creation · 2a7dba39
      Eric Paris 提交于
      SELinux would like to implement a new labeling behavior of newly created
      inodes.  We currently label new inodes based on the parent and the creating
      process.  This new behavior would also take into account the name of the
      new object when deciding the new label.  This is not the (supposed) full path,
      just the last component of the path.
      
      This is very useful because creating /etc/shadow is different than creating
      /etc/passwd but the kernel hooks are unable to differentiate these
      operations.  We currently require that userspace realize it is doing some
      difficult operation like that and than userspace jumps through SELinux hoops
      to get things set up correctly.  This patch does not implement new
      behavior, that is obviously contained in a seperate SELinux patch, but it
      does pass the needed name down to the correct LSM hook.  If no such name
      exists it is fine to pass NULL.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      2a7dba39
  12. 01 2月, 2011 1 次提交
    • T
      xfs: convert to alloc_workqueue() · 83e75904
      Tejun Heo 提交于
      Convert from create[_singlethread]_workqueue() to alloc_workqueue().
      
      * xfsdatad_workqueue and xfsconvertd_workqueue are identity converted.
        Using higher concurrency limit might be useful but given the
        complexity of workqueue usage in xfs, proceeding cautiously seems
        better.
      
      * xfs_mru_reap_wq is converted to non-ordered workqueue with max
        concurrency of 1 as the work items don't require any specific
        ordering and already have proper synchronization.  It seems it was
        singlethreaded to save worker threads, which is no longer a concern.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: xfs-masters@oss.sgi.com
      Cc: Christoph Hellwig <hch@infradead.org>
      83e75904
  13. 28 1月, 2011 1 次提交
    • D
      xfs: limit extsize to size of AGs and/or MAXEXTLEN · 5315837d
      Dave Chinner 提交于
      The extent size hint can be set to larger than an AG. This means
      that the alignment process can push the range to be allocated
      outside the bounds of the AG, resulting in assert failures or
      corrupted bmbt records. Similarly, if the extsize is larger than the
      maximum extent size supported, the alignment process will produce
      extents that are too large to fit into the bmbt records, resulting
      in a different type of assert/corruption failure.
      
      Fix this by limiting extsize at the time іt is set firstly to be
      less than MAXEXTLEN, then to be a maximum of half the size of the
      AGs in the filesystem for non-realtime inodes. Realtime inodes do
      not allocate out of AGs, so don't have to be restricted by the size
      of AGs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      5315837d
  14. 17 1月, 2011 2 次提交
    • C
      fallocate should be a file operation · 2fe17c10
      Christoph Hellwig 提交于
      Currently all filesystems except XFS implement fallocate asynchronously,
      while XFS forced a commit.  Both of these are suboptimal - in case of O_SYNC
      I/O we really want our allocation on disk, especially for the !KEEP_SIZE
      case where we actually grow the file with user-visible zeroes.  On the
      other hand always commiting the transaction is a bad idea for fast-path
      uses of fallocate like for example in recent Samba versions.   Given
      that block allocation is a data plane operation anyway change it from
      an inode operation to a file operation so that we have the file structure
      available that lets us check for O_SYNC.
      
      This also includes moving the code around for a few of the filesystems,
      and remove the already unnedded S_ISDIR checks given that we only wire
      up fallocate for regular files.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2fe17c10
    • C
      make the feature checks in ->fallocate future proof · 64c23e86
      Christoph Hellwig 提交于
      Instead of various home grown checks that might need updates for new
      flags just check for any bit outside the mask of the features supported
      by the filesystem.  This makes the check future proof for any newly
      added flag.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      64c23e86
  15. 13 1月, 2011 1 次提交
  16. 12 1月, 2011 4 次提交
    • D
      xfs: prevent NMI timeouts in cmn_err · 73efe4a4
      Dave Chinner 提交于
      We currently have a global error message buffer in cmn_err that is
      protected by a spin lock that disables interrupts.  Recently there
      have been reports of NMI timeouts occurring when the console is
      being flooded by SCSI error reports due to cmn_err() getting stuck
      trying to print to the console while holding this lock (i.e. with
      interrupts disabled). The NMI watchdog is seeing this CPU as
      non-responding and so is triggering a panic.  While the trigger for
      the reported case is SCSI errors, pretty much anything that spams
      the kernel log could cause this to occur.
      
      Realistically the only reason that we have the intemediate message
      buffer is to prepend the correct kernel log level prefix to the log
      message. The only reason we have the lock is to protect the global
      message buffer and the only reason the message buffer is global is
      to keep it off the stack. Hence if we can avoid needing a global
      message buffer we avoid needing the lock, and we can do this with a
      small amount of cleanup and some preprocessor tricks:
      
      	1. clean up xfs_cmn_err() panic mask functionality to avoid
      	   needing debug code in xfs_cmn_err()
      	2. remove the couple of "!" message prefixes that still exist that
      	   the existing cmn_err() code steps over.
      	3. redefine CE_* levels directly to KERN_*
      	4. redefine cmn_err() and friends to use printk() directly
      	   via variable argument length macros.
      
      By doing this, we can completely remove the cmn_err() code and the
      lock that is causing the problems, and rely solely on printk()
      serialisation to ensure that we don't get garbled messages.
      
      A series of followup patches is really needed to clean up all the
      cmn_err() calls and related messages properly, but that results in a
      series that is not easily back portable to enterprise kernels. Hence
      this initial fix is only to address the direct problem in the lowest
      impact way possible.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      73efe4a4
    • C
      xfs: fix error handling for synchronous writes · bfc60177
      Christoph Hellwig 提交于
      If we get an IO error on a synchronous superblock write, we attach an
      error release function to it so that when the last reference goes away
      the release function is called and the buffer is invalidated and
      unlocked. The buffer is left locked until the release function is
      called so that other concurrent users of the buffer will be locked out
      until the buffer error is fully processed.
      
      Unfortunately, for the superblock buffer the filesyetm itself holds a
      reference to the buffer which prevents the reference count from
      dropping to zero and the release function being called. As a result,
      once an IO error occurs on a sync write, the buffer will never be
      unlocked and all future attempts to lock the buffer will hang.
      
      To make matters worse, this problems is not unique to such buffers;
      if there is a concurrent _xfs_buf_find() running, the lookup will grab
      a reference to the buffer and then wait on the buffer lock, preventing
      the reference count from ever falling to zero and hence unlocking the
      buffer.
      
      As such, the whole b_relse function implementation is broken because it
      cannot rely on the buffer reference count falling to zero to unlock the
      errored buffer. The synchronous write error path is the only path that
      uses this callback - it is used to ensure that the synchronous waiter
      gets the buffer error before the error state is cleared from the buffer
      by the release function.
      
      Given that the only sychronous buffer writes now go through xfs_bwrite
      and the error path in question can only occur for a write of a dirty,
      logged buffer, we can move most of the b_relse processing to happen
      inline in xfs_buf_iodone_callbacks, just like a normal I/O completion.
      In addition to that we make sure the error is not cleared in
      xfs_buf_iodone_callbacks, so that xfs_bwrite can reliably check it.
      Given that xfs_bwrite keeps the buffer locked until it has waited for
      it and checked the error this allows to reliably propagate the error
      to the caller, and make sure that the buffer is reliably unlocked.
      
      Given that xfs_buf_iodone_callbacks was the only instance of the
      b_relse callback we can remove it entirely.
      
      Based on earlier patches by Dave Chinner and Ajeet Yadav.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reported-by: NAjeet Yadav <ajeet.yadav.77@gmail.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      bfc60177
    • C
      xfs: add FITRIM support · a46db608
      Christoph Hellwig 提交于
      Allow manual discards from userspace using the FITRIM ioctl.  This is not
      intended to be run during normal workloads, as the freepsace btree walks
      can cause large performance degradation.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      a46db608
    • D
      xfs: ensure log covering transactions are synchronous · c58efdb4
      Dave Chinner 提交于
      To ensure the log is covered and the filesystem idles correctly, we
      need to ensure that dummy transactions hit the disk and do not stay
      pinned in memory.  If the superblock is pinned in memory, it can't
      be flushed so the log covering cannot make progress. The result is
      dependent on timing - more oftent han not we continue to issues a
      log covering transaction every 36s rather than idling after ~90s.
      
      Fix this by making the log covering transaction synchronous. To
      avoid additional log force from xfssyncd, make the log covering
      transaction take the place of the existing log force in the xfssyncd
      background sync process.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      c58efdb4
  17. 11 1月, 2011 4 次提交
  18. 12 1月, 2011 1 次提交
    • D
      xfs: introduce xfs_rw_lock() helpers for locking the inode · 487f84f3
      Dave Chinner 提交于
      We need to obtain the i_mutex, i_iolock and i_ilock during the read
      and write paths. Add a set of wrapper functions to neatly
      encapsulate the lock ordering and shared/exclusive semantics to make
      the locking easier to follow and get right.
      
      Note that this changes some of the exclusive locking serialisation in
      that serialisation will occur against the i_mutex instead of the
      XFS_IOLOCK_EXCL. This does not change any behaviour, and it is
      arguably more efficient to use the mutex for such serialisation than
      the rw_sem.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      487f84f3