1. 28 1月, 2011 6 次提交
    • D
      xfs: handle CIl transaction commit failures correctly · c6f990d1
      Dave Chinner 提交于
      Failure to commit a transaction into the CIL is not handled
      correctly. This currently can only happen when racing with a
      shutdown and requires an explicit shutdown check, so it rare and can
      be avoided. Remove the shutdown check and make the CIL commit a void
      function to indicate it will always succeed, thereby removing the
      incorrectly handled failure case.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      c6f990d1
    • D
      xfs: limit extsize to size of AGs and/or MAXEXTLEN · 5315837d
      Dave Chinner 提交于
      The extent size hint can be set to larger than an AG. This means
      that the alignment process can push the range to be allocated
      outside the bounds of the AG, resulting in assert failures or
      corrupted bmbt records. Similarly, if the extsize is larger than the
      maximum extent size supported, the alignment process will produce
      extents that are too large to fit into the bmbt records, resulting
      in a different type of assert/corruption failure.
      
      Fix this by limiting extsize at the time іt is set firstly to be
      less than MAXEXTLEN, then to be a maximum of half the size of the
      AGs in the filesystem for non-realtime inodes. Realtime inodes do
      not allocate out of AGs, so don't have to be restricted by the size
      of AGs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      5315837d
    • D
      xfs: prevent extsize alignment from exceeding maximum extent size · 4ce15989
      Dave Chinner 提交于
      When doing delayed allocation, if the allocation size is for a
      maximally sized extent, extent size alignment can push it over this
      limit. This results in an assert failure in xfs_bmbt_set_allf() as
      the extent length is too large to find in the extent record.
      
      Fix this by ensuring that we allow for space that extent size
      alignment requires (up to 2 * (extsize -1) blocks as we have to
      handle both head and tail alignment) when limiting the maximum size
      of the extent.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      4ce15989
    • D
      xfs: limit extent length for allocation to AG size · 14b064ce
      Dave Chinner 提交于
      Delayed allocation extents can be larger than AGs, so when trying to
      convert a large range we may scan every AG inside
      xfs_bmap_alloc_nullfb() trying to find an AG with a size larger than
      an AG. We should stop when we find the first AG with a maximum
      possible allocation size. This causes excessive CPU usage when there
      are lots of AGs.
      
      The same problem occurs when doing preallocation of a range larger
      than an AG.
      
      Fix the problem by limiting real allocation lengths to the maximum
      that an AG can support. This means if we have empty AGs, we'll stop
      the search at the first of them. If there are no empty AGs, we'll
      still scan them all, but that is a different problem....
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      14b064ce
    • D
      xfs: speculative delayed allocation uses rounddown_power_of_2 badly · b8fc8263
      Dave Chinner 提交于
      rounddown_power_of_2() returns an undefined result when passed a
      value of zero. The specualtive delayed allocation code is doing this
      when the inode is zero length. Hence occasionally the preallocation
      is much, much larger than is necessary (e.g. 8GB for a 270 _byte_
      file). Ensure we don't even pass a zero value to this function so
      the result of preallocation is always the desired size.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      b8fc8263
    • D
      xfs: fix efi item leak on forced shutdown · e34a314c
      Dave Chinner 提交于
      After test 139, kmemleak shows:
      
      unreferenced object 0xffff880078b405d8 (size 400):
        comm "xfs_io", pid 4904, jiffies 4294909383 (age 1186.728s)
        hex dump (first 32 bytes):
          60 c1 17 79 00 88 ff ff 60 c1 17 79 00 88 ff ff  `..y....`..y....
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81afb04d>] kmemleak_alloc+0x2d/0x60
          [<ffffffff8115c6cf>] kmem_cache_alloc+0x13f/0x2b0
          [<ffffffff814aaa97>] kmem_zone_alloc+0x77/0xf0
          [<ffffffff814aab2e>] kmem_zone_zalloc+0x1e/0x50
          [<ffffffff8147cd6b>] xfs_efi_init+0x4b/0xb0
          [<ffffffff814a4ee8>] xfs_trans_get_efi+0x58/0x90
          [<ffffffff81455fab>] xfs_bmap_finish+0x8b/0x1d0
          [<ffffffff814851b4>] xfs_itruncate_finish+0x2c4/0x5d0
          [<ffffffff814a970f>] xfs_setattr+0x8df/0xa70
          [<ffffffff814b5c7b>] xfs_vn_setattr+0x1b/0x20
          [<ffffffff8117dc00>] notify_change+0x170/0x2e0
          [<ffffffff81163bf6>] do_truncate+0x66/0xa0
          [<ffffffff81163d0b>] sys_ftruncate+0xdb/0xe0
          [<ffffffff8103a002>] system_call_fastpath+0x16/0x1b
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      The cause of the leak is that the "remove" parameter of IOP_UNPIN()
      is never set when a CIL push is aborted. This means that the EFI
      item is never freed if it was in the push being cancelled. The
      problem is specific to delayed logging, but has uncovered a couple
      of problems with the handling of IOP_UNPIN(remove).
      
      Firstly, we cannot safely call xfs_trans_del_item() from IOP_UNPIN()
      in the CIL commit failure path or the iclog write failure path
      because for delayed loging we have no transaction context. Hence we
      must only call xfs_trans_del_item() if the log item being unpinned
      has an active log item descriptor.
      
      Secondly, xfs_trans_uncommit() does not handle log item descriptor
      freeing during the traversal of log items on a transaction. It can
      reference a freed log item descriptor when unpinning an EFI item.
      Hence it needs to use a safe list traversal method to allow items to
      be removed from the transaction during IOP_UNPIN().
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      e34a314c
  2. 27 1月, 2011 1 次提交
    • D
      xfs: fix log ticket leak on forced shutdown. · 7db37c5e
      Dave Chinner 提交于
      The kmemleak detector shows this after test 139:
      
      unreferenced object 0xffff880079b88bb0 (size 264):
        comm "xfs_io", pid 4904, jiffies 4294909382 (age 276.824s)
        hex dump (first 32 bytes):
          00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
          ff ff ff ff ff ff ff ff 48 7b c9 82 ff ff ff ff  ........H{......
        backtrace:
          [<ffffffff81afb04d>] kmemleak_alloc+0x2d/0x60
          [<ffffffff8115c6cf>] kmem_cache_alloc+0x13f/0x2b0
          [<ffffffff814aaa97>] kmem_zone_alloc+0x77/0xf0
          [<ffffffff814aab2e>] kmem_zone_zalloc+0x1e/0x50
          [<ffffffff8148f394>] xlog_ticket_alloc+0x34/0x170
          [<ffffffff81494444>] xlog_cil_push+0xa4/0x3f0
          [<ffffffff81494eca>] xlog_cil_force_lsn+0x15a/0x160
          [<ffffffff814933a5>] _xfs_log_force_lsn+0x75/0x2d0
          [<ffffffff814a264d>] _xfs_trans_commit+0x2bd/0x2f0
          [<ffffffff8148bfdd>] xfs_iomap_write_allocate+0x1ad/0x350
          [<ffffffff814ac17f>] xfs_map_blocks+0x21f/0x370
          [<ffffffff814ad1b7>] xfs_vm_writepage+0x1c7/0x550
          [<ffffffff8112200a>] __writepage+0x1a/0x50
          [<ffffffff81122df2>] write_cache_pages+0x1c2/0x4c0
          [<ffffffff81123117>] generic_writepages+0x27/0x30
          [<ffffffff814aba5d>] xfs_vm_writepages+0x5d/0x80
      
      By inspection, the leak occurs when xlog_write() returns and error
      and we jump to the abort path without dropping the reference on the
      active ticket.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      7db37c5e
  3. 18 1月, 2011 1 次提交
  4. 17 1月, 2011 2 次提交
    • C
      fallocate should be a file operation · 2fe17c10
      Christoph Hellwig 提交于
      Currently all filesystems except XFS implement fallocate asynchronously,
      while XFS forced a commit.  Both of these are suboptimal - in case of O_SYNC
      I/O we really want our allocation on disk, especially for the !KEEP_SIZE
      case where we actually grow the file with user-visible zeroes.  On the
      other hand always commiting the transaction is a bad idea for fast-path
      uses of fallocate like for example in recent Samba versions.   Given
      that block allocation is a data plane operation anyway change it from
      an inode operation to a file operation so that we have the file structure
      available that lets us check for O_SYNC.
      
      This also includes moving the code around for a few of the filesystems,
      and remove the already unnedded S_ISDIR checks given that we only wire
      up fallocate for regular files.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2fe17c10
    • C
      make the feature checks in ->fallocate future proof · 64c23e86
      Christoph Hellwig 提交于
      Instead of various home grown checks that might need updates for new
      flags just check for any bit outside the mask of the features supported
      by the filesystem.  This makes the check future proof for any newly
      added flag.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      64c23e86
  5. 13 1月, 2011 1 次提交
  6. 12 1月, 2011 6 次提交
    • D
      xfs: prevent NMI timeouts in cmn_err · 73efe4a4
      Dave Chinner 提交于
      We currently have a global error message buffer in cmn_err that is
      protected by a spin lock that disables interrupts.  Recently there
      have been reports of NMI timeouts occurring when the console is
      being flooded by SCSI error reports due to cmn_err() getting stuck
      trying to print to the console while holding this lock (i.e. with
      interrupts disabled). The NMI watchdog is seeing this CPU as
      non-responding and so is triggering a panic.  While the trigger for
      the reported case is SCSI errors, pretty much anything that spams
      the kernel log could cause this to occur.
      
      Realistically the only reason that we have the intemediate message
      buffer is to prepend the correct kernel log level prefix to the log
      message. The only reason we have the lock is to protect the global
      message buffer and the only reason the message buffer is global is
      to keep it off the stack. Hence if we can avoid needing a global
      message buffer we avoid needing the lock, and we can do this with a
      small amount of cleanup and some preprocessor tricks:
      
      	1. clean up xfs_cmn_err() panic mask functionality to avoid
      	   needing debug code in xfs_cmn_err()
      	2. remove the couple of "!" message prefixes that still exist that
      	   the existing cmn_err() code steps over.
      	3. redefine CE_* levels directly to KERN_*
      	4. redefine cmn_err() and friends to use printk() directly
      	   via variable argument length macros.
      
      By doing this, we can completely remove the cmn_err() code and the
      lock that is causing the problems, and rely solely on printk()
      serialisation to ensure that we don't get garbled messages.
      
      A series of followup patches is really needed to clean up all the
      cmn_err() calls and related messages properly, but that results in a
      series that is not easily back portable to enterprise kernels. Hence
      this initial fix is only to address the direct problem in the lowest
      impact way possible.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      73efe4a4
    • A
      xfs: Add log level to assertion printk · 65a84a0f
      Anton Blanchard 提交于
      I received a ppc64 bug report involving xfs but the assertion was
      filtered out by the console log level. Use KERN_CRIT to ensure it
      makes it out.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      65a84a0f
    • J
      xfs: fix an assignment within an ASSERT() · 1884bd83
      Jesper Juhl 提交于
      In fs/xfs/xfs_trans.c::xfs_trans_unreserve_and_mod_sb() at the out:
      label we have this:
      	ASSERT(error = 0);
      I believe a comparison was intended, not an assignment. If I'm
      right, the patch below fixes that up.
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      1884bd83
    • C
      xfs: fix error handling for synchronous writes · bfc60177
      Christoph Hellwig 提交于
      If we get an IO error on a synchronous superblock write, we attach an
      error release function to it so that when the last reference goes away
      the release function is called and the buffer is invalidated and
      unlocked. The buffer is left locked until the release function is
      called so that other concurrent users of the buffer will be locked out
      until the buffer error is fully processed.
      
      Unfortunately, for the superblock buffer the filesyetm itself holds a
      reference to the buffer which prevents the reference count from
      dropping to zero and the release function being called. As a result,
      once an IO error occurs on a sync write, the buffer will never be
      unlocked and all future attempts to lock the buffer will hang.
      
      To make matters worse, this problems is not unique to such buffers;
      if there is a concurrent _xfs_buf_find() running, the lookup will grab
      a reference to the buffer and then wait on the buffer lock, preventing
      the reference count from ever falling to zero and hence unlocking the
      buffer.
      
      As such, the whole b_relse function implementation is broken because it
      cannot rely on the buffer reference count falling to zero to unlock the
      errored buffer. The synchronous write error path is the only path that
      uses this callback - it is used to ensure that the synchronous waiter
      gets the buffer error before the error state is cleared from the buffer
      by the release function.
      
      Given that the only sychronous buffer writes now go through xfs_bwrite
      and the error path in question can only occur for a write of a dirty,
      logged buffer, we can move most of the b_relse processing to happen
      inline in xfs_buf_iodone_callbacks, just like a normal I/O completion.
      In addition to that we make sure the error is not cleared in
      xfs_buf_iodone_callbacks, so that xfs_bwrite can reliably check it.
      Given that xfs_bwrite keeps the buffer locked until it has waited for
      it and checked the error this allows to reliably propagate the error
      to the caller, and make sure that the buffer is reliably unlocked.
      
      Given that xfs_buf_iodone_callbacks was the only instance of the
      b_relse callback we can remove it entirely.
      
      Based on earlier patches by Dave Chinner and Ajeet Yadav.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reported-by: NAjeet Yadav <ajeet.yadav.77@gmail.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      bfc60177
    • C
      xfs: add FITRIM support · a46db608
      Christoph Hellwig 提交于
      Allow manual discards from userspace using the FITRIM ioctl.  This is not
      intended to be run during normal workloads, as the freepsace btree walks
      can cause large performance degradation.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      a46db608
    • D
      xfs: ensure log covering transactions are synchronous · c58efdb4
      Dave Chinner 提交于
      To ensure the log is covered and the filesystem idles correctly, we
      need to ensure that dummy transactions hit the disk and do not stay
      pinned in memory.  If the superblock is pinned in memory, it can't
      be flushed so the log covering cannot make progress. The result is
      dependent on timing - more oftent han not we continue to issues a
      log covering transaction every 36s rather than idling after ~90s.
      
      Fix this by making the log covering transaction synchronous. To
      avoid additional log force from xfssyncd, make the log covering
      transaction take the place of the existing log force in the xfssyncd
      background sync process.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      c58efdb4
  7. 11 1月, 2011 4 次提交
  8. 12 1月, 2011 1 次提交
    • D
      xfs: introduce xfs_rw_lock() helpers for locking the inode · 487f84f3
      Dave Chinner 提交于
      We need to obtain the i_mutex, i_iolock and i_ilock during the read
      and write paths. Add a set of wrapper functions to neatly
      encapsulate the lock ordering and shared/exclusive semantics to make
      the locking easier to follow and get right.
      
      Note that this changes some of the exclusive locking serialisation in
      that serialisation will occur against the i_mutex instead of the
      XFS_IOLOCK_EXCL. This does not change any behaviour, and it is
      arguably more efficient to use the mutex for such serialisation than
      the rw_sem.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      487f84f3
  9. 11 1月, 2011 3 次提交
  10. 07 1月, 2011 3 次提交
    • N
      xfs: provide simple rcu-walk ACL implementation · 880566e1
      Nick Piggin 提交于
      This simple implementation just checks for no ACLs on the inode, and
      if so, then the rcu-walk may proceed, otherwise fail it.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      880566e1
    • N
      fs: provide rcu-walk aware permission i_ops · b74c79e9
      Nick Piggin 提交于
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b74c79e9
    • N
      fs: icache RCU free inodes · fa0d7e3d
      Nick Piggin 提交于
      RCU free the struct inode. This will allow:
      
      - Subsequent store-free path walking patch. The inode must be consulted for
        permissions when walking, so an RCU inode reference is a must.
      - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
        to take i_lock no longer need to take sb_inode_list_lock to walk the list in
        the first place. This will simplify and optimize locking.
      - Could remove some nested trylock loops in dcache code
      - Could potentially simplify things a bit in VM land. Do not need to take the
        page lock to follow page->mapping.
      
      The downsides of this is the performance cost of using RCU. In a simple
      creat/unlink microbenchmark, performance drops by about 10% due to inability to
      reuse cache-hot slab objects. As iterations increase and RCU freeing starts
      kicking over, this increases to about 20%.
      
      In cases where inode lifetimes are longer (ie. many inodes may be allocated
      during the average life span of a single inode), a lot of this cache reuse is
      not applicable, so the regression caused by this patch is smaller.
      
      The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
      however this adds some complexity to list walking and store-free path walking,
      so I prefer to implement this at a later date, if it is shown to be a win in
      real situations. I haven't found a regression in any non-micro benchmark so I
      doubt it will be a problem.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fa0d7e3d
  11. 21 12月, 2010 2 次提交
    • D
      xfs: convert grant head manipulations to lockless algorithm · d0eb2f38
      Dave Chinner 提交于
      The only thing that the grant lock remains to protect is the grant head
      manipulations when adding or removing space from the log. These calculations
      are already based on atomic variables, so we can already update them safely
      without locks. However, the grant head manpulations require atomic multi-step
      calculations to be executed, which the algorithms currently don't allow.
      
      To make these multi-step calculations atomic, convert the algorithms to
      compare-and-exchange loops on the atomic variables. That is, we sample the old
      value, perform the calculation and use atomic64_cmpxchg() to attempt to update
      the head with the new value. If the head has not changed since we sampled it,
      it will succeed and we are done. Otherwise, we rerun the calculation again from
      a new sample of the head.
      
      This allows us to remove the grant lock from around all the grant head space
      manipulations, and that effectively removes the grant lock from the log
      completely. Hence we can remove the grant lock completely from the log at this
      point.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      d0eb2f38
    • D
      xfs: introduce new locks for the log grant ticket wait queues · 3f16b985
      Dave Chinner 提交于
      The log grant ticket wait queues are currently protected by the log
      grant lock.  However, the queues are functionally independent from
      each other, and operations on them only require serialisation
      against other queue operations now that all of the other log
      variables they use are atomic values.
      
      Hence, we can make them independent of the grant lock by introducing
      new locks just to protect the lists operations. because the lists
      are independent, we can use a lock per list and ensure that reserve
      and write head queuing do not contend.
      
      To ensure forced shutdowns work correctly in conjunction with the
      new fast paths, ensure that we check whether the log has been shut
      down in the grant functions once we hold the relevant spin locks but
      before we go to sleep. This is needed to co-ordinate correctly with
      the wakeups that are issued on the ticket queues so we don't leave
      any processes sleeping on the queues during a shutdown.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      3f16b985
  12. 15 12月, 2010 1 次提交
    • T
      workqueue: convert cancel_rearming_delayed_work[queue]() users to cancel_delayed_work_sync() · afe2c511
      Tejun Heo 提交于
      cancel_rearming_delayed_work[queue]() has been superceded by
      cancel_delayed_work_sync() quite some time ago.  Convert all the
      in-kernel users.  The conversions are completely equivalent and
      trivial.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
      Acked-by: NEvgeniy Polyakov <zbr@ioremap.net>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: netdev@vger.kernel.org
      Cc: Anton Vorontsov <cbou@mail.ru>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: xfs-masters@oss.sgi.com
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: netfilter-devel@vger.kernel.org
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: linux-nfs@vger.kernel.org
      afe2c511
  13. 10 12月, 2010 1 次提交
    • C
      xfs: log timestamp changes to the source inode in rename · 05340d4a
      Christoph Hellwig 提交于
      Now that we don't mark VFS inodes dirty anymore for internal
      timestamp changes, but rely on the transaction subsystem to push
      them out, we need to explicitly log the source inode in rename after
      updating it's timestamps to make sure the changes actually get
      forced out by sync/fsync or an AIL push.
      
      We already account for the fourth inode in the log reservation, as a
      rename of directories needs to update the nlink field, so just
      adding the xfs_trans_log_inode call is enough.
      
      This fixes the xfsqa 065 regression introduced by:
      
      	"xfs: don't use vfs writeback for pure metadata modifications"
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      05340d4a
  14. 03 12月, 2010 1 次提交
  15. 21 12月, 2010 1 次提交
    • D
      xfs: convert l_tail_lsn to an atomic variable. · 1c3cb9ec
      Dave Chinner 提交于
      log->l_tail_lsn is currently protected by the log grant lock. The
      lock is only needed for serialising readers against writers, so we
      don't really need the lock if we make the l_tail_lsn variable an
      atomic. Converting the l_tail_lsn variable to an atomic64_t means we
      can start to peel back the grant lock from various operations.
      
      Also, provide functions to safely crack an atomic LSN variable into
      it's component pieces and to recombined the components into an
      atomic variable. Use them where appropriate.
      
      This also removes the need for explicitly holding a spinlock to read
      the l_tail_lsn on 32 bit platforms.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      
      1c3cb9ec
  16. 03 12月, 2010 1 次提交
    • D
      xfs: convert l_last_sync_lsn to an atomic variable · 84f3c683
      Dave Chinner 提交于
      log->l_last_sync_lsn is updated in only one critical spot - log
      buffer Io completion - and is protected by the grant lock here. This
      requires the grant lock to be taken for every log buffer IO
      completion. Converting the l_last_sync_lsn variable to an atomic64_t
      means that we do not need to take the grant lock in log buffer IO
      completion to update it.
      
      This also removes the need for explicitly holding a spinlock to read
      the l_last_sync_lsn on 32 bit platforms.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      84f3c683
  17. 21 12月, 2010 5 次提交
    • D
      xfs: make AIL tail pushing independent of the grant lock · 2ced19cb
      Dave Chinner 提交于
      The xlog_grant_push_ail() currently takes the grant lock internally to sample
      the tail lsn, last sync lsn and the reserve grant head. Most of the callers
      already hold the grant lock but have to drop it before calling
      xlog_grant_push_ail(). This is a left over from when the AIL tail pushing was
      done in line and hence xlog_grant_push_ail had to drop the grant lock. AIL push
      is now done in another thread and hence we can safely hold the grant lock over
      the entire xlog_grant_push_ail call.
      
      Push the grant lock outside of xlog_grant_push_ail() to simplify the locking
      and synchronisation needed for tail pushing.  This will reduce traffic on the
      grant lock by itself, but this is only one step in preparing for the complete
      removal of the grant lock.
      
      While there, clean up the formatting of xlog_grant_push_ail() to match the
      rest of the XFS code.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      2ced19cb
    • D
      xfs: use wait queues directly for the log wait queues · eb40a875
      Dave Chinner 提交于
      The log grant queues are one of the few places left using sv_t
      constructs for waiting. Given we are touching this code, we should
      convert them to plain wait queues. While there, convert all the
      other sv_t users in the log code as well.
      
      Seeing as this removes the last users of the sv_t type, remove the
      header file defining the wrapper and the fragments that still
      reference it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      eb40a875
    • D
      xfs: combine grant heads into a single 64 bit integer · a69ed03c
      Dave Chinner 提交于
      Prepare for switching the grant heads to atomic variables by
      combining the two 32 bit values that make up the grant head into a
      single 64 bit variable.  Provide wrapper functions to combine and
      split the grant heads appropriately for calculations and use them as
      necessary.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      a69ed03c
    • D
      xfs: rework log grant space calculations · 663e496a
      Dave Chinner 提交于
      The log grant space calculations are repeated for both write and
      reserve grant heads. To make it simpler to convert the calculations
      toa different algorithm, factor them so both the gratn heads use the
      same calculation functions. Once this is done we can drop the
      wrappers that are used in only a couple of place to update both
      grant heads at once as they don't provide any particular value.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      663e496a
    • D
      xfs: fact out common grant head/log tail verification code · 3f336c6f
      Dave Chinner 提交于
      Factor repeated debug code out of grant head manipulation functions into a
      separate function. This removes ifdef DEBUG spagetti from the code and makes
      the code easier to follow.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      3f336c6f