1. 28 6月, 2017 1 次提交
  2. 20 6月, 2017 1 次提交
    • D
      xfs: remove double-underscore integer types · c8ce540d
      Darrick J. Wong 提交于
      This is a purely mechanical patch that removes the private
      __{u,}int{8,16,32,64}_t typedefs in favor of using the system
      {u,}int{8,16,32,64}_t typedefs.  This is the sed script used to perform
      the transformation and fix the resulting whitespace and indentation
      errors:
      
      s/typedef\t__uint8_t/typedef __uint8_t\t/g
      s/typedef\t__uint/typedef __uint/g
      s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
      s/__uint8_t\t/__uint8_t\t\t/g
      s/__uint/uint/g
      s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
      s/__int/int/g
      /^typedef.*int[0-9]*_t;$/d
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c8ce540d
  3. 26 4月, 2017 1 次提交
    • E
      xfs: more do_div cleanups · 4f1adf33
      Eric Sandeen 提交于
      On some architectures do_div does the pointer compare
      trick to make sure that we've sent it an unsigned 64-bit
      number.  (Why unsigned?  I don't know.)
      
      Fix up the few places that squawk about this; in
      xfs_bmap_wants_extents() we just used a bare int64_t so change
      that to unsigned.
      
      In xfs_adjust_extent_unmap_boundaries() all we wanted was the
      mod, and we have an xfs-specific function to handle that w/o
      side effects, which includes proper casting for do_div.
      
      In xfs_daddr_to_ag[b]no, we were using the wrong type anyway;
      XFS_BB_TO_FSBT returns a block in the filesystem, so use
      xfs_rfsblock_t not xfs_daddr_t, and gain the unsignedness
      from that type as a bonus.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      4f1adf33
  4. 04 4月, 2017 1 次提交
    • B
      xfs: use dedicated log worker wq to avoid deadlock with cil wq · 696a5620
      Brian Foster 提交于
      The log covering background task used to be part of the xfssyncd
      workqueue. That workqueue was removed as of commit 5889608d ("xfs:
      syncd workqueue is no more") and the associated work item scheduled
      to the xfs-log wq. The latter is used for log buffer I/O completion.
      
      Since xfs_log_worker() can invoke a log flush, a deadlock is
      possible between the xfs-log and xfs-cil workqueues. Consider the
      following codepath from xfs_log_worker():
      
      xfs_log_worker()
        xfs_log_force()
          _xfs_log_force()
            xlog_cil_force()
              xlog_cil_force_lsn()
                xlog_cil_push_now()
                  flush_work()
      
      The above is in xfs-log wq context and blocked waiting on the
      completion of an xfs-cil work item. Concurrently, the cil push in
      progress can end up blocked here:
      
      xlog_cil_push_work()
        xlog_cil_push()
          xlog_write()
            xlog_state_get_iclog_space()
              xlog_wait(&log->l_flush_wait, ...)
      
      The above is in xfs-cil context waiting on log buffer I/O
      completion, which executes in xfs-log wq context. In this scenario
      both workqueues are deadlocked waiting on eachother.
      
      Add a new workqueue specifically for the high level log covering and
      ail pushing worker, as was the case prior to commit 5889608d.
      Diagnosed-by: NDavid Jeffery <djeffery@redhat.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      696a5620
  5. 17 2月, 2017 1 次提交
    • B
      xfs: resurrect debug mode drop buffered writes mechanism · 9dbddd7b
      Brian Foster 提交于
      A debug mode write failure mechanism was introduced to XFS in commit
      801cc4e1 ("xfs: debug mode forced buffered write failure") to
      facilitate targeted testing of delalloc indirect reservation management
      from userspace. This code was subsequently rendered ineffective by the
      move to iomap based buffered writes in commit 68a9f5e7 ("xfs:
      implement iomap based buffered write path"). This likely went unnoticed
      because the associated userspace code had not made it into xfstests.
      
      Resurrect this mechanism to facilitate effective indlen reservation
      testing from xfstests. The move to iomap based buffered writes relocated
      the hook this mechanism needs to return write failure from XFS to
      generic code. The failure trigger must remain in XFS. Given that
      limitation, convert this from a write failure mechanism to one that
      simply drops writes without returning failure to userspace. Rename all
      "fail_writes" references to "drop_writes" to illustrate the point. This
      is more hacky than preferred, but still triggers the XFS error handling
      behavior required to drive the indlen tests. This is only available in
      DEBUG mode and for testing purposes only.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      9dbddd7b
  6. 10 2月, 2017 1 次提交
  7. 25 1月, 2017 1 次提交
    • C
      xfs: use per-AG reservations for the finobt · 76d771b4
      Christoph Hellwig 提交于
      Currently we try to rely on the global reserved block pool for block
      allocations for the free inode btree, but I have customer reports
      (fairly complex workload, need to find an easier reproducer) where that
      is not enough as the AG where we free an inode that requires a new
      finobt block is entirely full.  This causes us to cancel a dirty
      transaction and thus a file system shutdown.
      
      I think the right way to guard against this is to treat the finot the same
      way as the refcount btree and have a per-AG reservations for the possible
      worst case size of it, and the patch below implements that.
      
      Note that this could increase mount times with large finobt trees.  In
      an ideal world we would have added a field for the number of finobt
      fields to the AGI, similar to what we did for the refcount blocks.
      We should do add it next time we rev the AGI or AGF format by adding
      new fields.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      76d771b4
  8. 07 12月, 2016 1 次提交
    • L
      xfs: use rhashtable to track buffer cache · 6031e73a
      Lucas Stach 提交于
      On filesystems with a lot of metadata and in metadata intensive workloads
      xfs_buf_find() is showing up at the top of the CPU cycles trace. Most of
      the CPU time is spent on CPU cache misses while traversing the rbtree.
      
      As the buffer cache does not need any kind of ordering, but fast lookups
      a hashtable is the natural data structure to use. The rhashtable
      infrastructure provides a self-scaling hashtable implementation and
      allows lookups to proceed while the table is going through a resize
      operation.
      
      This reduces the CPU-time spent for the lookups to 1/3 even for small
      filesystems with a relatively small number of cached buffers, with
      possibly much larger gains on higher loaded filesystems.
      
      [dchinner: reduce minimum hash size to an acceptable size for large
      	   filesystems with many AGs with no active use.]
      [dchinner: remove stale rbtree asserts.]
      [dchinner: use xfs_buf_map for compare function argument.]
      [dchinner: make functions static.]
      [dchinner: remove redundant comments.]
      Signed-off-by: NLucas Stach <dev@lynxeye.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      6031e73a
  9. 06 10月, 2016 1 次提交
    • D
      xfs: garbage collect old cowextsz reservations · 83104d44
      Darrick J. Wong 提交于
      Trim CoW reservations made on behalf of a cowextsz hint if they get too
      old or we run low on quota, so long as we don't have dirty data awaiting
      writeback or directio operations in progress.
      
      Garbage collection of the cowextsize extents are kept separate from
      prealloc extent reaping because setting the CoW prealloc lifetime to a
      (much) higher value than the regular prealloc extent lifetime has been
      useful for combatting CoW fragmentation on VM hosts where the VMs
      experience bursty write behaviors and we can keep the utilization ratios
      low enough that we don't start to run out of space.  IOWs, it benefits
      us to keep the CoW fork reservations around for as long as we can unless
      we run out of blocks or hit inode reclaim.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      83104d44
  10. 04 10月, 2016 2 次提交
  11. 19 9月, 2016 1 次提交
    • D
      xfs: set up per-AG free space reservations · 3fd129b6
      Darrick J. Wong 提交于
      One unfortunate quirk of the reference count and reverse mapping
      btrees -- they can expand in size when blocks are written to *other*
      allocation groups if, say, one large extent becomes a lot of tiny
      extents.  Since we don't want to start throwing errors in the middle
      of CoWing, we need to reserve some blocks to handle future expansion.
      The transaction block reservation counters aren't sufficient here
      because we have to have a reserve of blocks in every AG, not just
      somewhere in the filesystem.
      
      Therefore, create two per-AG block reservation pools.  One feeds the
      AGFL so that rmapbt expansion always succeeds, and the other feeds all
      other metadata so that refcountbt expansion never fails.
      
      Use the count of how many reserved blocks we need to have on hand to
      create a virtual reservation in the AG.  Through selective clamping of
      the maximum length of allocation requests and of the length of the
      longest free extent, we can make it look like there's less free space
      in the AG unless the reservation owner is asking for blocks.
      
      In other words, play some accounting tricks in-core to make sure that
      we always have blocks available.  On the plus side, there's nothing to
      clean up if we crash, which is contrast to the strategy that the rough
      draft used (actually removing extents from the freespace btrees).
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      3fd129b6
  12. 14 9月, 2016 1 次提交
    • E
      xfs: normalize "infinite" retries in error configs · 77169812
      Eric Sandeen 提交于
      As it stands today, the "fail immediately" vs. "retry forever"
      values for max_retries and retry_timeout_seconds in the xfs metadata
      error configurations are not consistent.
      
      A retry_timeout_seconds of 0 means "retry forever," but a
      max_retries of 0 means "fail immediately."
      
      retry_timeout_seconds < 0 is disallowed, while max_retries == -1
      means "retry forever."
      
      Make this consistent across the error configs, such that a value of
      0 means "fail immediately" (i.e. wait 0 seconds, or retry 0 times),
      and a value of -1 always means "retry forever."
      
      This makes retry_timeout a signed long to accommodate the -1, even
      though it stores jiffies.  Given our limit of a 1 day maximum
      timeout, this should be sufficient even at much higher HZ values
      than we have available today.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      77169812
  13. 03 8月, 2016 3 次提交
    • D
      xfs: rmap btree requires more reserved free space · 52548852
      Darrick J. Wong 提交于
      Originally-From: Dave Chinner <dchinner@redhat.com>
      
      The rmap btree is allocated from the AGFL, which means we have to
      ensure ENOSPC is reported to userspace before we run out of free
      space in each AG. The last allocation in an AG can cause a full
      height rmap btree split, and that means we have to reserve at least
      this many blocks *in each AG* to be placed on the AGFL at ENOSPC.
      Update the various space calculation functions to handle this.
      
      Also, because the macros are now executing conditional code and are
      called quite frequently, convert them to functions that initialise
      variables in the struct xfs_mount, use the new variables everywhere
      and document the calculations better.
      
      [darrick.wong@oracle.com: don't reserve blocks if !rmap]
      [dchinner@redhat.com: update m_ag_max_usable after growfs]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      52548852
    • D
      xfs: define the on-disk rmap btree format · 035e00ac
      Darrick J. Wong 提交于
      Originally-From: Dave Chinner <dchinner@redhat.com>
      
      Now we have all the surrounding call infrastructure in place, we can
      start filling out the rmap btree implementation. Start with the
      on-disk btree format; add everything needed to read, write and
      manipulate rmap btree blocks. This prepares the way for adding the
      btree operations implementation.
      
      [darrick: record owner and offset info in rmap btree]
      [darrick: fork, bmbt and unwritten state in rmap btree]
      [darrick: flags are a separate field in xfs_rmap_irec]
      [darrick: calculate maxlevels separately]
      [darrick: move the 'unwritten' bit into unused parts of rm_offset]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      035e00ac
    • D
      xfs: rmap btree add more reserved blocks · 8018026e
      Darrick J. Wong 提交于
      Originally-From: Dave Chinner <dchinner@redhat.com>
      
      XFS reserves a small amount of space in each AG for the minimum
      number of free blocks needed for operation. Adding the rmap btree
      increases the number of reserved blocks, but it also increases the
      complexity of the calculation as the free inode btree is optional
      (like the rmbt).
      
      Rather than calculate the prealloc blocks every time we need to
      check it, add a function to calculate it at mount time and store it
      in the struct xfs_mount, and convert the XFS_PREALLOC_BLOCKS macro
      just to use the xfs-mount variable directly.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      8018026e
  14. 18 5月, 2016 6 次提交
  15. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  16. 15 3月, 2016 1 次提交
    • B
      xfs: debug mode forced buffered write failure · 801cc4e1
      Brian Foster 提交于
      Add a DEBUG mode-only sysfs knob to enable forced buffered write
      failure. An additional side effect of this mode is brute force killing
      of delayed allocation blocks in the range of the write. The latter is
      the prime motiviation behind this patch, as userspace test
      infrastructure requires a reliable mechanism to create and split
      delalloc extents without causing extent conversion.
      
      Certain fallocate operations (i.e., zero range) were used for this in
      the past, but the implementations have changed such that delalloc
      extents are flushed and converted to real blocks, rendering the test
      useless.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      801cc4e1
  17. 02 3月, 2016 1 次提交
    • E
      xfs: fix up inode32/64 (re)mount handling · 12c3f05c
      Eric Sandeen 提交于
      inode32/inode64 allocator behavior with respect to mount, remount
      and growfs is a little tricky.
      
      The inode32 mount option should only enable the inode32 allocator
      heuristics if the filesystem is large enough for 64-bit inodes to
      exist.  Today, it has this behavior on the initial mount, but a
      remount with inode32 unconditionally changes the allocation
      heuristics, even for a small fs.
      
      Also, an inode32 mounted small filesystem should transition to the
      inode32 allocator if the filesystem is subsequently grown to a
      sufficient size.  Today that does not happen.
      
      This patch consolidates xfs_set_inode32 and xfs_set_inode64 into a
      single new function, and moves the "is the maximum inode number big
      enough to matter" test into that function, so it doesn't rely on the
      caller to get it right - which remount did not do, previously.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      12c3f05c
  18. 08 2月, 2016 1 次提交
  19. 03 11月, 2015 2 次提交
    • D
      xfs: don't leak uuid table on rmmod · af3b6382
      Darrick J. Wong 提交于
      Don't leak the UUID table when the module is unloaded.
      (Found with kmemleak.)
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      af3b6382
    • D
      xfs: introduce BMAPI_ZERO for allocating zeroed extents · 3fbbbea3
      Dave Chinner 提交于
      To enable DAX to do atomic allocation of zeroed extents, we need to
      drive the block zeroing deep into the allocator. Because
      xfs_bmapi_write() can return merged extents on allocation that were
      only partially allocated (i.e. requested range spans allocated and
      hole regions, allocation into the hole was contiguous), we cannot
      zero the extent returned from xfs_bmapi_write() as that can
      overwrite existing data with zeros.
      
      Hence we have to drive the extent zeroing into the allocation code,
      prior to where we merge the extents into the BMBT and return the
      resultant map. This means we need to propagate this need down to
      the xfs_alloc_vextent() and issue the block zeroing at this point.
      
      While this functionality is being introduced for DAX, there is no
      reason why it is specific to DAX - we can per-zero blocks during the
      allocation transaction on any type of device. It's just slow (and
      usually slower than unwritten allocation and conversion) on
      traditional block devices so doesn't tend to get used. We can,
      however, hook hardware zeroing optimisations via sb_issue_zeroout()
      to this operation, so it may be useful in future and hence the
      "allocate zeroed blocks" API needs to be implementation neutral.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      3fbbbea3
  20. 12 10月, 2015 1 次提交
    • B
      xfs: per-filesystem stats in sysfs · 225e4635
      Bill O'Donnell 提交于
      This patch implements per-filesystem stats objects in sysfs. It
      depends on the application of the previous patch series that
      develops the infrastructure to support both xfs global stats and
      xfs per-fs stats in sysfs.
      
      Stats objects are instantiated when an xfs filesystem is mounted
      and deleted on unmount. With this patch, the stats directory is
      created and populated with the familiar stats and stats_clear files.
      Example:
              /sys/fs/xfs/sda9/stats/stats
              /sys/fs/xfs/sda9/stats/stats_clear
      
      With this patch, the individual counts within the new per-fs
      stats file(s) remain at zero. Functions that use the the macros
      to increment, decrement, and add-to the per-fs stats counts will
      be covered in a separate new patch to follow this one. Note that
      the counts within the global stats file (/sys/fs/xfs/stats/stats)
      advance normally and can be cleared as it was prior to this patch.
      
      [dchinner: move setup/teardown to xfs_fs_{fill|put}_super() so
      it is down before/after any path that uses the per-mount stats. ]
      Signed-off-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      225e4635
  21. 04 6月, 2015 1 次提交
  22. 29 5月, 2015 1 次提交
    • B
      xfs: use sparse chunk alignment for min. inode allocation requirement · 066a1884
      Brian Foster 提交于
      xfs_ialloc_ag_select() iterates through the allocation groups looking
      for free inodes or free space to determine whether to allow an inode
      allocation to proceed. If no free inodes are available, it assumes that
      an AG must have an extent longer than mp->m_ialloc_blks.
      
      Sparse inode chunk support currently allows for allocations smaller than
      the traditional inode chunk size specified in m_ialloc_blks. The current
      minimum sparse allocation is set in the superblock sb_spino_align field
      at mkfs time. Create a new m_ialloc_min_blks field in xfs_mount and use
      this to represent the minimum supported allocation size for inode
      chunks. Initialize m_ialloc_min_blks at mount time based on whether
      sparse inodes are supported.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      066a1884
  23. 23 2月, 2015 7 次提交
    • D
      xfs: remove xfs_mod_incore_sb API · 964aa8d9
      Dave Chinner 提交于
      Now that there are no users of the bitfield based incore superblock
      modification API, just remove the whole damn lot of it, including
      all the bitfield definitions. This finally removes a lot of cruft
      that has been around for a long time.
      
      Credit goes to Christoph Hellwig for providing a great patch
      connecting all the dots to enale us to do this. This patch is
      derived from that work.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      964aa8d9
    • D
      xfs: replace xfs_mod_incore_sb_batched · 0bd5dded
      Dave Chinner 提交于
      Introduce helper functions for modifying fields in the superblock
      into xfs_trans.c, the only caller of xfs_mod_incore_sb_batch().  We
      can then use these directly in xfs_trans_unreserve_and_mod_sb() and
      so remove another user of the xfs_mode_incore_sb() API without
      losing any functionality or scalability of the transaction commit
      code..
      
      Based on a patch from Christoph Hellwig.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      0bd5dded
    • D
      xfs: introduce xfs_mod_frextents · bab98bbe
      Dave Chinner 提交于
      Add a new helper to modify the incore counter of free realtime
      extents. This matches the helpers used for inode and data block
      counters, and removes a significant users of the xfs_mod_incore_sb()
      interface.
      
      Based on a patch originally from Christoph Hellwig.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      bab98bbe
    • D
      xfs: Remove icsb infrastructure · 5681ca40
      Dave Chinner 提交于
      Now that the in-core superblock infrastructure has been replaced with
      generic per-cpu counters, we don't need it anymore. Nuke it from
      orbit so we are sure that it won't haunt us again...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5681ca40
    • D
      xfs: use generic percpu counters for free block counter · 0d485ada
      Dave Chinner 提交于
      XFS has hand-rolled per-cpu counters for the superblock since before
      there was any generic implementation. The free block counter is
      special in that it is used for ENOSPC detection outside transaction
      contexts for for delayed allocation. This means that the counter
      needs to be accurate at zero. The current per-cpu counter code jumps
      through lots of hoops to ensure we never run past zero, but we don't
      need to make all those jumps with the generic counter
      implementation.
      
      The generic counter implementation allows us to pass a "batch"
      threshold at which the addition/subtraction to the counter value
      will be folded back into global value under lock. We can use this
      feature to reduce the batch size as we approach 0 in a very similar
      manner to the existing counters and their rebalance algorithm. If we
      use a batch size of 1 as we approach 0, then every addition and
      subtraction will be done against the global value and hence allow
      accurate detection of zero threshold crossing.
      
      Hence we can replace the handrolled, accurate-at-zero counters with
      generic percpu counters.
      
      Note: this removes just enough of the icsb infrastructure to compile
      without warnings. The rest will go in subsequent commits.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      0d485ada
    • D
      xfs: use generic percpu counters for free inode counter · e88b64ea
      Dave Chinner 提交于
      XFS has hand-rolled per-cpu counters for the superblock since before
      there was any generic implementation. The free inode counter is not
      used for any limit enforcement - the per-AG free inode counters are
      used during allocation to determine if there are inode available for
      allocation.
      
      Hence we don't need any of the complexity of the hand-rolled
      counters and we can simply replace them with generic per-cpu
      counters similar to the inode counter.
      
      This version introduces a xfs_mod_ifree() helper function from
      Christoph Hellwig.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e88b64ea
    • D
      xfs: use generic percpu counters for inode counter · 501ab323
      Dave Chinner 提交于
      XFS has hand-rolled per-cpu counters for the superblock since before
      there was any generic implementation. There are some warts around
      the  use of them for the inode counter as the hand rolled counter is
      designed to be accurate at zero, but has no specific accurracy at
      any other value. This design causes problems for the maximum inode
      count threshold enforcement, as there is no trigger that balances
      the counters as they get close tothe maximum threshold.
      
      Instead of designing new triggers for balancing, just replace the
      handrolled per-cpu counter with a generic counter.  This enables us
      to update the counter through the normal superblock modification
      funtions, but rather than do that we add a xfs_mod_icount() helper
      function (from Christoph Hellwig) and keep the percpu counter
      outside the superblock in the struct xfs_mount.
      
      This means we still need to initialise the per-cpu counter
      specifically when we read the superblock, and vice versa when we
      log/write it, but it does mean that we don't need to change any
      other code.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      501ab323
  24. 16 2月, 2015 1 次提交
  25. 22 1月, 2015 1 次提交
    • D
      xfs: consolidate superblock logging functions · 61e63ecb
      Dave Chinner 提交于
      We now have several superblock loggin functions that are identical
      except for the transaction reservation and whether it shoul dbe a
      synchronous transaction or not. Consolidate these all into a single
      function, a single reserveration and a sync flag and call it
      xfs_sync_sb().
      
      Also, xfs_mod_sb() is not really a modification function - it's the
      operation of logging the superblock buffer. hence change the name of
      it to reflect this.
      
      Note that we have to change the mp->m_update_flags that are passed
      around at mount time to a boolean simply to indicate a superblock
      update is needed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      61e63ecb