1. 06 5月, 2011 1 次提交
  2. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  3. 03 4月, 2009 1 次提交
  4. 25 3月, 2009 2 次提交
    • C
      Btrfs: leave btree locks spinning more often · b9473439
      Chris Mason 提交于
      btrfs_mark_buffer dirty would set dirty bits in the extent_io tree
      for the buffers it was dirtying.  This may require a kmalloc and it
      was not atomic.  So, anyone who called btrfs_mark_buffer_dirty had to
      set any btree locks they were holding to blocking first.
      
      This commit changes dirty tracking for extent buffers to just use a flag
      in the extent buffer.  Now that we have one and only one extent buffer
      per page, this can be safely done without losing dirty bits along the way.
      
      This also introduces a path->leave_spinning flag that callers of
      btrfs_search_slot can use to indicate they will properly deal with a
      path returned where all the locks are spinning instead of blocking.
      
      Many of the btree search callers now expect spinning paths,
      resulting in better btree concurrency overall.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b9473439
    • C
      Btrfs: Check for a blocking lock before taking the spin · 66d7e85e
      Chris Mason 提交于
      This reduces contention on the extent buffer spin locks by testing for a
      blocking lock before trying to take the spinlock.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      66d7e85e
  5. 09 3月, 2009 1 次提交
    • C
      Btrfs: fix spinlock assertions on UP systems · b9447ef8
      Chris Mason 提交于
      btrfs_tree_locked was being used to make sure a given extent_buffer was
      properly locked in a few places.  But, it wasn't correct for UP compiled
      kernels.
      
      This switches it to using assert_spin_locked instead, and renames it to
      btrfs_assert_tree_locked to better reflect how it was really being used.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b9447ef8
  6. 13 2月, 2009 1 次提交
    • C
      Btrfs: make a lockdep class for the extent buffer locks · 4008c04a
      Chris Mason 提交于
      Btrfs is currently using spin_lock_nested with a nested value based
      on the tree depth of the block.  But, this doesn't quite work because
      the max tree depth is bigger than what spin_lock_nested can deal with,
      and because locks are sometimes taken before the level field is filled in.
      
      The solution here is to use lockdep_set_class_and_name instead, and to
      set the class before unlocking the pages when the block is read from the
      disk and just after init of a freshly allocated tree block.
      
      btrfs_clear_path_blocking is also changed to take the locks in the proper
      order, and it also makes sure all the locks currently held are properly
      set to blocking before it tries to retake the spinlocks.  Otherwise, lockdep
      gets upset about bad lock orderin.
      
      The lockdep magic cam from Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      4008c04a
  7. 10 2月, 2009 1 次提交
    • C
      Btrfs: don't use spin_is_contended · 284b066a
      Chris Mason 提交于
      Btrfs was using spin_is_contended to see if it should drop locks before
      doing extent allocations during btrfs_search_slot.  The idea was to avoid
      expensive searches in the tree unless the lock was actually contended.
      
      But, spin_is_contended is specific to the ticket spinlocks on x86, so this
      is causing compile errors everywhere else.
      
      In practice, the contention could easily appear some time after we started
      doing the extent allocation, and it makes more sense to always drop the lock
      instead.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      284b066a
  8. 04 2月, 2009 1 次提交
    • C
      Btrfs: Change btree locking to use explicit blocking points · b4ce94de
      Chris Mason 提交于
      Most of the btrfs metadata operations can be protected by a spinlock,
      but some operations still need to schedule.
      
      So far, btrfs has been using a mutex along with a trylock loop,
      most of the time it is able to avoid going for the full mutex, so
      the trylock loop is a big performance gain.
      
      This commit is step one for getting rid of the blocking locks entirely.
      btrfs_tree_lock takes a spinlock, and the code explicitly switches
      to a blocking lock when it starts an operation that can schedule.
      
      We'll be able get rid of the blocking locks in smaller pieces over time.
      Tracing allows us to find the most common cause of blocking, so we
      can start with the hot spots first.
      
      The basic idea is:
      
      btrfs_tree_lock() returns with the spin lock held
      
      btrfs_set_lock_blocking() sets the EXTENT_BUFFER_BLOCKING bit in
      the extent buffer flags, and then drops the spin lock.  The buffer is
      still considered locked by all of the btrfs code.
      
      If btrfs_tree_lock gets the spinlock but finds the blocking bit set, it drops
      the spin lock and waits on a wait queue for the blocking bit to go away.
      
      Much of the code that needs to set the blocking bit finishes without actually
      blocking a good percentage of the time.  So, an adaptive spin is still
      used against the blocking bit to avoid very high context switch rates.
      
      btrfs_clear_lock_blocking() clears the blocking bit and returns
      with the spinlock held again.
      
      btrfs_tree_unlock() can be called on either blocking or spinning locks,
      it does the right thing based on the blocking bit.
      
      ctree.c has a helper function to set/clear all the locked buffers in a
      path as blocking.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b4ce94de
  9. 15 1月, 2009 1 次提交
  10. 06 1月, 2009 1 次提交
  11. 30 9月, 2008 1 次提交
    • C
      Btrfs: add and improve comments · d352ac68
      Chris Mason 提交于
      This improves the comments at the top of many functions.  It didn't
      dive into the guts of functions because I was trying to
      avoid merging problems with the new allocator and back reference work.
      
      extent-tree.c and volumes.c were both skipped, and there is definitely
      more work todo in cleaning and commenting the code.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      d352ac68
  12. 25 9月, 2008 7 次提交
    • C
      btrfs_search_slot: reduce lock contention by cowing in two stages · 65b51a00
      Chris Mason 提交于
      A btree block cow has two parts, the first is to allocate a destination
      block and the second is to copy the old bock over.
      
      The first part needs locks in the extent allocation tree, and may need to
      do IO.  This changeset splits that into a separate function that can be
      called without any tree locks held.
      
      btrfs_search_slot is changed to drop its path and start over if it has
      to COW a contended block.  This often means that many writers will
      pre-alloc a new destination for a the same contended block, but they
      cache their prealloc for later use on lower levels in the tree.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      65b51a00
    • Y
      Btrfs: implement memory reclaim for leaf reference cache · bcc63abb
      Yan 提交于
      The memory reclaiming issue happens when snapshot exists. In that
      case, some cache entries may not be used during old snapshot dropping,
      so they will remain in the cache until umount.
      
      The patch adds a field to struct btrfs_leaf_ref to record create time. Besides,
      the patch makes all dead roots of a given snapshot linked together in order of
      create time. After a old snapshot was completely dropped, we check the dead
      root list and remove all cache entries created before the oldest dead root in
      the list.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      bcc63abb
    • C
    • C
      Btrfs: Use mutex_lock_nested for tree locking · 6dddcbeb
      Chris Mason 提交于
      Lockdep has the notion of locking subclasses so that you can identify
      locks you expect to be taken after other locks of the same class.  This
      changes the per-extent buffer btree locking routines to use a subclass based
      on the level in the tree.
      
      Unfortunately, lockdep can only handle 8 total subclasses, and the btrfs
      max level is also 8.  So when lockdep is on, use a lower max level.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      6dddcbeb
    • C
      Btrfs: Use a mutex in the extent buffer for tree block locking · a61e6f29
      Chris Mason 提交于
      This replaces the use of the page cache lock bit for locking, which wasn't
      suitable for block size < page size and couldn't be used recursively.
      
      The mutexes alone don't fix either problem, but they are the first step.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a61e6f29
    • C
      Btrfs: Reduce contention on the root node · f9efa9c7
      Chris Mason 提交于
      This calls unlock_up sooner in btrfs_search_slot in order to decrease the
      amount of work done with the higher level tree locks held.
      
      Also, it changes btrfs_tree_lock to spin for a big against the page lock
      before scheduling.  This makes a big difference in context switch rate under
      highly contended workloads.
      
      Longer term, a better locking structure is needed than the page lock.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f9efa9c7
    • C
      Btrfs: Start btree concurrency work. · 925baedd
      Chris Mason 提交于
      The allocation trees and the chunk trees are serialized via their own
      dedicated mutexes.  This means allocation location is still not very
      fine grained.
      
      The main FS btree is protected by locks on each block in the btree.  Locks
      are taken top / down, and as processing finishes on a given level of the
      tree, the lock is released after locking the lower level.
      
      The end result of a search is now a path where only the lowest level
      is locked.  Releasing or freeing the path drops any locks held.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      925baedd