1. 25 9月, 2014 5 次提交
  2. 24 9月, 2014 1 次提交
    • T
      blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe · 0a30288d
      Tejun Heo 提交于
      blk-mq uses percpu_ref for its usage counter which tracks the number
      of in-flight commands and used to synchronously drain the queue on
      freeze.  percpu_ref shutdown takes measureable wallclock time as it
      involves a sched RCU grace period.  This means that draining a blk-mq
      takes measureable wallclock time.  One would think that this shouldn't
      matter as queue shutdown should be a rare event which takes place
      asynchronously w.r.t. userland.
      
      Unfortunately, SCSI probing involves synchronously setting up and then
      tearing down a lot of request_queues back-to-back for non-existent
      LUNs.  This means that SCSI probing may take more than ten seconds
      when scsi-mq is used.
      
      This will be properly fixed by implementing a mechanism to keep
      q->mq_usage_counter in atomic mode till genhd registration; however,
      that involves rather big updates to percpu_ref which is difficult to
      apply late in the devel cycle (v3.17-rc6 at the moment).  As a
      stop-gap measure till the proper fix can be implemented in the next
      cycle, this patch introduces __percpu_ref_kill_expedited() and makes
      blk_mq_freeze_queue() use it.  This is heavy-handed but should work
      for testing the experimental SCSI blk-mq implementation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NChristoph Hellwig <hch@infradead.org>
      Link: http://lkml.kernel.org/g/20140919113815.GA10791@lst.de
      Fixes: add703fd ("blk-mq: use percpu_ref for mq usage count")
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Tested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      0a30288d
  3. 20 9月, 2014 2 次提交
    • T
      percpu-refcount: make percpu_ref based on longs instead of ints · e625305b
      Tejun Heo 提交于
      percpu_ref is currently based on ints and the number of refs it can
      cover is (1 << 31).  This makes it impossible to use a percpu_ref to
      count memory objects or pages on 64bit machines as it may overflow.
      This forces those users to somehow aggregate the references before
      contributing to the percpu_ref which is often cumbersome and sometimes
      challenging to get the same level of performance as using the
      percpu_ref directly.
      
      While using ints for the percpu counters makes them pack tighter on
      64bit machines, the possible gain from using ints instead of longs is
      extremely small compared to the overall gain from per-cpu operation.
      This patch makes percpu_ref based on longs so that it can be used to
      directly count memory objects or pages.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      e625305b
    • T
      percpu-refcount: improve WARN messages · 4843c332
      Tejun Heo 提交于
      percpu_ref's WARN messages can be a lot more helpful by indicating
      who's the culprit.  Make them report the release function that the
      offending percpu-refcount is associated with.  This should make it a
      lot easier to track down the reported invalid refcnting operations.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      4843c332
  4. 08 9月, 2014 1 次提交
    • T
      percpu-refcount: add @gfp to percpu_ref_init() · a34375ef
      Tejun Heo 提交于
      Percpu allocator now supports allocation mask.  Add @gfp to
      percpu_ref_init() so that !GFP_KERNEL allocation masks can be used
      with percpu_refs too.
      
      This patch doesn't make any functional difference.
      
      v2: blk-mq conversion was missing.  Updated.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      a34375ef
  5. 28 6月, 2014 5 次提交
    • T
      percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero() · 2d722782
      Tejun Heo 提交于
      Now that explicit invocation of percpu_ref_exit() is necessary to free
      the percpu counter, we can implement percpu_ref_reinit() which
      reinitializes a released percpu_ref.  This can be used implement
      scalable gating switch which can be drained and then re-opened without
      worrying about memory allocation failures.
      
      percpu_ref_is_zero() is added to be used in a sanity check in
      percpu_ref_exit().  As this function will be useful for other purposes
      too, make it a public interface.
      
      v2: Use smp_read_barrier_depends() instead of smp_load_acquire().  We
          only need data dep barrier and smp_load_acquire() is stronger and
          heavier on some archs.  Spotted by Lai Jiangshan.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      2d722782
    • T
      percpu-refcount: require percpu_ref to be exited explicitly · 9a1049da
      Tejun Heo 提交于
      Currently, a percpu_ref undoes percpu_ref_init() automatically by
      freeing the allocated percpu area when the percpu_ref is killed.
      While seemingly convenient, this has the following niggles.
      
      * It's impossible to re-init a released reference counter without
        going through re-allocation.
      
      * In the similar vein, it's impossible to initialize a percpu_ref
        count with static percpu variables.
      
      * We need and have an explicit destructor anyway for failure paths -
        percpu_ref_cancel_init().
      
      This patch removes the automatic percpu counter freeing in
      percpu_ref_kill_rcu() and repurposes percpu_ref_cancel_init() into a
      generic destructor now named percpu_ref_exit().  percpu_ref_destroy()
      is considered but it gets confusing with percpu_ref_kill() while
      "exit" clearly indicates that it's the counterpart of
      percpu_ref_init().
      
      All percpu_ref_cancel_init() users are updated to invoke
      percpu_ref_exit() instead and explicit percpu_ref_exit() calls are
      added to the destruction path of all percpu_ref users.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NBenjamin LaHaise <bcrl@kvack.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Cc: Li Zefan <lizefan@huawei.com>
      9a1049da
    • T
      percpu-refcount: use unsigned long for pcpu_count pointer · 7d742075
      Tejun Heo 提交于
      percpu_ref->pcpu_count is a percpu pointer with a status flag in its
      lowest bit.  As such, it always goes through arithmetic operations
      which is very cumbersome to do on a pointer.  It has to be first
      casted to unsigned long and then back.
      
      Let's just make the field unsigned long so that we can skip the first
      casts.  While at it, rename it to pcpu_counter_ptr to clarify that
      it's a pointer value.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      7d742075
    • T
      percpu-refcount: add helpers for ->percpu_count accesses · eae7975d
      Tejun Heo 提交于
      * All four percpu_ref_*() operations implemented in the header file
        perform the same operation to determine whether the percpu_ref is
        alive and extract the percpu pointer.  Factor out the common logic
        into __pcpu_ref_alive().  This doesn't change the generated code.
      
      * There are a couple places in percpu-refcount.c which masks out
        PCPU_REF_DEAD to obtain the percpu pointer.  Factor it out into
        pcpu_count_ptr().
      
      * The above changes make the WARN_ON_ONCE() conditional at the top of
        percpu_ref_kill_and_confirm() the only user of REF_STATUS().  Test
        PCPU_REF_DEAD directly and remove REF_STATUS().
      
      This patch doesn't introduce any functional change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      eae7975d
    • T
      percpu-refcount: one bit is enough for REF_STATUS · d630dc4c
      Tejun Heo 提交于
      percpu-refcount currently reserves two lowest bits of its percpu
      pointer to indicate its state; however, only one bit is used for
      PCPU_REF_DEAD.
      
      Simplify it by removing PCPU_STATUS_BITS/MASK and testing
      PCPU_REF_DEAD directly.  This also allows the compiler to choose a
      more efficient instruction depending on the architecture.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      d630dc4c
  6. 21 1月, 2014 1 次提交
  7. 08 11月, 2013 1 次提交
  8. 17 10月, 2013 1 次提交
  9. 17 6月, 2013 1 次提交
    • T
      percpu-refcount: use RCU-sched insted of normal RCU · a4244454
      Tejun Heo 提交于
      percpu-refcount was incorrectly using preempt_disable/enable() for RCU
      critical sections against call_rcu().  6a24474d ("percpu-refcount:
      consistently use plain (non-sched) RCU") fixed it by converting the
      preepmtion operations with rcu_read_[un]lock() citing that there isn't
      any advantage in using sched-RCU over using the usual one; however,
      rcu_read_[un]lock() for the preemptible RCU implementation -
      CONFIG_TREE_PREEMPT_RCU, chosen when CONFIG_PREEMPT - are slightly
      more expensive than preempt_disable/enable().
      
      In a contrived microbench which repeats the followings,
      
       - percpu_ref_get()
       - copy 32 bytes of data into percpu buffer
       - percpu_put_get()
       - copy 32 bytes of data into percpu buffer
      
      rcu_read_[un]lock() used in percpu_ref_get/put() makes it go slower by
      about 15% when compared to using sched-RCU.
      
      As the RCU critical sections are extremely short, using sched-RCU
      shouldn't have any latency implications.  Convert to RCU-sched.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NKent Overstreet <koverstreet@google.com>
      Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      a4244454
  10. 14 6月, 2013 3 次提交
    • T
      percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm() · dbece3a0
      Tejun Heo 提交于
      Implement percpu_tryget() which stops giving out references once the
      percpu_ref is visible as killed.  Because the refcnt is per-cpu,
      different CPUs will start to see a refcnt as killed at different
      points in time and tryget() may continue to succeed on subset of cpus
      for a while after percpu_ref_kill() returns.
      
      For use cases where it's necessary to know when all CPUs start to see
      the refcnt as dead, percpu_ref_kill_and_confirm() is added.  The new
      function takes an extra argument @confirm_kill which is invoked when
      the refcnt is guaranteed to be viewed as killed on all CPUs.
      
      While this isn't the prettiest interface, it doesn't force synchronous
      wait and is much safer than requiring the caller to do its own
      call_rcu().
      
      v2: Patch description rephrased to emphasize that tryget() may
          continue to succeed on some CPUs after kill() returns as suggested
          by Kent.
      
      v3: Function comment in percpu_ref_kill_and_confirm() updated warning
          people to not depend on the implied RCU grace period from the
          confirm callback as it's an implementation detail.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Slightly-Grumpily-Acked-by: NKent Overstreet <koverstreet@google.com>
      dbece3a0
    • T
      percpu-refcount: implement percpu_ref_cancel_init() · bc497bd3
      Tejun Heo 提交于
      Normally, percpu_ref_init() initializes and percpu_ref_kill()
      initiates destruction which completes asynchronously.  The
      asynchronous destruction can be problematic in init failure path where
      the caller wants to destroy half-constructed object - distinguishing
      half-constructed objects from the usual release method can be painful
      for complex objects.
      
      This patch implements percpu_ref_cancel_init() which synchronously
      destroys the percpu_ref without invoking release.  To avoid
      unintentional misuses, the function requires the ref to have finished
      percpu_ref_init() but never used and triggers WARN otherwise.
      
      v2: Explain the weird name and usage restriction in the function
          comment.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NKent Overstreet <koverstreet@google.com>
      bc497bd3
    • T
      percpu-refcount: add __must_check to percpu_ref_init() and don't use... · acac7883
      Tejun Heo 提交于
      percpu-refcount: add __must_check to percpu_ref_init() and don't use ACCESS_ONCE() in percpu_ref_kill_rcu()
      
      Two small changes.
      
      * Unlike most init functions, percpu_ref_init() allocates memory and
        may fail.  Let's mark it with __must_check in case the caller
        forgets.
      
      * percpu_ref_kill_rcu() is unnecessarily using ACCESS_ONCE() to
        dereference @ref->pcpu_count, which can be misleading.  The pointer
        is guaranteed to be valid and visible and can't change underneath
        the function.  Drop ACCESS_ONCE().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      acac7883
  11. 13 6月, 2013 1 次提交
  12. 04 6月, 2013 2 次提交
    • K
      percpu-refcount: Don't use silly cmpxchg() · c1ae6e9b
      Kent Overstreet 提交于
      The cmpxchg() was just to ensure the debug check didn't race, which was
      a bit excessive. The caller is supposed to do the appropriate
      synchronization, which means percpu_ref_kill() can just do a simple
      store.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      c1ae6e9b
    • K
      percpu: implement generic percpu refcounting · 215e262f
      Kent Overstreet 提交于
      This implements a refcount with similar semantics to
      atomic_get()/atomic_dec_and_test() - but percpu.
      
      It also implements two stage shutdown, as we need it to tear down the
      percpu counts.  Before dropping the initial refcount, you must call
      percpu_ref_kill(); this puts the refcount in "shutting down mode" and
      switches back to a single atomic refcount with the appropriate
      barriers (synchronize_rcu()).
      
      It's also legal to call percpu_ref_kill() multiple times - it only
      returns true once, so callers don't have to reimplement shutdown
      synchronization.
      
      [akpm@linux-foundation.org: fix build]
      [akpm@linux-foundation.org: coding-style tweak]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      215e262f