1. 25 9月, 2014 11 次提交
    • T
      blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode · 17497acb
      Tejun Heo 提交于
      blk-mq uses percpu_ref for its usage counter which tracks the number
      of in-flight commands and used to synchronously drain the queue on
      freeze.  percpu_ref shutdown takes measureable wallclock time as it
      involves a sched RCU grace period.  This means that draining a blk-mq
      takes measureable wallclock time.  One would think that this shouldn't
      matter as queue shutdown should be a rare event which takes place
      asynchronously w.r.t. userland.
      
      Unfortunately, SCSI probing involves synchronously setting up and then
      tearing down a lot of request_queues back-to-back for non-existent
      LUNs.  This means that SCSI probing may take above ten seconds when
      scsi-mq is used.
      
        [    0.949892] scsi host0: Virtio SCSI HBA
        [    1.007864] scsi 0:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    1.1. PQ: 0 ANSI: 5
        [    1.021299] scsi 0:0:1:0: Direct-Access     QEMU     QEMU HARDDISK    1.1. PQ: 0 ANSI: 5
        [    1.520356] tsc: Refined TSC clocksource calibration: 2491.910 MHz
      
        <stall>
      
        [   16.186549] sd 0:0:0:0: Attached scsi generic sg0 type 0
        [   16.190478] sd 0:0:1:0: Attached scsi generic sg1 type 0
        [   16.194099] osd: LOADED open-osd 0.2.1
        [   16.203202] sd 0:0:0:0: [sda] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB)
        [   16.208478] sd 0:0:0:0: [sda] Write Protect is off
        [   16.211439] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
        [   16.218771] sd 0:0:1:0: [sdb] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB)
        [   16.223264] sd 0:0:1:0: [sdb] Write Protect is off
        [   16.225682] sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
      
      This is also the reason why request_queues start in bypass mode which
      is ended on blk_register_queue() as shutting down a fully functional
      queue also involves a RCU grace period and the queues for non-existent
      SCSI devices never reach registration.
      
      blk-mq basically needs to do the same thing - start the mq in a
      degraded mode which is faster to shut down and then make it fully
      functional only after the queue reaches registration.  percpu_ref
      recently grew facilities to force atomic operation until explicitly
      switched to percpu mode, which can be used for this purpose.  This
      patch makes blk-mq initialize q->mq_usage_counter in atomic mode and
      switch it to percpu mode only once blk_register_queue() is reached.
      
      Note that this issue was previously worked around by 0a30288d
      ("blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during
      probe") for v3.17.  The temp fix was reverted in preparation of adding
      persistent atomic mode to percpu_ref by 9eca8046 ("Revert "blk-mq,
      percpu_ref: implement a kludge for SCSI blk-mq stall during probe"").
      This patch and the prerequisite percpu_ref changes will be merged
      during v3.18 devel cycle.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NChristoph Hellwig <hch@infradead.org>
      Link: http://lkml.kernel.org/g/20140919113815.GA10791@lst.de
      Fixes: add703fd ("blk-mq: use percpu_ref for mq usage count")
      Reviewed-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      17497acb
    • T
      percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky · 1cae13e7
      Tejun Heo 提交于
      Currently, a percpu_ref which is initialized with
      PERPCU_REF_INIT_ATOMIC or switched to atomic mode via
      switch_to_atomic() automatically reverts to percpu mode on the first
      percpu_ref_reinit().  This makes the atomic mode difficult to use for
      cases where a percpu_ref is used as a persistent on/off switch which
      may be cycled multiple times.
      
      This patch makes such atomic state sticky so that it survives through
      kill/reinit cycles.  After this patch, atomic state is cleared only by
      an explicit percpu_ref_switch_to_percpu() call.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      1cae13e7
    • T
      percpu_ref: add PERCPU_REF_INIT_* flags · 2aad2a86
      Tejun Heo 提交于
      With the recent addition of percpu_ref_reinit(), percpu_ref now can be
      used as a persistent switch which can be turned on and off repeatedly
      where turning off maps to killing the ref and waiting for it to drain;
      however, there currently isn't a way to initialize a percpu_ref in its
      off (killed and drained) state, which can be inconvenient for certain
      persistent switch use cases.
      
      Similarly, percpu_ref_switch_to_atomic/percpu() allow dynamic
      selection of operation mode; however, currently a newly initialized
      percpu_ref is always in percpu mode making it impossible to avoid the
      latency overhead of switching to atomic mode.
      
      This patch adds @flags to percpu_ref_init() and implements the
      following flags.
      
      * PERCPU_REF_INIT_ATOMIC	: start ref in atomic mode
      * PERCPU_REF_INIT_DEAD		: start ref killed and drained
      
      These flags should be able to serve the above two use cases.
      
      v2: target_core_tpg.c conversion was missing.  Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      2aad2a86
    • T
      percpu_ref: decouple switching to percpu mode and reinit · f47ad457
      Tejun Heo 提交于
      percpu_ref has treated the dropping of the base reference and
      switching to atomic mode as an integral operation; however, there's
      nothing inherent tying the two together.
      
      The use cases for percpu_ref have been expanding continuously.  While
      the current init/kill/reinit/exit model can cover a lot, the coupling
      of kill/reinit with atomic/percpu mode switching is turning out to be
      too restrictive for use cases where many percpu_refs are created and
      destroyed back-to-back with only some of them reaching extended
      operation.  The coupling also makes implementing always-atomic debug
      mode difficult.
      
      This patch separates out percpu mode switching into
      percpu_ref_switch_to_percpu() and reimplements percpu_ref_reinit() on
      top of it.
      
      * DEAD still requires ATOMIC.  A dead ref can't be switched to percpu
        mode w/o going through reinit.
      
      v2: __percpu_ref_switch_to_percpu() was missing static.  Fixed.
          Reported by Fengguang aka kbuild test robot.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: kbuild test robot <fengguang.wu@intel.com>
      f47ad457
    • T
      percpu_ref: decouple switching to atomic mode and killing · 490c79a6
      Tejun Heo 提交于
      percpu_ref has treated the dropping of the base reference and
      switching to atomic mode as an integral operation; however, there's
      nothing inherent tying the two together.
      
      The use cases for percpu_ref have been expanding continuously.  While
      the current init/kill/reinit/exit model can cover a lot, the coupling
      of kill/reinit with atomic/percpu mode switching is turning out to be
      too restrictive for use cases where many percpu_refs are created and
      destroyed back-to-back with only some of them reaching extended
      operation.  The coupling also makes implementing always-atomic debug
      mode difficult.
      
      This patch separates out atomic mode switching into
      percpu_ref_switch_to_atomic() and reimplements
      percpu_ref_kill_and_confirm() on top of it.
      
      * The handling of __PERCPU_REF_ATOMIC and __PERCPU_REF_DEAD is now
        differentiated.  Among get/put operations, percpu_ref_tryget_live()
        is the only one which cares about DEAD.
      
      * percpu_ref_switch_to_atomic() can be called multiple times on the
        same ref.  This means that multiple @confirm_switch may get queued
        up which we can't do reliably without extra memory area.  This is
        handled by making the later invocation synchronously wait for the
        completion of the previous one.  This isn't particularly desirable
        but such synchronous waits shouldn't happen in most cases.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      490c79a6
    • T
      percpu_ref: add PCPU_REF_DEAD · 27344a90
      Tejun Heo 提交于
      percpu_ref will be restructured so that percpu/atomic mode switching
      and reference killing are dedoupled.  In preparation, add
      PCPU_REF_DEAD and PCPU_REF_ATOMIC_DEAD which is OR of ATOMIC and DEAD.
      For now, ATOMIC and DEAD are changed together and all PCPU_REF_ATOMIC
      uses are converted to PCPU_REF_ATOMIC_DEAD without causing any
      behavior changes.
      
      percpu_ref_init() now specifies an explicit alignment when allocating
      the percpu counters so that the pointer has enough unused low bits to
      accomodate the flags.  Note that one flag was fine as min alignment
      for percpu memory is 2 bytes but two flags are already too many for
      the natural alignment of unsigned longs on archs like cris and m68k.
      
      v2: The original patch had BUILD_BUG_ON() which triggers if unsigned
          long's alignment isn't enough to accomodate the flags, which
          triggered on cris and m64k.  percpu_ref_init() updated to specify
          the required alignment explicitly.  Reported by Fengguang.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NKent Overstreet <kmo@daterainc.com>
      Cc: kbuild test robot <fengguang.wu@intel.com>
      27344a90
    • T
      percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch · 9e804d1f
      Tejun Heo 提交于
      percpu_ref will be restructured so that percpu/atomic mode switching
      and reference killing are dedoupled.  In preparation, do the following
      renames.
      
      * percpu_ref->confirm_kill	-> percpu_ref->confirm_switch
      * __PERCPU_REF_DEAD		-> __PERCPU_REF_ATOMIC
      * __percpu_ref_alive()		-> __ref_is_percpu()
      
      This patch is pure rename and doesn't introduce any functional
      changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NKent Overstreet <kmo@daterainc.com>
      9e804d1f
    • T
      percpu_ref: replace pcpu_ prefix with percpu_ · eecc16ba
      Tejun Heo 提交于
      percpu_ref uses pcpu_ prefix for internal stuff and percpu_ for
      externally visible ones.  This is the same convention used in the
      percpu allocator implementation.  It works fine there but percpu_ref
      doesn't have too much internal-only stuff and scattered usages of
      pcpu_ prefix are confusing than helpful.
      
      This patch replaces all pcpu_ prefixes with percpu_.  This is pure
      rename and there's no functional change.  Note that PCPU_REF_DEAD is
      renamed to __PERCPU_REF_DEAD to signify that the flag is internal.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NKent Overstreet <kmo@daterainc.com>
      eecc16ba
    • T
      percpu_ref: minor code and comment updates · 6251f997
      Tejun Heo 提交于
      * Some comments became stale.  Updated.
      * percpu_ref_tryget() unnecessarily initializes @ret.  Removed.
      * A blank line removed from percpu_ref_kill_rcu().
      * Explicit function name in a WARN format string replaced with __func__.
      * WARN_ON() in percpu_ref_reinit() converted to WARN_ON_ONCE().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NKent Overstreet <kmo@daterainc.com>
      6251f997
    • T
      percpu_ref: relocate percpu_ref_reinit() · a2237370
      Tejun Heo 提交于
      percpu_ref is gonna go through restructuring.  Move
      percpu_ref_reinit() after percpu_ref_kill_and_confirm().  This will
      make later changes easier to follow and result in cleaner
      organization.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NKent Overstreet <kmo@daterainc.com>
      a2237370
    • T
      Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe" · 9eca8046
      Tejun Heo 提交于
      This reverts commit 0a30288d, which
      was a temporary fix for SCSI blk-mq stall issue.  The following
      patches will fix the issue properly by introducing atomic mode to
      percpu_ref.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@lst.de>
      9eca8046
  2. 24 9月, 2014 1 次提交
    • T
      blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe · 0a30288d
      Tejun Heo 提交于
      blk-mq uses percpu_ref for its usage counter which tracks the number
      of in-flight commands and used to synchronously drain the queue on
      freeze.  percpu_ref shutdown takes measureable wallclock time as it
      involves a sched RCU grace period.  This means that draining a blk-mq
      takes measureable wallclock time.  One would think that this shouldn't
      matter as queue shutdown should be a rare event which takes place
      asynchronously w.r.t. userland.
      
      Unfortunately, SCSI probing involves synchronously setting up and then
      tearing down a lot of request_queues back-to-back for non-existent
      LUNs.  This means that SCSI probing may take more than ten seconds
      when scsi-mq is used.
      
      This will be properly fixed by implementing a mechanism to keep
      q->mq_usage_counter in atomic mode till genhd registration; however,
      that involves rather big updates to percpu_ref which is difficult to
      apply late in the devel cycle (v3.17-rc6 at the moment).  As a
      stop-gap measure till the proper fix can be implemented in the next
      cycle, this patch introduces __percpu_ref_kill_expedited() and makes
      blk_mq_freeze_queue() use it.  This is heavy-handed but should work
      for testing the experimental SCSI blk-mq implementation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NChristoph Hellwig <hch@infradead.org>
      Link: http://lkml.kernel.org/g/20140919113815.GA10791@lst.de
      Fixes: add703fd ("blk-mq: use percpu_ref for mq usage count")
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Tested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      0a30288d
  3. 20 9月, 2014 1 次提交
    • T
      percpu-refcount: make percpu_ref based on longs instead of ints · e625305b
      Tejun Heo 提交于
      percpu_ref is currently based on ints and the number of refs it can
      cover is (1 << 31).  This makes it impossible to use a percpu_ref to
      count memory objects or pages on 64bit machines as it may overflow.
      This forces those users to somehow aggregate the references before
      contributing to the percpu_ref which is often cumbersome and sometimes
      challenging to get the same level of performance as using the
      percpu_ref directly.
      
      While using ints for the percpu counters makes them pack tighter on
      64bit machines, the possible gain from using ints instead of longs is
      extremely small compared to the overall gain from per-cpu operation.
      This patch makes percpu_ref based on longs so that it can be used to
      directly count memory objects or pages.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      e625305b
  4. 19 9月, 2014 1 次提交
  5. 17 9月, 2014 1 次提交
  6. 15 9月, 2014 1 次提交
    • L
      vfs: avoid non-forwarding large load after small store in path lookup · 9226b5b4
      Linus Torvalds 提交于
      The performance regression that Josef Bacik reported in the pathname
      lookup (see commit 99d263d4 "vfs: fix bad hashing of dentries") made
      me look at performance stability of the dcache code, just to verify that
      the problem was actually fixed.  That turned up a few other problems in
      this area.
      
      There are a few cases where we exit RCU lookup mode and go to the slow
      serializing case when we shouldn't, Al has fixed those and they'll come
      in with the next VFS pull.
      
      But my performance verification also shows that link_path_walk() turns
      out to have a very unfortunate 32-bit store of the length and hash of
      the name we look up, followed by a 64-bit read of the combined hash_len
      field.  That screws up the processor store to load forwarding, causing
      an unnecessary hickup in this critical routine.
      
      It's caused by the ugly calling convention for the "hash_name()"
      function, and easily fixed by just making hash_name() fill in the whole
      'struct qstr' rather than passing it a pointer to just the hash value.
      
      With that, the profile for this function looks much smoother.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9226b5b4
  7. 14 9月, 2014 1 次提交
    • L
      Make hash_64() use a 64-bit multiply when appropriate · 23d0db76
      Linus Torvalds 提交于
      The hash_64() function historically does the multiply by the
      GOLDEN_RATIO_PRIME_64 number with explicit shifts and adds, because
      unlike the 32-bit case, gcc seems unable to turn the constant multiply
      into the more appropriate shift and adds when required.
      
      However, that means that we generate those shifts and adds even when the
      architecture has a fast multiplier, and could just do it better in
      hardware.
      
      Use the now-cleaned-up CONFIG_ARCH_HAS_FAST_MULTIPLIER (together with
      "is it a 64-bit architecture") to decide whether to use an integer
      multiply or the explicit sequence of shift/add instructions.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      23d0db76
  8. 13 9月, 2014 2 次提交
    • A
      jiffies: Fix timeval conversion to jiffies · d78c9300
      Andrew Hunter 提交于
      timeval_to_jiffies tried to round a timeval up to an integral number
      of jiffies, but the logic for doing so was incorrect: intervals
      corresponding to exactly N jiffies would become N+1. This manifested
      itself particularly repeatedly stopping/starting an itimer:
      
      setitimer(ITIMER_PROF, &val, NULL);
      setitimer(ITIMER_PROF, NULL, &val);
      
      would add a full tick to val, _even if it was exactly representable in
      terms of jiffies_ (say, the result of a previous rounding.)  Doing
      this repeatedly would cause unbounded growth in val.  So fix the math.
      
      Here's what was wrong with the conversion: we essentially computed
      (eliding seconds)
      
      jiffies = usec  * (NSEC_PER_USEC/TICK_NSEC)
      
      by using scaling arithmetic, which took the best approximation of
      NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC =
      x/(2^USEC_JIFFIE_SC), and computed:
      
      jiffies = (usec * x) >> USEC_JIFFIE_SC
      
      and rounded this calculation up in the intermediate form (since we
      can't necessarily exactly represent TICK_NSEC in usec.) But the
      scaling arithmetic is a (very slight) *over*approximation of the true
      value; that is, instead of dividing by (1 usec/ 1 jiffie), we
      effectively divided by (1 usec/1 jiffie)-epsilon (rounding
      down). This would normally be fine, but we want to round timeouts up,
      and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this
      would be fine if our division was exact, but dividing this by the
      slightly smaller factor was equivalent to adding just _over_ 1 to the
      final result (instead of just _under_ 1, as desired.)
      
      In particular, with HZ=1000, we consistently computed that 10000 usec
      was 11 jiffies; the same was true for any exact multiple of
      TICK_NSEC.
      
      We could possibly still round in the intermediate form, adding
      something less than 2^USEC_JIFFIE_SC - 1, but easier still is to
      convert usec->nsec, round in nanoseconds, and then convert using
      time*spec*_to_jiffies.  This adds one constant multiplication, and is
      not observably slower in microbenchmarks on recent x86 hardware.
      
      Tested: the following program:
      
      int main() {
        struct itimerval zero = {{0, 0}, {0, 0}};
        /* Initially set to 10 ms. */
        struct itimerval initial = zero;
        initial.it_interval.tv_usec = 10000;
        setitimer(ITIMER_PROF, &initial, NULL);
        /* Save and restore several times. */
        for (size_t i = 0; i < 10; ++i) {
          struct itimerval prev;
          setitimer(ITIMER_PROF, &zero, &prev);
          /* on old kernels, this goes up by TICK_USEC every iteration */
          printf("previous value: %ld %ld %ld %ld\n",
                 prev.it_interval.tv_sec, prev.it_interval.tv_usec,
                 prev.it_value.tv_sec, prev.it_value.tv_usec);
          setitimer(ITIMER_PROF, &prev, NULL);
        }
          return 0;
      }
      
      Cc: stable@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Reviewed-by: NPaul Turner <pjt@google.com>
      Reported-by: NAaron Jacobs <jacobsa@google.com>
      Signed-off-by: NAndrew Hunter <ahh@google.com>
      [jstultz: Tweaked to apply to 3.17-rc]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      d78c9300
    • T
      workqueue: apply __WQ_ORDERED to create_singlethread_workqueue() · e09c2c29
      Tejun Heo 提交于
      create_singlethread_workqueue() is a compat interface for single
      threaded workqueue which maps to ordered workqueue w/ rescuer in the
      current implementation.  create_singlethread_workqueue() currently
      implemented by invoking alloc_workqueue() w/ appropriate parameters.
      
      8719dcea ("workqueue: reject adjusting max_active or applying
      attrs to ordered workqueues") introduced __WQ_ORDERED to protect
      ordered workqueues against dynamic attribute changes which can break
      ordering guarantees but forgot to apply it to
      create_singlethread_workqueue().  This in itself is okay as nobody
      currently uses dynamic attribute change on workqueues created with
      create_singlethread_workqueue().
      
      However, 4c16bd32 ("workqueue: implement NUMA affinity for unbound
      workqueues") broke singlethreaded guarantee for ordered workqueues
      through allocating a separate pool_workqueue on each NUMA node by
      default.  A later change 8a2b7538 ("workqueue: fix ordered
      workqueues in NUMA setups") fixed it by allocating only one global
      pool_workqueue if __WQ_ORDERED is set.
      
      Combined, the __WQ_ORDERED omission in create_singlethread_workqueue()
      became critical breaking its single threadedness and ordering
      guarantee.
      
      Let's make create_singlethread_workqueue() wrap
      alloc_ordered_workqueue() instead so that it inherits __WQ_ORDERED and
      can implicitly track future ordered_workqueue changes.
      
      v2: I missed that __WQ_ORDERED now protects against pwq splitting
          across NUMA nodes and incorrectly described the patch as a
          nice-to-have fix to protect against future dynamic attribute
          usages.  Oleg pointed out that this is actually a critical
          breakage due to 8a2b7538 ("workqueue: fix ordered workqueues
          in NUMA setups").
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NMike Anderson <mike.anderson@us.ibm.com>
      Cc: Oleg Nesterov <onestero@redhat.com>
      Cc: Gustavo Luiz Duarte <gduarte@redhat.com>
      Cc: Tomas Henzl <thenzl@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 4c16bd32 ("workqueue: implement NUMA affinity for unbound workqueues")
      e09c2c29
  9. 11 9月, 2014 2 次提交
    • M
      net/mlx4: Set vlan stripping policy by the right command · 09e05c3f
      Matan Barak 提交于
      Changing the vlan stripping policy of the QP isn't supported by older
      firmware versions for the INIT2RTR command. Nevertheless, we've used it.
      
      Fix that by doing this policy change using INIT2RTR only if the firmware
      supports it, otherwise, we call UPDATE_QP command to do the task.
      
      Fixes: 7677fc96 ('net/mlx4: Strengthen VLAN tags/priorities enforcement in VST mode')
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09e05c3f
    • B
      PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device · b440bde7
      Bjorn Helgaas 提交于
      Powering off a hot-pluggable device, e.g., with pci_set_power_state(D3cold),
      normally generates a hot-remove event that unbinds the driver.
      
      Some drivers expect to remain bound to a device even while they power it
      off and back on again.  This can be dangerous, because if the device is
      removed or replaced while it is powered off, the driver doesn't know that
      anything changed.  But some drivers accept that risk.
      
      Add pci_ignore_hotplug() for use by drivers that know their device cannot
      be removed.  Using pci_ignore_hotplug() tells the PCI core that hot-plug
      events for the device should be ignored.
      
      The radeon and nouveau drivers use this to switch between a low-power,
      integrated GPU and a higher-power, higher-performance discrete GPU.  They
      power off the unused GPU, but they want to remain bound to it.
      
      This is a reimplementation of f244d8b6 ("ACPIPHP / radeon / nouveau:
      Fix VGA switcheroo problem related to hotplug") but extends it to work with
      both acpiphp and pciehp.
      
      This fixes a problem where systems with dual GPUs using the radeon drivers
      become unusable, freezing every few seconds (see bugzillas below).  The
      resume of the radeon device may also fail, e.g.,
      
      This fixes problems on dual GPU systems where the radeon driver becomes
      unusable because of problems while suspending the device, as in bug 79701:
      
          [drm] radeon: finishing device.
          radeon 0000:01:00.0: Userspace still has active objects !
          radeon 0000:01:00.0: ffff8800cb4ec288 ffff8800cb4ec000 16384 4294967297 force free
          ...
          WARNING: CPU: 0 PID: 67 at /home/apw/COD/linux/drivers/gpu/drm/radeon/radeon_gart.c:234 radeon_gart_unbind+0xd2/0xe0 [radeon]()
          trying to unbind memory from uninitialized GART !
      
      or while resuming it, as in bug 77261:
      
          radeon 0000:01:00.0: ring 0 stalled for more than 10158msec
          radeon 0000:01:00.0: GPU lockup ...
          radeon 0000:01:00.0: GPU pci config reset
          pciehp 0000:00:01.0:pcie04: Card not present on Slot(1-1)
          radeon 0000:01:00.0: GPU reset succeeded, trying to resume
          *ERROR* radeon: dpm resume failed
          radeon 0000:01:00.0: Wait for MC idle timedout !
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=77261
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=79701Reported-by: NShawn Starr <shawn.starr@rogers.com>
      Reported-by: NJose P. <lbdkmjdf@sharklasers.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NAlex Deucher <alexander.deucher@amd.com>
      Acked-by: NRajat Jain <rajatxjain@gmail.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NDave Airlie <airlied@redhat.com>
      CC: stable@vger.kernel.org	# v3.15+
      b440bde7
  10. 08 9月, 2014 3 次提交
    • T
      percpu-refcount: add @gfp to percpu_ref_init() · a34375ef
      Tejun Heo 提交于
      Percpu allocator now supports allocation mask.  Add @gfp to
      percpu_ref_init() so that !GFP_KERNEL allocation masks can be used
      with percpu_refs too.
      
      This patch doesn't make any functional difference.
      
      v2: blk-mq conversion was missing.  Updated.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      a34375ef
    • T
      proportions: add @gfp to init functions · 20ae0079
      Tejun Heo 提交于
      Percpu allocator now supports allocation mask.  Add @gfp to
      [flex_]proportions init functions so that !GFP_KERNEL allocation masks
      can be used with them too.
      
      This patch doesn't make any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      20ae0079
    • T
      percpu_counter: add @gfp to percpu_counter_init() · 908c7f19
      Tejun Heo 提交于
      Percpu allocator now supports allocation mask.  Add @gfp to
      percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
      with percpu_counters too.
      
      We could have left percpu_counter_init() alone and added
      percpu_counter_init_gfp(); however, the number of users isn't that
      high and introducing _gfp variants to all percpu data structures would
      be quite ugly, so let's just do the conversion.  This is the one with
      the most users.  Other percpu data structures are a lot easier to
      convert.
      
      This patch doesn't make any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Cc: x86@kernel.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      908c7f19
  11. 06 9月, 2014 1 次提交
  12. 05 9月, 2014 1 次提交
    • F
      nohz: Restore NMI safe local irq work for local nohz kick · 40bea039
      Frederic Weisbecker 提交于
      The local nohz kick is currently used by perf which needs it to be
      NMI-safe. Recent commit though (7d1311b9)
      changed its implementation to fire the local kick using the remote kick
      API. It was convenient to make the code more generic but the remote kick
      isn't NMI-safe.
      
      As a result:
      
      	WARNING: CPU: 3 PID: 18062 at kernel/irq_work.c:72 irq_work_queue_on+0x11e/0x140()
      	CPU: 3 PID: 18062 Comm: trinity-subchil Not tainted 3.16.0+ #34
      	0000000000000009 00000000903774d1 ffff880244e06c00 ffffffff9a7f1e37
      	0000000000000000 ffff880244e06c38 ffffffff9a0791dd ffff880244fce180
      	0000000000000003 ffff880244e06d58 ffff880244e06ef8 0000000000000000
      	Call Trace:
      	<NMI>  [<ffffffff9a7f1e37>] dump_stack+0x4e/0x7a
      	[<ffffffff9a0791dd>] warn_slowpath_common+0x7d/0xa0
      	[<ffffffff9a07930a>] warn_slowpath_null+0x1a/0x20
      	[<ffffffff9a17ca1e>] irq_work_queue_on+0x11e/0x140
      	[<ffffffff9a10a2c7>] tick_nohz_full_kick_cpu+0x57/0x90
      	[<ffffffff9a186cd5>] __perf_event_overflow+0x275/0x350
      	[<ffffffff9a184f80>] ? perf_event_task_disable+0xa0/0xa0
      	[<ffffffff9a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
      	[<ffffffff9a187934>] perf_event_overflow+0x14/0x20
      	[<ffffffff9a020386>] intel_pmu_handle_irq+0x206/0x410
      	[<ffffffff9a0b54d3>] ? arch_vtime_task_switch+0x63/0x130
      	[<ffffffff9a01937b>] perf_event_nmi_handler+0x2b/0x50
      	[<ffffffff9a007b72>] nmi_handle+0xd2/0x390
      	[<ffffffff9a007aa5>] ? nmi_handle+0x5/0x390
      	[<ffffffff9a0d131b>] ? lock_release+0xab/0x330
      	[<ffffffff9a008062>] default_do_nmi+0x72/0x1c0
      	[<ffffffff9a0c925f>] ? cpuacct_account_field+0xcf/0x200
      	[<ffffffff9a008268>] do_nmi+0xb8/0x100
      
      Lets fix this by restoring the use of local irq work for the nohz local
      kick.
      Reported-by: NCatalin Iacob <iacobcatalin@gmail.com>
      Reported-and-tested-by: NDave Jones <davej@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      40bea039
  13. 03 9月, 2014 4 次提交
    • G
    • T
      percpu: implement asynchronous chunk population · 1a4d7607
      Tejun Heo 提交于
      The percpu allocator now supports atomic allocations by only
      allocating from already populated areas but the mechanism to ensure
      that there's adequate amount of populated areas was missing.
      
      This patch expands pcpu_balance_work so that in addition to freeing
      excess free chunks it also populates chunks to maintain an adequate
      level of populated areas.  pcpu_alloc() schedules pcpu_balance_work if
      the amount of free populated areas is too low or after an atomic
      allocation failure.
      
      * PERPCU_DYNAMIC_RESERVE is increased by two pages to account for
        PCPU_EMPTY_POP_PAGES_LOW.
      
      * pcpu_async_enabled is added to gate both async jobs -
        chunk->map_extend_work and pcpu_balance_work - so that we don't end
        up scheduling them while the needed subsystems aren't up yet.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      1a4d7607
    • T
      percpu: implement [__]alloc_percpu_gfp() · 5835d96e
      Tejun Heo 提交于
      Now that pcpu_alloc_area() can allocate only from populated areas,
      it's easy to add atomic allocation support to [__]alloc_percpu().
      Update pcpu_alloc() so that it accepts @gfp and skips all the blocking
      operations and allocates only from the populated areas if @gfp doesn't
      contain GFP_KERNEL.  New interface functions [__]alloc_percpu_gfp()
      are added.
      
      While this means that atomic allocations are possible, this isn't
      complete yet as there's no mechanism to ensure that certain amount of
      populated areas is kept available and atomic allocations may keep
      failing under certain conditions.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      5835d96e
    • J
      Revert "leds: convert blink timer to workqueue" · 9067359f
      Jiri Kosina 提交于
      This reverts commit 8b37e1be.
      
      It's broken as it changes led_blink_set() in a way that it can now sleep
      (while synchronously waiting for workqueue to be cancelled). That's a
      problem, because it's possible that this function gets called from atomic
      context (tpt_trig_timer() takes a readlock and thus disables preemption).
      
      This has been brought up 3 weeks ago already [1] but no proper fix has
      materialized, and I keep seeing the problem since 3.17-rc1.
      
      [1] https://lkml.org/lkml/2014/8/16/128
      
       BUG: sleeping function called from invalid context at kernel/workqueue.c:2650
       in_atomic(): 1, irqs_disabled(): 0, pid: 2335, name: wpa_supplicant
       5 locks held by wpa_supplicant/2335:
        #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff814c7c92>] rtnl_lock+0x12/0x20
        #1:  (&wdev->mtx){+.+.+.}, at: [<ffffffffc06e649c>] cfg80211_mgd_wext_siwessid+0x5c/0x180 [cfg80211]
        #2:  (&local->mtx){+.+.+.}, at: [<ffffffffc0817dea>] ieee80211_prep_connection+0x17a/0x9a0 [mac80211]
        #3:  (&local->chanctx_mtx){+.+.+.}, at: [<ffffffffc08081ed>] ieee80211_vif_use_channel+0x5d/0x2a0 [mac80211]
        #4:  (&trig->leddev_list_lock){.+.+..}, at: [<ffffffffc081e68c>] tpt_trig_timer+0xec/0x170 [mac80211]
       CPU: 0 PID: 2335 Comm: wpa_supplicant Not tainted 3.17.0-rc3 #1
       Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
        ffff8800360b5a50 ffff8800751f76d8 ffffffff8159e97f ffff8800360b5a30
        ffff8800751f76e8 ffffffff810739a5 ffff8800751f77b0 ffffffff8106862f
        ffffffff810685d0 0aa2209200000000 ffff880000000004 ffff8800361c59d0
       Call Trace:
        [<ffffffff8159e97f>] dump_stack+0x4d/0x66
        [<ffffffff810739a5>] __might_sleep+0xe5/0x120
        [<ffffffff8106862f>] flush_work+0x5f/0x270
        [<ffffffff810685d0>] ? mod_delayed_work_on+0x80/0x80
        [<ffffffff810945ca>] ? mark_held_locks+0x6a/0x90
        [<ffffffff81068a5f>] ? __cancel_work_timer+0x6f/0x100
        [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
        [<ffffffff81068a6b>] __cancel_work_timer+0x7b/0x100
        [<ffffffff81068b0e>] cancel_delayed_work_sync+0xe/0x10
        [<ffffffff8147cf3b>] led_blink_set+0x1b/0x40
        [<ffffffffc081e6b0>] tpt_trig_timer+0x110/0x170 [mac80211]
        [<ffffffffc081ecdd>] ieee80211_mod_tpt_led_trig+0x9d/0x160 [mac80211]
        [<ffffffffc07e4278>] __ieee80211_recalc_idle+0x98/0x140 [mac80211]
        [<ffffffffc07e59ce>] ieee80211_idle_off+0xe/0x10 [mac80211]
        [<ffffffffc0804e5b>] ieee80211_add_chanctx+0x3b/0x220 [mac80211]
        [<ffffffffc08062e4>] ieee80211_new_chanctx+0x44/0xf0 [mac80211]
        [<ffffffffc080838a>] ieee80211_vif_use_channel+0x1fa/0x2a0 [mac80211]
        [<ffffffffc0817df8>] ieee80211_prep_connection+0x188/0x9a0 [mac80211]
        [<ffffffffc081c246>] ieee80211_mgd_auth+0x256/0x2e0 [mac80211]
        [<ffffffffc07eab33>] ieee80211_auth+0x13/0x20 [mac80211]
        [<ffffffffc06cb006>] cfg80211_mlme_auth+0x106/0x270 [cfg80211]
        [<ffffffffc06ce085>] cfg80211_conn_do_work+0x155/0x3b0 [cfg80211]
        [<ffffffffc06cf670>] cfg80211_connect+0x3f0/0x540 [cfg80211]
        [<ffffffffc06e6148>] cfg80211_mgd_wext_connect+0x158/0x1f0 [cfg80211]
        [<ffffffffc06e651e>] cfg80211_mgd_wext_siwessid+0xde/0x180 [cfg80211]
        [<ffffffffc06e36c0>] ? cfg80211_wext_giwessid+0x50/0x50 [cfg80211]
        [<ffffffffc06e36dd>] cfg80211_wext_siwessid+0x1d/0x40 [cfg80211]
        [<ffffffff81584d0c>] ioctl_standard_iw_point+0x14c/0x3e0
        [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
        [<ffffffff8158502a>] ioctl_standard_call+0x8a/0xd0
        [<ffffffff81584fa0>] ? ioctl_standard_iw_point+0x3e0/0x3e0
        [<ffffffff81584b76>] wireless_process_ioctl.constprop.10+0xb6/0x100
        [<ffffffff8158521d>] wext_handle_ioctl+0x5d/0xb0
        [<ffffffff814cfb29>] dev_ioctl+0x329/0x620
        [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
        [<ffffffff8149c7f2>] sock_ioctl+0x142/0x2e0
        [<ffffffff811b0140>] do_vfs_ioctl+0x300/0x520
        [<ffffffff815a67fb>] ? sysret_check+0x1b/0x56
        [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
        [<ffffffff811b03e1>] SyS_ioctl+0x81/0xa0
        [<ffffffff815a67d6>] system_call_fastpath+0x1a/0x1f
       wlan0: send auth to 00:0b:6b:3c:8c:e4 (try 1/3)
       wlan0: authenticated
       wlan0: associate with 00:0b:6b:3c:8c:e4 (try 1/3)
       wlan0: RX AssocResp from 00:0b:6b:3c:8c:e4 (capab=0x431 status=0 aid=2)
       wlan0: associated
       IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
       cfg80211: Calling CRDA for country: NA
       wlan0: Limiting TX power to 27 (27 - 0) dBm as advertised by 00:0b:6b:3c:8c:e4
      
       =================================
       [ INFO: inconsistent lock state ]
       3.17.0-rc3 #1 Not tainted
       ---------------------------------
       inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
       swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
        ((&(&led_cdev->blink_work)->work)){+.?...}, at: [<ffffffff810685d0>] flush_work+0x0/0x270
       {SOFTIRQ-ON-W} state was registered at:
         [<ffffffff81094dbe>] __lock_acquire+0x30e/0x1a30
         [<ffffffff81096c81>] lock_acquire+0x91/0x110
         [<ffffffff81068608>] flush_work+0x38/0x270
         [<ffffffff81068a6b>] __cancel_work_timer+0x7b/0x100
         [<ffffffff81068b0e>] cancel_delayed_work_sync+0xe/0x10
         [<ffffffff8147cf3b>] led_blink_set+0x1b/0x40
         [<ffffffffc081e6b0>] tpt_trig_timer+0x110/0x170 [mac80211]
         [<ffffffffc081ecdd>] ieee80211_mod_tpt_led_trig+0x9d/0x160 [mac80211]
         [<ffffffffc07e4278>] __ieee80211_recalc_idle+0x98/0x140 [mac80211]
         [<ffffffffc07e59ce>] ieee80211_idle_off+0xe/0x10 [mac80211]
         [<ffffffffc0804e5b>] ieee80211_add_chanctx+0x3b/0x220 [mac80211]
         [<ffffffffc08062e4>] ieee80211_new_chanctx+0x44/0xf0 [mac80211]
         [<ffffffffc080838a>] ieee80211_vif_use_channel+0x1fa/0x2a0 [mac80211]
         [<ffffffffc0817df8>] ieee80211_prep_connection+0x188/0x9a0 [mac80211]
         [<ffffffffc081c246>] ieee80211_mgd_auth+0x256/0x2e0 [mac80211]
         [<ffffffffc07eab33>] ieee80211_auth+0x13/0x20 [mac80211]
         [<ffffffffc06cb006>] cfg80211_mlme_auth+0x106/0x270 [cfg80211]
         [<ffffffffc06ce085>] cfg80211_conn_do_work+0x155/0x3b0 [cfg80211]
         [<ffffffffc06cf670>] cfg80211_connect+0x3f0/0x540 [cfg80211]
         [<ffffffffc06e6148>] cfg80211_mgd_wext_connect+0x158/0x1f0 [cfg80211]
         [<ffffffffc06e651e>] cfg80211_mgd_wext_siwessid+0xde/0x180 [cfg80211]
         [<ffffffffc06e36dd>] cfg80211_wext_siwessid+0x1d/0x40 [cfg80211]
         [<ffffffff81584d0c>] ioctl_standard_iw_point+0x14c/0x3e0
         [<ffffffff8158502a>] ioctl_standard_call+0x8a/0xd0
         [<ffffffff81584b76>] wireless_process_ioctl.constprop.10+0xb6/0x100
         [<ffffffff8158521d>] wext_handle_ioctl+0x5d/0xb0
         [<ffffffff814cfb29>] dev_ioctl+0x329/0x620
         [<ffffffff8149c7f2>] sock_ioctl+0x142/0x2e0
         [<ffffffff811b0140>] do_vfs_ioctl+0x300/0x520
         [<ffffffff811b03e1>] SyS_ioctl+0x81/0xa0
         [<ffffffff815a67d6>] system_call_fastpath+0x1a/0x1f
       irq event stamp: 493416
       hardirqs last  enabled at (493416): [<ffffffff81068a5f>] __cancel_work_timer+0x6f/0x100
       hardirqs last disabled at (493415): [<ffffffff81067e9f>] try_to_grab_pending+0x1f/0x160
       softirqs last  enabled at (493408): [<ffffffff81053ced>] _local_bh_enable+0x1d/0x50
       softirqs last disabled at (493409): [<ffffffff81054c75>] irq_exit+0xa5/0xb0
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock((&(&led_cdev->blink_work)->work));
         <Interrupt>
           lock((&(&led_cdev->blink_work)->work));
      
        *** DEADLOCK ***
      
       2 locks held by swapper/0/0:
        #0:  (((&tpt_trig->timer))){+.-...}, at: [<ffffffff810b4c50>] call_timer_fn+0x0/0x180
        #1:  (&trig->leddev_list_lock){.+.?..}, at: [<ffffffffc081e68c>] tpt_trig_timer+0xec/0x170 [mac80211]
      
       stack backtrace:
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.0-rc3 #1
       Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
        ffffffff8246eb30 ffff88007c203b00 ffffffff8159e97f ffffffff81a194c0
        ffff88007c203b50 ffffffff81599c29 0000000000000001 ffffffff00000001
        ffff880000000000 0000000000000006 ffffffff81a194c0 ffffffff81093ad0
       Call Trace:
        <IRQ>  [<ffffffff8159e97f>] dump_stack+0x4d/0x66
        [<ffffffff81599c29>] print_usage_bug+0x1f4/0x205
        [<ffffffff81093ad0>] ? check_usage_backwards+0x140/0x140
        [<ffffffff810944d3>] mark_lock+0x223/0x2b0
        [<ffffffff81094d60>] __lock_acquire+0x2b0/0x1a30
        [<ffffffff81096c81>] lock_acquire+0x91/0x110
        [<ffffffff810685d0>] ? mod_delayed_work_on+0x80/0x80
        [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
        [<ffffffff81068608>] flush_work+0x38/0x270
        [<ffffffff810685d0>] ? mod_delayed_work_on+0x80/0x80
        [<ffffffff810945ca>] ? mark_held_locks+0x6a/0x90
        [<ffffffff81068a5f>] ? __cancel_work_timer+0x6f/0x100
        [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
        [<ffffffff8109469d>] ? trace_hardirqs_on_caller+0xad/0x1c0
        [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
        [<ffffffff81068a6b>] __cancel_work_timer+0x7b/0x100
        [<ffffffff81068b0e>] cancel_delayed_work_sync+0xe/0x10
        [<ffffffff8147cf3b>] led_blink_set+0x1b/0x40
        [<ffffffffc081e6b0>] tpt_trig_timer+0x110/0x170 [mac80211]
        [<ffffffff810b4cc5>] call_timer_fn+0x75/0x180
        [<ffffffff810b4c50>] ? process_timeout+0x10/0x10
        [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
        [<ffffffff810b50ac>] run_timer_softirq+0x1fc/0x2f0
        [<ffffffff81054805>] __do_softirq+0x115/0x2e0
        [<ffffffff81054c75>] irq_exit+0xa5/0xb0
        [<ffffffff810049b3>] do_IRQ+0x53/0xf0
        [<ffffffff815a74af>] common_interrupt+0x6f/0x6f
        <EOI>  [<ffffffff8147b56e>] ? cpuidle_enter_state+0x6e/0x180
        [<ffffffff8147b732>] cpuidle_enter+0x12/0x20
        [<ffffffff8108bba0>] cpu_startup_entry+0x330/0x360
        [<ffffffff8158fb51>] rest_init+0xc1/0xd0
        [<ffffffff8158fa90>] ? csum_partial_copy_generic+0x170/0x170
        [<ffffffff81af3ff2>] start_kernel+0x44f/0x45a
        [<ffffffff81af399c>] ? set_init_arg+0x53/0x53
        [<ffffffff81af35ad>] x86_64_start_reservations+0x2a/0x2c
        [<ffffffff81af36a0>] x86_64_start_kernel+0xf1/0xf4
      
      Cc: Vincent Donnefort <vdonnefort@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NBryan Wu <cooloney@gmail.com>
      9067359f
  14. 02 9月, 2014 1 次提交
  15. 30 8月, 2014 1 次提交
  16. 29 8月, 2014 1 次提交
    • D
      jbd2: fix descriptor block size handling errors with journal_csum · db9ee220
      Darrick J. Wong 提交于
      It turns out that there are some serious problems with the on-disk
      format of journal checksum v2.  The foremost is that the function to
      calculate descriptor tag size returns sizes that are too big.  This
      causes alignment issues on some architectures and is compounded by the
      fact that some parts of jbd2 use the structure size (incorrectly) to
      determine the presence of a 64bit journal instead of checking the
      feature flags.
      
      Therefore, introduce journal checksum v3, which enlarges the
      descriptor block tag format to allow for full 32-bit checksums of
      journal blocks, fix the journal tag function to return the correct
      sizes, and fix the jbd2 recovery code to use feature flags to
      determine 64bitness.
      
      Add a few function helpers so we don't have to open-code quite so
      many pieces.
      
      Switching to a 16-byte block size was found to increase journal size
      overhead by a maximum of 0.1%, to convert a 32-bit journal with no
      checksumming to a 32-bit journal with checksum v3 enabled.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reported-by: NTR Reardon <thomas_reardon@hotmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      db9ee220
  17. 28 8月, 2014 2 次提交
  18. 26 8月, 2014 2 次提交
  19. 25 8月, 2014 1 次提交
  20. 23 8月, 2014 2 次提交