1. 12 10月, 2016 1 次提交
  2. 29 8月, 2016 1 次提交
  3. 14 7月, 2016 1 次提交
  4. 02 7月, 2016 1 次提交
  5. 17 6月, 2016 1 次提交
    • P
      workqueue: Fix setting affinity of unbound worker threads · d945b5e9
      Peter Zijlstra 提交于
      With commit e9d867a6 ("sched: Allow per-cpu kernel threads to
      run on online && !active"), __set_cpus_allowed_ptr() expects that only
      strict per-cpu kernel threads can have affinity to an online CPU which
      is not yet active.
      
      This assumption is currently broken in the CPU_ONLINE notification
      handler for the workqueues where restore_unbound_workers_cpumask()
      calls set_cpus_allowed_ptr() when the first cpu in the unbound
      worker's pool->attr->cpumask comes online. Since
      set_cpus_allowed_ptr() is called with pool->attr->cpumask in which
      only one CPU is online which is not yet active, we get the following
      WARN_ON during an CPU online operation.
      
      ------------[ cut here ]------------
      WARNING: CPU: 40 PID: 248 at kernel/sched/core.c:1166
      __set_cpus_allowed_ptr+0x228/0x2e0
      Modules linked in:
      CPU: 40 PID: 248 Comm: cpuhp/40 Not tainted 4.6.0-autotest+ #4
      <..snip..>
      Call Trace:
      [c000000f273ff920] [c00000000010493c] __set_cpus_allowed_ptr+0x2cc/0x2e0 (unreliable)
      [c000000f273ffac0] [c0000000000ed4b0] workqueue_cpu_up_callback+0x2c0/0x470
      [c000000f273ffb70] [c0000000000f5c58] notifier_call_chain+0x98/0x100
      [c000000f273ffbc0] [c0000000000c5ed0] __cpu_notify+0x70/0xe0
      [c000000f273ffc00] [c0000000000c6028] notify_online+0x38/0x50
      [c000000f273ffc30] [c0000000000c5214] cpuhp_invoke_callback+0x84/0x250
      [c000000f273ffc90] [c0000000000c562c] cpuhp_up_callbacks+0x5c/0x120
      [c000000f273ffce0] [c0000000000c64d4] cpuhp_thread_fun+0x184/0x1c0
      [c000000f273ffd20] [c0000000000fa050] smpboot_thread_fn+0x290/0x2a0
      [c000000f273ffd80] [c0000000000f45b0] kthread+0x110/0x130
      [c000000f273ffe30] [c000000000009570] ret_from_kernel_thread+0x5c/0x6c
      ---[ end trace 00f1456578b2a3b2 ]---
      
      This patch fixes this by limiting the mask to the intersection of
      the pool affinity and online CPUs.
      
      Changelog-cribbed-from: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      d945b5e9
  6. 20 5月, 2016 2 次提交
  7. 13 5月, 2016 1 次提交
    • W
      workqueue: fix rebind bound workers warning · f7c17d26
      Wanpeng Li 提交于
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 16 at kernel/workqueue.c:4559 rebind_workers+0x1c0/0x1d0
      Modules linked in:
      CPU: 0 PID: 16 Comm: cpuhp/0 Not tainted 4.6.0-rc4+ #31
      Hardware name: IBM IBM System x3550 M4 Server -[7914IUW]-/00Y8603, BIOS -[D7E128FUS-1.40]- 07/23/2013
       0000000000000000 ffff881037babb58 ffffffff8139d885 0000000000000010
       0000000000000000 0000000000000000 0000000000000000 ffff881037babba8
       ffffffff8108505d ffff881037ba0000 000011cf3e7d6e60 0000000000000046
      Call Trace:
       dump_stack+0x89/0xd4
       __warn+0xfd/0x120
       warn_slowpath_null+0x1d/0x20
       rebind_workers+0x1c0/0x1d0
       workqueue_cpu_up_callback+0xf5/0x1d0
       notifier_call_chain+0x64/0x90
       ? trace_hardirqs_on_caller+0xf2/0x220
       ? notify_prepare+0x80/0x80
       __raw_notifier_call_chain+0xe/0x10
       __cpu_notify+0x35/0x50
       notify_down_prepare+0x5e/0x80
       ? notify_prepare+0x80/0x80
       cpuhp_invoke_callback+0x73/0x330
       ? __schedule+0x33e/0x8a0
       cpuhp_down_callbacks+0x51/0xc0
       cpuhp_thread_fun+0xc1/0xf0
       smpboot_thread_fn+0x159/0x2a0
       ? smpboot_create_threads+0x80/0x80
       kthread+0xef/0x110
       ? wait_for_completion+0xf0/0x120
       ? schedule_tail+0x35/0xf0
       ret_from_fork+0x22/0x50
       ? __init_kthread_worker+0x70/0x70
      ---[ end trace eb12ae47d2382d8f ]---
      notify_down_prepare: attempt to take down CPU 0 failed
      
      This bug can be reproduced by below config w/ nohz_full= all cpus:
      
      CONFIG_BOOTPARAM_HOTPLUG_CPU0=y
      CONFIG_DEBUG_HOTPLUG_CPU0=y
      CONFIG_NO_HZ_FULL=y
      
      As Thomas pointed out:
      
      | If a down prepare callback fails, then DOWN_FAILED is invoked for all
      | callbacks which have successfully executed DOWN_PREPARE.
      |
      | But, workqueue has actually two notifiers. One which handles
      | UP/DOWN_FAILED/ONLINE and one which handles DOWN_PREPARE.
      |
      | Now look at the priorities of those callbacks:
      |
      | CPU_PRI_WORKQUEUE_UP        = 5
      | CPU_PRI_WORKQUEUE_DOWN      = -5
      |
      | So the call order on DOWN_PREPARE is:
      |
      | CB 1
      | CB ...
      | CB workqueue_up() -> Ignores DOWN_PREPARE
      | CB ...
      | CB X ---> Fails
      |
      | So we call up to CB X with DOWN_FAILED
      |
      | CB 1
      | CB ...
      | CB workqueue_up() -> Handles DOWN_FAILED
      | CB ...
      | CB X-1
      |
      | So the problem is that the workqueue stuff handles DOWN_FAILED in the up
      | callback, while it should do it in the down callback. Which is not a good idea
      | either because it wants to be called early on rollback...
      |
      | Brilliant stuff, isn't it? The hotplug rework will solve this problem because
      | the callbacks become symetric, but for the existing mess, we need some
      | workaround in the workqueue code.
      
      The boot CPU handles housekeeping duty(unbound timers, workqueues,
      timekeeping, ...) on behalf of full dynticks CPUs. It must remain
      online when nohz full is enabled. There is a priority set to every
      notifier_blocks:
      
      workqueue_cpu_up > tick_nohz_cpu_down > workqueue_cpu_down
      
      So tick_nohz_cpu_down callback failed when down prepare cpu 0, and
      notifier_blocks behind tick_nohz_cpu_down will not be called any
      more, which leads to workers are actually not unbound. Then hotplug
      state machine will fallback to undo and online cpu 0 again. Workers
      will be rebound unconditionally even if they are not unbound and
      trigger the warning in this progress.
      
      This patch fix it by catching !DISASSOCIATED to avoid rebind bound
      workers.
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: stable@vger.kernel.org
      Suggested-by: NLai Jiangshan <jiangshanlai@gmail.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      f7c17d26
  8. 26 4月, 2016 1 次提交
    • R
      workqueue: fix ghost PENDING flag while doing MQ IO · 346c09f8
      Roman Pen 提交于
      The bug in a workqueue leads to a stalled IO request in MQ ctx->rq_list
      with the following backtrace:
      
      [  601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 seconds.
      [  601.347574]       Tainted: G           O    4.4.5-1-storage+ #6
      [  601.347651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  601.348142] kworker/u129:5  D ffff880803077988     0  1636      2 0x00000000
      [  601.348519] Workqueue: ibnbd_server_fileio_wq ibnbd_dev_file_submit_io_worker [ibnbd_server]
      [  601.348999]  ffff880803077988 ffff88080466b900 ffff8808033f9c80 ffff880803078000
      [  601.349662]  ffff880807c95000 7fffffffffffffff ffffffff815b0920 ffff880803077ad0
      [  601.350333]  ffff8808030779a0 ffffffff815b01d5 0000000000000000 ffff880803077a38
      [  601.350965] Call Trace:
      [  601.351203]  [<ffffffff815b0920>] ? bit_wait+0x60/0x60
      [  601.351444]  [<ffffffff815b01d5>] schedule+0x35/0x80
      [  601.351709]  [<ffffffff815b2dd2>] schedule_timeout+0x192/0x230
      [  601.351958]  [<ffffffff812d43f7>] ? blk_flush_plug_list+0xc7/0x220
      [  601.352208]  [<ffffffff810bd737>] ? ktime_get+0x37/0xa0
      [  601.352446]  [<ffffffff815b0920>] ? bit_wait+0x60/0x60
      [  601.352688]  [<ffffffff815af784>] io_schedule_timeout+0xa4/0x110
      [  601.352951]  [<ffffffff815b3a4e>] ? _raw_spin_unlock_irqrestore+0xe/0x10
      [  601.353196]  [<ffffffff815b093b>] bit_wait_io+0x1b/0x70
      [  601.353440]  [<ffffffff815b056d>] __wait_on_bit+0x5d/0x90
      [  601.353689]  [<ffffffff81127bd0>] wait_on_page_bit+0xc0/0xd0
      [  601.353958]  [<ffffffff81096db0>] ? autoremove_wake_function+0x40/0x40
      [  601.354200]  [<ffffffff81127cc4>] __filemap_fdatawait_range+0xe4/0x140
      [  601.354441]  [<ffffffff81127d34>] filemap_fdatawait_range+0x14/0x30
      [  601.354688]  [<ffffffff81129a9f>] filemap_write_and_wait_range+0x3f/0x70
      [  601.354932]  [<ffffffff811ced3b>] blkdev_fsync+0x1b/0x50
      [  601.355193]  [<ffffffff811c82d9>] vfs_fsync_range+0x49/0xa0
      [  601.355432]  [<ffffffff811cf45a>] blkdev_write_iter+0xca/0x100
      [  601.355679]  [<ffffffff81197b1a>] __vfs_write+0xaa/0xe0
      [  601.355925]  [<ffffffff81198379>] vfs_write+0xa9/0x1a0
      [  601.356164]  [<ffffffff811c59d8>] kernel_write+0x38/0x50
      
      The underlying device is a null_blk, with default parameters:
      
        queue_mode    = MQ
        submit_queues = 1
      
      Verification that nullb0 has something inflight:
      
      root@pserver8:~# cat /sys/block/nullb0/inflight
             0        1
      root@pserver8:~# find /sys/block/nullb0/mq/0/cpu* -name rq_list -print -exec cat {} \;
      ...
      /sys/block/nullb0/mq/0/cpu2/rq_list
      CTX pending:
              ffff8838038e2400
      ...
      
      During debug it became clear that stalled request is always inserted in
      the rq_list from the following path:
      
         save_stack_trace_tsk + 34
         blk_mq_insert_requests + 231
         blk_mq_flush_plug_list + 281
         blk_flush_plug_list + 199
         wait_on_page_bit + 192
         __filemap_fdatawait_range + 228
         filemap_fdatawait_range + 20
         filemap_write_and_wait_range + 63
         blkdev_fsync + 27
         vfs_fsync_range + 73
         blkdev_write_iter + 202
         __vfs_write + 170
         vfs_write + 169
         kernel_write + 56
      
      So blk_flush_plug_list() was called with from_schedule == true.
      
      If from_schedule is true, that means that finally blk_mq_insert_requests()
      offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue,
      i.e. it calls kblockd_schedule_delayed_work_on().
      
      That means, that we race with another CPU, which is about to execute
      __blk_mq_run_hw_queue() work.
      
      Further debugging shows the following traces from different CPUs:
      
        CPU#0                                  CPU#1
        ----------------------------------     -------------------------------
        reqeust A inserted
        STORE hctx->ctx_map[0] bit marked
        kblockd_schedule...() returns 1
        <schedule to kblockd workqueue>
                                               request B inserted
                                               STORE hctx->ctx_map[1] bit marked
                                               kblockd_schedule...() returns 0
        *** WORK PENDING bit is cleared ***
        flush_busy_ctxs() is executed, but
        bit 1, set by CPU#1, is not observed
      
      As a result request B pended forever.
      
      This behaviour can be explained by speculative LOAD of hctx->ctx_map on
      CPU#0, which is reordered with clear of PENDING bit and executed _before_
      actual STORE of bit 1 on CPU#1.
      
      The proper fix is an explicit full barrier <mfence>, which guarantees
      that clear of PENDING bit is to be executed before all possible
      speculative LOADS or STORES inside actual work function.
      Signed-off-by: NRoman Pen <roman.penyaev@profitbricks.com>
      Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
      Cc: Michael Wang <yun.wang@profitbricks.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NTejun Heo <tj@kernel.org>
      346c09f8
  9. 16 3月, 2016 1 次提交
    • P
      tags: Fix DEFINE_PER_CPU expansions · 25528213
      Peter Zijlstra 提交于
      $ make tags
        GEN     tags
      ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name pattern "\1"
      ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name pattern "\1"
      ctags: Warning: kernel/locking/lockdep.c:151: null expansion of name pattern "\1"
      ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern "\1"
      ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern "\1"
      ctags: Warning: kernel/workqueue.c:323: null expansion of name pattern "\1"
      ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1"
      ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1"
      ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1"
      
      Which are all the result of the DEFINE_PER_CPU pattern:
      
        scripts/tags.sh:200:	'/\<DEFINE_PER_CPU([^,]*, *\([[:alnum:]_]*\)/\1/v/'
        scripts/tags.sh:201:	'/\<DEFINE_PER_CPU_SHARED_ALIGNED([^,]*, *\([[:alnum:]_]*\)/\1/v/'
      
      The below cures them. All except the workqueue one are within reasonable
      distance of the 80 char limit. TJ do you have any preference on how to
      fix the wq one, or shall we just not care its too long?
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      25528213
  10. 12 3月, 2016 1 次提交
  11. 02 3月, 2016 1 次提交
  12. 18 2月, 2016 1 次提交
  13. 11 2月, 2016 1 次提交
    • T
      workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup · d6e022f1
      Tejun Heo 提交于
      When looking up the pool_workqueue to use for an unbound workqueue,
      workqueue assumes that the target CPU is always bound to a valid NUMA
      node.  However, currently, when a CPU goes offline, the mapping is
      destroyed and cpu_to_node() returns NUMA_NO_NODE.
      
      This has always been broken but hasn't triggered often enough before
      874bbfe6 ("workqueue: make sure delayed work run in local cpu").
      After the commit, workqueue forcifully assigns the local CPU for
      delayed work items without explicit target CPU to fix a different
      issue.  This widens the window where CPU can go offline while a
      delayed work item is pending causing delayed work items dispatched
      with target CPU set to an already offlined CPU.  The resulting
      NUMA_NO_NODE mapping makes workqueue try to queue the work item on a
      NULL pool_workqueue and thus crash.
      
      While 874bbfe6 has been reverted for a different reason making the
      bug less visible again, it can still happen.  Fix it by mapping
      NUMA_NO_NODE to the default pool_workqueue from unbound_pwq_by_node().
      This is a temporary workaround.  The long term solution is keeping CPU
      -> NODE mapping stable across CPU off/online cycles which is being
      worked on.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NMike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/g/1454424264.11183.46.camel@gmail.com
      Link: http://lkml.kernel.org/g/1453702100-2597-1-git-send-email-tangchen@cn.fujitsu.com
      d6e022f1
  14. 10 2月, 2016 3 次提交
    • T
      workqueue: implement "workqueue.debug_force_rr_cpu" debug feature · f303fccb
      Tejun Heo 提交于
      Workqueue used to guarantee local execution for work items queued
      without explicit target CPU.  The guarantee is gone now which can
      break some usages in subtle ways.  To flush out those cases, this
      patch implements a debug feature which forces round-robin CPU
      selection for all such work items.
      
      The debug feature defaults to off and can be enabled with a kernel
      parameter.  The default can be flipped with a debug config option.
      
      If you hit this commit during bisection, please refer to 041bd12e
      ("Revert "workqueue: make sure delayed work run in local cpu"") for
      more information and ping me.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f303fccb
    • M
      workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs · ef557180
      Mike Galbraith 提交于
      WORK_CPU_UNBOUND work items queued to a bound workqueue always run
      locally.  This is a good thing normally, but not when the user has
      asked us to keep unbound work away from certain CPUs.  Round robin
      these to wq_unbound_cpumask CPUs instead, as perturbation avoidance
      trumps performance.
      
      tj: Cosmetic and comment changes.  WARN_ON_ONCE() dropped from empty
          (wq_unbound_cpumask AND cpu_online_mask).  If we want that, it
          should be done when config changes.
      Signed-off-by: NMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      ef557180
    • T
      Revert "workqueue: make sure delayed work run in local cpu" · 041bd12e
      Tejun Heo 提交于
      This reverts commit 874bbfe6.
      
      Workqueue used to implicity guarantee that work items queued without
      explicit CPU specified are put on the local CPU.  Recent changes in
      timer broke the guarantee and led to vmstat breakage which was fixed
      by 176bed1d ("vmstat: explicitly schedule per-cpu work on the CPU
      we need it to run on").
      
      vmstat is the most likely to expose the issue and it's quite possible
      that there are other similar problems which are a lot more difficult
      to trigger.  As a preventive measure, 874bbfe6 ("workqueue: make
      sure delayed work run in local cpu") was applied to restore the local
      CPU guarnatee.  Unfortunately, the change exposed a bug in timer code
      which got fixed by 22b886dd ("timers: Use proper base migration in
      add_timer_on()").  Due to code restructuring, the commit couldn't be
      backported beyond certain point and stable kernels which only had
      874bbfe6 started crashing.
      
      The local CPU guarantee was accidental more than anything else and we
      want to get rid of it anyway.  As, with the vmstat case fixed,
      874bbfe6 is causing more problems than it's fixing, it has been
      decided to take the chance and officially break the guarantee by
      reverting the commit.  A debug feature will be added to force foreign
      CPU assignment to expose cases relying on the guarantee and fixes for
      the individual cases will be backported to stable as necessary.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Fixes: 874bbfe6 ("workqueue: make sure delayed work run in local cpu")
      Link: http://lkml.kernel.org/g/20160120211926.GJ10810@quack.suse.cz
      Cc: stable@vger.kernel.org
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
      Cc: Daniel Bilik <daniel.bilik@neosystem.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Daniel Bilik <daniel.bilik@neosystem.cz>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      041bd12e
  15. 30 1月, 2016 1 次提交
    • T
      workqueue: skip flush dependency checks for legacy workqueues · 23d11a58
      Tejun Heo 提交于
      fca839c0 ("workqueue: warn if memory reclaim tries to flush
      !WQ_MEM_RECLAIM workqueue") implemented flush dependency warning which
      triggers if a PF_MEMALLOC task or WQ_MEM_RECLAIM workqueue tries to
      flush a !WQ_MEM_RECLAIM workquee.
      
      This assumes that workqueues marked with WQ_MEM_RECLAIM sit in memory
      reclaim path and making it depend on something which may need more
      memory to make forward progress can lead to deadlocks.  Unfortunately,
      workqueues created with the legacy create*_workqueue() interface
      always have WQ_MEM_RECLAIM regardless of whether they are depended
      upon memory reclaim or not.  These spurious WQ_MEM_RECLAIM markings
      cause spurious triggering of the flush dependency checks.
      
        WARNING: CPU: 0 PID: 6 at kernel/workqueue.c:2361 check_flush_dependency+0x138/0x144()
        workqueue: WQ_MEM_RECLAIM deferwq:deferred_probe_work_func is flushing !WQ_MEM_RECLAIM events:lru_add_drain_per_cpu
        ...
        Workqueue: deferwq deferred_probe_work_func
        [<c0017acc>] (unwind_backtrace) from [<c0013134>] (show_stack+0x10/0x14)
        [<c0013134>] (show_stack) from [<c0245f18>] (dump_stack+0x94/0xd4)
        [<c0245f18>] (dump_stack) from [<c0026f9c>] (warn_slowpath_common+0x80/0xb0)
        [<c0026f9c>] (warn_slowpath_common) from [<c0026ffc>] (warn_slowpath_fmt+0x30/0x40)
        [<c0026ffc>] (warn_slowpath_fmt) from [<c00390b8>] (check_flush_dependency+0x138/0x144)
        [<c00390b8>] (check_flush_dependency) from [<c0039ca0>] (flush_work+0x50/0x15c)
        [<c0039ca0>] (flush_work) from [<c00c51b0>] (lru_add_drain_all+0x130/0x180)
        [<c00c51b0>] (lru_add_drain_all) from [<c00f728c>] (migrate_prep+0x8/0x10)
        [<c00f728c>] (migrate_prep) from [<c00bfbc4>] (alloc_contig_range+0xd8/0x338)
        [<c00bfbc4>] (alloc_contig_range) from [<c00f8f18>] (cma_alloc+0xe0/0x1ac)
        [<c00f8f18>] (cma_alloc) from [<c001cac4>] (__alloc_from_contiguous+0x38/0xd8)
        [<c001cac4>] (__alloc_from_contiguous) from [<c001ceb4>] (__dma_alloc+0x240/0x278)
        [<c001ceb4>] (__dma_alloc) from [<c001cf78>] (arm_dma_alloc+0x54/0x5c)
        [<c001cf78>] (arm_dma_alloc) from [<c0355ea4>] (dmam_alloc_coherent+0xc0/0xec)
        [<c0355ea4>] (dmam_alloc_coherent) from [<c039cc4c>] (ahci_port_start+0x150/0x1dc)
        [<c039cc4c>] (ahci_port_start) from [<c0384734>] (ata_host_start.part.3+0xc8/0x1c8)
        [<c0384734>] (ata_host_start.part.3) from [<c03898dc>] (ata_host_activate+0x50/0x148)
        [<c03898dc>] (ata_host_activate) from [<c039d558>] (ahci_host_activate+0x44/0x114)
        [<c039d558>] (ahci_host_activate) from [<c039f05c>] (ahci_platform_init_host+0x1d8/0x3c8)
        [<c039f05c>] (ahci_platform_init_host) from [<c039e6bc>] (tegra_ahci_probe+0x448/0x4e8)
        [<c039e6bc>] (tegra_ahci_probe) from [<c0347058>] (platform_drv_probe+0x50/0xac)
        [<c0347058>] (platform_drv_probe) from [<c03458cc>] (driver_probe_device+0x214/0x2c0)
        [<c03458cc>] (driver_probe_device) from [<c0343cc0>] (bus_for_each_drv+0x60/0x94)
        [<c0343cc0>] (bus_for_each_drv) from [<c03455d8>] (__device_attach+0xb0/0x114)
        [<c03455d8>] (__device_attach) from [<c0344ab8>] (bus_probe_device+0x84/0x8c)
        [<c0344ab8>] (bus_probe_device) from [<c0344f48>] (deferred_probe_work_func+0x68/0x98)
        [<c0344f48>] (deferred_probe_work_func) from [<c003b738>] (process_one_work+0x120/0x3f8)
        [<c003b738>] (process_one_work) from [<c003ba48>] (worker_thread+0x38/0x55c)
        [<c003ba48>] (worker_thread) from [<c0040f14>] (kthread+0xdc/0xf4)
        [<c0040f14>] (kthread) from [<c000f778>] (ret_from_fork+0x14/0x3c)
      
      Fix it by marking workqueues created via create*_workqueue() with
      __WQ_LEGACY and disabling flush dependency checks on them.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-tested-by: NThierry Reding <thierry.reding@gmail.com>
      Link: http://lkml.kernel.org/g/20160126173843.GA11115@ulmo.nvidia.com
      Fixes: fca839c0 ("workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue")
      23d11a58
  16. 08 1月, 2016 1 次提交
  17. 09 12月, 2015 2 次提交
    • T
      workqueue: implement lockup detector · 82607adc
      Tejun Heo 提交于
      Workqueue stalls can happen from a variety of usage bugs such as
      missing WQ_MEM_RECLAIM flag or concurrency managed work item
      indefinitely staying RUNNING.  These stalls can be extremely difficult
      to hunt down because the usual warning mechanisms can't detect
      workqueue stalls and the internal state is pretty opaque.
      
      To alleviate the situation, this patch implements workqueue lockup
      detector.  It periodically monitors all worker_pools periodically and,
      if any pool failed to make forward progress longer than the threshold
      duration, triggers warning and dumps workqueue state as follows.
      
       BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 31s!
       Showing busy workqueues and worker pools:
       workqueue events: flags=0x0
         pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=17/256
           pending: monkey_wrench_fn, e1000_watchdog, cache_reap, vmstat_shepherd, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, cgroup_release_agent
       workqueue events_power_efficient: flags=0x80
         pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
           pending: check_lifetime, neigh_periodic_work
       workqueue cgroup_pidlist_destroy: flags=0x0
         pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
           pending: cgroup_pidlist_destroy_work_fn
       ...
      
      The detection mechanism is controller through kernel parameter
      workqueue.watchdog_thresh and can be updated at runtime through the
      sysfs module parameter file.
      
      v2: Decoupled from softlockup control knobs.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Chris Mason <clm@fb.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      82607adc
    • T
      workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue · fca839c0
      Tejun Heo 提交于
      Task or work item involved in memory reclaim trying to flush a
      non-WQ_MEM_RECLAIM workqueue or one of its work items can lead to
      deadlock.  Trigger WARN_ONCE() if such conditions are detected.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      fca839c0
  18. 13 10月, 2015 1 次提交
  19. 01 10月, 2015 1 次提交
    • S
      workqueue: make sure delayed work run in local cpu · 874bbfe6
      Shaohua Li 提交于
      My system keeps crashing with below message. vmstat_update() schedules a delayed
      work in current cpu and expects the work runs in the cpu.
      schedule_delayed_work() is expected to make delayed work run in local cpu. The
      problem is timer can be migrated with NO_HZ. __queue_work() queues work in
      timer handler, which could run in a different cpu other than where the delayed
      work is scheduled. The end result is the delayed work runs in different cpu.
      The patch makes __queue_delayed_work records local cpu earlier. Where the timer
      runs doesn't change where the work runs with the change.
      
      [   28.010131] ------------[ cut here ]------------
      [   28.010609] kernel BUG at ../mm/vmstat.c:1392!
      [   28.011099] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
      [   28.011860] Modules linked in:
      [   28.012245] CPU: 0 PID: 289 Comm: kworker/0:3 Tainted: G        W4.3.0-rc3+ #634
      [   28.013065] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153802- 04/01/2014
      [   28.014160] Workqueue: events vmstat_update
      [   28.014571] task: ffff880117682580 ti: ffff8800ba428000 task.ti: ffff8800ba428000
      [   28.015445] RIP: 0010:[<ffffffff8115f921>]  [<ffffffff8115f921>]vmstat_update+0x31/0x80
      [   28.016282] RSP: 0018:ffff8800ba42fd80  EFLAGS: 00010297
      [   28.016812] RAX: 0000000000000000 RBX: ffff88011a858dc0 RCX:0000000000000000
      [   28.017585] RDX: ffff880117682580 RSI: ffffffff81f14d8c RDI:ffffffff81f4df8d
      [   28.018366] RBP: ffff8800ba42fd90 R08: 0000000000000001 R09:0000000000000000
      [   28.019169] R10: 0000000000000000 R11: 0000000000000121 R12:ffff8800baa9f640
      [   28.019947] R13: ffff88011a81e340 R14: ffff88011a823700 R15:0000000000000000
      [   28.020071] FS:  0000000000000000(0000) GS:ffff88011a800000(0000)knlGS:0000000000000000
      [   28.020071] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [   28.020071] CR2: 00007ff6144b01d0 CR3: 00000000b8e93000 CR4:00000000000006f0
      [   28.020071] Stack:
      [   28.020071]  ffff88011a858dc0 ffff8800baa9f640 ffff8800ba42fe00ffffffff8106bd88
      [   28.020071]  ffffffff8106bd0b 0000000000000096 0000000000000000ffffffff82f9b1e8
      [   28.020071]  ffffffff829f0b10 0000000000000000 ffffffff81f18460ffff88011a81e340
      [   28.020071] Call Trace:
      [   28.020071]  [<ffffffff8106bd88>] process_one_work+0x1c8/0x540
      [   28.020071]  [<ffffffff8106bd0b>] ? process_one_work+0x14b/0x540
      [   28.020071]  [<ffffffff8106c214>] worker_thread+0x114/0x460
      [   28.020071]  [<ffffffff8106c100>] ? process_one_work+0x540/0x540
      [   28.020071]  [<ffffffff81071bf8>] kthread+0xf8/0x110
      [   28.020071]  [<ffffffff81071b00>] ?kthread_create_on_node+0x200/0x200
      [   28.020071]  [<ffffffff81a6522f>] ret_from_fork+0x3f/0x70
      [   28.020071]  [<ffffffff81071b00>] ?kthread_create_on_node+0x200/0x200
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org # v2.6.31+
      874bbfe6
  20. 12 8月, 2015 1 次提交
  21. 05 8月, 2015 1 次提交
  22. 23 7月, 2015 1 次提交
  23. 29 5月, 2015 1 次提交
  24. 28 5月, 2015 1 次提交
  25. 22 5月, 2015 3 次提交
  26. 20 5月, 2015 2 次提交
    • L
      workqueue: ensure attrs changes are properly synchronized · d4d3e257
      Lai Jiangshan 提交于
      Current modification to attrs via sysfs is not fully synchronized.
      
      Process A (change cpumask)      | Process B (change numa affinity)
      wq_cpumask_store()              |
        wq_sysfs_prep_attrs()         |
                                      | apply_workqueue_attrs()
        apply_workqueue_attrs()       |
      
      It results that the Process B's operation is totally reverted
      without any notification, it is a buggy behavior.  So this patch
      moves wq_sysfs_prep_attrs() into the protection under wq_pool_mutex
      to ensure attrs changes are properly synchronized.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      d4d3e257
    • L
      workqueue: separate out and refactor the locking of applying attrs · a0111cf6
      Lai Jiangshan 提交于
      Applying attrs requires two locks: get_online_cpus() and wq_pool_mutex,
      and this code is duplicated at two places (apply_workqueue_attrs() and
      workqueue_set_unbound_cpumask()).  So we separate out this locking
      code into apply_wqattrs_[un]lock() and do a minor refactor on
      apply_workqueue_attrs().
      
      The apply_wqattrs_[un]lock() will be also used on later patch for
      ensuring attrs changes are properly synchronized.
      
      tj: minor updates to comments
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a0111cf6
  27. 19 5月, 2015 2 次提交
    • L
      workqueue: simplify wq_update_unbound_numa() · f7142ed4
      Lai Jiangshan 提交于
      wq_update_unbound_numa() is known be called with wq_pool_mutex held.
      
      But wq_update_unbound_numa() requests wq->mutex before reading
      wq->unbound_attrs, wq->numa_pwq_tbl[] and wq->dfl_pwq.  But these fields
      were changed to be allowed being read with wq_pool_mutex held.  So we
      simply remove the mutex_lock(&wq->mutex).
      
      Without the dependence on the the mutex_lock(&wq->mutex), the test
      of wq->unbound_attrs->no_numa can also be moved upward.
      
      The old code need a long comment to describe the stableness of
      @wq->unbound_attrs which is also guaranteed by wq_pool_mutex now,
      so we don't need this such comment.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f7142ed4
    • L
      workqueue: wq_pool_mutex protects the attrs-installation · 5b95e1af
      Lai Jiangshan 提交于
      Current wq_pool_mutex doesn't proctect the attrs-installation, it results
      that ->unbound_attrs, ->numa_pwq_tbl[] and ->dfl_pwq can only be accessed
      under wq->mutex and causes some inconveniences. Example, wq_update_unbound_numa()
      has to acquire wq->mutex before fetching the wq->unbound_attrs->no_numa
      and the old_pwq.
      
      attrs-installation is a short operation, so this change will no cause any
      latency for other operations which also acquire the wq_pool_mutex.
      
      The only unprotected attrs-installation code is in apply_workqueue_attrs(),
      so this patch touches code less than comments.
      
      It is also a preparation patch for next several patches which read
      wq->unbound_attrs, wq->numa_pwq_tbl[] and wq->dfl_pwq with
      only wq_pool_mutex held.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      5b95e1af
  28. 13 5月, 2015 1 次提交
  29. 11 5月, 2015 1 次提交
  30. 30 4月, 2015 1 次提交
    • L
      workqueue: Allow modifying low level unbound workqueue cpumask · 042f7df1
      Lai Jiangshan 提交于
      Allow to modify the low-level unbound workqueues cpumask through
      sysfs. This is performed by traversing the entire workqueue list
      and calling apply_wqattrs_prepare() on the unbound workqueues
      with the new low level mask. Only after all the preparation are done,
      we commit them all together.
      
      Ordered workqueues are ignored from the low level unbound workqueue
      cpumask, it will be handled in near future.
      
      All the (default & per-node) pwqs are mandatorily controlled by
      the low level cpumask. If the user configured cpumask doesn't overlap
      with the low level cpumask, the low level cpumask will be used for the
      wq instead.
      
      The comment of wq_calc_node_cpumask() is updated and explicitly
      requires that its first argument should be the attrs of the default
      pwq.
      
      The default wq_unbound_cpumask is cpu_possible_mask.  The workqueue
      subsystem doesn't know its best default value, let the system manager
      or the other subsystem set it when needed.
      
      Changed from V8:
        merge the calculating code for the attrs of the default pwq together.
        minor change the code&comments for saving the user configured attrs.
        remove unnecessary list_del().
        minor update the comment of wq_calc_node_cpumask().
        update the comment of workqueue_set_unbound_cpumask();
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Original-patch-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      042f7df1
  31. 27 4月, 2015 2 次提交
    • F
      workqueue: Create low-level unbound workqueues cpumask · b05a7928
      Frederic Weisbecker 提交于
      Create a cpumask that limits the affinity of all unbound workqueues.
      This cpumask is controlled through a file at the root of the workqueue
      sysfs directory.
      
      It works on a lower-level than the per WQ_SYSFS workqueues cpumask files
      such that the effective cpumask applied for a given unbound workqueue is
      the intersection of /sys/devices/virtual/workqueue/$WORKQUEUE/cpumask and
      the new /sys/devices/virtual/workqueue/cpumask file.
      
      This patch implements the basic infrastructure and the read interface.
      wq_unbound_cpumask is initially set to cpu_possible_mask.
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      b05a7928
    • L
      workqueue: split apply_workqueue_attrs() into 3 stages · 2d5f0764
      Lai Jiangshan 提交于
      Current apply_workqueue_attrs() includes pwqs-allocation and pwqs-installation,
      so when we batch multiple apply_workqueue_attrs()s as a transaction, we can't
      ensure the transaction must succeed or fail as a complete unit.
      
      To solve this, we split apply_workqueue_attrs() into three stages.
      The first stage does the preparation: allocation memory, pwqs.
      The second stage does the attrs-installaion and pwqs-installation.
      The third stage frees the allocated memory and (old or unused) pwqs.
      
      As the result, batching multiple apply_workqueue_attrs()s can
      succeed or fail as a complete unit:
      	1) batch do all the first stage for all the workqueues
      	2) only commit all when all the above succeed.
      
      This patch is a preparation for the next patch ("Allow modifying low level
      unbound workqueue cpumask") which will do a multiple apply_workqueue_attrs().
      
      The patch doesn't have functionality changed except two minor adjustment:
      	1) free_unbound_pwq() for the error path is removed, we use the
      	   heavier version put_pwq_unlocked() instead since the error path
      	   is rare. this adjustment simplifies the code.
      	2) the memory-allocation is also moved into wq_pool_mutex.
      	   this is needed to avoid to do the further splitting.
      
      tj: minor updates to comments.
      Suggested-by: NTejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      2d5f0764