1. 26 7月, 2017 1 次提交
    • T
      workqueue: implicit ordered attribute should be overridable · 0a94efb5
      Tejun Heo 提交于
      5c0338c6 ("workqueue: restore WQ_UNBOUND/max_active==1 to be
      ordered") automatically enabled ordered attribute for unbound
      workqueues w/ max_active == 1.  Because ordered workqueues reject
      max_active and some attribute changes, this implicit ordered mode
      broke cases where the user creates an unbound workqueue w/ max_active
      == 1 and later explicitly changes the related attributes.
      
      This patch distinguishes explicit and implicit ordered setting and
      overrides from attribute changes if implict.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Fixes: 5c0338c6 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
      0a94efb5
  2. 15 4月, 2017 1 次提交
    • T
      workqueue: Provide work_on_cpu_safe() · 0e8d6a93
      Thomas Gleixner 提交于
      work_on_cpu() is not protected against CPU hotplug. For code which requires
      to be either executed on an online CPU or to fail if the CPU is not
      available the callsite would have to protect against CPU hotplug.
      
      Provide a function which does get/put_online_cpus() around the call to
      work_on_cpu() and fails the call with -ENODEV if the target CPU is not
      online.
      
      Preparatory patch to convert several racy task affinity manipulations.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Len Brown <lenb@kernel.org>
      Link: http://lkml.kernel.org/r/20170412201042.262610721@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      0e8d6a93
  3. 03 2月, 2017 1 次提交
    • A
      workqueue: avoid clang warning · a45463cb
      Arnd Bergmann 提交于
      Building with clang shows lots of warning like:
      
      drivers/amba/bus.c:447:8: warning: implicit conversion from 'long long' to 'int' changes value from 4294967248 to -48
            [-Wconstant-conversion]
      static DECLARE_DELAYED_WORK(deferred_retry_work, amba_deferred_retry_func);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      include/linux/workqueue.h:187:26: note: expanded from macro 'DECLARE_DELAYED_WORK'
              struct delayed_work n = __DELAYED_WORK_INITIALIZER(n, f, 0)
                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      include/linux/workqueue.h:177:10: note: expanded from macro '__DELAYED_WORK_INITIALIZER'
              .work = __WORK_INITIALIZER((n).work, (f)),                      \
                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      include/linux/workqueue.h:170:10: note: expanded from macro '__WORK_INITIALIZER'
              .data = WORK_DATA_STATIC_INIT(),                                \
                      ^~~~~~~~~~~~~~~~~~~~~~~
      include/linux/workqueue.h:111:39: note: expanded from macro 'WORK_DATA_STATIC_INIT'
              ATOMIC_LONG_INIT(WORK_STRUCT_NO_POOL | WORK_STRUCT_STATIC)
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
      include/asm-generic/atomic-long.h:32:41: note: expanded from macro 'ATOMIC_LONG_INIT'
       #define ATOMIC_LONG_INIT(i)     ATOMIC_INIT(i)
                                      ~~~~~~~~~~~~^~
      arch/arm/include/asm/atomic.h:21:27: note: expanded from macro 'ATOMIC_INIT'
       #define ATOMIC_INIT(i)  { (i) }
                              ~  ^
      
      This makes the type cast explicit, which shuts up the warning.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a45463cb
  4. 29 10月, 2016 1 次提交
  5. 18 9月, 2016 2 次提交
    • T
      workqueue: remove keventd_up() · 863b710b
      Tejun Heo 提交于
      keventd_up() no longer has in-kernel users.  Remove it and make
      wq_online static.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      863b710b
    • T
      workqueue: make workqueue available early during boot · 3347fa09
      Tejun Heo 提交于
      Workqueue is currently initialized in an early init call; however,
      there are cases where early boot code has to be split and reordered to
      come after workqueue initialization or the same code path which makes
      use of workqueues is used both before workqueue initailization and
      after.  The latter cases have to gate workqueue usages with
      keventd_up() tests, which is nasty and easy to get wrong.
      
      Workqueue usages have become widespread and it'd be a lot more
      convenient if it can be used very early from boot.  This patch splits
      workqueue initialization into two steps.  workqueue_init_early() which
      sets up the basic data structures so that workqueues can be created
      and work items queued, and workqueue_init() which actually brings up
      workqueues online and starts executing queued work items.  The former
      step can be done very early during boot once memory allocation,
      cpumasks and idr are initialized.  The latter right after kthreads
      become available.
      
      This allows work item queueing and canceling from very early boot
      which is what most of these use cases want.
      
      * As systemd_wq being initialized doesn't indicate that workqueue is
        fully online anymore, update keventd_up() to test wq_online instead.
        The follow-up patches will get rid of all its usages and the
        function itself.
      
      * Flushing doesn't make sense before workqueue is fully initialized.
        The flush functions trigger WARN and return immediately before fully
        online.
      
      * Work items are never in-flight before fully online.  Canceling can
        always succeed by skipping the flush step.
      
      * Some code paths can no longer assume to be called with irq enabled
        as irq is disabled during early boot.  Use irqsave/restore
        operations instead.
      
      v2: Watchdog init, which requires timer to be running, moved from
          workqueue_init_early() to workqueue_init().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/CA+55aFx0vPuMuxn00rBSM192n-Du5uxy+4AvKa0SBSOVJeuCGg@mail.gmail.com
      3347fa09
  6. 29 8月, 2016 1 次提交
  7. 14 7月, 2016 1 次提交
  8. 30 1月, 2016 1 次提交
    • T
      workqueue: skip flush dependency checks for legacy workqueues · 23d11a58
      Tejun Heo 提交于
      fca839c0 ("workqueue: warn if memory reclaim tries to flush
      !WQ_MEM_RECLAIM workqueue") implemented flush dependency warning which
      triggers if a PF_MEMALLOC task or WQ_MEM_RECLAIM workqueue tries to
      flush a !WQ_MEM_RECLAIM workquee.
      
      This assumes that workqueues marked with WQ_MEM_RECLAIM sit in memory
      reclaim path and making it depend on something which may need more
      memory to make forward progress can lead to deadlocks.  Unfortunately,
      workqueues created with the legacy create*_workqueue() interface
      always have WQ_MEM_RECLAIM regardless of whether they are depended
      upon memory reclaim or not.  These spurious WQ_MEM_RECLAIM markings
      cause spurious triggering of the flush dependency checks.
      
        WARNING: CPU: 0 PID: 6 at kernel/workqueue.c:2361 check_flush_dependency+0x138/0x144()
        workqueue: WQ_MEM_RECLAIM deferwq:deferred_probe_work_func is flushing !WQ_MEM_RECLAIM events:lru_add_drain_per_cpu
        ...
        Workqueue: deferwq deferred_probe_work_func
        [<c0017acc>] (unwind_backtrace) from [<c0013134>] (show_stack+0x10/0x14)
        [<c0013134>] (show_stack) from [<c0245f18>] (dump_stack+0x94/0xd4)
        [<c0245f18>] (dump_stack) from [<c0026f9c>] (warn_slowpath_common+0x80/0xb0)
        [<c0026f9c>] (warn_slowpath_common) from [<c0026ffc>] (warn_slowpath_fmt+0x30/0x40)
        [<c0026ffc>] (warn_slowpath_fmt) from [<c00390b8>] (check_flush_dependency+0x138/0x144)
        [<c00390b8>] (check_flush_dependency) from [<c0039ca0>] (flush_work+0x50/0x15c)
        [<c0039ca0>] (flush_work) from [<c00c51b0>] (lru_add_drain_all+0x130/0x180)
        [<c00c51b0>] (lru_add_drain_all) from [<c00f728c>] (migrate_prep+0x8/0x10)
        [<c00f728c>] (migrate_prep) from [<c00bfbc4>] (alloc_contig_range+0xd8/0x338)
        [<c00bfbc4>] (alloc_contig_range) from [<c00f8f18>] (cma_alloc+0xe0/0x1ac)
        [<c00f8f18>] (cma_alloc) from [<c001cac4>] (__alloc_from_contiguous+0x38/0xd8)
        [<c001cac4>] (__alloc_from_contiguous) from [<c001ceb4>] (__dma_alloc+0x240/0x278)
        [<c001ceb4>] (__dma_alloc) from [<c001cf78>] (arm_dma_alloc+0x54/0x5c)
        [<c001cf78>] (arm_dma_alloc) from [<c0355ea4>] (dmam_alloc_coherent+0xc0/0xec)
        [<c0355ea4>] (dmam_alloc_coherent) from [<c039cc4c>] (ahci_port_start+0x150/0x1dc)
        [<c039cc4c>] (ahci_port_start) from [<c0384734>] (ata_host_start.part.3+0xc8/0x1c8)
        [<c0384734>] (ata_host_start.part.3) from [<c03898dc>] (ata_host_activate+0x50/0x148)
        [<c03898dc>] (ata_host_activate) from [<c039d558>] (ahci_host_activate+0x44/0x114)
        [<c039d558>] (ahci_host_activate) from [<c039f05c>] (ahci_platform_init_host+0x1d8/0x3c8)
        [<c039f05c>] (ahci_platform_init_host) from [<c039e6bc>] (tegra_ahci_probe+0x448/0x4e8)
        [<c039e6bc>] (tegra_ahci_probe) from [<c0347058>] (platform_drv_probe+0x50/0xac)
        [<c0347058>] (platform_drv_probe) from [<c03458cc>] (driver_probe_device+0x214/0x2c0)
        [<c03458cc>] (driver_probe_device) from [<c0343cc0>] (bus_for_each_drv+0x60/0x94)
        [<c0343cc0>] (bus_for_each_drv) from [<c03455d8>] (__device_attach+0xb0/0x114)
        [<c03455d8>] (__device_attach) from [<c0344ab8>] (bus_probe_device+0x84/0x8c)
        [<c0344ab8>] (bus_probe_device) from [<c0344f48>] (deferred_probe_work_func+0x68/0x98)
        [<c0344f48>] (deferred_probe_work_func) from [<c003b738>] (process_one_work+0x120/0x3f8)
        [<c003b738>] (process_one_work) from [<c003ba48>] (worker_thread+0x38/0x55c)
        [<c003ba48>] (worker_thread) from [<c0040f14>] (kthread+0xdc/0xf4)
        [<c0040f14>] (kthread) from [<c000f778>] (ret_from_fork+0x14/0x3c)
      
      Fix it by marking workqueues created via create*_workqueue() with
      __WQ_LEGACY and disabling flush dependency checks on them.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-tested-by: NThierry Reding <thierry.reding@gmail.com>
      Link: http://lkml.kernel.org/g/20160126173843.GA11115@ulmo.nvidia.com
      Fixes: fca839c0 ("workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue")
      23d11a58
  9. 09 12月, 2015 1 次提交
    • T
      workqueue: implement lockup detector · 82607adc
      Tejun Heo 提交于
      Workqueue stalls can happen from a variety of usage bugs such as
      missing WQ_MEM_RECLAIM flag or concurrency managed work item
      indefinitely staying RUNNING.  These stalls can be extremely difficult
      to hunt down because the usual warning mechanisms can't detect
      workqueue stalls and the internal state is pretty opaque.
      
      To alleviate the situation, this patch implements workqueue lockup
      detector.  It periodically monitors all worker_pools periodically and,
      if any pool failed to make forward progress longer than the threshold
      duration, triggers warning and dumps workqueue state as follows.
      
       BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 31s!
       Showing busy workqueues and worker pools:
       workqueue events: flags=0x0
         pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=17/256
           pending: monkey_wrench_fn, e1000_watchdog, cache_reap, vmstat_shepherd, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, cgroup_release_agent
       workqueue events_power_efficient: flags=0x80
         pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
           pending: check_lifetime, neigh_periodic_work
       workqueue cgroup_pidlist_destroy: flags=0x0
         pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
           pending: cgroup_pidlist_destroy_work_fn
       ...
      
      The detection mechanism is controller through kernel parameter
      workqueue.watchdog_thresh and can be updated at runtime through the
      sysfs module parameter file.
      
      v2: Decoupled from softlockup control knobs.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Chris Mason <clm@fb.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      82607adc
  10. 18 8月, 2015 1 次提交
    • J
      workqueue: fix some docbook warnings · 355c0663
      Jonathan Corbet 提交于
      There are some errors in the docbook comments in workqueue.h that cause
      warnings when the docs are built; this only recently came to light because
      these comments were not used until now.  Fix the comments to make the
      warnings go away.
      
      The "args..." "fix" is a hack.  kerneldoc doesn't deal properly with named
      variadic arguments in macros, so all I've really achieved here is to make
      it shut up.  Fixing kerneldoc will have to wait for more time.
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      355c0663
  11. 22 5月, 2015 1 次提交
  12. 30 4月, 2015 1 次提交
    • L
      workqueue: Allow modifying low level unbound workqueue cpumask · 042f7df1
      Lai Jiangshan 提交于
      Allow to modify the low-level unbound workqueues cpumask through
      sysfs. This is performed by traversing the entire workqueue list
      and calling apply_wqattrs_prepare() on the unbound workqueues
      with the new low level mask. Only after all the preparation are done,
      we commit them all together.
      
      Ordered workqueues are ignored from the low level unbound workqueue
      cpumask, it will be handled in near future.
      
      All the (default & per-node) pwqs are mandatorily controlled by
      the low level cpumask. If the user configured cpumask doesn't overlap
      with the low level cpumask, the low level cpumask will be used for the
      wq instead.
      
      The comment of wq_calc_node_cpumask() is updated and explicitly
      requires that its first argument should be the attrs of the default
      pwq.
      
      The default wq_unbound_cpumask is cpu_possible_mask.  The workqueue
      subsystem doesn't know its best default value, let the system manager
      or the other subsystem set it when needed.
      
      Changed from V8:
        merge the calculating code for the attrs of the default pwq together.
        minor change the code&comments for saving the user configured attrs.
        remove unnecessary list_del().
        minor update the comment of wq_calc_node_cpumask().
        update the comment of workqueue_set_unbound_cpumask();
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Original-patch-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      042f7df1
  13. 09 3月, 2015 1 次提交
    • T
      workqueue: dump workqueues on sysrq-t · 3494fc30
      Tejun Heo 提交于
      Workqueues are used extensively throughout the kernel but sometimes
      it's difficult to debug stalls involving work items because visibility
      into its inner workings is fairly limited.  Although sysrq-t task dump
      annotates each active worker task with the information on the work
      item being executed, it is challenging to find out which work items
      are pending or delayed on which queues and how pools are being
      managed.
      
      This patch implements show_workqueue_state() which dumps all busy
      workqueues and pools and is called from the sysrq-t handler.  At the
      end of sysrq-t dump, something like the following is printed.
      
       Showing busy workqueues and worker pools:
       ...
       workqueue filler_wq: flags=0x0
         pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
           in-flight: 491:filler_workfn, 507:filler_workfn
         pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
           in-flight: 501:filler_workfn
           pending: filler_workfn
       ...
       workqueue test_wq: flags=0x8
         pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1
           in-flight: 510(RESCUER):test_workfn BAR(69) BAR(500)
           delayed: test_workfn1 BAR(492), test_workfn2
       ...
       pool 0: cpus=0 node=0 flags=0x0 nice=0 workers=2 manager: 137
       pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=3 manager: 469
       pool 3: cpus=1 node=0 flags=0x0 nice=-20 workers=2 idle: 16
       pool 8: cpus=0-3 flags=0x4 nice=0 workers=2 manager: 62
      
      The above shows that test_wq is executing test_workfn() on pid 510
      which is the rescuer and also that there are two tasks 69 and 500
      waiting for the work item to finish in flush_work().  As test_wq has
      max_active of 1, there are two work items for test_workfn1() and
      test_workfn2() which are delayed till the current work item is
      finished.  In addition, pid 492 is flushing test_workfn1().
      
      The work item for test_workfn() is being executed on pwq of pool 2
      which is the normal priority per-cpu pool for CPU 1.  The pool has
      three workers, two of which are executing filler_workfn() for
      filler_wq and the last one is assuming the manager role trying to
      create more workers.
      
      This extra workqueue state dump will hopefully help chasing down hangs
      involving workqueues.
      
      v3: cpulist_pr_cont() replaced with "%*pbl" printf formatting.
      
      v2: As suggested by Andrew, minor formatting change in pr_cont_work(),
          printk()'s replaced with pr_info()'s, and cpumask printing now
          uses cpulist_pr_cont().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      CC: Ingo Molnar <mingo@redhat.com>
      3494fc30
  14. 05 3月, 2015 1 次提交
    • T
      workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE · 8603e1b3
      Tejun Heo 提交于
      cancel[_delayed]_work_sync() are implemented using
      __cancel_work_timer() which grabs the PENDING bit using
      try_to_grab_pending() and then flushes the work item with PENDING set
      to prevent the on-going execution of the work item from requeueing
      itself.
      
      try_to_grab_pending() can always grab PENDING bit without blocking
      except when someone else is doing the above flushing during
      cancelation.  In that case, try_to_grab_pending() returns -ENOENT.  In
      this case, __cancel_work_timer() currently invokes flush_work().  The
      assumption is that the completion of the work item is what the other
      canceling task would be waiting for too and thus waiting for the same
      condition and retrying should allow forward progress without excessive
      busy looping
      
      Unfortunately, this doesn't work if preemption is disabled or the
      latter task has real time priority.  Let's say task A just got woken
      up from flush_work() by the completion of the target work item.  If,
      before task A starts executing, task B gets scheduled and invokes
      __cancel_work_timer() on the same work item, its try_to_grab_pending()
      will return -ENOENT as the work item is still being canceled by task A
      and flush_work() will also immediately return false as the work item
      is no longer executing.  This puts task B in a busy loop possibly
      preventing task A from executing and clearing the canceling state on
      the work item leading to a hang.
      
      task A			task B			worker
      
      						executing work
      __cancel_work_timer()
        try_to_grab_pending()
        set work CANCELING
        flush_work()
          block for work completion
      						completion, wakes up A
      			__cancel_work_timer()
      			while (forever) {
      			  try_to_grab_pending()
      			    -ENOENT as work is being canceled
      			  flush_work()
      			    false as work is no longer executing
      			}
      
      This patch removes the possible hang by updating __cancel_work_timer()
      to explicitly wait for clearing of CANCELING rather than invoking
      flush_work() after try_to_grab_pending() fails with -ENOENT.
      
      Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com
      
      v3: bit_waitqueue() can't be used for work items defined in vmalloc
          area.  Switched to custom wake function which matches the target
          work item and exclusive wait and wakeup.
      
      v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if
          the target bit waitqueue has wait_bit_queue's on it.  Use
          DEFINE_WAIT_BIT() and __wake_up_bit() instead.  Reported by Tomeu
          Vizoso.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NRabin Vincent <rabin.vincent@axis.com>
      Cc: Tomeu Vizoso <tomeu.vizoso@gmail.com>
      Cc: stable@vger.kernel.org
      Tested-by: NJesper Nilsson <jesper.nilsson@axis.com>
      Tested-by: NRabin Vincent <rabin.vincent@axis.com>
      8603e1b3
  15. 07 1月, 2015 1 次提交
  16. 13 9月, 2014 1 次提交
    • T
      workqueue: apply __WQ_ORDERED to create_singlethread_workqueue() · e09c2c29
      Tejun Heo 提交于
      create_singlethread_workqueue() is a compat interface for single
      threaded workqueue which maps to ordered workqueue w/ rescuer in the
      current implementation.  create_singlethread_workqueue() currently
      implemented by invoking alloc_workqueue() w/ appropriate parameters.
      
      8719dcea ("workqueue: reject adjusting max_active or applying
      attrs to ordered workqueues") introduced __WQ_ORDERED to protect
      ordered workqueues against dynamic attribute changes which can break
      ordering guarantees but forgot to apply it to
      create_singlethread_workqueue().  This in itself is okay as nobody
      currently uses dynamic attribute change on workqueues created with
      create_singlethread_workqueue().
      
      However, 4c16bd32 ("workqueue: implement NUMA affinity for unbound
      workqueues") broke singlethreaded guarantee for ordered workqueues
      through allocating a separate pool_workqueue on each NUMA node by
      default.  A later change 8a2b7538 ("workqueue: fix ordered
      workqueues in NUMA setups") fixed it by allocating only one global
      pool_workqueue if __WQ_ORDERED is set.
      
      Combined, the __WQ_ORDERED omission in create_singlethread_workqueue()
      became critical breaking its single threadedness and ordering
      guarantee.
      
      Let's make create_singlethread_workqueue() wrap
      alloc_ordered_workqueue() instead so that it inherits __WQ_ORDERED and
      can implicitly track future ordered_workqueue changes.
      
      v2: I missed that __WQ_ORDERED now protects against pwq splitting
          across NUMA nodes and incorrectly described the patch as a
          nice-to-have fix to protect against future dynamic attribute
          usages.  Oleg pointed out that this is actually a critical
          breakage due to 8a2b7538 ("workqueue: fix ordered workqueues
          in NUMA setups").
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NMike Anderson <mike.anderson@us.ibm.com>
      Cc: Oleg Nesterov <onestero@redhat.com>
      Cc: Gustavo Luiz Duarte <gduarte@redhat.com>
      Cc: Tomas Henzl <thenzl@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 4c16bd32 ("workqueue: implement NUMA affinity for unbound workqueues")
      e09c2c29
  17. 22 5月, 2014 3 次提交
  18. 15 5月, 2014 2 次提交
  19. 29 3月, 2014 1 次提交
  20. 26 3月, 2014 1 次提交
  21. 25 3月, 2014 1 次提交
  22. 07 3月, 2014 1 次提交
    • T
      workqueue: remove PREPARE_[DELAYED_]WORK() · f073f922
      Tejun Heo 提交于
      Peter Hurley noticed that since a2c1c57b ("workqueue: consider
      work function when searching for busy work items"), a work item which
      gets assigned a different work function would break out of the
      non-reentrancy guarantee as workqueue would consider it a different
      work item.
      
      This is fragile and extremely subtle.  PREPARE_[DELAYED_]WORK() have
      never been used widely and its semantics has always been somewhat
      iffy.  If the work item is known not to be on queue when
      PREPARE_WORK() is called, there's no difference from using
      INIT_WORK().  If the work item may be queued at the time of
      PREPARE_WORK(), we can't really tell whether the old or new function
      will be executed the next time.
      
      We really don't want this level of subtlety in workqueue interface for
      such marginal use cases.  The previous patches converted all existing
      users away from PREPARE_[DELAYED_]WORK().  Let's remove them.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Link: http://lkml.kernel.org/g/1392493119-9277-1-git-send-email-peter@hurleysoftware.com
      f073f922
  23. 19 2月, 2014 1 次提交
  24. 14 2月, 2014 1 次提交
    • L
      workqueue: add args to workqueue lockdep name · fada94ee
      Li Zhong 提交于
      Tommi noticed a 'funny' lock class name: "%s#5" from a lock acquired in
      process_one_work().
      
      Maybe #fmt plus #args could be used as the lock_name to give some more
      information for some fmt string like the above.
      
      __builtin_constant_p() check is removed (as there seems no good way to
      check all the variables in args list). However, by removing the check,
      it only adds two additional "s for those constants.
      
      Some lockdep name examples printed out after the change:
      
      lockdep name                    wq->name
      
      "events_long"                   events_long
      "%s"("khelper")                 khelper
      "xfs-data/%s"mp->m_fsname       xfs-data/dm-3
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      fada94ee
  25. 30 7月, 2013 1 次提交
  26. 04 7月, 2013 1 次提交
  27. 15 5月, 2013 2 次提交
  28. 01 5月, 2013 1 次提交
    • T
      workqueue: include workqueue info when printing debug dump of a worker task · 3d1cb205
      Tejun Heo 提交于
      One of the problems that arise when converting dedicated custom
      threadpool to workqueue is that the shared worker pool used by workqueue
      anonimizes each worker making it more difficult to identify what the
      worker was doing on which target from the output of sysrq-t or debug
      dump from oops, BUG() and friends.
      
      This patch implements set_worker_desc() which can be called from any
      workqueue work function to set its description.  When the worker task is
      dumped for whatever reason - sysrq-t, WARN, BUG, oops, lockdep assertion
      and so on - the description will be printed out together with the
      workqueue name and the worker function pointer.
      
      The printing side is implemented by print_worker_info() which is called
      from functions in task dump paths - sched_show_task() and
      dump_stack_print_info().  print_worker_info() can be safely called on
      any task in any state as long as the task struct itself is accessible.
      It uses probe_*() functions to access worker fields.  It may print
      garbage if something went very wrong, but it wouldn't cause (another)
      oops.
      
      The description is currently limited to 24bytes including the
      terminating \0.  worker->desc_valid and workder->desc[] are added and
      the 64 bytes marker which was already incorrect before adding the new
      fields is moved to the correct position.
      
      Here's an example dump with writeback updated to set the bdi name as
      worker desc.
      
       Hardware name: Bochs
       Modules linked in:
       Pid: 7, comm: kworker/u9:0 Not tainted 3.9.0-rc1-work+ #1
       Workqueue: writeback bdi_writeback_workfn (flush-8:0)
        ffffffff820a3ab0 ffff88000f6e9cb8 ffffffff81c61845 ffff88000f6e9cf8
        ffffffff8108f50f 0000000000000000 0000000000000000 ffff88000cde16b0
        ffff88000cde1aa8 ffff88001ee19240 ffff88000f6e9fd8 ffff88000f6e9d08
       Call Trace:
        [<ffffffff81c61845>] dump_stack+0x19/0x1b
        [<ffffffff8108f50f>] warn_slowpath_common+0x7f/0xc0
        [<ffffffff8108f56a>] warn_slowpath_null+0x1a/0x20
        [<ffffffff81200150>] bdi_writeback_workfn+0x2a0/0x3b0
       ...
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d1cb205
  29. 02 4月, 2013 1 次提交
    • T
      workqueue: update sysfs interface to reflect NUMA awareness and a kernel param... · d55262c4
      Tejun Heo 提交于
      workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity
      
      Unbound workqueues are now NUMA aware.  Let's add some control knobs
      and update sysfs interface accordingly.
      
      * Add kernel param workqueue.numa_disable which disables NUMA affinity
        globally.
      
      * Replace sysfs file "pool_id" with "pool_ids" which contain
        node:pool_id pairs.  This change is userland-visible but "pool_id"
        hasn't seen a release yet, so this is okay.
      
      * Add a new sysf files "numa" which can toggle NUMA affinity on
        individual workqueues.  This is implemented as attrs->no_numa whichn
        is special in that it isn't part of a pool's attributes.  It only
        affects how apply_workqueue_attrs() picks which pools to use.
      
      After "pool_ids" change, first_pwq() doesn't have any user left.
      Removed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      d55262c4
  30. 14 3月, 2013 1 次提交
    • T
      workqueue: inline trivial wrappers · 8425e3d5
      Tejun Heo 提交于
      There's no reason to make these trivial wrappers full (exported)
      functions.  Inline the followings.
      
       queue_work()
       queue_delayed_work()
       mod_delayed_work()
       schedule_work_on()
       schedule_work()
       schedule_delayed_work_on()
       schedule_delayed_work()
       keventd_up()
      Signed-off-by: NTejun Heo <tj@kernel.org>
      8425e3d5
  31. 13 3月, 2013 5 次提交
    • T
      workqueue: implement current_is_workqueue_rescuer() · e6267616
      Tejun Heo 提交于
      Implement a function which queries whether it currently is running off
      a workqueue rescuer.  This will be used to convert writeback to
      workqueue.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      e6267616
    • T
      workqueue: implement sysfs interface for workqueues · 226223ab
      Tejun Heo 提交于
      There are cases where workqueue users want to expose control knobs to
      userland.  e.g. Unbound workqueues with custom attributes are
      scheduled to be used for writeback workers and depending on
      configuration it can be useful to allow admins to tinker with the
      priority or allowed CPUs.
      
      This patch implements workqueue_sysfs_register(), which makes the
      workqueue visible under /sys/bus/workqueue/devices/WQ_NAME.  There
      currently are two attributes common to both per-cpu and unbound pools
      and extra attributes for unbound pools including nice level and
      cpumask.
      
      If alloc_workqueue*() is called with WQ_SYSFS,
      workqueue_sysfs_register() is called automatically as part of
      workqueue creation.  This is the preferred method unless the workqueue
      user wants to apply workqueue_attrs before making the workqueue
      visible to userland.
      
      v2: Disallow exposing ordered workqueues as ordered workqueues can't
          be tuned in any way.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      226223ab
    • T
      workqueue: reject adjusting max_active or applying attrs to ordered workqueues · 8719dcea
      Tejun Heo 提交于
      Adjusting max_active of or applying new workqueue_attrs to an ordered
      workqueue breaks its ordering guarantee.  The former is obvious.  The
      latter is because applying attrs creates a new pwq (pool_workqueue)
      and there is no ordering constraint between the old and new pwqs.
      
      Make apply_workqueue_attrs() and workqueue_set_max_active() trigger
      WARN_ON() if those operations are requested on an ordered workqueue
      and fail / ignore respectively.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      8719dcea
    • T
      workqueue: make it clear that WQ_DRAINING is an internal flag · 618b01eb
      Tejun Heo 提交于
      We're gonna add another internal WQ flag.  Let's make the distinction
      clear.  Prefix WQ_DRAINING with __ and move it to bit 16.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      618b01eb
    • T
      workqueue: implement apply_workqueue_attrs() · 9e8cd2f5
      Tejun Heo 提交于
      Implement apply_workqueue_attrs() which applies workqueue_attrs to the
      specified unbound workqueue by creating a new pwq (pool_workqueue)
      linked to worker_pool with the specified attributes.
      
      A new pwq is linked at the head of wq->pwqs instead of tail and
      __queue_work() verifies that the first unbound pwq has positive refcnt
      before choosing it for the actual queueing.  This is to cover the case
      where creation of a new pwq races with queueing.  As base ref on a pwq
      won't be dropped without making another pwq the first one,
      __queue_work() is guaranteed to make progress and not add work item to
      a dead pwq.
      
      init_and_link_pwq() is updated to return the last first pwq the new
      pwq replaced, which is put by apply_workqueue_attrs().
      
      Note that apply_workqueue_attrs() is almost identical to unbound pwq
      part of alloc_and_link_pwqs().  The only difference is that there is
      no previous first pwq.  apply_workqueue_attrs() is implemented to
      handle such cases and replaces unbound pwq handling in
      alloc_and_link_pwqs().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      9e8cd2f5