1. 21 3月, 2019 1 次提交
  2. 09 3月, 2019 2 次提交
    • Q
      workqueue, lockdep: Fix a memory leak in wq->lock_name · 69a106c0
      Qian Cai 提交于
      The following commit:
      
        669de8bd ("kernel/workqueue: Use dynamic lockdep keys for workqueues")
      
      introduced a memory leak as wq_free_lockdep() calls kfree(wq->lock_name),
      but wq_init_lockdep() does not point wq->lock_name to the newly allocated
      slab object.
      
      This can be reproduced by running LTP fallocate04 followed by oom01 tests:
      
       unreferenced object 0xc0000005876384d8 (size 64):
        comm "fallocate04", pid 26972, jiffies 4297139141 (age 40370.480s)
        hex dump (first 32 bytes):
          28 77 71 5f 63 6f 6d 70 6c 65 74 69 6f 6e 29 65  (wq_completion)e
          78 74 34 2d 72 73 76 2d 63 6f 6e 76 65 72 73 69  xt4-rsv-conversi
        backtrace:
          [<00000000cb452883>] kvasprintf+0x6c/0xe0
          [<000000004654ddac>] kasprintf+0x34/0x60
          [<000000001c68f311>] alloc_workqueue+0x1f8/0x6ac
          [<0000000003c2ad83>] ext4_fill_super+0x23d4/0x3c80 [ext4]
          [<0000000006610538>] mount_bdev+0x25c/0x290
          [<00000000bcf955ec>] ext4_mount+0x28/0x50 [ext4]
          [<0000000016e08fd3>] legacy_get_tree+0x4c/0xb0
          [<0000000042b6a5fc>] vfs_get_tree+0x6c/0x190
          [<00000000268ab022>] do_mount+0xb9c/0x1100
          [<00000000698e6898>] ksys_mount+0x158/0x180
          [<0000000064e391fd>] sys_mount+0x20/0x30
          [<00000000ba378f12>] system_call+0x5c/0x70
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: catalin.marinas@arm.com
      Cc: jiangshanlai@gmail.com
      Cc: tj@kernel.org
      Fixes: 669de8bd ("kernel/workqueue: Use dynamic lockdep keys for workqueues")
      Link: https://lkml.kernel.org/r/20190307002731.47371-1-cai@lca.pwSigned-off-by: NIngo Molnar <mingo@kernel.org>
      69a106c0
    • B
      workqueue, lockdep: Fix an alloc_workqueue() error path · 009bb421
      Bart Van Assche 提交于
      This patch fixes a use-after-free and a memory leak in an alloc_workqueue()
      error path.
      
      Repoted by syzkaller and KASAN:
      
        BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:197 [inline]
        BUG: KASAN: use-after-free in lockdep_register_key+0x3b9/0x490 kernel/locking/lockdep.c:1023
        Read of size 8 at addr ffff888090fc2698 by task syz-executor134/7858
      
        CPU: 1 PID: 7858 Comm: syz-executor134 Not tainted 5.0.0-rc8-next-20190301 #1
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Call Trace:
         __dump_stack lib/dump_stack.c:77 [inline]
         dump_stack+0x172/0x1f0 lib/dump_stack.c:113
         print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
         kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
         __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
         __read_once_size include/linux/compiler.h:197 [inline]
         lockdep_register_key+0x3b9/0x490 kernel/locking/lockdep.c:1023
         wq_init_lockdep kernel/workqueue.c:3444 [inline]
         alloc_workqueue+0x427/0xe70 kernel/workqueue.c:4263
         ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732
         misc_open+0x398/0x4c0 drivers/char/misc.c:141
         chrdev_open+0x247/0x6b0 fs/char_dev.c:417
         do_dentry_open+0x488/0x1160 fs/open.c:771
         vfs_open+0xa0/0xd0 fs/open.c:880
         do_last fs/namei.c:3416 [inline]
         path_openat+0x10e9/0x46e0 fs/namei.c:3533
         do_filp_open+0x1a1/0x280 fs/namei.c:3563
         do_sys_open+0x3fe/0x5d0 fs/open.c:1063
         __do_sys_openat fs/open.c:1090 [inline]
         __se_sys_openat fs/open.c:1084 [inline]
         __x64_sys_openat+0x9d/0x100 fs/open.c:1084
         do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
        Allocated by task 7789:
         save_stack+0x45/0xd0 mm/kasan/common.c:75
         set_track mm/kasan/common.c:87 [inline]
         __kasan_kmalloc mm/kasan/common.c:497 [inline]
         __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470
         kasan_kmalloc+0x9/0x10 mm/kasan/common.c:511
         __do_kmalloc mm/slab.c:3726 [inline]
         __kmalloc+0x15c/0x740 mm/slab.c:3735
         kmalloc include/linux/slab.h:553 [inline]
         kzalloc include/linux/slab.h:743 [inline]
         alloc_workqueue+0x13c/0xe70 kernel/workqueue.c:4236
         ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732
         misc_open+0x398/0x4c0 drivers/char/misc.c:141
         chrdev_open+0x247/0x6b0 fs/char_dev.c:417
         do_dentry_open+0x488/0x1160 fs/open.c:771
         vfs_open+0xa0/0xd0 fs/open.c:880
         do_last fs/namei.c:3416 [inline]
         path_openat+0x10e9/0x46e0 fs/namei.c:3533
         do_filp_open+0x1a1/0x280 fs/namei.c:3563
         do_sys_open+0x3fe/0x5d0 fs/open.c:1063
         __do_sys_openat fs/open.c:1090 [inline]
         __se_sys_openat fs/open.c:1084 [inline]
         __x64_sys_openat+0x9d/0x100 fs/open.c:1084
         do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
        Freed by task 7789:
         save_stack+0x45/0xd0 mm/kasan/common.c:75
         set_track mm/kasan/common.c:87 [inline]
         __kasan_slab_free+0x102/0x150 mm/kasan/common.c:459
         kasan_slab_free+0xe/0x10 mm/kasan/common.c:467
         __cache_free mm/slab.c:3498 [inline]
         kfree+0xcf/0x230 mm/slab.c:3821
         alloc_workqueue+0xc3e/0xe70 kernel/workqueue.c:4295
         ucma_open+0x76/0x290 drivers/infiniband/core/ucma.c:1732
         misc_open+0x398/0x4c0 drivers/char/misc.c:141
         chrdev_open+0x247/0x6b0 fs/char_dev.c:417
         do_dentry_open+0x488/0x1160 fs/open.c:771
         vfs_open+0xa0/0xd0 fs/open.c:880
         do_last fs/namei.c:3416 [inline]
         path_openat+0x10e9/0x46e0 fs/namei.c:3533
         do_filp_open+0x1a1/0x280 fs/namei.c:3563
         do_sys_open+0x3fe/0x5d0 fs/open.c:1063
         __do_sys_openat fs/open.c:1090 [inline]
         __se_sys_openat fs/open.c:1084 [inline]
         __x64_sys_openat+0x9d/0x100 fs/open.c:1084
         do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
        The buggy address belongs to the object at ffff888090fc2580
         which belongs to the cache kmalloc-512 of size 512
        The buggy address is located 280 bytes inside of
         512-byte region [ffff888090fc2580, ffff888090fc2780)
      
      Reported-by: syzbot+17335689e239ce135d8b@syzkaller.appspotmail.com
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Fixes: 669de8bd ("kernel/workqueue: Use dynamic lockdep keys for workqueues")
      Link: https://lkml.kernel.org/r/20190303220046.29448-1-bvanassche@acm.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      009bb421
  3. 08 3月, 2019 1 次提交
  4. 05 3月, 2019 1 次提交
  5. 28 2月, 2019 1 次提交
    • B
      kernel/workqueue: Use dynamic lockdep keys for workqueues · 669de8bd
      Bart Van Assche 提交于
      The following commit:
      
        87915adc ("workqueue: re-add lockdep dependencies for flushing")
      
      improved deadlock checking in the workqueue implementation. Unfortunately
      that patch also introduced a few false positive lockdep complaints.
      
      This patch suppresses these false positives by allocating the workqueue mutex
      lockdep key dynamically.
      
      An example of a false positive lockdep complaint suppressed by this patch
      can be found below. The root cause of the lockdep complaint shown below
      is that the direct I/O code can call alloc_workqueue() from inside a work
      item created by another alloc_workqueue() call and that both workqueues
      share the same lockdep key. This patch avoids that that lockdep complaint
      is triggered by allocating the work queue lockdep keys dynamically.
      
      In other words, this patch guarantees that a unique lockdep key is
      associated with each work queue mutex.
      
        ======================================================
        WARNING: possible circular locking dependency detected
        4.19.0-dbg+ #1 Not tainted
        fio/4129 is trying to acquire lock:
        00000000a01cfe1a ((wq_completion)"dio/%s"sb->s_id){+.+.}, at: flush_workqueue+0xd0/0x970
      
        but task is already holding lock:
        00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #2 (&sb->s_type->i_mutex_key#14){+.+.}:
               down_write+0x3d/0x80
               __generic_file_fsync+0x77/0xf0
               ext4_sync_file+0x3c9/0x780
               vfs_fsync_range+0x66/0x100
               dio_complete+0x2f5/0x360
               dio_aio_complete_work+0x1c/0x20
               process_one_work+0x481/0x9f0
               worker_thread+0x63/0x5a0
               kthread+0x1cf/0x1f0
               ret_from_fork+0x24/0x30
      
        -> #1 ((work_completion)(&dio->complete_work)){+.+.}:
               process_one_work+0x447/0x9f0
               worker_thread+0x63/0x5a0
               kthread+0x1cf/0x1f0
               ret_from_fork+0x24/0x30
      
        -> #0 ((wq_completion)"dio/%s"sb->s_id){+.+.}:
               lock_acquire+0xc5/0x200
               flush_workqueue+0xf3/0x970
               drain_workqueue+0xec/0x220
               destroy_workqueue+0x23/0x350
               sb_init_dio_done_wq+0x6a/0x80
               do_blockdev_direct_IO+0x1f33/0x4be0
               __blockdev_direct_IO+0x79/0x86
               ext4_direct_IO+0x5df/0xbb0
               generic_file_direct_write+0x119/0x220
               __generic_file_write_iter+0x131/0x2d0
               ext4_file_write_iter+0x3fa/0x710
               aio_write+0x235/0x330
               io_submit_one+0x510/0xeb0
               __x64_sys_io_submit+0x122/0x340
               do_syscall_64+0x71/0x220
               entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
        other info that might help us debug this:
      
        Chain exists of:
          (wq_completion)"dio/%s"sb->s_id --> (work_completion)(&dio->complete_work) --> &sb->s_type->i_mutex_key#14
      
         Possible unsafe locking scenario:
      
               CPU0                    CPU1
               ----                    ----
          lock(&sb->s_type->i_mutex_key#14);
                                       lock((work_completion)(&dio->complete_work));
                                       lock(&sb->s_type->i_mutex_key#14);
          lock((wq_completion)"dio/%s"sb->s_id);
      
         *** DEADLOCK ***
      
        1 lock held by fio/4129:
         #0: 00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710
      
        stack backtrace:
        CPU: 3 PID: 4129 Comm: fio Not tainted 4.19.0-dbg+ #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
        Call Trace:
         dump_stack+0x86/0xc5
         print_circular_bug.isra.32+0x20a/0x218
         __lock_acquire+0x1c68/0x1cf0
         lock_acquire+0xc5/0x200
         flush_workqueue+0xf3/0x970
         drain_workqueue+0xec/0x220
         destroy_workqueue+0x23/0x350
         sb_init_dio_done_wq+0x6a/0x80
         do_blockdev_direct_IO+0x1f33/0x4be0
         __blockdev_direct_IO+0x79/0x86
         ext4_direct_IO+0x5df/0xbb0
         generic_file_direct_write+0x119/0x220
         __generic_file_write_iter+0x131/0x2d0
         ext4_file_write_iter+0x3fa/0x710
         aio_write+0x235/0x330
         io_submit_one+0x510/0xeb0
         __x64_sys_io_submit+0x122/0x340
         do_syscall_64+0x71/0x220
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Berg <johannes.berg@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: https://lkml.kernel.org/r/20190214230058.196511-20-bvanassche@acm.org
      [ Reworked the changelog a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      669de8bd
  6. 22 2月, 2019 1 次提交
  7. 02 2月, 2019 1 次提交
    • J
      psi: fix aggregation idle shut-off · 1b69ac6b
      Johannes Weiner 提交于
      psi has provisions to shut off the periodic aggregation worker when
      there is a period of no task activity - and thus no data that needs
      aggregating.  However, while developing psi monitoring, Suren noticed
      that the aggregation clock currently won't stay shut off for good.
      
      Debugging this revealed a flaw in the idle design: an aggregation run
      will see no task activity and decide to go to sleep; shortly thereafter,
      the kworker thread that executed the aggregation will go idle and cause
      a scheduling change, during which the psi callback will kick the
      !pending worker again.  This will ping-pong forever, and is equivalent
      to having no shut-off logic at all (but with more code!)
      
      Fix this by exempting aggregation workers from psi's clock waking logic
      when the state change is them going to sleep.  To do this, tag workers
      with the last work function they executed, and if in psi we see a worker
      going to sleep after aggregating psi data, we will not reschedule the
      aggregation work item.
      
      What if the worker is also executing other items before or after?
      
      Any psi state times that were incurred by work items preceding the
      aggregation work will have been collected from the per-cpu buckets
      during the aggregation itself.  If there are work items following the
      aggregation work, the worker's last_func tag will be overwritten and the
      aggregator will be kept alive to process this genuine new activity.
      
      If the aggregation work is the last thing the worker does, and we decide
      to go idle, the brief period of non-idle time incurred between the
      aggregation run and the kworker's dequeue will be stranded in the
      per-cpu buckets until the clock is woken by later activity.  But that
      should not be a problem.  The buckets can hold 4s worth of time, and
      future activity will wake the clock with a 2s delay, giving us 2s worth
      of data we can leave behind when disabling aggregation.  If it takes a
      worker more than two seconds to go idle after it finishes its last work
      item, we likely have bigger problems in the system, and won't notice one
      sample that was averaged with a bogus per-CPU weight.
      
      Link: http://lkml.kernel.org/r/20190116193501.1910-1-hannes@cmpxchg.org
      Fixes: eb414681 ("psi: pressure stall information for CPU, memory, and IO")
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: NSuren Baghdasaryan <surenb@google.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b69ac6b
  8. 31 1月, 2019 1 次提交
  9. 25 1月, 2019 1 次提交
  10. 28 11月, 2018 1 次提交
  11. 30 8月, 2018 1 次提交
  12. 22 8月, 2018 2 次提交
    • J
      workqueue: re-add lockdep dependencies for flushing · 87915adc
      Johannes Berg 提交于
      In flush_work(), we need to create a lockdep dependency so that
      the following scenario is appropriately tagged as a problem:
      
        work_function()
        {
          mutex_lock(&mutex);
          ...
        }
      
        other_function()
        {
          mutex_lock(&mutex);
          flush_work(&work); // or cancel_work_sync(&work);
        }
      
      This is a problem since the work might be running and be blocked
      on trying to acquire the mutex.
      
      Similarly, in flush_workqueue().
      
      These were removed after cross-release partially caught these
      problems, but now cross-release was reverted anyway. IMHO the
      removal was erroneous anyway though, since lockdep should be
      able to catch potential problems, not just actual ones, and
      cross-release would only have caught the problem when actually
      invoking wait_for_completion().
      
      Fixes: fd1a5b04 ("workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes")
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      87915adc
    • J
      workqueue: skip lockdep wq dependency in cancel_work_sync() · d6e89786
      Johannes Berg 提交于
      In cancel_work_sync(), we can only have one of two cases, even
      with an ordered workqueue:
       * the work isn't running, just cancelled before it started
       * the work is running, but then nothing else can be on the
         workqueue before it
      
      Thus, we need to skip the lockdep workqueue dependency handling,
      otherwise we get false positive reports from lockdep saying that
      we have a potential deadlock when the workqueue also has other
      work items with locking, e.g.
      
        work1_function() { mutex_lock(&mutex); ... }
        work2_function() { /* nothing */ }
      
        other_function() {
          queue_work(ordered_wq, &work1);
          queue_work(ordered_wq, &work2);
          mutex_lock(&mutex);
          cancel_work_sync(&work2);
        }
      
      As described above, this isn't a problem, but lockdep will
      currently flag it as if cancel_work_sync() was flush_work(),
      which *is* a problem.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      d6e89786
  13. 13 6月, 2018 1 次提交
    • K
      treewide: kzalloc() -> kcalloc() · 6396bb22
      Kees Cook 提交于
      The kzalloc() function has a 2-factor argument form, kcalloc(). This
      patch replaces cases of:
      
              kzalloc(a * b, gfp)
      
      with:
              kcalloc(a * b, gfp)
      
      as well as handling cases of:
      
              kzalloc(a * b * c, gfp)
      
      with:
      
              kzalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kzalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kzalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kzalloc
      + kcalloc
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(sizeof(THING) * C2, ...)
      |
        kzalloc(sizeof(TYPE) * C2, ...)
      |
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(C1 * C2, ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6396bb22
  14. 07 6月, 2018 1 次提交
    • K
      treewide: Use struct_size() for kmalloc()-family · acafe7e3
      Kees Cook 提交于
      One of the more common cases of allocation size calculations is finding
      the size of a structure that has a zero-sized array at the end, along
      with memory for some number of elements for that array. For example:
      
      struct foo {
          int stuff;
          void *entry[];
      };
      
      instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);
      
      Instead of leaving these open-coded and prone to type mistakes, we can
      now use the new struct_size() helper:
      
      instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);
      
      This patch makes the changes for kmalloc()-family (and kvmalloc()-family)
      uses. It was done via automatic conversion with manual review for the
      "CHECKME" non-standard cases noted below, using the following Coccinelle
      script:
      
      // pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len *
      //                      sizeof *pkey_cache->table, GFP_KERNEL);
      @@
      identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
      expression GFP;
      identifier VAR, ELEMENT;
      expression COUNT;
      @@
      
      - alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP)
      + alloc(struct_size(VAR, ELEMENT, COUNT), GFP)
      
      // mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL);
      @@
      identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
      expression GFP;
      identifier VAR, ELEMENT;
      expression COUNT;
      @@
      
      - alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP)
      + alloc(struct_size(VAR, ELEMENT, COUNT), GFP)
      
      // Same pattern, but can't trivially locate the trailing element name,
      // or variable name.
      @@
      identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
      expression GFP;
      expression SOMETHING, COUNT, ELEMENT;
      @@
      
      - alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP)
      + alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP)
      Signed-off-by: NKees Cook <keescook@chromium.org>
      acafe7e3
  15. 24 5月, 2018 1 次提交
    • M
      workqueue: move function definitions within CONFIG_SMP block · 66448bc2
      Mathieu Malaterre 提交于
      In commit 7ee681b2 ("workqueue: Convert to state machine callbacks"),
      three new function definitions were added: ‘workqueue_prepare_cpu’,
      ‘workqueue_online_cpu’ and ‘workqueue_offline_cpu’.
      
      Move these function definitions within a CONFIG_SMP block since they are
      not used outside of it. This will match function declarations in header
      <include/linux/workqueue.h>, and silence the following gcc warning (W=1):
      
        kernel/workqueue.c:4743:5: warning: no previous prototype for ‘workqueue_prepare_cpu’ [-Wmissing-prototypes]
        kernel/workqueue.c:4756:5: warning: no previous prototype for ‘workqueue_online_cpu’ [-Wmissing-prototypes]
        kernel/workqueue.c:4783:5: warning: no previous prototype for ‘workqueue_offline_cpu’ [-Wmissing-prototypes]
      Signed-off-by: NMathieu Malaterre <malat@debian.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      66448bc2
  16. 21 5月, 2018 1 次提交
  17. 18 5月, 2018 5 次提交
    • T
      workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status} · 6b59808b
      Tejun Heo 提交于
      There can be a lot of workqueue workers and they all show up with the
      cryptic kworker/* names making it difficult to understand which is
      doing what and how they came to be.
      
        # ps -ef | grep kworker
        root           4       2  0 Feb25 ?        00:00:00 [kworker/0:0H]
        root           6       2  0 Feb25 ?        00:00:00 [kworker/u112:0]
        root          19       2  0 Feb25 ?        00:00:00 [kworker/1:0H]
        root          25       2  0 Feb25 ?        00:00:00 [kworker/2:0H]
        root          31       2  0 Feb25 ?        00:00:00 [kworker/3:0H]
        ...
      
      This patch makes workqueue workers report the latest workqueue it was
      executing for through /proc/PID/{comm,stat,status}.  The extra
      information is appended to the kthread name with intervening '+' if
      currently executing, otherwise '-'.
      
        # cat /proc/25/comm
        kworker/2:0-events_power_efficient
        # cat /proc/25/stat
        25 (kworker/2:0-events_power_efficient) I 2 0 0 0 -1 69238880 0 0...
        # grep Name /proc/25/status
        Name:   kworker/2:0-events_power_efficient
      
      Unfortunately, ps(1) truncates comm to 15 characters,
      
        # ps 25
          PID TTY      STAT   TIME COMMAND
           25 ?        I      0:00 [kworker/2:0-eve]
      
      making it a lot less useful; however, this should be an easy fix from
      ps(1) side.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Craig Small <csmall@enc.com.au>
      6b59808b
    • T
      workqueue: Set worker->desc to workqueue name by default · 8bf89593
      Tejun Heo 提交于
      Work functions can use set_worker_desc() to improve the visibility of
      what the worker task is doing.  Currently, the desc field is unset at
      the beginning of each execution and there is a separate field to track
      the field is set during the current execution.
      
      Instead of leaving empty till desc is set, worker->desc can be used to
      remember the last workqueue the worker worked on by default and users
      that use set_worker_desc() can override it to something more
      informative as necessary.
      
      This simplifies desc handling and helps tracking the last workqueue
      that the worker exected on to improve visibility.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      8bf89593
    • T
      workqueue: Make worker_attach/detach_pool() update worker->pool · a2d812a2
      Tejun Heo 提交于
      For historical reasons, the worker attach/detach functions don't
      currently manage worker->pool and the callers are manually and
      inconsistently updating it.
      
      This patch moves worker->pool updates into the worker attach/detach
      functions.  This makes worker->pool consistent and clearly defines how
      worker->pool updates are synchronized.
      
      This will help later workqueue visibility improvements by allowing
      safe access to workqueue information from worker->task.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a2d812a2
    • T
      workqueue: Replace pool->attach_mutex with global wq_pool_attach_mutex · 1258fae7
      Tejun Heo 提交于
      To improve workqueue visibility, we want to be able to access
      workqueue information from worker tasks.  The per-pool attach mutex
      makes that difficult because there's no way of stabilizing task ->
      worker pool association without knowing the pool first.
      
      Worker attach/detach is a slow path and there's no need for different
      pools to be able to perform them concurrently.  This patch replaces
      the per-pool attach_mutex with global wq_pool_attach_mutex to prepare
      for visibility improvement changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      1258fae7
    • S
      scsi: zfcp: workqueue: set description for port work items with their WWPN as context · 5c750d58
      Steffen Maier 提交于
      As a prerequisite, complement commit 3d1cb205 ("workqueue: include
      workqueue info when printing debug dump of a worker task") to be usable with
      kernel modules by exporting the symbol set_worker_desc().  Current built-in
      user was introduced with commit ef3b1019 ("writeback: set worker desc to
      identify writeback workers in task dumps").
      
      Can help distinguishing work items which do not have adapter scope.
      Description is printed out with task dump for debugging on WARN, BUG, panic,
      or magic-sysrq [show-task-states(t)].
      
      Example:
      $ echo 0 >| /sys/bus/ccw/drivers/zfcp/0.0.1880/0x50050763031bd327/failed &
      $ echo 't' >| /proc/sysrq-trigger
      $ dmesg
      sysrq: SysRq : Show State
        task                        PC stack   pid father
      ...
      zfcp_q_0.0.1880 S14640  2165      2 0x02000000
      Call Trace:
      ([<00000000009df464>] __schedule+0xbf4/0xc78)
       [<00000000009df57c>] schedule+0x94/0xc0
       [<0000000000168654>] rescuer_thread+0x33c/0x3a0
       [<000000000016f8be>] kthread+0x166/0x178
       [<00000000009e71f2>] kernel_thread_starter+0x6/0xc
       [<00000000009e71ec>] kernel_thread_starter+0x0/0xc
      no locks held by zfcp_q_0.0.1880/2165.
      ...
      kworker/u512:2  D11280  2193      2 0x02000000
      Workqueue: zfcp_q_0.0.1880 zfcp_scsi_rport_work [zfcp] (zrpd-50050763031bd327)
                                                              ^^^^^^^^^^^^^^^^^^^^^
      Call Trace:
      ([<00000000009df464>] __schedule+0xbf4/0xc78)
       [<00000000009df57c>] schedule+0x94/0xc0
       [<00000000009e50c0>] schedule_timeout+0x488/0x4d0
       [<00000000001e425c>] msleep+0x5c/0x78                  >>test code only<<
       [<000003ff8008a21e>] zfcp_scsi_rport_work+0xbe/0x100 [zfcp]
       [<0000000000167154>] process_one_work+0x3b4/0x718
       [<000000000016771c>] worker_thread+0x264/0x408
       [<000000000016f8be>] kthread+0x166/0x178
       [<00000000009e71f2>] kernel_thread_starter+0x6/0xc
       [<00000000009e71ec>] kernel_thread_starter+0x0/0xc
      2 locks held by kworker/u512:2/2193:
       #0:  (name){++++.+}, at: [<0000000000166f4e>] process_one_work+0x1ae/0x718
       #1:  ((&(&port->rport_work)->work)){+.+.+.}, at: [<0000000000166f4e>] process_one_work+0x1ae/0x718
      ...
      
      =============================================
      Showing busy workqueues and worker pools:
      workqueue zfcp_q_0.0.1880: flags=0x2000a
        pwq 512: cpus=0-255 flags=0x4 nice=0 active=1/1
          in-flight: 2193:zfcp_scsi_rport_work [zfcp]
      pool 512: cpus=0-255 flags=0x4 nice=0 hung=0s workers=4 idle: 5 2354 2311
      
      Work items with adapter scope are already identified by the workqueue name
      "zfcp_q_<devbusid>" and the work item function name.
      Signed-off-by: NSteffen Maier <maier@linux.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Reviewed-by: NBenjamin Block <bblock@linux.ibm.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      5c750d58
  18. 21 3月, 2018 2 次提交
  19. 20 3月, 2018 1 次提交
    • T
      RCU, workqueue: Implement rcu_work · 05f0fe6b
      Tejun Heo 提交于
      There are cases where RCU callback needs to be bounced to a sleepable
      context.  This is currently done by the RCU callback queueing a work
      item, which can be cumbersome to write and confusing to read.
      
      This patch introduces rcu_work, a workqueue work variant which gets
      executed after a RCU grace period, and converts the open coded
      bouncing in fs/aio and kernel/cgroup.
      
      v3: Dropped queue_rcu_work_on().  Documented rcu grace period behavior
          after queue_rcu_work().
      
      v2: Use rcu_barrier() instead of synchronize_rcu() to wait for
          completion of previously queued rcu callback as per Paul.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      05f0fe6b
  20. 14 3月, 2018 2 次提交
  21. 21 2月, 2018 1 次提交
  22. 17 2月, 2018 1 次提交
  23. 15 1月, 2018 1 次提交
    • N
      staging: lustre: lnet: convert selftest to use workqueues · 6106c0f8
      NeilBrown 提交于
      Instead of the cfs workitem library, use workqueues.
      
      As lnet wants to provide a cpu mask of allowed cpus, it
      needs to be a WQ_UNBOUND work queue so that tasks can
      run on cpus other than where they were submitted.
      
      This patch also exported apply_workqueue_attrs() which is
      a documented part of the workqueue API, that isn't currently
      exported.  lustre needs it to allow workqueue thread to be limited
      to a subset of CPUs.
      
      Acked-by: Tejun Heo <tj@kernel.org> (for export of apply_workqueue_attrs)
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6106c0f8
  24. 13 1月, 2018 1 次提交
  25. 08 1月, 2018 2 次提交
    • T
      workqueue: allow WQ_MEM_RECLAIM on early init workqueues · 40c17f75
      Tejun Heo 提交于
      Workqueues can be created early during boot before workqueue subsystem
      in fully online - work items are queued waiting for later full
      initialization.  However, early init wasn't supported for
      WQ_MEM_RECLAIM workqueues causing unnecessary annoyances for a subset
      of users.  Expand early init support to include WQ_MEM_RECLAIM
      workqueues.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      40c17f75
    • T
      workqueue: separate out init_rescuer() · 983c7515
      Tejun Heo 提交于
      Separate out init_rescuer() from __alloc_workqueue_key() to prepare
      for early init support for WQ_MEM_RECLAIM.  This patch doesn't
      introduce any functional changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      983c7515
  26. 11 12月, 2017 1 次提交
  27. 05 12月, 2017 3 次提交
  28. 28 11月, 2017 1 次提交
  29. 22 11月, 2017 1 次提交
    • K
      treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts · 841b86f3
      Kees Cook 提交于
      With all callbacks converted, and the timer callback prototype
      switched over, the TIMER_FUNC_TYPE cast is no longer needed,
      so remove it. Conversion was done with the following scripts:
      
          perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \
              $(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u)
      
          perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \
              $(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u)
      
      The now unused macros are also dropped from include/linux/timer.h.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      841b86f3